Skip to content

Latest commit

 

History

History
133 lines (103 loc) · 5.65 KB

File metadata and controls

133 lines (103 loc) · 5.65 KB

Project: Setting Up Cloud Storage and Connecting a Database with DBeaver

Objective:

The goal of this project is to set up a cloud storage service (AWS S3 or Azure Blob Storage), create a PostgreSQL database, and connect to it using DBeaver for managing and querying the data. This project will help you understand how to configure cloud storage, set up a relational database, and use a SQL client to interact with the database.


Steps to Complete the Project:

1. Set Up Cloud Storage (AWS S3 or Azure Blob Storage)

Using AWS S3:

  • Create an AWS account (if you don’t have one).
  • Go to the S3 dashboard and create a new S3 bucket.
    • Set a unique bucket name and choose a region.
    • Leave default settings for now.
  • Upload a sample file (e.g., a CSV file or any dataset) to your S3 bucket.

OR

Using Azure Blob Storage:

  • Create an Azure account (if you don’t have one).
  • Go to the Azure Portal and create a Storage Account.
    • Choose the appropriate region and resource group.
    • Once created, navigate to the Blob Storage section and create a container.
  • Upload a sample file (e.g., a CSV file) to your Azure Blob Storage.

2. Set Up PostgreSQL Database

  • Install PostgreSQL locally on your machine or use a cloud database provider like AWS RDS or Azure PostgreSQL.

    • For local installation:
      • Windows: Download the installer from the official PostgreSQL website.
      • Mac: Use Homebrew (brew install postgresql).
      • Linux: Use the package manager (sudo apt-get install postgresql).
  • Create a PostgreSQL database named test_db (or any other name).

    • Connect to the database using the psql terminal.
    • Create a simple table to store data (e.g., a table for storing basic product information):
      CREATE TABLE products (
          id SERIAL PRIMARY KEY,
          name VARCHAR(100),
          price DECIMAL
      );

3. Install and Configure DBeaver

  • Download and install DBeaver (a SQL client tool that connects to databases).

    • Go to DBeaver website and download the version compatible with your operating system.
  • Open DBeaver and create a new connection to the PostgreSQL database:

    • Select PostgreSQL as the database type.
    • Enter the database connection details (host, port, username, password, and database name).
      • For local PostgreSQL installation, the default values are typically:
        • Host: localhost
        • Port: 5432
        • Username: postgres
        • Password: Your PostgreSQL password
        • Database: test_db

4. Connect Cloud Storage with PostgreSQL

  • (Optional) For AWS S3: Use a tool like boto3 (AWS SDK for Python) to interact with the files stored in your S3 bucket. You could upload a CSV file and load it into your PostgreSQL database using Python.

    • Example Python code using boto3 to download a file from S3:
      import boto3
      
      s3 = boto3.client('s3')
      bucket_name = 'your-bucket-name'
      file_key = 'your-file.csv'
      local_file_path = '/path/to/save/file.csv'
      
      s3.download_file(bucket_name, file_key, local_file_path)
  • (Optional) For Azure Blob Storage: Use the azure-storage-blob Python library to interact with Azure Blob Storage.

    • Example Python code to download a file from Azure Blob Storage:
      from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
      
      connection_string = "your_connection_string"
      container_name = "your_container_name"
      blob_name = "your-file.csv"
      download_path = "/path/to/save/file.csv"
      
      blob_service_client = BlobServiceClient.from_connection_string(connection_string)
      container_client = blob_service_client.get_container_client(container_name)
      blob_client = container_client.get_blob_client(blob_name)
      
      with open(download_path, "wb") as download_file:
          download_file.write(blob_client.download_blob().readall())

5. Use DBeaver to Interact with PostgreSQL

  • Open DBeaver and connect to your PostgreSQL database.
  • Execute basic SQL queries such as:
    • Inserting data:
      INSERT INTO products (name, price) VALUES ('Laptop', 1000);
    • Querying the data:
      SELECT * FROM products;
  • You can now use DBeaver to perform other SQL operations like creating new tables, updating data, etc.

6. (Optional) Data Import from CSV

If you’ve uploaded a CSV file to your cloud storage (S3 or Azure Blob), you can use DBeaver to import this file into your PostgreSQL database:

  • In DBeaver, right-click on the table where you want to import data and select Import Data.
  • Choose the CSV file from your local machine (after downloading from cloud storage).
  • Map the CSV columns to the corresponding table columns.

Deliverables:

  1. Screenshots of the cloud storage (AWS S3 or Azure Blob) with uploaded files.
  2. PostgreSQL database schema (SQL script) for the products table.
  3. A Python script for interacting with the cloud storage and PostgreSQL database (if applicable).
  4. DBeaver connection details and queries performed on the database.
  5. A brief report documenting the steps taken and any challenges faced.

Skills Gained:

  • Configuring cloud storage (AWS S3 or Azure Blob Storage).
  • Setting up and connecting to a PostgreSQL database.
  • Using DBeaver as a SQL client for managing and querying data.
  • Integrating cloud storage with PostgreSQL (optional, but adds a real-world dimension).

This is a simple and effective project that will help learners get hands-on experience with cloud services, databases, and SQL client tools while reinforcing key data engineering concepts.