The goal of this project is to set up a cloud storage service (AWS S3 or Azure Blob Storage), create a PostgreSQL database, and connect to it using DBeaver for managing and querying the data. This project will help you understand how to configure cloud storage, set up a relational database, and use a SQL client to interact with the database.
Using AWS S3:
- Create an AWS account (if you don’t have one).
- Go to the S3 dashboard and create a new S3 bucket.
- Set a unique bucket name and choose a region.
- Leave default settings for now.
- Upload a sample file (e.g., a CSV file or any dataset) to your S3 bucket.
OR
Using Azure Blob Storage:
- Create an Azure account (if you don’t have one).
- Go to the Azure Portal and create a Storage Account.
- Choose the appropriate region and resource group.
- Once created, navigate to the Blob Storage section and create a container.
- Upload a sample file (e.g., a CSV file) to your Azure Blob Storage.
-
Install PostgreSQL locally on your machine or use a cloud database provider like AWS RDS or Azure PostgreSQL.
- For local installation:
- Windows: Download the installer from the official PostgreSQL website.
- Mac: Use Homebrew (
brew install postgresql). - Linux: Use the package manager (
sudo apt-get install postgresql).
- For local installation:
-
Create a PostgreSQL database named
test_db(or any other name).- Connect to the database using the
psqlterminal. - Create a simple table to store data (e.g., a table for storing basic product information):
CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(100), price DECIMAL );
- Connect to the database using the
-
Download and install DBeaver (a SQL client tool that connects to databases).
- Go to DBeaver website and download the version compatible with your operating system.
-
Open DBeaver and create a new connection to the PostgreSQL database:
- Select PostgreSQL as the database type.
- Enter the database connection details (host, port, username, password, and database name).
- For local PostgreSQL installation, the default values are typically:
- Host:
localhost - Port:
5432 - Username:
postgres - Password: Your PostgreSQL password
- Database:
test_db
- Host:
- For local PostgreSQL installation, the default values are typically:
-
(Optional) For AWS S3: Use a tool like
boto3(AWS SDK for Python) to interact with the files stored in your S3 bucket. You could upload a CSV file and load it into your PostgreSQL database using Python.- Example Python code using
boto3to download a file from S3:import boto3 s3 = boto3.client('s3') bucket_name = 'your-bucket-name' file_key = 'your-file.csv' local_file_path = '/path/to/save/file.csv' s3.download_file(bucket_name, file_key, local_file_path)
- Example Python code using
-
(Optional) For Azure Blob Storage: Use the
azure-storage-blobPython library to interact with Azure Blob Storage.- Example Python code to download a file from Azure Blob Storage:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient connection_string = "your_connection_string" container_name = "your_container_name" blob_name = "your-file.csv" download_path = "/path/to/save/file.csv" blob_service_client = BlobServiceClient.from_connection_string(connection_string) container_client = blob_service_client.get_container_client(container_name) blob_client = container_client.get_blob_client(blob_name) with open(download_path, "wb") as download_file: download_file.write(blob_client.download_blob().readall())
- Example Python code to download a file from Azure Blob Storage:
- Open DBeaver and connect to your PostgreSQL database.
- Execute basic SQL queries such as:
- Inserting data:
INSERT INTO products (name, price) VALUES ('Laptop', 1000);
- Querying the data:
SELECT * FROM products;
- Inserting data:
- You can now use DBeaver to perform other SQL operations like creating new tables, updating data, etc.
If you’ve uploaded a CSV file to your cloud storage (S3 or Azure Blob), you can use DBeaver to import this file into your PostgreSQL database:
- In DBeaver, right-click on the table where you want to import data and select Import Data.
- Choose the CSV file from your local machine (after downloading from cloud storage).
- Map the CSV columns to the corresponding table columns.
- Screenshots of the cloud storage (AWS S3 or Azure Blob) with uploaded files.
- PostgreSQL database schema (SQL script) for the
productstable. - A Python script for interacting with the cloud storage and PostgreSQL database (if applicable).
- DBeaver connection details and queries performed on the database.
- A brief report documenting the steps taken and any challenges faced.
- Configuring cloud storage (AWS S3 or Azure Blob Storage).
- Setting up and connecting to a PostgreSQL database.
- Using DBeaver as a SQL client for managing and querying data.
- Integrating cloud storage with PostgreSQL (optional, but adds a real-world dimension).
This is a simple and effective project that will help learners get hands-on experience with cloud services, databases, and SQL client tools while reinforcing key data engineering concepts.