Welcome to the google-cloud monorepo. This repository serves as a consolidated hub for various data engineering, AI/ML, and cloud architecture solutions built on Google Cloud Platform.
Each project within this monorepo is designed to solve specific challenges ranging from industrial data harmonization and natural language analysis to real-time data security and advanced BigQuery integrations.
| Project | Description | Technology Stack |
|---|---|---|
vector-matching-rrf-pipeline |
Industrial parts catalog harmonization using Vector Search and AI reasoning agents. | BigQuery ML, Vector Search, Go, Vertex AI |
data-insights-agent |
Natural language data analysis tool that enables querying BigQuery using plain English. | Google ADK, FastAPI, React, Vertex AI |
biglake-iceberg-pipeline |
Event-driven data pipeline using BigQuery Iceberg tables and AI-powered cleaning agents. | BigLake, Iceberg, Cloud Run, Gemini |
google-cloud-bigquery-pii-masking-pipeline |
Real-time PII masking pipeline for BigQuery data using Cloud DLP. | Dataflow, Cloud DLP, BigQuery |
bigquery-gemini-with-remote-functions |
Implementation of calling Gemini LLM models directly from BigQuery SQL. | BigQuery Remote Functions, Cloud Functions, Gemini |
bigquery-fuzzy-match-embeddings-example |
Fuzzy record matching and customer deduplication using BigQuery embeddings. | BigQuery ML, Text Embeddings |
A dual-architecture approach to harmonizing disparate industrial parts catalogs. It utilizes BigQuery's native Vector Distance indexing for bulk resolution and an autonomous Go-based reasoning agent for complex edge-case verification.
An intelligent interface for data analysis. It leverages the Google Agent Development Kit (ADK) to transform natural language questions into optimized SQL queries, providing instant insights and visualizations.
Demonstrates modern open-table formats on Google Cloud. This pipeline processes data into Iceberg format using BigLake, with an integrated AI agent to ensure high data quality and automated schema alignment.
A security-first data pipeline that automatically detects and masks sensitive PII (Personally Identifiable Information) in real-time as data flows through BigQuery, ensuring compliance with privacy regulations.
A practical example showing how to extend BigQuery SQL with generative AI capabilities. By using remote functions, you can invoke Vertex AI's Gemini models directly within your SQL queries for text analysis, summarization, or translation.
A specialized example for data deduplication. It uses BigQuery ML to generate embeddings for customer records and calculates similarity scores to find and match inconsistent records (e.g., spelling variations, different formats).
Each directory is a self-contained project. Navigate to the individual project folders to find specific deployment guides, prerequisites, and documentation.
# Example: Navigate to the Data Insights Agent
cd data-insights-agentMost projects within this monorepo are licensed under the MIT License. Please check the LICENSE file within each project directory for specific terms.