Skip to content

jesusdecastro/ingestion-config-agent

Repository files navigation

ODCS Agent - AI-Powered DataContract Assistant

WARNING - Proof of Concept: This is an experimental POC for exploring AI-powered ODCS DataContract generation. The code is functional but not production-ready. Expect rough edges, incomplete error handling, and areas needing refactoring.

Python TypeScript AWS Bedrock Status

POC Disclaimer

This is a Proof of Concept, not production software.

What this means:

  • Functional: Core features work and demonstrate the concept
  • Incomplete: Missing comprehensive error handling, edge case coverage
  • Unoptimized: Performance and scalability not prioritized
  • Limited Testing: Basic testing only, no comprehensive test suite
  • Evolving: APIs and architecture may change significantly
  • Documentation: May lag behind code changes

Use for:

  • Exploring AI-powered contract generation
  • Understanding RAG system integration
  • Evaluating Strands Agents framework
  • Learning ODCS DataContract structure

Do NOT use for:

  • Production deployments
  • Critical business processes
  • Sensitive data processing
  • High-availability requirements

Table of Contents

Overview

ODCS Agent is an intelligent assistant that helps data engineers create, validate, and manage ODCS DataContracts for Databricks and AWS environments. It combines:

  • AI-Powered Generation: Uses Claude Sonnet 4.5 via AWS Bedrock to generate production-ready contracts
  • RAG System: Retrieves relevant ODCS specifications and examples using FAISS vector search
  • Real-time Validation: Validates contracts against Pydantic models with immediate feedback
  • Interactive IDE: Web-based editor with proposal system for reviewing and accepting changes
  • Multi-language Support: Responds in English, Spanish, or French based on user input

What is ODCS?

ODCS (Open Data Contract Standard) is a specification for defining data contracts that describe:

  • Data structure and schema
  • Data quality rules
  • Ownership and governance
  • Ingestion schedules and sources
  • Transformation logic

Key Features

Intelligent Agent

  • Context-Aware: Understands current editor content and conversation history
  • Tool-Augmented: Uses specialized tools for documentation search, validation, and template loading
  • Streaming Responses: Real-time feedback as the agent generates contracts
  • Multi-turn Conversations: Maintains context across multiple interactions

RAG Knowledge Base

  • 23 ODCS Documents: Comprehensive specifications, examples, and best practices
  • Semantic Search: FAISS vector store with Bedrock Titan embeddings (1024 dimensions)
  • Quality Scoring: Ranks results by relevance and confidence
  • Auto-Reingestion: Easy updates when documentation changes

Validation System

  • Pydantic Models: Strict validation against ODCS v1.0.0 schema
  • Detailed Errors: Clear, actionable error messages with fix suggestions
  • Field-Level Validation: Validates individual sections and fields
  • Production-Ready: Ensures generated contracts are immediately usable

Interactive IDE

  • Monaco Editor: Full-featured YAML editor with syntax highlighting
  • Proposal System: Review and accept/reject AI-generated changes
  • Diff View: See exactly what changed before accepting
  • Session Management: Save and load work sessions

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     React Frontend (Vite)                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ Monaco Editor│  │ Chat Interface│  │ Session Mgmt │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP/REST
┌────────────────────────┴────────────────────────────────────┐
│                   Node.js API Server                         │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              FastAPI Python Bridge                    │   │
│  └────────────────────┬─────────────────────────────────┘   │
└───────────────────────┴─────────────────────────────────────┘
                        │
┌───────────────────────┴─────────────────────────────────────┐
│                   ODCS Agent (Strands)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ RAG System   │  │ Validation   │  │ Templates    │      │
│  │ (FAISS)      │  │ (Pydantic)   │  │ (YAML)       │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└────────────────────────┬────────────────────────────────────┘
                         │
┌────────────────────────┴────────────────────────────────────┐
│                    AWS Bedrock                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Claude Sonnet 4.5  │  Titan Embeddings v2          │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Technology Stack

Backend:

  • Python 3.11+ with Strands Agents framework
  • FastAPI for Python bridge
  • AWS Bedrock (Claude Sonnet 4.5, Titan Embeddings)
  • FAISS for vector search
  • Pydantic for validation

Frontend:

  • React 18 with TypeScript
  • Vite for build tooling
  • Monaco Editor for code editing
  • TailwindCSS for styling

Infrastructure:

  • Node.js API server (Express)
  • AWS SSO for authentication
  • Local development with hot reload

Quick Start

Prerequisites

  • Python 3.11+ with uv package manager
  • Node.js 18+ with pnpm
  • AWS Account with Bedrock access
  • AWS CLI configured with SSO

Installation

  1. Clone the repository

    git clone https://github.com/your-org/odcs-agent.git
    cd odcs-agent
  2. Setup Python environment

    .\make.ps1 setup
  3. Install Node.js dependencies

    .\make.ps1 web-install
  4. Configure AWS credentials

    .\make.ps1 aws-login
  5. Start the application

    .\make.ps1 server
  6. Open in browser

    http://localhost:5173
    

First Contract

Try these commands in the chat:

  • English: "Generate a simple ODCS contract"
  • Spanish: "Genera un contrato ODCS simple"
  • French: "Générer un contrat ODCS simple"

The agent will generate a valid contract and propose it for review.

Documentation

Core Documentation

Technical Documentation

User Guides

Development

� Project Structure

odcs-agent/
├── apps/
│   ├── api/              # Node.js API server
│   │   ├── src/          # API routes and services
│   │   └── python_bridge.py  # FastAPI bridge to Python
│   └── web/              # React frontend
│       └── src/          # React components and pages
├── backend/
│   ├── agent/            # Strands agent implementation
│   ├── models/           # Pydantic ODCS models
│   ├── rag/              # RAG system (FAISS, embeddings)
│   ├── storage/          # Storage abstractions
│   ├── templates/        # ODCS templates
│   └── tests/            # Test suite
├── data/
│   └── knowledge_base/   # ODCS documentation (23 docs)
│       ├── schemas/      # Pydantic schema reference
│       ├── examples/     # Minimal valid contracts
│       ├── documentation/# Section guides
│       └── best_practices/  # Validation troubleshooting
├── docs/                 # Project documentation
├── infrastructure/       # Terraform modules (future)
└── scripts/              # Build and utility scripts

Development

Code Quality Status

WARNING - POC Code Quality Notice:

This codebase prioritizes functionality over polish. Expect:

  • Inconsistent patterns: Different approaches in different modules
  • Minimal error handling: Happy path focus, limited edge case coverage
  • Limited validation: Basic input validation only
  • Sparse comments: Code is mostly self-documenting but lacks context
  • No comprehensive tests: Manual testing only
  • Technical debt: Known areas needing refactoring

Areas Needing Improvement:

  1. Error handling and recovery
  2. Input validation and sanitization
  3. Logging consistency and structure
  4. Test coverage (currently <10%)
  5. Code documentation and comments
  6. Performance optimization
  7. Security hardening

Before Production Use:

  • Add comprehensive error handling
  • Implement retry logic for external calls
  • Add input validation at all boundaries
  • Create full test suite (unit, integration, e2e)
  • Security audit and penetration testing
  • Performance testing and optimization
  • Code review and refactoring

Available Commands

# Development
.\make.ps1 server         # Start full stack
.\make.ps1 web-dev        # Frontend only
.\make.ps1 api-dev        # API only
.\make.ps1 python-bridge  # Python bridge only

# Testing
.\make.ps1 test           # Run all tests
.\make.ps1 test-unit      # Unit tests only
.\make.ps1 test-coverage  # With coverage report

# Code Quality
.\make.ps1 format         # Format Python code
.\make.ps1 lint           # Check code quality
.\make.ps1 web-check      # Check frontend code

# Knowledge Base
.\make.ps1 reingest-kb    # Rebuild FAISS index

# AWS
.\make.ps1 aws-check      # Verify AWS setup
.\make.ps1 aws-login      # Login to AWS SSO

Running Tests

# All tests
.\make.ps1 test

# Specific test types
.\make.ps1 test-unit
.\make.ps1 test-integration

# With coverage
.\make.ps1 test-coverage

Code Style

  • Python: PEP 8, 88-char limit, type hints required
  • TypeScript: ESLint + Prettier
  • Commits: Conventional Commits format

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Code of Conduct
  • Development workflow
  • Pull request process
  • Coding standards

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests (.\make.ps1 test)
  5. Commit using conventional commits (git commit -m "feat: add amazing feature")
  6. Push to your fork (git push origin feature/amazing-feature)
  7. Open a Pull Request

Project Status

Current Version: 0.1.0 (Proof of Concept)
Last Updated: 2026-02-18

What Works

  • Agent generates ODCS contracts using Claude Sonnet 4.5
  • RAG system retrieves relevant documentation (23 documents)
  • Pydantic validation catches schema errors
  • Interactive IDE with Monaco editor
  • Multi-language support (EN, ES, FR)
  • Session management for saving work
  • Template system with 5 base templates

Known Limitations

Code Quality:

  • Minimal error handling in many paths
  • Limited input validation
  • Some code duplication
  • Inconsistent logging
  • No retry logic for AWS calls

Testing:

  • No automated test suite
  • Manual testing only
  • No CI/CD pipeline
  • No performance testing

Features:

  • Single-user only (no collaboration)
  • Local storage only (no S3 integration)
  • Limited to 23 documents in knowledge base
  • No schema versioning
  • No contract diff/merge tools

Infrastructure:

  • Development environment only
  • No production deployment
  • No monitoring or alerting
  • No backup/recovery

In Progress

  • Automated testing suite
  • Error handling improvements
  • Code refactoring and cleanup

Planned Improvements

Short Term (1-3 months):

  • Comprehensive test suite (unit, integration, property-based)
  • Better error handling and user feedback
  • Code quality improvements (linting, type checking)
  • CI/CD pipeline setup

Medium Term (3-6 months):

  • Schema versioning support
  • Template marketplace
  • S3 storage integration
  • Performance optimizations

Long Term (6-12 months):

  • Production-ready deployment
  • Collaborative editing
  • Contract diff/merge tools
  • Enterprise features (SSO, RBAC, audit logs)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Contact

For questions or support, please open an issue on GitHub.


Built for the data engineering community

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors