Storage for application data, models, and knowledge base files.
data/
├── app_state.enc # Encrypted application state
├── knowledge_base/ # RAG knowledge base
│ ├── documentation/ # ODCS documentation
│ ├── examples/ # Contract examples
│ ├── best_practices/ # Best practices guides
│ ├── schemas/ # Schema definitions
│ ├── templates/ # Template examples
│ ├── tutorials/ # Tutorial content
│ ├── faiss_index/ # FAISS vector index
│ ├── generated/ # Generated documentation
│ ├── documents.json # Document metadata
│ └── index.json # Knowledge base index
├── models/ # ML models
│ └── models--sentence-transformers--all-MiniLM-L6-v2/
│ └── Sentence transformer embeddings model
└── templates/ # Contract templates
├── custom/ # Custom templates
├── education/ # Education industry
├── finance/ # Finance industry
├── government/ # Government sector
├── healthcare/ # Healthcare industry
├── manufacturing/ # Manufacturing
├── retail/ # Retail industry
└── technology/ # Technology sector
The knowledge base contains:
- Documentation: ODCS field descriptions and usage guides
- Examples: Complete contract examples for reference
- Best Practices: Validation rules and recommendations
- Schemas: JSON schema definitions
- Templates: Industry-specific templates
Vector search index for semantic retrieval:
faiss.index- FAISS index filemetadata.pkl- Document metadata
Auto-generated documentation:
field_documentation.json- Field-level docssearchable_documents.json- Indexed documentsfaiss_validation.json- Index validation results
Pre-trained embedding model for semantic search:
- Model:
all-MiniLM-L6-v2 - Dimension: 384
- Used for: Document embedding and retrieval
Downloaded automatically on first use.
Contract templates organized by industry:
- Custom: User-defined templates
- Education: Educational data contracts
- Finance: Financial data contracts
- Government: Government sector contracts
- Healthcare: Healthcare data contracts
- Manufacturing: Manufacturing data contracts
- Retail: Retail data contracts
- Technology: Technology sector contracts
Each template includes:
- Metadata section
- Scheduling configuration
- Technical ingestion setup
- Functional ingestion setup
app_state.enc stores encrypted application state:
- User preferences
- Session information
- Cached data
from backend.rag.knowledge_base import KnowledgeBase
kb = KnowledgeBase(knowledge_base_path="data/knowledge_base")
results = kb.search("contract metadata", top_k=5)from backend.storage.template_storage import TemplateStorage
storage = TemplateStorage(base_path="data/templates")
templates = storage.list_templates(category="finance")from backend.rag.faiss_store import FAISSVectorStore
store = FAISSVectorStore(index_path="data/knowledge_base/faiss_index")
results = store.search(query_embedding, top_k=10)python backend/scripts/build_faiss_index.pypython backend/scripts/generate_field_docs.pypython backend/scripts/generate_rag_knowledge.py- Local Development: Files stored in
data/directory - Production: Files stored in S3 bucket (configured via environment)
Important files to backup:
knowledge_base/faiss_index/- Vector indextemplates/- Custom templatesapp_state.enc- Application state
- FAISS index: ~50-100 MB
- Models: ~100-200 MB
- Templates: ~10-20 MB
- Total: ~200-300 MB
- Vector search: <100ms for top-10 results
- Template loading: <50ms
- Knowledge base initialization: <1s