Last Updated: 2025-11-02 Status: Current and Accurate (Generated from actual code analysis) Codebase Version: Production-ready with Script-Based Injection Architecture
The Code Telemetry Injector is a sophisticated code instrumentation system with two distinct execution pipelines:
- Script-Based Pipeline (--use-scripts flag) - 98.7% faster on cache hits, 0 LLM calls!
- Traditional Pipeline (default) - 2-3 LLM calls per file, flexible for complex cases
Key Innovation: Script-based injection achieves 100% cost savings on repeat runs by generating reusable insertion scripts and caching them by hash.
- System Architecture Overview
- Two Pipeline Modes
- When AI/LLM is Actually Used
- Component Catalog
- Data Flow Diagrams
- Performance & Cost Analysis
- Configuration
- Deployment Guide
flowchart TD
A[telemetry-inject.py<br/>CLI Entry Point] --> B[src/cli.py<br/>Main Orchestrator]
B --> C[Configuration]
C -->|Load| C1[.env file]
C -->|Parse| C2[Command-line flags]
B --> D[Provider Detection]
D --> D1[OpenAI/Anthropic/Ollama]
B --> E[Cost Tracking &<br/>Budget Management]
B --> F{Mode Selection}
F -->|Default| G[Traditional Pipeline<br/>2-3 LLM calls per file]
F -->|--use-scripts| H[Script-Based Pipeline<br/>0-1 LLM calls per function<br/>cache hit = 0 LLM calls!]
F --> I[Analysis Layer<br/>HybridAnalyzer]
I --> J[TreeSitterAnalyzer<br/>Python/JS/Go<br/>NO LLM]
I --> K[LLMAnalyzer<br/>Other languages<br/>YES LLM]
style H fill:#90EE90
style J fill:#90EE90
| Aspect | Traditional Pipeline | Script-Based Pipeline |
|---|---|---|
| Speed (first run) | 3-6 seconds/file | 7ms/function (template-based) |
| Speed (cached) | Same (no cache) | 0.3ms/function (98.7% faster!) |
| LLM Calls | 2-3 per file | 0 (cached), 0-3 (failures only) |
| Cost | $0.06-0.20/file | $0.00 (cached), $0.00-0.05 (failures) |
| Use Case | Simple projects | Large codebases, CI/CD |
| Deterministic | No (LLM variance) | Yes (template-based) |
File: src/cli.py:39-232 (async def process_with_scripts)
Flow:
1. CodeScanner β Find code files
2. FunctionExtractor β Parse functions (NO LLM, pure AST)
3. TelemetryGenerator β Generate snippets (template-based)
4. ParallelScriptProcessor β Process up to 12 functions concurrently
ββ> ScriptGenerator β Generate insertion script
β ββ> Load lessons from docs/lessons/
β ββ> Template-based generation (NO LLM!)
β ββ> Calculate SHA256 hash
ββ> ScriptCache β Check cache by hash
β ββ> If CACHED: Execute cached script β
(0.3ms, $0)
β ββ> If NOT CACHED: Continue...
ββ> TestGenerator β Generate pytest tests
ββ> ScriptValidator β Syntax + Security + Tests
β ββ> If PASS: Cache script, execute β
β ββ> If FAIL: ScriptRefactorer.refactor() [LLM CALL]
β ββ> Retry up to 3 times with LLM fixes
ββ> ScriptSandbox β Execute in isolated subprocess
5. FileReconstructor β Rebuild file with instrumented code
6. Write to output directory
LLM Usage: 0 calls on cache hit, 0-3 calls on test failures only
Key Components:
src/parallel_script_processor.py- Async batch processing (12 workers)src/script_generator.py:34-603- Template-based script generationsrc/script_cache.py- Hash-based storage (.telemetry_cache/)src/script_sandbox.py- Isolated executionsrc/script_validator.py- Syntax/security/test validationsrc/script_refactorer.py- LLM-powered self-healing (failures only)
Cache Structure:
.telemetry_cache/
βββ scripts/python/calculate_ema_5fffbf67.py
βββ tests/python/test_calculate_ema_5fffbf67.py
βββ metadata/cache_index.json
File: src/cli.py:329-653 (def main)
Flow:
1. CodeScanner β Find code files
2. HybridAnalyzer β Analyze code
ββ> If Python/JS/Go: TreeSitterAnalyzer (NO LLM) β
ββ> If Other: LLMAnalyzer [LLM CALL #1]
3. TelemetryGenerator β Generate telemetry snippets (template-based)
4. RetryInjector β Inject with retry logic
ββ> CodeInjector [LLM CALL #2]
ββ> Generate injection prompt
ββ> Call LLM with code + telemetry
ββ> Validate output (syntax check)
ββ> If failed N times: ReflectionEngine [LLM CALL #3]
ββ> Analyze failure, provide guidance, retry
5. FileReconstructor β Rebuild file
6. Write to output directory
LLM Usage: 2-3 calls per file (1-2 for analysis if not Python/JS/Go, 1 for injection, maybe 1 for reflection)
Key Components:
src/hybrid_analyzer.py- Smart selector (tree-sitter first, LLM fallback)src/tree_sitter_analyzer.py- Fast AST parsing (10-100x faster, $0)src/llm_analyzer.py:179-329- LLM-based analysis [AI CALL]src/code_injector.py:34-250- LLM-based injection [AI CALL]src/reflection_engine.py:42-180- Failure analysis [AI CALL]src/retry_injector.py- Retry wrapper with validation
Based on code analysis, AI is called from exactly 9 source files:
Purpose: Code analysis when tree-sitter doesn't support the language When: Traditional pipeline only, NOT called for Python/JS/Go/TypeScript Provider: OpenAI, Anthropic, or Ollama Cost: ~$0.01-0.10 per file Time: 2-10 seconds
Triggers:
- Language not supported by tree-sitter (Ruby, Rust, C++, etc.)
prefer_tree_sitter=Falseflag (override)force_tree_sitter=Truewill skip this (error instead)
Code Location:
# src/llm_analyzer.py:179-329
def analyze_code(self, code: str, language: str) -> AnalysisResult:
response = self.client.messages.create( # β LLM API CALL
model=self.model,
max_tokens=4096,
temperature=0.1,
messages=[{"role": "user", "content": prompt}]
)Purpose: Insert telemetry code into source files When: Traditional pipeline only (NOT used in script mode) Provider: OpenAI, Anthropic, or Ollama Cost: ~$0.05-0.10 per file Time: 2-5 seconds
Triggers:
- Every file in traditional mode
- Called by RetryInjector wrapper
Code Location:
# src/code_injector.py:34-250
def inject(self, code, snippets, language, ...) -> InjectionResult:
response = self.client.messages.create( # β LLM API CALL
model=model,
max_tokens=16000,
temperature=0.1,
messages=[{"role": "user", "content": injection_prompt}]
)Prompt: 37+ lines with examples, enforces "Don't Repeat Yourself" principle
Purpose: Analyze failure patterns and provide guidance When: Traditional pipeline, after N injection failures (default: 2) Provider: OpenAI, Anthropic, or Ollama Cost: ~$0.02-0.05 per reflection Time: 1-2 seconds
Triggers:
- Injection fails
--reflection-thresholdtimes (default: 2) - Only in traditional mode
- Optional feature (can be disabled)
Code Location:
# src/reflection_engine.py:42-180
def reflect(self, code, snippets, ...) -> Dict:
response = self.client.messages.create( # β LLM API CALL
model=model,
max_tokens=4096,
temperature=0.1,
messages=[{"role": "user", "content": reflection_prompt}]
)Purpose: Self-healing for script generation failures When: Script mode only, when template-based script fails pytest tests Provider: OpenAI, Anthropic, or Ollama Cost: ~$0.02-0.05 per refactor attempt Time: 1-3 seconds
Triggers:
- Script mode (
--use-scripts) - Only when ScriptValidator tests FAIL
- Rare if templates are well-designed
- Max attempts: configurable (default: 3)
Code Location:
# src/script_refactorer.py:40-180
def refactor(self, script_code, validation_result, ...) -> str:
response = self.client.messages.create( # β LLM API CALL
model=model,
max_tokens=8192,
temperature=0.1,
messages=[{"role": "user", "content": refactor_prompt}]
)Purpose: Generate telemetry snippets (rare, mostly template-based) When: Complex patterns not covered by templates Provider: OpenAI, Anthropic, or Ollama Cost: ~$0.01-0.03 per generation Time: 1-2 seconds
Triggers:
- Template generation fails (extremely rare)
- Complex custom patterns
- Most uses are template-based (NO LLM)
- api_checker.py - Tests API connectivity during setup
- config_menu.py - Interactive configuration wizard
- retry_injector.py - Wrapper around code_injector (not a new LLM call)
- script_generator.py - Template-first, LLM fallback (uses script_refactorer)
| Component | Mode | Frequency | Cost/File | Time | Can Avoid? |
|---|---|---|---|---|---|
| LLMAnalyzer | Traditional | 0-1x | $0.01-0.10 | 2-10s | β Yes (use Python/JS/Go) |
| CodeInjector | Traditional | 1x | $0.05-0.10 | 2-5s | β Yes (use --use-scripts) |
| ReflectionEngine | Traditional | 0-1x | $0.02-0.05 | 1-2s | β Yes (good prompts) |
| ScriptRefactorer | Script | 0-3x | $0.02-0.05 | 1-3s | β Yes (good templates) |
| Total (Traditional) | - | 2-3x | $0.08-0.25 | 5-17s | Partial |
| Total (Script, cached) | - | 0x | $0.00 | 0.3ms | β Yes (100%!) |
| Total (Script, uncached) | - | 0-3x | $0.00-0.15 | 1-9s | Partial |
- src/cli.py (985 lines) - Main entry point, pipeline orchestrator
- telemetry-inject.py (3 lines) - Thin wrapper for CLI
- src/scanner.py - Find code files, detect languages
- src/hybrid_analyzer.py - Smart analyzer selector
- src/tree_sitter_analyzer.py - Fast AST parsing (Python/JS/Go/TS) - NO LLM
- src/llm_analyzer.py - LLM-based analysis - YES LLM π€
- src/function_extractor.py - Extract functions from code
- src/telemetry_generator.py - Generate telemetry snippets (mostly templates)
- src/code_injector.py - LLM-based injection - YES LLM π€
- src/retry_injector.py - Retry wrapper with validation
- src/reflection_engine.py - Failure analysis - YES LLM π€
- src/file_reconstructor.py - Rebuild files from functions
- src/parallel_script_processor.py - Async batch processing (12 workers)
- src/script_generator.py - Template-based script generation (LLM fallback)
- src/script_cache.py - Hash-based caching system
- src/script_sandbox.py - Isolated subprocess execution
- src/script_validator.py - Syntax/security/test validation
- src/script_refactorer.py - LLM self-healing - YES LLM π€
- src/test_generator.py - Generate pytest tests
- src/cost_tracker.py - Track API costs, enforce budgets
- src/token_detector.py - Auto-detect model token limits
- src/verbose_logger.py - User-facing logging
- src/debug_trace_logger.py - Internal tracing
- src/telemetry_utils_writer.py - Write utility libraries
- src/config_menu.py - Interactive configuration
- src/api_checker.py - Test API connectivity
- src/model_manager.py - Model listing and recommendations
- src/ollama_model_pool.py - Multi-GPU model rotation (Ollama)
flowchart TD
START[User runs: python telemetry-inject.py ./src --use-scripts -v]
START --> SCAN
SCAN[1. SCAN PHASE<br/>CodeScanner<br/>Find all code files, detect languages<br/>Time: 100ms | Cost: $0 | LLM: NO]
SCAN --> EXTRACT
EXTRACT[2. EXTRACT PHASE<br/>FunctionExtractor<br/>Parse functions using AST<br/>Time: 1-5ms/function | Cost: $0 | LLM: NO]
EXTRACT --> ANALYZE
ANALYZE{3. ANALYZE PHASE<br/>HybridAnalyzer}
ANALYZE -->|Python/JS/Go| TREE[TreeSitterAnalyzer<br/>Time: <10ms | Cost: $0<br/>NO LLM]
ANALYZE -->|Other Languages| LLM_A[LLMAnalyzer π€<br/>Time: 2-10s | Cost: $0.01-0.10<br/>YES LLM]
TREE --> GENERATE
LLM_A --> GENERATE
GENERATE[4. GENERATE PHASE<br/>TelemetryGenerator<br/>Generate telemetry snippets<br/>Time: <1ms/snippet | Cost: $0 | LLM: NO]
GENERATE --> PARALLEL
PARALLEL[5. PARALLEL PROCESSING<br/>ParallelScriptProcessor<br/>Up to 12 concurrent workers]
PARALLEL --> SCRIPT_GEN
SCRIPT_GEN[ScriptGenerator.generate<br/>Load lessons, Generate script, Calculate hash]
SCRIPT_GEN --> CACHE_CHECK
CACHE_CHECK{ScriptCache.get}
CACHE_CHECK -->|CACHE HIT β
| EXECUTE_CACHED[Execute cached script<br/>0.3ms, $0<br/>DONE!]
CACHE_CHECK -->|CACHE MISS| TEST_GEN
TEST_GEN[TestGenerator.generate<br/>NO LLM]
TEST_GEN --> VALIDATE
VALIDATE[ScriptValidator.validate<br/>Syntax + Security + pytest]
VALIDATE --> TEST_RESULT{Tests?}
TEST_RESULT -->|PASS β
| STORE[ScriptCache.store<br/>Execute script<br/>DONE!]
TEST_RESULT -->|FAIL β| REFACTOR
REFACTOR[ScriptRefactorer.refactor<br/>LLM CALL π€<br/>Analyze error, Load lessons<br/>Generate improved version<br/>Retry validation up to 3 attempts]
REFACTOR --> VALIDATE
EXECUTE_CACHED --> RECONSTRUCT
STORE --> RECONSTRUCT
RECONSTRUCT[6. RECONSTRUCT PHASE<br/>FileReconstructor<br/>Replace original functions<br/>Time: 1-5ms | Cost: $0 | LLM: NO]
RECONSTRUCT --> WRITE
WRITE[7. WRITE PHASE<br/>File I/O<br/>Write instrumented code<br/>Time: 5-10ms | Cost: $0 | LLM: NO]
WRITE --> END[π Instrumented Code!]
style EXECUTE_CACHED fill:#90EE90
style STORE fill:#90EE90
style TREE fill:#90EE90
style END fill:#FFD700
Total for Script Mode:
- First run (template-based): 7-15ms/function, $0
- Cached run: 0.3ms/function, $0 (98.7% faster!)
- Failures only: +1-3 LLM calls, +$0.02-0.05
flowchart TD
START[User runs: python telemetry-inject.py ./src -v]
START --> SCAN
SCAN[1. SCAN PHASE<br/>CodeScanner<br/>Find all code files, detect languages<br/>Time: 100ms | Cost: $0 | LLM: NO]
SCAN --> ANALYZE
ANALYZE{2. ANALYZE PHASE<br/>HybridAnalyzer}
ANALYZE -->|Python/JS/Go| TREE[TreeSitterAnalyzer<br/>Time: <10ms | Cost: $0<br/>NO LLM]
ANALYZE -->|Other Languages| LLM_A[LLMAnalyzer<br/>CALL #1 π€<br/>Time: 2-10s | Cost: $0.01-0.10<br/>YES LLM]
TREE --> GENERATE
LLM_A --> GENERATE
GENERATE[3. GENERATE PHASE<br/>TelemetryGenerator<br/>Generate telemetry snippets<br/>Time: 50ms | Cost: $0 | LLM: NO]
GENERATE --> INJECT
INJECT[4. INJECT PHASE<br/>RetryInjector β CodeInjector<br/>CALL #2 π€<br/>Generate prompt, Call LLM<br/>Validate syntax, Retry if failed<br/>Time: 2-5s | Cost: $0.05-0.10 | LLM: YES]
INJECT --> INJECT_RESULT{Result?}
INJECT_RESULT -->|Success β
| RECONSTRUCT
INJECT_RESULT -->|Failed N times β| REFLECT
REFLECT[5. REFLECT PHASE<br/>ReflectionEngine<br/>CALL #3 π€<br/>Analyze failure patterns<br/>Build reflection prompt<br/>Call LLM for insights<br/>Time: 1-2s | Cost: $0.02-0.05 | LLM: YES]
REFLECT --> INJECT
RECONSTRUCT[6. RECONSTRUCT PHASE<br/>FileReconstructor<br/>Rebuild file with instrumented code<br/>Time: 1-5ms | Cost: $0 | LLM: NO]
RECONSTRUCT --> WRITE
WRITE[7. WRITE PHASE<br/>File I/O<br/>Write instrumented code<br/>Time: 5-10ms | Cost: $0 | LLM: NO]
WRITE --> END[π Instrumented Code!]
style TREE fill:#90EE90
style END fill:#FFD700
Total for Traditional Mode:
- Python/JS/Go: 2-7s/file, $0.05-0.15 (1-2 LLM calls)
- Other langs: 4-12s/file, $0.08-0.25 (2-3 LLM calls)
- With reflection: +1-2s, +$0.02-0.05 (1 additional call)
| Metric | Traditional | Script (First Run) | Script (Cached) | Improvement |
|---|---|---|---|---|
| Time | 500-700s | 0.7-1.5s | 0.03s | 23,333x faster |
| LLM Calls | 200-300 | 0-30 (failures) | 0 | 100% reduction |
| Cost | $10-25 | $0-1.50 | $0 | 100% savings |
| Deterministic | No | Yes | Yes | Perfect |
| Metric | Traditional | Script (First Run) | Script (Cached) | Improvement |
|---|---|---|---|---|
| Time | 600-1200s | 1.0-3.0s | 0.03s | 40,000x faster |
| LLM Calls | 300-400 | 0-50 (failures) | 0 | 100% reduction |
| Cost | $15-35 | $0-2.50 | $0 | 100% savings |
| Deterministic | No | Yes | Yes | Perfect |
| Phase | Time | LLM Calls | Cost |
|---|---|---|---|
| First Run | 1015ms | 0 | $0 |
| Cached Run | 15ms | 0 | $0 |
| Speedup | 98.7% | - | - |
| Result | 156 telemetry calls inserted, all tests pass | β | β |
# Provider Selection (auto-detects if not specified)
LLM_PROVIDER=openai|anthropic|ollama
# API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Model Configuration
LLM_MODEL=gpt-4|claude-3-5-sonnet-20241022|codellama
LLM_BASE_URL=http://localhost:11434/v1 # For Ollama
LLM_TIMEOUT=120 # seconds (default: 600 for Ollama, 120 for cloud)
# Feature Flags
DEBUG=true|false
DEBUG_TRACE=true|false
DEBUG_TRACE_LEVEL=TRACE|DEBUG|INFO|WARNING|ERROR
# Parallel Processing
MAX_PARALLEL=12 # Concurrent workers (script mode)# Mode Selection
--use-scripts # Use script-based injection (recommended!)
# Processing Options
--max-parallel N # Override max concurrent workers (default: 12)
--no-parallel # Disable parallel processing
--max-retries N # Max retry attempts (default: 3)
--reflection-threshold N # Failures before reflection (default: 2)
# Output Options
-o, --output DIR # Output directory
--dry-run # Analyze without writing files
-v, --verbose # Detailed progress output
--validate # Validate instrumented code (default: on)
--no-validate # Skip validation
# LLM Configuration
--model NAME # Model override
--base-url URL # Custom API endpoint
--api-key KEY # API key override
--budget AMOUNT # Max API spend before abort
# Debugging
--debug-trace # Enable execution tracing
--trace-level LEVEL # Trace verbosity
--no-trace-console # Write trace to file only1. Local Development (Free, Fast)
# Use Ollama locally
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_MODEL=codellama
# Run with script-based caching
python telemetry-inject.py ./src --use-scripts -v2. Production CI/CD (Cost-Optimized)
# Use script-based mode for speed + caching
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export LLM_MODEL=gpt-4
python telemetry-inject.py ./src --use-scripts --budget 5.00 --validate3. Cloud with Budget Limit
# OpenAI with strict budget
export OPENAI_API_KEY=sk-...
export LLM_MODEL=gpt-4
python telemetry-inject.py ./src --budget 10.00 -v# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure
python telemetry-inject.py --configure
# 3. Run (recommended: script-based)
python telemetry-inject.py ./src --use-scripts -v- Use Script Mode: Always prefer
--use-scriptsfor speed and cost savings - Set Budget Limits: Use
--budgetto prevent runaway costs - Enable Validation: Keep
--validateon (default) for quality assurance - Use Ollama Locally: Free inference, great for development
- Monitor Cache: Check
.telemetry_cache/metadata/cache_index.jsonfor stats
# GitHub Actions example
- name: Instrument Code
run: |
python telemetry-inject.py ./src \
--use-scripts \
--validate \
--budget 5.00 \
--output ./instrumented
- name: Check Cache Hit Rate
run: |
python -c "
import json
with open('.telemetry_cache/metadata/cache_index.json') as f:
data = json.load(f)
print(f'Cached scripts: {len(data)}')
"# Check cache statistics
cat .telemetry_cache/metadata/cache_index.json | jq 'length'
# View debug trace
export DEBUG_TRACE=true
export DEBUG_TRACE_LEVEL=INFO
python telemetry-inject.py ./src --use-scripts -v
# Analyze costs
grep -r "cost_usd" logs/debug_trace_*.jsonl | jq -s 'map(.data.cost_usd) | add'The Code Telemetry Injector achieves 98.7% performance improvement and 100% cost savings on cached runs through:
- Smart Analysis: Tree-sitter for Python/JS/Go (NO LLM), LLM fallback for others
- Template-First Generation: Deterministic scripts, LLM only on failures
- Hash-Based Caching: Perfect deduplication, instant retrieval
- Parallel Processing: Up to 12 concurrent workers
- Self-Healing: Automatic refactoring on test failures
Recommendation: Always use --use-scripts for production workloads.
Last Updated: 2025-11-02 Maintained By: Automated code analysis + human review Next Review: When major features are added