Version: 1.0 Date: 2025-11-17 Phase: 3 (Governance Enhancement) Status: Design → Implementation
This document defines the governance architecture for cortex, building on the observability and validation foundations established in Phases 0-2. The governance layer provides automated compliance checking, data quality monitoring, PII detection, and executive visibility.
Key Objectives:
- Automated compliance with GDPR, SOC2, and internal policies
- Proactive data quality issue detection
- PII detection and protection
- Complete audit trail for governance operations
- Executive dashboard for governance metrics
┌─────────────────────────────────────────────────────────────┐
│ GOVERNANCE LAYER │
│ Built on: Observability (Phase 1) + Validation (Phase 2) │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ PII │ │ Data │ │ Bypass │
│ Scanner │ │ Quality │ │ Auditor │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└───────────────────┼───────────────────┘
▼
┌─────────────────┐
│ Governance │
│ Dashboard │
└─────────────────┘
Automatically detect and flag Personally Identifiable Information (PII) in:
- Worker prompts and context
- Task descriptions
- Agent outputs
- Coordination files
High-Confidence PII:
- Email addresses:
[\w\.-]+@[\w\.-]+\.\w+ - Phone numbers:
\d{3}[-.]?\d{3}[-.]?\d{4} - SSN:
\d{3}-\d{2}-\d{4} - Credit cards:
\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4} - API keys:
(sk|pk)_[a-zA-Z0-9]{20,} - AWS keys:
AKIA[0-9A-Z]{16}
Medium-Confidence PII:
- IP addresses:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} - Usernames: Context-dependent
- Addresses: Natural language processing
File: coordination/governance/lib/pii-scanner.sh
API:
scan_for_pii() {
local content="$1"
local context="${2:-unknown}"
# Returns: JSON with findings
# {
# "has_pii": true/false,
# "findings": [
# {"type": "email", "value": "redacted", "confidence": "high"},
# {"type": "api_key", "value": "redacted", "confidence": "high"}
# ],
# "risk_level": "high|medium|low"
# }
}
redact_pii() {
local content="$1"
# Returns: content with PII redacted
}Actions on Detection:
- Log to governance log
- Emit observability event
- Flag for review if high confidence
- Auto-redact if configured
- Block operation if critical PII detected
Proactively detect data quality issues before they cause system failures.
1. Schema Validation
- All JSON matches defined schemas
- Required fields present
- Data types correct
- Enum values valid
2. Referential Integrity
- Worker references valid tasks
- Tasks reference valid repositories
- Cross-references are consistent
3. Data Freshness
- Worker specs updated recently
- Task queue not stale
- Metrics current
4. Data Completeness
- No empty required fields
- Descriptions meaningful (not "TODO")
- Context has required information
5. Data Consistency
- Worker status matches reality
- Token budgets within limits
- Timestamps logical
File: coordination/governance/lib/quality-monitor.sh
API:
check_data_quality() {
local file_path="$1"
local schema_name="${2:-auto}"
# Returns: Quality report JSON
# {
# "overall_score": 95,
# "issues": [
# {"severity": "warning", "check": "freshness", "message": "..."},
# {"severity": "error", "check": "schema", "message": "..."}
# ],
# "passed": 18,
# "failed": 2
# }
}Monitoring Daemon: scripts/daemons/quality-monitor-daemon.sh
- Runs every 5 minutes
- Checks all active workers, tasks, coordination files
- Emits observability events
- Updates quality dashboard
Track and audit all governance bypass operations to ensure accountability.
- Environment Variable:
GOVERNANCE_BYPASS=true - Explicit Flag:
--bypass-governance - Emergency Mode: System-wide bypass
- Schema Override: Writing without validation
File: coordination/governance/bypass-audit.jsonl
Format:
{
"timestamp": "2025-11-17T21:45:00Z",
"trace_id": "trace-...",
"bypass_type": "governance|validation|access",
"principal": "user@example.com",
"component": "spawn-worker.sh",
"reason": "Emergency production fix",
"approved_by": "manager@example.com",
"duration_minutes": 30,
"scope": {
"operation": "worker-spawn",
"resources": ["worker-emergency-001"]
},
"risk_level": "high"
}File: coordination/governance/lib/bypass-auditor.sh
API:
audit_bypass() {
local bypass_type="$1"
local reason="$2"
local approved_by="${3:-none}"
# Logs bypass to audit trail
# Emits observability event
# Checks if bypass is authorized
# Returns: 0 if allowed, 1 if denied
}
check_bypass_authorization() {
local principal="$1"
local bypass_type="$2"
# Checks authorization matrix
# Returns: 0 if authorized, 1 if not
}Authorization Matrix:
User Role | Validation | Governance | Access
-----------------|------------|------------|--------
Developer | ❌ | ❌ | ❌
Senior Engineer | ✅ (30min) | ❌ | ❌
Tech Lead | ✅ (2hr) | ✅ (30min) | ❌
Engineering Mgr | ✅ (8hr) | ✅ (2hr) | ✅ (30min)
Director | ✅ | ✅ | ✅
Provide executive and operational visibility into governance health.
Compliance Metrics:
- PII incidents (last 7/30/90 days)
- Data quality score (0-100)
- Bypass operations (count, duration)
- Policy violations
- Audit coverage percentage
Operational Metrics:
- Active workers with quality issues
- Tasks with data problems
- Configuration drift
- Schema validation failures
Trend Analysis:
- Quality score over time
- PII incidents trend
- Bypass frequency
- Violation patterns
File: coordination/governance/dashboard-summary.json
Format:
{
"generated_at": "2025-11-17T21:45:00Z",
"period": "24h",
"compliance": {
"pii_incidents": 0,
"data_quality_score": 95,
"bypass_count": 3,
"violations": 1,
"audit_coverage": 100
},
"quality": {
"workers_with_issues": 2,
"tasks_with_issues": 0,
"schema_failures": 0,
"referential_errors": 1
},
"trends": {
"quality_trend": "improving",
"pii_trend": "stable",
"bypass_trend": "stable"
},
"alerts": [
{
"severity": "warning",
"message": "2 workers have stale timestamps",
"action": "Review worker status"
}
]
}CLI Tool: scripts/governance-report.sh
Usage:
# Show current governance status
./scripts/governance-report.sh --summary
# Generate compliance report
./scripts/governance-report.sh --compliance --period 30d
# Show PII incidents
./scripts/governance-report.sh --pii-incidents --since 7d
# Export for auditors
./scripts/governance-report.sh --export --format csv- All governance events → observability events
- Trace IDs link governance to operations
- Dashboard queries observability indices
- PII scanner runs during validation
- Quality checks integrated into safe_write_json()
- Bypass auditing wraps validation overrides
- Bypass authorization checks access matrix
- Principal identity from COMMIT_RELAY_PRINCIPAL
- Audit trail includes access decisions
Right to be Forgotten:
- PII scanner identifies personal data
- Redaction tools for data removal
- Audit trail of deletions
Data Minimization:
- Quality checks flag unnecessary PII
- Auto-redaction reduces PII storage
- Retention policies enforced
Accountability:
- Complete audit trail
- Bypass tracking
- Principal attribution
Security:
- Access control integration
- Bypass requires authorization
- Audit trail immutable
Availability:
- Quality monitoring prevents outages
- Proactive issue detection
- Dashboard visibility
Confidentiality:
- PII detection and redaction
- Sensitive data flagging
- Access logging
Week 1 (Current):
- ✅ Architecture design (this document)
- PII scanner implementation
- Quality monitor core functions
- Bypass auditor framework
Week 2:
- Complete quality monitoring
- Dashboard data generation
- CLI reporting tool
- Integration testing
Week 3:
- Compliance framework validation
- Performance optimization
- Documentation
- Production deployment
Phase 3 Complete When:
- ✅ PII scanner catches 95%+ of known patterns
- ✅ Data quality score >90% system-wide
- ✅ 100% of bypasses logged and attributed
- ✅ Dashboard updates every 5 minutes
- ✅ Zero compliance violations in production
Long-term KPIs:
- PII incidents: 0 per month
- Data quality: >95% average
- Unauthorized bypasses: 0
- Audit coverage: 100%
- Compliance ready: <1 week for audits
- Audit Trail Integrity: Append-only, immutable logs
- PII in Logs: Scanner itself doesn't log PII values
- Access Control: Bypass requires proper authorization
- Encryption: Sensitive governance data encrypted at rest
- Retention: Audit logs kept for 7 years (compliance requirement)
Next Steps: Begin implementation with PII scanner (highest risk reduction)