Add GitHub Copilot repository instructions and infrastructure config#8
Conversation
- Create .github/copilot/instructions.md with comprehensive repository instructions - Create config/matter.yaml with full system configuration - Create infrastructure directories: k8s/, argo/, terraform/, scripts/ - Add README.md files documenting each infrastructure component - Add .gitkeep files to track empty directories Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
|
Update .github/workflows/codeql.yml to use github/codeql-action@v3 in all steps. |
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Updated all CodeQL action references to v3 in commit f9dcc86. Changes made:
Note: Code scanning must be enabled in repository settings (Security → Code security and analysis → Code scanning) to allow CodeQL to upload results. This requires repository admin access. |
There was a problem hiding this comment.
Pull Request Overview
This PR successfully sets up GitHub Copilot repository instructions and supporting infrastructure documentation for the CVE Matter-Analysis OS. The implementation provides comprehensive guidance for AI agents working on this defense-only vulnerability analysis pipeline, along with detailed documentation for Kubernetes, Argo Workflows, Terraform, and utility scripts.
Key changes:
- Created comprehensive Copilot instructions with 15 sections covering agent operating rules, task workflow, CI/CD gates, and troubleshooting
- Added detailed configuration file (config/matter.yaml) with 22 top-level sections defining system behavior
- Established infrastructure documentation framework (k8s/, argo/, terraform/, scripts/) with README files
Reviewed Changes
Copilot reviewed 7 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
.github/copilot/instructions.md |
Comprehensive 478-line guide covering agent rules, task order, quality gates, and development standards |
config/matter.yaml |
Complete 392-line YAML configuration defining pipeline behavior, thresholds, and infrastructure settings |
terraform/README.md |
Infrastructure as Code documentation covering GKE cluster, GPU nodes, and GCS storage setup |
k8s/README.md |
Kubernetes manifests documentation for deployments, CRDs, admission webhooks, and runtime classes |
argo/README.md |
Argo Workflows documentation for orchestrating epsilon-sweep pipeline with nightly scheduling |
scripts/README.md |
Utility scripts documentation covering deployment, Notion sync, capsule publishing, and maintenance |
.github/workflows/codeql.yml |
Updated CodeQL actions from v2 to v3 for security scanning |
.gitkeep files |
Created empty directories (config/, k8s/, argo/, terraform/, scripts/) for version control |
Comments suppressed due to low confidence (1)
.github/workflows/codeql.yml:72
- [nitpick] The separate "Upload SARIF results" step (lines 67-72) may be redundant. The
github/codeql-action/analyze@v3action automatically uploads results to GitHub Code Scanning. This separate upload step is typically only needed when generating SARIF files independently or when the analyze action fails. Consider removing this step unless there's a specific reason to keep it, as it may cause confusion or duplicate uploads.
- name: Upload SARIF results
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: ../results
continue-on-error: true
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| GPU node pool configuration (Task 080): | ||
| - **Machine Type**: n1-standard-4 (or custom) | ||
| - **Accelerator**: NVIDIA Tesla T4 (or T4, V100, A100) |
There was a problem hiding this comment.
The text "(or T4, V100, A100)" is redundant since "NVIDIA Tesla T4" already includes "T4". Consider changing to "(or V100, A100)" or "(e.g., T4, V100, A100)" to avoid redundancy.
| - **Accelerator**: NVIDIA Tesla T4 (or T4, V100, A100) | |
| - **Accelerator**: NVIDIA Tesla T4 (e.g., T4, V100, A100) |
| - **Python 3.11+**: Primary language | ||
| - **NumPy, SciPy, scikit-learn**: Numerical computing and ML | ||
| - **Requests**: HTTP client for NVD API | ||
| - **StructLog**: Structured logging |
There was a problem hiding this comment.
Inconsistent capitalization of library name. The correct package name is structlog (all lowercase), as shown in requirements.txt. Should be "structlog" not "StructLog".
| - **StructLog**: Structured logging | |
| - **structlog**: Structured logging |
|
|
||
| ### Monitoring & Observability | ||
| - **Prometheus**: Metrics collection | ||
| - **Structlog**: Structured logging (JSON) |
There was a problem hiding this comment.
Inconsistent capitalization of library name. The correct package name is structlog (all lowercase), as shown in requirements.txt. Should be "structlog" not "Structlog".
| - **Structlog**: Structured logging (JSON) | |
| - **structlog**: Structured logging (JSON) |
| artifacts: | ||
| repository: | ||
| type: gcs # Options: gcs, s3, minio | ||
| bucket: "cve-analysis-artifacts" |
There was a problem hiding this comment.
Inconsistent bucket naming. Line 282 specifies the Argo artifacts bucket as cve-analysis-artifacts, but line 310 specifies the infrastructure artifacts bucket as cve-artifacts. These should refer to the same bucket. Consider using cve-artifacts consistently (matching the naming pattern of cve-capsules and cve-logs), or update both to use cve-analysis-artifacts.
| bucket: "cve-analysis-artifacts" | |
| bucket: "cve-artifacts" |
| capsules_bucket_name = "cve-capsules-bucket" | ||
| artifacts_bucket_name = "cve-artifacts-bucket" | ||
| logs_bucket_name = "cve-logs-bucket" |
There was a problem hiding this comment.
Inconsistent bucket naming between terraform/README.md and config/matter.yaml. The terraform example uses bucket names with -bucket suffix (e.g., cve-capsules-bucket), but config/matter.yaml lines 309-311 use names without the suffix (e.g., cve-capsules). These should be aligned for consistency.
| capsules_bucket_name = "cve-capsules-bucket" | |
| artifacts_bucket_name = "cve-artifacts-bucket" | |
| logs_bucket_name = "cve-logs-bucket" | |
| capsules_bucket_name = "cve-capsules" | |
| artifacts_bucket_name = "cve-artifacts" | |
| logs_bucket_name = "cve-logs" |
Task: Set up Copilot instructions ✓
Successfully set up GitHub Copilot repository instructions to onboard the CVE Matter-Analysis OS repository.
Completion Status
.github/copilot/instructions.mdwith comprehensive repository instructions.copilot/AGENT_GUIDE.mdand task definitionsconfig/matter.yamlconfiguration fileFiles Created (11 files, 1,538 lines)
1. GitHub Copilot Instructions
.github/copilot/instructions.md(478 lines)Key Features:
2. Configuration File
config/matter.yaml(392 lines, 22 top-level sections)Configuration Sections:
3. Infrastructure Documentation
k8s/README.md(88 lines)argo/README.md(131 lines)terraform/README.md(229 lines)scripts/README.md(220 lines)4. Directory Structure
Created with
.gitkeepfiles to track in git:config/- System configuration filesk8s/- Kubernetes manifestsargo/- Argo Workflowsterraform/- Infrastructure as Codescripts/- Utility scripts5. CodeQL Workflow Update
.github/workflows/codeql.ymlgithub/codeql-action/init@v3github/codeql-action/analyze@v3github/codeql-action/upload-sarif@v3Validation Results ✓
YAML Validation:
Copilot Instructions:
Task References:
File References:
Defense-Only Guardrails:
Consistency Checks:
How This Addresses the Issue
The issue requested setting up Copilot instructions with these requirements:
Agent Operating Rules ✅
File Anchors ✅
Default Task Order ✅
CI/CD & Quality Gates ✅
Environment & Secrets ✅
K8s/GKE Runtime Rules ✅
Argo Orchestration ✅
Definition of Done ✅
Failure Handling ✅
One-Shot Bootstrap Prompt ✅
Additional Updates
CodeQL Workflow Modernization:
Next Steps
With Copilot instructions complete, agents can now:
Note: Code scanning must be enabled in repository settings for CodeQL analysis to run and upload results. This is a GitHub repository configuration that requires admin access.
Testing Performed
Status: Ready for review and merge ✓
Original prompt
This section details on the original issue you should resolve
<issue_title>✨ Set up Copilot instructions</issue_title>
<issue_description>Copilot Coding Agent — Repository Instructions (Onboard this repo)
Repo: CVE Matter-Analysis OS (defense-only)
Language/stack: Python 3.11, CUDA (optional), Docker, K8s (GKE/gVisor), Argo, Terraform, GitHub Actions
Mission: Build and maintain the CVE matter-analysis pipeline with Bayesian evidence, positional alignment, reverse-adversarial refractors, and a black-box arbiter.
Non-negotiables: Private repo; defense-only; no cryptographic-breaking, no offensive payloads.
Act from files, not guesswork. Prefer repo artifacts over improvisation.
Defense-only guardrails. Refuse crypto-breaking or exploit authoring.
Idempotent PRs. Every change must be reproducible via scripts/CI.
Citations & logs. Reference edited files; emit structured logs where applicable.
Never commit secrets. Read keys via env/Secrets. Fail closed if missing.
.copilot/AGENT_GUIDE.md — operating guide
.copilot/tasks/ — atomic tasks with acceptance criteria
config/matter.yaml — mode=cve, thresholds, alignment, refractors, arbiter
src/ — modules: ingest/, alignment/, models/, refractors/, evaluation/, orchestrate/
k8s/ — CRD+webhook, runtime classes, GPU job specs, policies
argo/ — WorkflowTemplate, CronWorkflow, Events
terraform/ — GKE cluster, GPU node pool, GCS capsules bucket
prompts/ & capsules/ — LID-LIFT prompt/capsule artifacts (link-pack)
NVD ingest
Edit: src/ingest/nvd_client.py, tests/test_ingest.py
Criteria: Delta sync, ETag/backoff, JSONL out, tests pass
Positional alignment
Edit: src/alignment/procrustes.py, src/alignment/cca.py, tests/test_alignment.py
Criteria: alignment_R2 ≥ 0.85, re-register on fail
Arbiter (stacked) + Pareto knee
Edit: src/models/arbiter.py
Criteria: choose knee over {accuracy,f1,robust_auc_ε} vs {latency_ms,energy_J,cost_$}
Refractor grid (ε & axes)
Edit: src/refractors/epsilon.py, src/refractors/shifts.py
Criteria: grid runs; outputs robust AUC & CVaR@95
Evidence & decision
Edit: src/evaluation/evidence.py
Criteria: BIC/WAIC log-evidence; Bayes factor + Jeffreys class
Legendary prompt sync (Notion)
Run: scripts/register_prompt_notion.sh
Criteria: SHA in Notion matches prompts/legendary_lidlift_v14.md
Capsule publish
Run: scripts/publish_capsules.sh (on tag)
Criteria: capsules/*.json reachable at configured bucket/domain
GPU enablement (optional)
Files: terraform/gpu.tf, k8s/gpu-job.yaml, docker/Dockerfile (CUDA)
Criteria: GPU pod requests succeed; tests marked Marco Hutter (@gpu) pass
Admission webhook & triggers
Files: k8s/crd-policytrigger.yaml, k8s/deploy-webhook.yaml, k8s/webhook-config.yaml
Criteria: Pods labeled macrosegment: code receive sedation init; suspicious binaries rejected
Argo ε-sweep
Files: argo/workflowtemplate-tensor-macrosegments.yaml, argo/cronworkflow-nightly.yaml
Criteria: nightly sweep executes and stores artifacts
CI: lint (flake8), tests (pytest), Python 3.11
Security: CodeQL (Python), Trivy image scan (fail on HIGH/CRITICAL)
Branch protection: required status checks, at least 1 review, linear history
No-secret check: block PRs containing keys or tokens
Container: build Docker image; rootless, read-only filesystem, no privilege escalation
Use GitHub Secrets / Environments only:
NVD_API_KEY, GCP_PROJECT_ID, GCP_SA_JSON (if needed), NOTION_API_TOKEN, NOTION_PROMPTS_DB_ID
Local/dev runs must read from env; CI injects via env: or secrets:.
Workload Identity: map KSA→GSA; no node-level keys
RuntimeClass (gVisor) for default pods; GPUs limited to macrosegment: code
AdmissionWebhook applies: inject-sleep-init, reject-suspicious-binary, freeze-image-pull as configured by PolicyTrigger CRDs
Run ε-grid with tensor-macrosegments WorkflowTemplate; CronWorkflow nightly at 03:00 UTC
Label all compute pods macrosegment: code for webhook policies
Tests + linters pass; CI security gates green
No secrets in diff; changelog & docs updated
For K8s/Argo changes: kubectl diff or dry-run manifests attached
For Terraform: terraform plan output attached (redacted)
If a task fails due to missing tools/perm: print exact simulated steps and mark SIMULATED in the PR body.
If prompt/spec is unclear: re-template the failing prompt and propose a single follow-up question as code comments, not as commits.
If latency/energy budgets breach: propose rollback to last stable config and open a separate “perf-budget” issue.
Build/maintain a private, defense-only CVE Matter-Analysis OS: NVD ingest → positional alignment (Procrustes/CCA) → stacked arbiter (Pareto ...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.