Skip to content

Add GitHub Copilot repository instructions and infrastructure config#8

Merged
Igor Holt (igor-holt) merged 3 commits into
mainfrom
copilot/setup-copilot-instructions
Nov 17, 2025
Merged

Add GitHub Copilot repository instructions and infrastructure config#8
Igor Holt (igor-holt) merged 3 commits into
mainfrom
copilot/setup-copilot-instructions

Conversation

Copy link
Copy Markdown

Copilot AI commented Nov 16, 2025

Task: Set up Copilot instructions ✓

Successfully set up GitHub Copilot repository instructions to onboard the CVE Matter-Analysis OS repository.

Completion Status

  • Create .github/copilot/instructions.md with comprehensive repository instructions
  • Ensure instructions align with existing .copilot/AGENT_GUIDE.md and task definitions
  • Include all required sections from the issue (15 sections)
  • Create missing infrastructure directories (config/, k8s/, argo/, terraform/, scripts/)
  • Create config/matter.yaml configuration file
  • Verify all references and links in the instructions
  • Validate YAML syntax and cross-references
  • Test setup completeness
  • Update CodeQL workflow to use v3 actions

Files Created (11 files, 1,538 lines)

1. GitHub Copilot Instructions

.github/copilot/instructions.md (478 lines)

  • ✅ Section 1: Agent Operating Rules
  • ✅ Section 2: File Anchors
  • ✅ Section 3: Default Task Order (Tasks 010-090)
  • ✅ Section 4: CI/CD & Quality Gates
  • ✅ Section 5: Environment & Secrets
  • ✅ Section 6: Kubernetes/GKE Runtime Rules
  • ✅ Section 7: Argo Orchestration
  • ✅ Section 8: Definition of Done
  • ✅ Section 9: Failure Handling
  • ✅ Section 10: One-Shot Bootstrap Prompt
  • ✅ Section 11: Technology Stack Reference
  • ✅ Section 12: Code Style and Standards
  • ✅ Section 13: Common Commands
  • ✅ Section 14: Troubleshooting
  • ✅ Section 15: Additional Resources

Key Features:

  • Defense-only guardrails (explicitly prohibits offensive work)
  • File-anchored approach (references all key documentation)
  • Sequential task workflow (010-090 with acceptance criteria)
  • Security-first practices (no secrets, scanning required)
  • Comprehensive troubleshooting guidance

2. Configuration File

config/matter.yaml (392 lines, 22 top-level sections)

  • Complete CVE Matter-Analysis OS configuration
  • Covers: ingest, alignment, arbiter, refractors, evidence
  • Includes: K8s, Argo, Terraform, monitoring, logging
  • Defines: performance budgets, resource limits, security settings
  • Validated: YAML syntax passes yaml.safe_load()

Configuration Sections:

  • Mode & System Identification
  • NVD Ingestion (API, rate limiting, delta sync)
  • Positional Alignment (Procrustes, CCA, thresholds)
  • Stacked Arbiter (Pareto optimization, ensemble models)
  • Epsilon-Refractors (shift detection, risk metrics)
  • Bayesian Evidence (WAIC/BIC, inference parameters)
  • Orchestration (H-MOC, tensor macrosegments)
  • Logging & Monitoring (structured logs, alerting)
  • Security & Compliance (defense-only enforcement)
  • Kubernetes Runtime (gVisor, admission webhooks)
  • Argo Workflows (ε-sweep schedule, artifacts)
  • Infrastructure (GCP, GKE, storage buckets)
  • Notion Integration & Capsule Publishing
  • Feature Flags & Data Retention
  • Performance Budgets (latency, energy, cost)

3. Infrastructure Documentation

k8s/README.md (88 lines)

  • Kubernetes manifests overview
  • CRD, deployments, services, runtime classes
  • Admission webhook documentation
  • Security considerations (gVisor, Workload Identity)

argo/README.md (131 lines)

  • Argo Workflows overview
  • Tensor macrosegments workflow
  • CronWorkflow scheduling (03:00 UTC nightly)
  • Artifact management (GCS storage)

terraform/README.md (229 lines)

  • Terraform infrastructure as code
  • GKE cluster configuration
  • GPU node pool setup
  • Storage buckets, networking, IAM
  • Cost optimization strategies

scripts/README.md (220 lines)

  • Utility scripts documentation
  • Notion sync, capsule publishing
  • Development, deployment, maintenance scripts
  • Script template and security guidelines

4. Directory Structure

Created with .gitkeep files to track in git:

  • config/ - System configuration files
  • k8s/ - Kubernetes manifests
  • argo/ - Argo Workflows
  • terraform/ - Infrastructure as Code
  • scripts/ - Utility scripts

5. CodeQL Workflow Update

.github/workflows/codeql.yml

  • Updated all CodeQL actions from v2 to v3
  • github/codeql-action/init@v3
  • github/codeql-action/analyze@v3
  • github/codeql-action/upload-sarif@v3
  • Validated YAML syntax

Validation Results ✓

YAML Validation:

  • ✅ config/matter.yaml: Valid YAML with 22 top-level keys
  • ✅ .github/workflows/codeql.yml: Valid YAML

Copilot Instructions:

  • ✅ All 10 required sections present
  • ✅ All 5 additional sections included
  • ✅ 478 lines, 16,306 bytes

Task References:

  • ✅ Task 010: NVD Ingest
  • ✅ Task 020: Positional Alignment
  • ✅ Task 030: Arbiter + Pareto Knee
  • ✅ Task 040: Refractor Grid
  • ✅ Task 050: Evidence & Decision
  • ✅ Task 060: Notion Sync
  • ✅ Task 070: Capsule Publish
  • ✅ Task 080: GPU Enablement
  • ✅ Task 090: Admission Webhook & Triggers

File References:

  • ✅ .copilot/AGENT_GUIDE.md
  • ✅ config/matter.yaml
  • ✅ SECURITY.md
  • ✅ README.md
  • ✅ prompts/legendary_lidlift_v14.md

Defense-Only Guardrails:

  • ✅ "defense-only" mentioned 6 times
  • ✅ "offensive" explicitly prohibited
  • ✅ "exploit" generation forbidden
  • ✅ "crypto-breaking" disallowed

Consistency Checks:

  • ✅ Aligned with .copilot/AGENT_GUIDE.md
  • ✅ Consistent with README.md mission
  • ✅ Compatible with CI/CD workflows
  • ✅ Follows SECURITY.md policies

How This Addresses the Issue

The issue requested setting up Copilot instructions with these requirements:

  1. Agent Operating Rules

    • Act from files (file-anchored approach documented)
    • Defense-only guardrails (explicitly enforced)
    • Idempotent PRs (reproducibility required)
    • Citations & logs (structured logging specified)
    • Never commit secrets (environment variables required)
  2. File Anchors

    • All key files documented with descriptions
    • Clear navigation paths provided
    • Module structure explained
  3. Default Task Order

    • All 9+ tasks documented with files and criteria
    • Sequential execution order specified
    • References to .copilot/tasks/ provided
  4. CI/CD & Quality Gates

    • Lint, test, coverage requirements
    • Security scanning (CodeQL v3, Trivy)
    • Branch protection rules
    • Container requirements
  5. Environment & Secrets

    • Required secrets enumerated
    • GitHub Secrets usage mandated
    • Secret detection enabled
  6. K8s/GKE Runtime Rules

    • Workload Identity configuration
    • gVisor runtime class requirements
    • Admission webhook policies
  7. Argo Orchestration

    • ε-grid workflow documented
    • CronWorkflow schedule (03:00 UTC)
    • Pod labeling requirements
  8. Definition of Done

    • Testing & quality checklist
    • Security requirements
    • Documentation standards
    • Infrastructure validation
  9. Failure Handling

    • Missing tools/permissions guidance
    • Unclear prompt handling
    • Performance budget breach procedures
    • Security issue reporting
  10. One-Shot Bootstrap Prompt

    • Copy-paste prompt provided
    • All key requirements listed
    • Guardrails included

Additional Updates

CodeQL Workflow Modernization:

  • Updated to use latest CodeQL action version (v3)
  • Ensures compatibility with current GitHub security features
  • Maintains existing security scanning capabilities

Next Steps

With Copilot instructions complete, agents can now:

  1. Navigate the repository effectively
  2. Follow the sequential task workflow (010-090)
  3. Understand security and defense-only requirements
  4. Use proper development and deployment practices
  5. Create idempotent, reproducible changes

Note: Code scanning must be enabled in repository settings for CodeQL analysis to run and upload results. This is a GitHub repository configuration that requires admin access.

Testing Performed

  • ✅ YAML syntax validation (config/matter.yaml)
  • ✅ YAML syntax validation (.github/workflows/codeql.yml)
  • ✅ Cross-reference validation (all file paths verified)
  • ✅ Task reference validation (010-090 present)
  • ✅ Defense-only guardrail validation (keywords present)
  • ✅ Consistency validation (aligned with existing docs)
  • ✅ Git operations (files committed successfully)

Status: Ready for review and merge ✓

Original prompt

This section details on the original issue you should resolve

<issue_title>✨ Set up Copilot instructions</issue_title>
<issue_description>Copilot Coding Agent — Repository Instructions (Onboard this repo)
Repo: CVE Matter-Analysis OS (defense-only)
Language/stack: Python 3.11, CUDA (optional), Docker, K8s (GKE/gVisor), Argo, Terraform, GitHub Actions
Mission: Build and maintain the CVE matter-analysis pipeline with Bayesian evidence, positional alignment, reverse-adversarial refractors, and a black-box arbiter.
Non-negotiables: Private repo; defense-only; no cryptographic-breaking, no offensive payloads.

  1. Agent Operating Rules
    Act from files, not guesswork. Prefer repo artifacts over improvisation.

Defense-only guardrails. Refuse crypto-breaking or exploit authoring.

Idempotent PRs. Every change must be reproducible via scripts/CI.

Citations & logs. Reference edited files; emit structured logs where applicable.

Never commit secrets. Read keys via env/Secrets. Fail closed if missing.

  1. File Anchors (read these first)
    .copilot/AGENT_GUIDE.md — operating guide

.copilot/tasks/ — atomic tasks with acceptance criteria

config/matter.yaml — mode=cve, thresholds, alignment, refractors, arbiter

src/ — modules: ingest/, alignment/, models/, refractors/, evaluation/, orchestrate/

k8s/ — CRD+webhook, runtime classes, GPU job specs, policies

argo/ — WorkflowTemplate, CronWorkflow, Events

terraform/ — GKE cluster, GPU node pool, GCS capsules bucket

prompts/ & capsules/ — LID-LIFT prompt/capsule artifacts (link-pack)

  1. Default Task Order (finish each before moving on)
    NVD ingest

Edit: src/ingest/nvd_client.py, tests/test_ingest.py

Criteria: Delta sync, ETag/backoff, JSONL out, tests pass

Positional alignment

Edit: src/alignment/procrustes.py, src/alignment/cca.py, tests/test_alignment.py

Criteria: alignment_R2 ≥ 0.85, re-register on fail

Arbiter (stacked) + Pareto knee

Edit: src/models/arbiter.py

Criteria: choose knee over {accuracy,f1,robust_auc_ε} vs {latency_ms,energy_J,cost_$}

Refractor grid (ε & axes)

Edit: src/refractors/epsilon.py, src/refractors/shifts.py

Criteria: grid runs; outputs robust AUC & CVaR@95

Evidence & decision

Edit: src/evaluation/evidence.py

Criteria: BIC/WAIC log-evidence; Bayes factor + Jeffreys class

Legendary prompt sync (Notion)

Run: scripts/register_prompt_notion.sh

Criteria: SHA in Notion matches prompts/legendary_lidlift_v14.md

Capsule publish

Run: scripts/publish_capsules.sh (on tag)

Criteria: capsules/*.json reachable at configured bucket/domain

GPU enablement (optional)

Files: terraform/gpu.tf, k8s/gpu-job.yaml, docker/Dockerfile (CUDA)

Criteria: GPU pod requests succeed; tests marked Marco Hutter (@gpu) pass

Admission webhook & triggers

Files: k8s/crd-policytrigger.yaml, k8s/deploy-webhook.yaml, k8s/webhook-config.yaml

Criteria: Pods labeled macrosegment: code receive sedation init; suspicious binaries rejected

Argo ε-sweep

Files: argo/workflowtemplate-tensor-macrosegments.yaml, argo/cronworkflow-nightly.yaml

Criteria: nightly sweep executes and stores artifacts

  1. CI/CD & Quality Gates (must pass on every PR)
    CI: lint (flake8), tests (pytest), Python 3.11

Security: CodeQL (Python), Trivy image scan (fail on HIGH/CRITICAL)

Branch protection: required status checks, at least 1 review, linear history

No-secret check: block PRs containing keys or tokens

Container: build Docker image; rootless, read-only filesystem, no privilege escalation

  1. Environment & Secrets
    Use GitHub Secrets / Environments only:

NVD_API_KEY, GCP_PROJECT_ID, GCP_SA_JSON (if needed), NOTION_API_TOKEN, NOTION_PROMPTS_DB_ID

Local/dev runs must read from env; CI injects via env: or secrets:.

  1. K8s/GKE Runtime Rules
    Workload Identity: map KSA→GSA; no node-level keys

RuntimeClass (gVisor) for default pods; GPUs limited to macrosegment: code

AdmissionWebhook applies: inject-sleep-init, reject-suspicious-binary, freeze-image-pull as configured by PolicyTrigger CRDs

  1. Argo Orchestration
    Run ε-grid with tensor-macrosegments WorkflowTemplate; CronWorkflow nightly at 03:00 UTC

Label all compute pods macrosegment: code for webhook policies

  1. Definition of Done (per task/PR)
    Tests + linters pass; CI security gates green

No secrets in diff; changelog & docs updated

For K8s/Argo changes: kubectl diff or dry-run manifests attached

For Terraform: terraform plan output attached (redacted)

  1. Failure Handling (what Copilot must do)
    If a task fails due to missing tools/perm: print exact simulated steps and mark SIMULATED in the PR body.

If prompt/spec is unclear: re-template the failing prompt and propose a single follow-up question as code comments, not as commits.

If latency/energy budgets breach: propose rollback to last stable config and open a separate “perf-budget” issue.

  1. One-shot bootstrap prompt (for Copilot PR generator)
    Build/maintain a private, defense-only CVE Matter-Analysis OS: NVD ingest → positional alignment (Procrustes/CCA) → stacked arbiter (Pareto ...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Nov 16, 2025
- Create .github/copilot/instructions.md with comprehensive repository instructions
- Create config/matter.yaml with full system configuration
- Create infrastructure directories: k8s/, argo/, terraform/, scripts/
- Add README.md files documenting each infrastructure component
- Add .gitkeep files to track empty directories

Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Copilot AI changed the title [WIP] Set up Copilot instructions for CVE Matter-Analysis Add GitHub Copilot repository instructions and infrastructure config Nov 16, 2025
@igor-holt
Copy link
Copy Markdown
Member

Update .github/workflows/codeql.yml to use github/codeql-action@v3 in all steps.
Enable code scanning in your repository settings. Copilot

Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Nov 16, 2025

Update .github/workflows/codeql.yml to use github/codeql-action@v3 in all steps.
Enable code scanning in your repository settings. Copilot

Updated all CodeQL action references to v3 in commit f9dcc86.

Changes made:

  • github/codeql-action/init@v2@v3
  • github/codeql-action/analyze@v2@v3
  • github/codeql-action/upload-sarif@v2@v3

Note: Code scanning must be enabled in repository settings (Security → Code security and analysis → Code scanning) to allow CodeQL to upload results. This requires repository admin access.

@igor-holt Igor Holt (igor-holt) marked this pull request as ready for review November 17, 2025 09:47
Copilot AI review requested due to automatic review settings November 17, 2025 09:47
@igor-holt Igor Holt (igor-holt) merged commit c26b225 into main Nov 17, 2025
14 of 16 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR successfully sets up GitHub Copilot repository instructions and supporting infrastructure documentation for the CVE Matter-Analysis OS. The implementation provides comprehensive guidance for AI agents working on this defense-only vulnerability analysis pipeline, along with detailed documentation for Kubernetes, Argo Workflows, Terraform, and utility scripts.

Key changes:

  • Created comprehensive Copilot instructions with 15 sections covering agent operating rules, task workflow, CI/CD gates, and troubleshooting
  • Added detailed configuration file (config/matter.yaml) with 22 top-level sections defining system behavior
  • Established infrastructure documentation framework (k8s/, argo/, terraform/, scripts/) with README files

Reviewed Changes

Copilot reviewed 7 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.github/copilot/instructions.md Comprehensive 478-line guide covering agent rules, task order, quality gates, and development standards
config/matter.yaml Complete 392-line YAML configuration defining pipeline behavior, thresholds, and infrastructure settings
terraform/README.md Infrastructure as Code documentation covering GKE cluster, GPU nodes, and GCS storage setup
k8s/README.md Kubernetes manifests documentation for deployments, CRDs, admission webhooks, and runtime classes
argo/README.md Argo Workflows documentation for orchestrating epsilon-sweep pipeline with nightly scheduling
scripts/README.md Utility scripts documentation covering deployment, Notion sync, capsule publishing, and maintenance
.github/workflows/codeql.yml Updated CodeQL actions from v2 to v3 for security scanning
.gitkeep files Created empty directories (config/, k8s/, argo/, terraform/, scripts/) for version control
Comments suppressed due to low confidence (1)

.github/workflows/codeql.yml:72

  • [nitpick] The separate "Upload SARIF results" step (lines 67-72) may be redundant. The github/codeql-action/analyze@v3 action automatically uploads results to GitHub Code Scanning. This separate upload step is typically only needed when generating SARIF files independently or when the analyze action fails. Consider removing this step unless there's a specific reason to keep it, as it may cause confusion or duplicate uploads.
      - name: Upload SARIF results
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: ../results
        continue-on-error: true

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread terraform/README.md

GPU node pool configuration (Task 080):
- **Machine Type**: n1-standard-4 (or custom)
- **Accelerator**: NVIDIA Tesla T4 (or T4, V100, A100)
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text "(or T4, V100, A100)" is redundant since "NVIDIA Tesla T4" already includes "T4". Consider changing to "(or V100, A100)" or "(e.g., T4, V100, A100)" to avoid redundancy.

Suggested change
- **Accelerator**: NVIDIA Tesla T4 (or T4, V100, A100)
- **Accelerator**: NVIDIA Tesla T4 (e.g., T4, V100, A100)

Copilot uses AI. Check for mistakes.
- **Python 3.11+**: Primary language
- **NumPy, SciPy, scikit-learn**: Numerical computing and ML
- **Requests**: HTTP client for NVD API
- **StructLog**: Structured logging
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization of library name. The correct package name is structlog (all lowercase), as shown in requirements.txt. Should be "structlog" not "StructLog".

Suggested change
- **StructLog**: Structured logging
- **structlog**: Structured logging

Copilot uses AI. Check for mistakes.

### Monitoring & Observability
- **Prometheus**: Metrics collection
- **Structlog**: Structured logging (JSON)
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization of library name. The correct package name is structlog (all lowercase), as shown in requirements.txt. Should be "structlog" not "Structlog".

Suggested change
- **Structlog**: Structured logging (JSON)
- **structlog**: Structured logging (JSON)

Copilot uses AI. Check for mistakes.
Comment thread config/matter.yaml
artifacts:
repository:
type: gcs # Options: gcs, s3, minio
bucket: "cve-analysis-artifacts"
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent bucket naming. Line 282 specifies the Argo artifacts bucket as cve-analysis-artifacts, but line 310 specifies the infrastructure artifacts bucket as cve-artifacts. These should refer to the same bucket. Consider using cve-artifacts consistently (matching the naming pattern of cve-capsules and cve-logs), or update both to use cve-analysis-artifacts.

Suggested change
bucket: "cve-analysis-artifacts"
bucket: "cve-artifacts"

Copilot uses AI. Check for mistakes.
Comment thread terraform/README.md
Comment on lines +112 to +114
capsules_bucket_name = "cve-capsules-bucket"
artifacts_bucket_name = "cve-artifacts-bucket"
logs_bucket_name = "cve-logs-bucket"
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent bucket naming between terraform/README.md and config/matter.yaml. The terraform example uses bucket names with -bucket suffix (e.g., cve-capsules-bucket), but config/matter.yaml lines 309-311 use names without the suffix (e.g., cve-capsules). These should be aligned for consistency.

Suggested change
capsules_bucket_name = "cve-capsules-bucket"
artifacts_bucket_name = "cve-artifacts-bucket"
logs_bucket_name = "cve-logs-bucket"
capsules_bucket_name = "cve-capsules"
artifacts_bucket_name = "cve-artifacts"
logs_bucket_name = "cve-logs"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ Set up Copilot instructions

3 participants