The SOC-as-Code Framework is an automated, research-grade system for validating, testing, classifying, and governing cybersecurity detection rules (Sigma/YARA). This project treats detection rules as version-controlled software artifacts and applies CI/CD practices to ensure correctness, maintainability, and measurable detection quality.
The system includes a universal log generator, a full Sigma evaluator, a rule validator, a classification and scoring engine, and diagnostic tooling—all integrated with GitHub Actions for automated governance.
- Python 3.x — Core implementation language.
- Universal Synthetic Log Generator — Multi-platform log simulation engine (Windows, Linux, AWS, Azure, Okta, OneLogin, M365, Google Workspace, Proxy, Network, OpenCanary, etc.).
- Sigma Rule Engine (SOCSimulator) — Full pattern evaluator with support for Sigma modifiers.
- JSON/YAML Processing — For rule parsing, report generation, and classification.
- GitHub Actions — CI/CD pipeline for automated validation.
- Regex, Pattern Matching, and Nested Field Evaluation — For deep rule matching accuracy.
soc-as-code/
├── .github/
│ └── workflows/
│ └── validate-rules.yml # CI/CD pipeline for rule validation
│
├── rules/
│ ├── sigma/ # Sigma rule definitions
│ └── yara/ # YARA rule definitions
│
├── validator/
│ ├── validate_rules.py # Core rule validator
│ ├── generate_logs.py # Synthetic log generator
│ ├── generate_report.py # Markdown summary report builder
│ ├── compare_and_classify.py # Rule scoring + classification engine
│ ├── check_results.py # CI interpretation + pass/fail logic
│ ├── diagnose_rules.py # Local diagnostics for debugging rules
│ ├── test_single_rule.py # Full local pipeline test for a single rule
│ └── __init__.py
│
├── test.py # SOCSimulator: Sigma evaluator + alert generation
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Python 3.9 or higher
- pip + virtual environment recommended
- GitHub Actions (optional, for automated CI/CD)
- YAML-compatible Sigma rules & optional YARA rules
Dependencies
Install all required packages using:
pip install -r requirements.txt
Dependencies:
pyyaml>=6.0
yara-python>=4.3.1
flask>=2.3.0
reportlab>=4.0.0
requests>=2.31.0Libraries include:
- pyyaml — Rule parsing
- regex — Pattern interpretation
- json — Log data processing
- datetime — Report metadata
- pathlib — Consistent file handling
Generates positive (matching) and negative (non-matching) synthetic logs for each rule.
Supports more than 50+ log source types, including:
- Windows process/file events
- Linux process activity
- AWS CloudTrail
- Azure ActivityLogs / PIM
- Okta, OneLogin
- Microsoft 365
- Google Workspace
- Proxy/Web logs
- Network telemetry
- OpenCanary honeypot events
Special logic:
New rules (e.g., IDs starting with SIG-900) intentionally produce zero synthetic logs to prevent score manipulation.
A complete Sigma detection engine implementing:
- Field modifiers:
|contains,|startswith,|endswith,|re,|base64,|all,|exists,|gt,|lt, etc. - Wildcards & regex-like expressions
- Nested field matching (
actor.email,process.name, etc.) - Boolean detection conditions
- Multi-selection merging
This enables academically reproducible rule evaluation.
Processes each rule by:
- Loading synthetic logs
- Executing Sigma matching logic
- Recording match data
- Saving structured results (
detections.json,validation_results.json) - Providing metadata for downstream scoring
Compares:
- Baseline detections (previous known-good state)
- Current detections (introduced by new rule)
Extracts rule identifiers from YAML:
- ID
- Title
- Filename
Each rule receives a 0–100 score based on:
- True-positive improvement
- False-positive regression
- Precision delta
- Detection consistency
- Identifier correctness
Grades:
- EXCELLENT – major positive impact
- GOOD – improves detection quality
- NEUTRAL – no major effect
- CONCERNING – potential issues
- BAD – harmful or faulty rule
Produces:
classification_report.json- Human-readable Markdown summary (
summary.md)
-
diagnose_rules.pyShows why a rule did not match logs, expected fields, actual fields, and suggested fixes. -
test_single_rule.pyRuns the entire pipeline for one rule from: log generation → validation → classification → report summary.
These tools dramatically simplify research workflows and debugging experiments.
- Clone this repository
git clone https://github.com/rahul07890-dev/soc-as-code.git
cd soc-as-code- Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate- Install dependencies
pip install -r requirements.txt- Run local validation
python validator/validate_rules.py --rules rules/sigma --synthetic synthetic_logs/ --mode current- Continuous Integration (GitHub Actions)
The project requires GitHub Actions to run the validation pipeline on every push / pull request. The included workflow (.github/workflows/validate-rules.yml) performs the following stages:
- Checkout repository and set up Python
- Install dependencies (pip install -r requirements.txt)
- Generate synthetic logs (unless a SKIP_GENERATE flag is set)
- Run the rule validator against rules/
- Run comparison & classification when a baseline is available
- Upload classification_report.json, validation_results.json, and synthetic_logs as workflow artifacts
Required repository secrets / environment variables:
- BASELINE_ARTIFACT_URL (optional) — URL or path to baseline results if using external baseline storage
- FAIL_ON_BAD_RULES (boolean) — whether PRs should fail when BAD rules are detected
- Any platform-integrations secrets (if you add real-log ingestion)
Behavior:
- A PR introducing BAD rules will fail the check and return actionable reports as artifacts.
- CONCERNING rules will present warnings but may not fail depending on FAIL_ON_BAD_RULES.
- Artifacts contain both machine-readable (.json) and human-readable (.md) reports for triage.
How to enable:
- Push the repo to GitHub (the workflow file is already included).
- Add any required repository secrets in Settings → Secrets.
- Open a PR — the workflow will run automatically and attach artifacts to the run.
- Automated validation of Sigma & YARA rules
- Synthetic log generation for realistic test scenarios
- Precision-based scoring and classification
- CI/CD pipeline with automated pass/fail
- Diagnostic tooling for debugging rule quality
- Baseline drift detection
- Human-readable and machine-readable reporting
- Add real log ingestion for hybrid precision evaluation
- Enhance ATT&CK technique validation and mapping
- Expand OneLogin/Okta/M365 schema coverage
- Improve anomaly scoring and statistical baselining
- Add Sigma→SIEM translator validation (Elastic, Splunk, Sentinel)
- Add machine learning–assisted rule tuning
Pull requests are welcome. For major changes, please open an issue to discuss your proposal, experiment, or research direction.
This project is released under the MIT License (or add your preferred license).
- Sigma HQ community
- Open-source detection engineering ecosystem
- Security researchers contributing to rule standardization
- Academic research in SOC automation and evaluative frameworks
SOC-as-Code transforms detection engineering into a structured, testable, and research-ready discipline.