SOC-as-Code Framework

Overview

The SOC-as-Code Framework is an automated, research-grade system for validating, testing, classifying, and governing cybersecurity detection rules (Sigma/YARA). This project treats detection rules as version-controlled software artifacts and applies CI/CD practices to ensure correctness, maintainability, and measurable detection quality.

The system includes a universal log generator, a full Sigma evaluator, a rule validator, a classification and scoring engine, and diagnostic tooling—all integrated with GitHub Actions for automated governance.

Technologies Used

Python 3.x — Core implementation language.
Universal Synthetic Log Generator — Multi-platform log simulation engine (Windows, Linux, AWS, Azure, Okta, OneLogin, M365, Google Workspace, Proxy, Network, OpenCanary, etc.).
Sigma Rule Engine (SOCSimulator) — Full pattern evaluator with support for Sigma modifiers.
JSON/YAML Processing — For rule parsing, report generation, and classification.
GitHub Actions — CI/CD pipeline for automated validation.
Regex, Pattern Matching, and Nested Field Evaluation — For deep rule matching accuracy.

Directory Structure

soc-as-code/
├── .github/
│   └── workflows/
│       └── validate-rules.yml          # CI/CD pipeline for rule validation
│
├── rules/
│   ├── sigma/                          # Sigma rule definitions
│   └── yara/                           # YARA rule definitions
│
├── validator/
│   ├── validate_rules.py               # Core rule validator
│   ├── generate_logs.py                # Synthetic log generator
│   ├── generate_report.py              # Markdown summary report builder
│   ├── compare_and_classify.py         # Rule scoring + classification engine
│   ├── check_results.py                # CI interpretation + pass/fail logic
│   ├── diagnose_rules.py               # Local diagnostics for debugging rules
│   ├── test_single_rule.py             # Full local pipeline test for a single rule
│   └── __init__.py
│
├── test.py                             # SOCSimulator: Sigma evaluator + alert generation
├── requirements.txt                    # Python dependencies
└── README.md                           # Project documentation

Requirements

Python 3.9 or higher
pip + virtual environment recommended
GitHub Actions (optional, for automated CI/CD)
YAML-compatible Sigma rules & optional YARA rules

Dependencies

Install all required packages using:

pip install -r requirements.txt

Dependencies:

pyyaml>=6.0
yara-python>=4.3.1
flask>=2.3.0
reportlab>=4.0.0
requests>=2.31.0

Libraries include:

pyyaml — Rule parsing
regex — Pattern interpretation
json — Log data processing
datetime — Report metadata
pathlib — Consistent file handling

How It Works

Core Components

1. Universal Synthetic Log Generator (`generate_logs.py`)

Generates positive (matching) and negative (non-matching) synthetic logs for each rule.

Supports more than 50+ log source types, including:

Windows process/file events
Linux process activity
AWS CloudTrail
Azure ActivityLogs / PIM
Okta, OneLogin
Microsoft 365
Google Workspace
Proxy/Web logs
Network telemetry
OpenCanary honeypot events

Special logic: New rules (e.g., IDs starting with SIG-900) intentionally produce zero synthetic logs to prevent score manipulation.

2. Sigma Rule Evaluator (`SOCSimulator`, in `test.py`)

A complete Sigma detection engine implementing:

Field modifiers: |contains, |startswith, |endswith, |re, |base64, |all, |exists, |gt, |lt, etc.
Wildcards & regex-like expressions
Nested field matching (actor.email, process.name, etc.)
Boolean detection conditions
Multi-selection merging

This enables academically reproducible rule evaluation.

3. Rule Validator (`validate_rules.py`)

Processes each rule by:

Loading synthetic logs
Executing Sigma matching logic
Recording match data
Saving structured results (detections.json, validation_results.json)
Providing metadata for downstream scoring

4. Classification & Scoring Engine (`compare_and_classify.py`)

Compares:

Baseline detections (previous known-good state)
Current detections (introduced by new rule)

Extracts rule identifiers from YAML:

ID
Title
Filename

Each rule receives a 0–100 score based on:

True-positive improvement
False-positive regression
Precision delta
Detection consistency
Identifier correctness

Grades:

EXCELLENT – major positive impact
GOOD – improves detection quality
NEUTRAL – no major effect
CONCERNING – potential issues
BAD – harmful or faulty rule

Produces:

classification_report.json
Human-readable Markdown summary (summary.md)

5. Diagnostic Tools

diagnose_rules.py Shows why a rule did not match logs, expected fields, actual fields, and suggested fixes.
test_single_rule.py Runs the entire pipeline for one rule from: log generation → validation → classification → report summary.

These tools dramatically simplify research workflows and debugging experiments.

Setup Instructions

Clone this repository

git clone https://github.com/rahul07890-dev/soc-as-code.git
cd soc-as-code

Create and activate virtual environment

python3 -m venv .venv
source .venv/bin/activate

Install dependencies

pip install -r requirements.txt

Run local validation

python validator/validate_rules.py --rules rules/sigma --synthetic synthetic_logs/ --mode current

Continuous Integration (GitHub Actions)

The project requires GitHub Actions to run the validation pipeline on every push / pull request. The included workflow (.github/workflows/validate-rules.yml) performs the following stages:

Checkout repository and set up Python
Install dependencies (pip install -r requirements.txt)
Generate synthetic logs (unless a SKIP_GENERATE flag is set)
Run the rule validator against rules/
Run comparison & classification when a baseline is available
Upload classification_report.json, validation_results.json, and synthetic_logs as workflow artifacts

Required repository secrets / environment variables:

BASELINE_ARTIFACT_URL (optional) — URL or path to baseline results if using external baseline storage
FAIL_ON_BAD_RULES (boolean) — whether PRs should fail when BAD rules are detected
Any platform-integrations secrets (if you add real-log ingestion)

Behavior:

A PR introducing BAD rules will fail the check and return actionable reports as artifacts.
CONCERNING rules will present warnings but may not fail depending on FAIL_ON_BAD_RULES.
Artifacts contain both machine-readable (.json) and human-readable (.md) reports for triage.

How to enable:

Push the repo to GitHub (the workflow file is already included).
Add any required repository secrets in Settings → Secrets.
Open a PR — the workflow will run automatically and attach artifacts to the run.

Features

Automated validation of Sigma & YARA rules
Synthetic log generation for realistic test scenarios
Precision-based scoring and classification
CI/CD pipeline with automated pass/fail
Diagnostic tooling for debugging rule quality
Baseline drift detection
Human-readable and machine-readable reporting

TODO / Improvements

Add real log ingestion for hybrid precision evaluation
Enhance ATT&CK technique validation and mapping
Expand OneLogin/Okta/M365 schema coverage
Improve anomaly scoring and statistical baselining
Add Sigma→SIEM translator validation (Elastic, Splunk, Sentinel)
Add machine learning–assisted rule tuning

Contribution

Pull requests are welcome. For major changes, please open an issue to discuss your proposal, experiment, or research direction.

License

This project is released under the MIT License (or add your preferred license).

Acknowledgments

Sigma HQ community
Open-source detection engineering ecosystem
Security researchers contributing to rule standardization
Academic research in SOC automation and evaluative frameworks

SOC-as-Code transforms detection engineering into a structured, testable, and research-ready discipline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOC-as-Code Framework

Overview

Technologies Used

Directory Structure

Requirements

How It Works

Core Components

1. Universal Synthetic Log Generator (`generate_logs.py`)

2. Sigma Rule Evaluator (`SOCSimulator`, in `test.py`)

3. Rule Validator (`validate_rules.py`)

4. Classification & Scoring Engine (`compare_and_classify.py`)

5. Diagnostic Tools

Setup Instructions

Features

TODO / Improvements

Contribution

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 521 Commits
.github/workflows		.github/workflows
rules		rules
validator		validator
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

SOC-as-Code Framework

Overview

Technologies Used

Directory Structure

Requirements

How It Works

Core Components

1. Universal Synthetic Log Generator (generate_logs.py)

2. Sigma Rule Evaluator (SOCSimulator, in test.py)

3. Rule Validator (validate_rules.py)

4. Classification & Scoring Engine (compare_and_classify.py)

5. Diagnostic Tools

Setup Instructions

Features

TODO / Improvements

Contribution

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Universal Synthetic Log Generator (`generate_logs.py`)

2. Sigma Rule Evaluator (`SOCSimulator`, in `test.py`)

3. Rule Validator (`validate_rules.py`)

4. Classification & Scoring Engine (`compare_and_classify.py`)

Packages