Skip to content

rahul07890-dev/soc-as-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

521 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOC-as-Code Framework

Overview

The SOC-as-Code Framework is an automated, research-grade system for validating, testing, classifying, and governing cybersecurity detection rules (Sigma/YARA). This project treats detection rules as version-controlled software artifacts and applies CI/CD practices to ensure correctness, maintainability, and measurable detection quality.

The system includes a universal log generator, a full Sigma evaluator, a rule validator, a classification and scoring engine, and diagnostic tooling—all integrated with GitHub Actions for automated governance.


Technologies Used

  • Python 3.x — Core implementation language.
  • Universal Synthetic Log Generator — Multi-platform log simulation engine (Windows, Linux, AWS, Azure, Okta, OneLogin, M365, Google Workspace, Proxy, Network, OpenCanary, etc.).
  • Sigma Rule Engine (SOCSimulator) — Full pattern evaluator with support for Sigma modifiers.
  • JSON/YAML Processing — For rule parsing, report generation, and classification.
  • GitHub Actions — CI/CD pipeline for automated validation.
  • Regex, Pattern Matching, and Nested Field Evaluation — For deep rule matching accuracy.

Directory Structure

soc-as-code/
├── .github/
│   └── workflows/
│       └── validate-rules.yml          # CI/CD pipeline for rule validation
│
├── rules/
│   ├── sigma/                          # Sigma rule definitions
│   └── yara/                           # YARA rule definitions
│
├── validator/
│   ├── validate_rules.py               # Core rule validator
│   ├── generate_logs.py                # Synthetic log generator
│   ├── generate_report.py              # Markdown summary report builder
│   ├── compare_and_classify.py         # Rule scoring + classification engine
│   ├── check_results.py                # CI interpretation + pass/fail logic
│   ├── diagnose_rules.py               # Local diagnostics for debugging rules
│   ├── test_single_rule.py             # Full local pipeline test for a single rule
│   └── __init__.py
│
├── test.py                             # SOCSimulator: Sigma evaluator + alert generation
├── requirements.txt                    # Python dependencies
└── README.md                           # Project documentation

Requirements

  • Python 3.9 or higher
  • pip + virtual environment recommended
  • GitHub Actions (optional, for automated CI/CD)
  • YAML-compatible Sigma rules & optional YARA rules

Dependencies

Install all required packages using:

pip install -r requirements.txt

Dependencies:

pyyaml>=6.0
yara-python>=4.3.1
flask>=2.3.0
reportlab>=4.0.0
requests>=2.31.0

Libraries include:

  • pyyaml — Rule parsing
  • regex — Pattern interpretation
  • json — Log data processing
  • datetime — Report metadata
  • pathlib — Consistent file handling

How It Works

Core Components


1. Universal Synthetic Log Generator (generate_logs.py)

Generates positive (matching) and negative (non-matching) synthetic logs for each rule.

Supports more than 50+ log source types, including:

  • Windows process/file events
  • Linux process activity
  • AWS CloudTrail
  • Azure ActivityLogs / PIM
  • Okta, OneLogin
  • Microsoft 365
  • Google Workspace
  • Proxy/Web logs
  • Network telemetry
  • OpenCanary honeypot events

Special logic: New rules (e.g., IDs starting with SIG-900) intentionally produce zero synthetic logs to prevent score manipulation.


2. Sigma Rule Evaluator (SOCSimulator, in test.py)

A complete Sigma detection engine implementing:

  • Field modifiers: |contains, |startswith, |endswith, |re, |base64, |all, |exists, |gt, |lt, etc.
  • Wildcards & regex-like expressions
  • Nested field matching (actor.email, process.name, etc.)
  • Boolean detection conditions
  • Multi-selection merging

This enables academically reproducible rule evaluation.


3. Rule Validator (validate_rules.py)

Processes each rule by:

  1. Loading synthetic logs
  2. Executing Sigma matching logic
  3. Recording match data
  4. Saving structured results (detections.json, validation_results.json)
  5. Providing metadata for downstream scoring

4. Classification & Scoring Engine (compare_and_classify.py)

Compares:

  • Baseline detections (previous known-good state)
  • Current detections (introduced by new rule)

Extracts rule identifiers from YAML:

  • ID
  • Title
  • Filename

Each rule receives a 0–100 score based on:

  • True-positive improvement
  • False-positive regression
  • Precision delta
  • Detection consistency
  • Identifier correctness

Grades:

  • EXCELLENT – major positive impact
  • GOOD – improves detection quality
  • NEUTRAL – no major effect
  • CONCERNING – potential issues
  • BAD – harmful or faulty rule

Produces:

  • classification_report.json
  • Human-readable Markdown summary (summary.md)

5. Diagnostic Tools

  • diagnose_rules.py Shows why a rule did not match logs, expected fields, actual fields, and suggested fixes.

  • test_single_rule.py Runs the entire pipeline for one rule from: log generation → validation → classification → report summary.

These tools dramatically simplify research workflows and debugging experiments.


Setup Instructions

  1. Clone this repository
git clone https://github.com/rahul07890-dev/soc-as-code.git
cd soc-as-code
  1. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run local validation
python validator/validate_rules.py --rules rules/sigma --synthetic synthetic_logs/ --mode current
  1. Continuous Integration (GitHub Actions)

The project requires GitHub Actions to run the validation pipeline on every push / pull request. The included workflow (.github/workflows/validate-rules.yml) performs the following stages:

  • Checkout repository and set up Python
  • Install dependencies (pip install -r requirements.txt)
  • Generate synthetic logs (unless a SKIP_GENERATE flag is set)
  • Run the rule validator against rules/
  • Run comparison & classification when a baseline is available
  • Upload classification_report.json, validation_results.json, and synthetic_logs as workflow artifacts

Required repository secrets / environment variables:

  • BASELINE_ARTIFACT_URL (optional) — URL or path to baseline results if using external baseline storage
  • FAIL_ON_BAD_RULES (boolean) — whether PRs should fail when BAD rules are detected
  • Any platform-integrations secrets (if you add real-log ingestion)

Behavior:

  • A PR introducing BAD rules will fail the check and return actionable reports as artifacts.
  • CONCERNING rules will present warnings but may not fail depending on FAIL_ON_BAD_RULES.
  • Artifacts contain both machine-readable (.json) and human-readable (.md) reports for triage.

How to enable:

  • Push the repo to GitHub (the workflow file is already included).
  • Add any required repository secrets in Settings → Secrets.
  • Open a PR — the workflow will run automatically and attach artifacts to the run.

Features

  • Automated validation of Sigma & YARA rules
  • Synthetic log generation for realistic test scenarios
  • Precision-based scoring and classification
  • CI/CD pipeline with automated pass/fail
  • Diagnostic tooling for debugging rule quality
  • Baseline drift detection
  • Human-readable and machine-readable reporting

TODO / Improvements

  • Add real log ingestion for hybrid precision evaluation
  • Enhance ATT&CK technique validation and mapping
  • Expand OneLogin/Okta/M365 schema coverage
  • Improve anomaly scoring and statistical baselining
  • Add Sigma→SIEM translator validation (Elastic, Splunk, Sentinel)
  • Add machine learning–assisted rule tuning

Contribution

Pull requests are welcome. For major changes, please open an issue to discuss your proposal, experiment, or research direction.


License

This project is released under the MIT License (or add your preferred license).


Acknowledgments

  • Sigma HQ community
  • Open-source detection engineering ecosystem
  • Security researchers contributing to rule standardization
  • Academic research in SOC automation and evaluative frameworks

SOC-as-Code transforms detection engineering into a structured, testable, and research-ready discipline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors