Project: SecretScout - Rust/WASM Port of gitleaks-action Methodology: SPARC (Specification, Pseudocode, Architecture, Refinement, Completion) Phase: ARCHITECTURE Date: October 16, 2025 Version: 1.0 Status: Ready for Review
- Executive Summary
- Architectural Overview
- System Context
- Architectural Layers
- Component Catalog
- Deployment Architecture
- Data Flow Architecture
- Integration Architecture
- Security Architecture
- Architectural Decisions
- Quality Attributes
- Architectural Constraints
This document defines the high-level system architecture for SecretScout, a Rust-based reimplementation of gitleaks-action that compiles to both native binaries and WebAssembly (WASM). The architecture balances functional parity with the original Node.js implementation while leveraging Rust's performance, safety, and cross-platform capabilities.
- Hybrid Architecture: JavaScript wrapper + WASM core + external process orchestration
- Layered Design: Presentation (GitHub Actions) → Application (WASM) → Infrastructure (JavaScript glue)
- Event-Driven: Responds to 4 GitHub event types with specialized handling
- Pipeline Pattern: Linear flow from event → scan → parse → report → exit
- Sandboxed Execution: WASM isolation with capability-based security
- WASM Core Logic: All business logic in Rust/WASM for performance and safety
- JavaScript I/O Layer: File system, network, and process operations in Node.js
- Stateless Execution: No persistent state; each run is independent
- External Binary Dependency: Delegate secret detection to gitleaks CLI
- GitHub Actions Native: Optimized for GitHub's workflow environment
┌────────────────────────────────────────────────────────────────────────────┐
│ GITHUB ACTIONS RUNTIME │
│ (Node.js 20/24 Host) │
└───────────────────────────────┬────────────────────────────────────────────┘
│
│ action.yml
│ inputs/outputs
│
┌───────────┴────────────┐
│ PRESENTATION LAYER │
│ (JavaScript Wrapper) │
│ dist/index.js │
│ │
│ • Parse action inputs │
│ • Load WASM module │
│ • File I/O operations │
│ • Process spawning │
│ • HTTP requests │
│ • Error marshalling │
└───────────┬────────────┘
│
│ wasm-bindgen
│ FFI boundary
│
┌───────────┴────────────┐
│ APPLICATION LAYER │
│ (WASM Core Logic) │
│ secretscout_bg.wasm │
│ │
│ ┌──────────────────┐ │
│ │ Event Router │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ │ │
│ ┌──▼──┐ ┌──▼──┐ │
│ │SARIF│ │ PR │ │
│ │Parse│ │Comm.│ │
│ └──┬──┘ └──┬──┘ │
│ │ │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Summary │ │
│ │ Generator │ │
│ └───────────┘ │
└───────────┬────────────┘
│
│ async calls
│ (via JS callbacks)
│
┌───────────────────────┴────────────────────────┐
│ │
┌───────▼────────┐ ┌─────────▼─────────┐ ┌───▼──────┐
│ GITHUB API │ │ GITLEAKS BINARY │ │ FILE │
│ │ │ │ │ SYSTEM │
│ • User lookup │ │ • Secret scanning │ │ │
│ • PR commits │ │ • SARIF output │ │ • Config │
│ • PR comments │ │ • Exit codes │ │ • Events │
│ • Latest │ │ 0: Clean │ │ • SARIF │
│ releases │ │ 1: Error │ │ │
└────────────────┘ │ 2: Secrets │ └──────────┘
└───────────────────┘
Primary Pattern: Pipes and Filters
- Linear pipeline from input (event) to output (exit code)
- Each component transforms data and passes to next stage
- Clear separation of concerns between filters
Secondary Pattern: Event-Driven
- Different processing paths based on GitHub event type
- Asynchronous operations with callbacks
- Non-blocking I/O operations
Tertiary Pattern: Layered
- Clear separation between presentation, application, and infrastructure
- Dependencies flow downward (no circular dependencies)
- Each layer has distinct responsibilities
┌──────────────────┐
│ │
│ GitHub Actions │
│ Platform │
│ │
└────────┬─────────┘
│
│ triggers workflow
│
┌─────────────────▼────────────────┐
│ │
│ SecretScout │
│ (Rust/WASM Action) │
│ │
└───┬─────────────┬────────────┬───┘
│ │ │
┌─────────────┘ │ └─────────────┐
│ │ │
│ │ │
┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼────────┐
│ │ │ │ │ │
│ GitHub REST │ │ Gitleaks │ │ Repository │
│ API │ │ Binary (CLI) │ │ File System │
│ │ │ │ │ │
│ • User info │ │ • Scan commits │ │ • Source code │
│ • PR metadata │ │ • Detect secrets │ │ • Config files │
│ • Comments │ │ • Generate SARIF │ │ • Event JSON │
└───────────────────┘ └───────────────────┘ └──────────────────┘
-
GitHub Actions Platform
- Provides: Node.js runtime, environment variables, workflow context
- Consumes: Exit codes, job summaries, artifacts, log output
- Interaction: Synchronous execution model
-
GitHub REST API
- Provides: User metadata, PR commits, existing comments
- Consumes: Review comments, artifact uploads
- Interaction: Asynchronous HTTP with rate limiting
-
Gitleaks Binary
- Provides: Secret detection, SARIF reports
- Consumes: Git repository, configuration files, CLI arguments
- Interaction: Synchronous process execution
-
Repository File System
- Provides: Source code, configuration, event payloads
- Consumes: SARIF output files
- Interaction: Synchronous file I/O
Inside the System:
- Event routing and orchestration
- SARIF parsing and transformation
- PR comment content generation
- Job summary HTML generation
- Configuration management
- Binary download and caching
- Error handling and reporting
Outside the System:
- Secret detection algorithms (delegated to gitleaks)
- Git operations (delegated to gitleaks)
- GitHub Actions runtime (host environment)
- HTTP client implementation (Node.js fetch/reqwest)
- File system operations (Node.js fs module)
Technology: Node.js 20/24, JavaScript ES2022
Location: dist/index.js
Responsibilities:
- Parse GitHub Actions inputs from
action.yml - Load and initialize WASM module
- Perform all file system operations (read event JSON, SARIF files)
- Execute HTTP requests (GitHub API, binary downloads)
- Spawn and manage child processes (gitleaks binary)
- Marshal data between JavaScript and WASM
- Write job summaries to
GITHUB_STEP_SUMMARY - Set action outputs via
core.setOutput() - Handle uncaught WASM errors
- Exit with appropriate status code
Key Characteristics:
- Stateless (no persistent state between calls)
- Thin layer (minimal business logic)
- Synchronous interface to GitHub Actions
- Asynchronous interface to WASM core
Dependencies:
@actions/core- GitHub Actions toolkit@actions/github- GitHub API client@actions/artifact- Artifact upload@actions/tool-cache- Binary cachingsecretscout.js- WASM bindings (generated)
Technology: Rust 2021, wasm32-unknown-unknown target
Location: dist/secretscout_bg.wasm
Responsibilities:
- Route events to appropriate handlers (push, PR, workflow_dispatch, schedule)
- Parse and validate SARIF JSON structure
- Extract secrets, locations, and metadata from SARIF
- Generate fingerprint strings (
{commit}:{file}:{rule}:{line}) - Generate PR comment content (Markdown with structured data)
- Generate job summary HTML tables
- Validate configuration values
- Implement retry logic with exponential backoff
- Handle errors with detailed context
Key Characteristics:
- Pure functional core (no side effects)
- Memory-safe (Rust's ownership system)
- Sandboxed execution (WASM security model)
- Deterministic (same inputs → same outputs)
- Cross-platform (universal binary)
Dependencies (compiled into WASM):
serde+serde_json- JSON parsingwasm-bindgen- JavaScript interopthiserror- Error types- Minimal dependencies (size optimization)
Components:
-
GitHub API Client (JavaScript + Rust types)
- User account type lookup
- PR commit fetching with pagination
- PR comment posting with deduplication
- Latest release queries
- Rate limit tracking and backoff
-
Binary Management (JavaScript execution + Rust orchestration)
- Version resolution (default, override, "latest")
- Platform/architecture detection
- Cache key generation
- HTTP download with progress
- Archive extraction (tar.gz, zip)
- PATH modification
- Process spawning with argument construction
-
File System Abstraction (JavaScript implementation)
- Read event JSON payloads
- Read SARIF output files
- Write job summary files
- Discover configuration files
- Validate paths (security)
Key Characteristics:
- Platform-specific implementations
- Handles all I/O operations
- Manages external process lifecycle
- Implements retry and error recovery
Purpose: Dispatch events to specialized handlers based on GitHub event type
Responsibilities:
- Parse
GITHUB_EVENT_NAMEenvironment variable - Load event JSON from
GITHUB_EVENT_PATH - Validate event structure
- Route to:
HandlePushEvent(),HandlePullRequestEvent(),HandleWorkflowDispatchEvent(), orHandleScheduleEvent() - Return scan configuration (base ref, head ref, log-opts)
Interfaces:
- Input: Event type string, event JSON payload
- Output:
ScanConfigurationstruct or error
Dependencies: Configuration module, SARIF parser
Key Algorithms:
- Event type validation (enum matching)
- Commit range determination (first to last)
- BASE_REF override logic
- Single commit optimization (log-opts=-1)
Error Handling: Fail fast on unsupported event types or malformed JSON
Purpose: Download, cache, and execute gitleaks binary
Responsibilities:
- Resolve gitleaks version (default 8.24.3, override, or "latest" from API)
- Detect platform (linux, darwin, windows) and architecture (x64, arm64)
- Generate cache key:
gitleaks-cache-{version}-{platform}-{arch} - Download binary from GitHub releases if not cached
- Extract archive (platform-specific: tar.gz vs zip)
- Add binary to PATH
- Construct CLI arguments with event-specific log-opts
- Execute gitleaks as child process
- Capture stdout, stderr, and exit code
- Return execution results
Interfaces:
- Input:
ScanConfiguration, gitleaks version - Output: Exit code (0/1/2), SARIF file path
Dependencies: HTTP client (JavaScript), cache API (JavaScript), process spawner (JavaScript)
Key Algorithms:
- Cache hit/miss detection
- URL construction:
https://github.com/zricethezav/gitleaks/releases/download/v{version}/gitleaks_{version}_{platform}_{arch}.{ext} - Argument array construction based on event type
- Exit code interpretation
Error Handling:
- Cache failures → Download fresh (non-fatal)
- Download failures → Fatal error
- Extraction failures → Fatal error
- Gitleaks exit code 1 → Fatal error
- Gitleaks exit code 2 → Continue (secrets found)
Purpose: Parse SARIF v2 JSON output from gitleaks into structured data
Responsibilities:
- Read SARIF file (via JavaScript callback)
- Deserialize JSON with serde
- Validate SARIF structure (runs[0].results[])
- Extract for each result:
ruleId(detection rule name)partialFingerprints(commit, author, email, date)locations[0].physicalLocation(file path, line number)
- Generate fingerprint string:
{commitSha}:{filePath}:{ruleId}:{startLine} - Return
Vec<DetectedSecret>
Interfaces:
- Input: SARIF file path (JavaScript reads file)
- Output:
Vec<DetectedSecret>orSARIFParseError
Dependencies: serde_json, file reader (JavaScript callback)
Data Structures:
struct DetectedSecret {
rule_id: String,
commit_sha: String,
author: String,
email: String,
date: String,
file_path: String,
start_line: u32,
fingerprint: String,
}Key Algorithms:
- Null-safe JSON traversal
- Fingerprint generation
- Default value substitution for missing fields
Error Handling:
- File not found → Fatal error
- Invalid JSON → Fatal error
- Missing required fields → Use defaults where possible
- Empty results array → Valid (no secrets found)
Purpose: Create inline review comments on pull requests
Responsibilities:
- Generate comment body Markdown
- Determine comment placement (file, line, commit, side)
- Fetch existing PR comments via GitHub API (JavaScript)
- Build deduplication hash map
- Check for duplicate comments (same body, path, line)
- Post new comments via GitHub API (JavaScript)
- Handle API errors gracefully (non-fatal)
- Add @mentions if
GITLEAKS_NOTIFY_USER_LISTset
Interfaces:
- Input:
Vec<DetectedSecret>, PR number, GITHUB_TOKEN - Output: Comment count (posted), errors (warnings)
Dependencies: GitHub API client (JavaScript), string utilities (WASM)
Comment Format:
🛑 **Gitleaks detected:** {rule_id}
**Commit:** {short_sha}
**Fingerprint:** {commit}:{file}:{rule}:{line}
To ignore this secret, add the fingerprint to your .gitleaksignore file.
cc @user1, @user2Key Algorithms:
- Markdown generation with escaping
- Deduplication map (HashMap<(body, path, line), bool>)
- Retry with exponential backoff for transient failures
- Rate limit tracking and pre-emptive backoff
Error Handling:
- API failures → Log warning, continue (secrets still in summary/artifacts)
- Rate limits → Retry with backoff
- 422 Large Diff → Skip comment, log info
- Line not in diff → Skip comment
Purpose: Generate GitHub Actions job summary HTML/Markdown
Responsibilities:
- Determine summary type based on exit code:
- Exit 0 → Success message
- Exit 2 → HTML table with findings
- Exit 1 → Error message
- Generate HTML table with columns: Rule ID, Commit, Secret URL, Line, Author, Date, Email, File
- Generate hyperlinks to repository (commit URLs, file URLs with line anchors)
- Escape HTML entities (XSS prevention)
- Return summary string for JavaScript to write to
GITHUB_STEP_SUMMARY
Interfaces:
- Input: Exit code,
Vec<DetectedSecret>, repository URL - Output: Markdown/HTML string
Dependencies: String formatting utilities
Summary Types:
- Success:
## No leaks detected ✅ - Secrets: HTML table with all findings
- Error:
## ❌ Gitleaks exited with error. Exit code [1]
URL Patterns:
- Commit:
{repo_url}/commit/{commitSha} - Secret:
{repo_url}/blob/{commitSha}/{filePath}#L{startLine} - File:
{repo_url}/blob/{commitSha}/{filePath}
Key Algorithms:
- HTML table generation
- URL construction with encoding
- HTML entity escaping (prevent XSS)
- Conditional rendering based on exit code
Error Handling: Should never fail (defensive programming with fallbacks)
Purpose: Load and validate all configuration from environment variables
Responsibilities:
- Parse 14 environment variables
- Validate required vs optional variables
- Implement special boolean parsing logic:
"false"→ false"0"→ false- Everything else → true (including empty string)
- Discover configuration file:
- Priority 1:
GITLEAKS_CONFIG(explicit) - Priority 2:
gitleaks.toml(auto-detect) - Priority 3: None (use gitleaks defaults)
- Priority 1:
- Validate paths (security checks)
- Return
Configurationstruct
Interfaces:
- Input: Environment variable map (from JavaScript)
- Output:
Configurationstruct orConfigurationError
Data Structures:
struct Configuration {
github_token: String,
github_workspace: String,
github_event_path: String,
github_event_name: String,
github_repository: String,
github_repository_owner: String,
gitleaks_version: String,
gitleaks_config: Option<String>,
gitleaks_license: Option<String>,
enable_summary: bool,
enable_upload_artifact: bool,
enable_comments: bool,
notify_user_list: Vec<String>,
base_ref: Option<String>,
}Key Algorithms:
- Boolean parsing with backward compatibility
- Config file discovery (priority order)
- Path validation (traversal prevention)
- Canonical path resolution
Error Handling:
- Missing required env var → Fatal error with clear message
- Invalid path → Fatal error
- Missing optional env var → Use default
Purpose: Interact with GitHub REST API
Responsibilities:
- Fetch user account type (
GET /users/{username}) - Fetch latest gitleaks release (
GET /repos/zricethezav/gitleaks/releases/latest) - Fetch PR commits with pagination (
GET /repos/{owner}/{repo}/pulls/{number}/commits) - Fetch PR comments (
GET /repos/{owner}/{repo}/pulls/{number}/comments) - Post PR comment (
POST /repos/{owner}/{repo}/pulls/{number}/comments) - Implement retry with exponential backoff (3 retries, 1s initial, 60s max)
- Track rate limits (workflow token: 1,000 req/hr)
- Respect
Retry-Afterheader
Interfaces:
- Input: Endpoint, method, body, GITHUB_TOKEN
- Output: Response body or
APIError
Dependencies: HTTP client (JavaScript fetch or reqwest), retry logic (WASM)
Retry Strategy:
- Retry on: 429, 500, 502, 503, 504, network errors, timeouts
- Don't retry on: 401, 403, 404, 400 (bad input)
- Multiplier: 2.0 (exponential)
- Jitter: 10% (prevent thundering herd)
Key Algorithms:
- Exponential backoff with jitter
- Rate limit tracking (atomic counters)
- Response pagination (follow
Linkheader) - Error classification (retry vs fail)
Error Handling:
- Authentication failures → Fatal (401, 403)
- Not found → Context-dependent (404)
- Rate limits → Retry with backoff (429)
- Server errors → Retry with backoff (5xx)
- Network errors → Retry with backoff
Purpose: Validate Keygen.sh licenses for organization accounts
Status: Currently disabled, retained for future re-enablement
Responsibilities:
- Determine account type (Organization vs User)
- Skip validation for personal accounts
- Validate license via Keygen API for organizations
- Handle responses: VALID, TOO_MANY_MACHINES, NO_MACHINE
- Activate repository (associate with license)
Interfaces:
- Input: Repository owner, GITLEAKS_LICENSE
- Output: Validation result or error
Key Algorithms:
- Account type detection with fallback
- License validation with activation retry
- Fingerprint generation (owner/repo)
Error Handling:
- License validation failure → Fatal error
- API unavailable → Fatal error
- Missing license (organizations) → Fatal error
Purpose: Centralized error handling and logging
Responsibilities:
- Define error types for all modules
- Implement error context propagation
- Format user-friendly error messages
- Determine fatal vs non-fatal errors
- Log errors to GitHub Actions (::error::, ::warning::)
Error Categories:
- Fatal: Exit immediately with code 1
- Unsupported event type
- Missing required configuration
- SARIF parse failure
- Gitleaks exit code 1
- Non-Fatal: Log warning, continue
- PR comment failures
- Cache failures
- Latest release fetch failure
Interfaces:
- Input: Error type, context
- Output: Formatted error message, logging side effect
Key Algorithms:
- Error type enumeration (thiserror)
- Context propagation (anyhow)
- Log level determination
- Secret masking in error messages
Type: GitHub Action (JavaScript Action with WASM core)
Deployment Units:
-
Action Metadata:
action.yml- Defines inputs, outputs, runtime (Node.js 20)
- Specifies entry point (
dist/index.js)
-
JavaScript Wrapper:
dist/index.js- Bundled with dependencies (no node_modules in repository)
- Includes WASM loader code
- Platform-agnostic (Node.js handles platform differences)
-
WASM Module:
dist/secretscout_bg.wasm- Compiled Rust code
- Universal binary (works on all platforms)
- Size: ~500KB (optimized)
-
WASM Bindings:
dist/secretscout.js- Generated by wasm-bindgen
- Provides JavaScript interface to WASM
Distribution:
- Primary: GitHub repository (git clone/checkout)
- Versioning: Git tags (v3.0.0, v3.1.0, etc.)
- User Consumption:
steps: - uses: gitleaks/gitleaks-action@v3
Host: GitHub Actions Runner
- OS: Ubuntu 22.04, Ubuntu 24.04, macOS 13/14, Windows Server 2022
- Runtime: Node.js 20 (current), Node.js 24 (future)
- Memory: 7 GB available (use ~100 MB peak)
- Disk: Sufficient for gitleaks binary (~20 MB) + caching
Execution Lifecycle:
- GitHub Actions clones action repository
- Node.js executes
dist/index.js - JavaScript loads WASM module
- JavaScript calls WASM entry point with event data
- WASM processes and returns results
- JavaScript performs I/O (API calls, file writes, process spawn)
- Process exits with appropriate code (0/1/2)
Resource Requirements:
- CPU: Minimal (most time spent in gitleaks binary)
- Memory: <200 MB peak (including Node.js runtime)
- Disk: <100 MB (gitleaks binary + cache)
- Network: <50 MB download (gitleaks binary, first run)
Build Steps:
- Compile Rust to WASM:
wasm-pack build --target nodejs --release
- Optimize WASM:
wasm-opt -Oz dist/secretscout_bg.wasm -o dist/secretscout_bg.wasm
- Create JavaScript wrapper (
dist/index.js) - Test on all platforms (Ubuntu, macOS, Windows)
- Commit
dist/directory to repository - Tag release (e.g., v3.0.0)
- Create GitHub release with notes
Release Artifacts:
- Git tag (e.g.,
v3.0.0) - GitHub release (changelog, notes)
dist/directory in repository (ready to use)
Versioning Strategy:
- Major: v3 (breaking changes from v2)
- Minor: v3.1 (new features, backward compatible)
- Patch: v3.1.1 (bug fixes)
Users Pin:
- Recommended:
@v3(major version, auto-updates) - Conservative:
@v3.1.2(exact version) - Bleeding edge:
@main(not recommended)
1. GitHub Event Trigger
│
│ Event JSON (push/PR/workflow_dispatch/schedule)
│
▼
2. JavaScript Wrapper
│
│ Parse action.yml inputs
│ Read environment variables
│
▼
3. WASM: Configuration Manager
│
│ Configuration struct
│
▼
4. WASM: Event Router
│
│ ScanConfiguration (base ref, head ref, log-opts)
│
▼
5. JavaScript: Binary Manager
│
│ Download/cache gitleaks binary
│ Construct CLI arguments
│
▼
6. External Process: Gitleaks Binary
│
│ Execute scan
│ Generate results.sarif
│ Exit with code (0/2)
│
▼
7. JavaScript: Read SARIF File
│
│ SARIF JSON string
│
▼
8. WASM: SARIF Parser
│
│ Vec<DetectedSecret>
│
▼
9. WASM: PR Comment Generator (if PR event)
│
│ Comment Markdown strings
│
▼
10. JavaScript: GitHub API (post comments)
│
│ HTTP POST responses
│
▼
11. WASM: Job Summary Generator
│
│ Summary HTML/Markdown
│
▼
12. JavaScript: Write to GITHUB_STEP_SUMMARY
│
▼
13. JavaScript: Set Action Outputs
│
▼
14. Exit with code (0 = success, 1 = error, 2 = secrets found)
┌─────────────────────────────────────────────────────────────────┐
│ 1. GitHub Triggers PR Event (opened, synchronize) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. JavaScript: Read event JSON from GITHUB_EVENT_PATH │
│ { │
│ "pull_request": { │
│ "number": 123, │
│ "head": { "sha": "abc123" }, │
│ "base": { "sha": "def456" } │
│ } │
│ } │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. WASM: Event Router │
│ - Detect event type: "pull_request" │
│ - Extract PR number: 123 │
│ - Call HandlePullRequestEvent() │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. JavaScript: GitHub API - Fetch PR Commits │
│ GET /repos/{owner}/{repo}/pulls/123/commits │
│ Response: [ │
│ { "sha": "abc123", ... }, │
│ { "sha": "def456", ... } │
│ ] │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 5. WASM: Determine Scan Range │
│ - base_ref = first commit SHA (abc123) │
│ - head_ref = last commit SHA (def456) │
│ - log_opts = "--no-merges --first-parent abc123^..def456" │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 6. JavaScript: Execute Gitleaks │
│ gitleaks detect --redact -v --exit-code=2 \ │
│ --report-format=sarif --report-path=results.sarif \ │
│ --log-level=debug \ │
│ --log-opts="--no-merges --first-parent abc123^..def456" │
│ Exit code: 2 (secrets found) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 7. WASM: Parse SARIF │
│ [ │
│ DetectedSecret { │
│ rule_id: "aws-access-token", │
│ commit_sha: "abc123", │
│ file_path: "src/config.js", │
│ start_line: 42, │
│ fingerprint: "abc123:src/config.js:aws-access-token:42" │
│ } │
│ ] │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 8. JavaScript: GitHub API - Fetch Existing Comments │
│ GET /repos/{owner}/{repo}/pulls/123/comments │
│ (For deduplication) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 9. WASM: Generate Comment Body │
│ "🛑 **Gitleaks detected:** aws-access-token\n\n │
│ **Commit:** abc123\n │
│ **Fingerprint:** abc123:src/config.js:aws-access-token:42" │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 10. JavaScript: GitHub API - Post Comment │
│ POST /repos/{owner}/{repo}/pulls/123/comments │
│ { │
│ "body": "🛑 **Gitleaks detected:** ...", │
│ "commit_id": "abc123", │
│ "path": "src/config.js", │
│ "line": 42, │
│ "side": "RIGHT" │
│ } │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 11. WASM: Generate Job Summary (HTML Table) │
│ ## 🛑 Gitleaks detected secrets 🛑 │
│ <table>...</table> │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 12. JavaScript: Write Summary + Upload Artifact │
│ - Write to GITHUB_STEP_SUMMARY │
│ - Upload results.sarif as artifact │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 13. Exit with code 1 (failure, because secrets were found) │
└─────────────────────────────────────────────────────────────────┘
Persistent Storage: None (stateless execution)
Temporary Storage:
- SARIF File:
{GITHUB_WORKSPACE}/results.sarif(deleted after run) - Gitleaks Binary: GitHub Actions cache (persistent across runs)
- WASM Module: Loaded into memory (not cached between runs)
Cache Strategy:
- Gitleaks Binary: Cache key =
gitleaks-cache-{version}-{platform}-{arch} - TTL: 7 days (GitHub Actions default)
- Invalidation: Version change, manual cache clear
Integration Type: Host runtime environment
Interfaces:
- action.yml: Metadata (inputs, outputs, runtime)
- Environment Variables: Configuration source
- GITHUB_STEP_SUMMARY: Job summary output
- core.setOutput(): Action outputs
- Process Exit Code: Success/failure signal
Data Flow:
- Actions platform → Environment variables → Application
- Application → GITHUB_STEP_SUMMARY file → Actions platform
- Application → Exit code → Actions platform
Error Handling:
- Uncaught exceptions → Exit code 1
- Exit code 0 → Success (green checkmark)
- Exit code 1 → Failure (red X)
- Exit code 2 → Treated as failure (secrets found)
Integration Type: RESTful HTTP API
Endpoints:
GET /users/{username}- Account type detectionGET /repos/zricethezav/gitleaks/releases/latest- Latest versionGET /repos/{owner}/{repo}/pulls/{number}/commits- PR commitsGET /repos/{owner}/{repo}/pulls/{number}/comments- PR commentsPOST /repos/{owner}/{repo}/pulls/{number}/comments- Post comment
Authentication:
- Header:
Authorization: Bearer {GITHUB_TOKEN} - Token type: Workflow token (automatic) or PAT (manual)
Rate Limiting:
- Workflow token: 1,000 requests/hour
- Personal token: 5,000 requests/hour
- Strategy: Track requests, pre-emptive backoff when <100 remaining
Retry Strategy:
- Transient errors (429, 5xx): Exponential backoff (3 retries)
- Client errors (4xx): No retry
- Network errors: Retry with backoff
Data Flow:
- Application → HTTP client (JavaScript) → GitHub API
- GitHub API → HTTP response → Application
Integration Type: External process execution
Execution Model:
- Spawn child process
- Pass arguments via CLI
- Capture stdout, stderr
- Wait for exit code
- Read output file (results.sarif)
Argument Construction:
- Base:
detect --redact -v --exit-code=2 --report-format=sarif --report-path=results.sarif --log-level=debug - Event-specific:
--log-opts="..." - Optional:
--config={path}
Exit Code Mapping:
- 0 → No secrets found (success)
- 1 → Error occurred (failure)
- 2 → Secrets found (process results, then failure)
Data Flow:
- Application → Process spawner (JavaScript) → Gitleaks binary
- Gitleaks binary → SARIF file → Application
- Gitleaks binary → Exit code → Application
Integration Type: Synchronous file I/O (via JavaScript)
Operations:
- Read: Event JSON, SARIF files, configuration files
- Write: Job summary (GITHUB_STEP_SUMMARY), SARIF files
- Discover: Auto-detect gitleaks.toml
Security:
- Path validation (no
..traversal) - Constrain to GITHUB_WORKSPACE
- Verify file existence and permissions
Data Flow:
- Application → File reader (JavaScript) → File system
- File system → File contents → Application
Pattern 1: Request-Response (Synchronous)
- Used for: File I/O, process execution
- Characteristics: Blocking, sequential
- Example: Read event JSON → Parse → Route
Pattern 2: Async Request-Response
- Used for: HTTP API calls
- Characteristics: Non-blocking, concurrent
- Example: Fetch PR commits while downloading binary
Pattern 3: Callback (WASM ↔ JavaScript)
- Used for: WASM requesting I/O operations
- Characteristics: Asynchronous, bidirectional
- Example: WASM requests GitHub API call → JavaScript executes → Returns response to WASM
Pattern 4: Fire and Forget
- Used for: Logging, artifact uploads
- Characteristics: Non-blocking, no response needed
- Example: Upload artifact to GitHub Actions
Threat Model: Malicious input data, compromised dependencies
Mitigation:
- WASM runs in isolated linear memory (no access to process memory)
- No direct file system access (capability-based security)
- No direct network access (must call through JavaScript)
- No process spawning (must call through JavaScript)
- Stack overflow protection (WASM runtime)
- Type safety at module boundary (wasm-bindgen)
Attack Surface Reduction:
- WASM module cannot:
- Read arbitrary files
- Make arbitrary network requests
- Execute arbitrary code
- Access environment variables directly
- Spawn child processes
Limitations:
- WASM does not protect against logic bugs
- WASM does not prevent denial of service (resource exhaustion)
- JavaScript layer is not sandboxed (Node.js permissions)
Threat Model: Injection attacks, path traversal, malicious JSON
Mitigation:
-
Path Validation:
fn validate_path(path: &str, workspace: &str) -> Result<PathBuf> { // Reject paths with .. if path.contains("..") { return Err(PathTraversalError); } // Resolve to canonical path let canonical = fs::canonicalize(path)?; // Ensure within workspace if !canonical.starts_with(workspace) { return Err(OutsideWorkspaceError); } Ok(canonical) }
-
Git Reference Validation:
- Commit SHAs: Validate 40-character hex
- Branch names: Reject shell metacharacters (
;,|,&, etc.) - Sanitize before passing to gitleaks CLI
-
JSON Validation:
- Schema validation with serde
- Reject malformed JSON
- Validate required fields exist
- Use safe deserialization (no
unsafe)
-
Environment Variable Validation:
- Boolean: Only accept
"true","false","0","1" - Version: Validate semantic versioning format
- Repository: Validate
owner/repoformat
- Boolean: Only accept
Threat Model: Secret exposure in logs, outputs, error messages
Mitigation:
-
Input Secrets:
- Never log GITHUB_TOKEN value
- Never log GITLEAKS_LICENSE value
- Mask in error messages:
Error: Invalid token '***'
-
Detected Secrets:
- Always use gitleaks
--redactflag - SARIF output contains redacted values only
- PR comments never include actual secret values
- Job summaries never include actual secret values
- Always use gitleaks
-
GitHub Actions Secret Masking:
- Use
::add-mask::command if handling secrets in JavaScript - Ensure WASM module doesn't log sensitive data
- Use
Secret Redaction Example:
Detected: AWS_ACCESS_KEY_ID
Value (redacted): AKIA****************
Location: src/config.js:42
Threat Model: Compromised dependencies, vulnerabilities
Mitigation:
-
Vulnerability Scanning:
- Run
cargo auditin CI on every build - Fail build on high/critical vulnerabilities
- Document accepted low/medium vulnerabilities
- Run
-
Supply Chain Integrity:
- Commit Cargo.lock to repository
- Pin dependency versions
- Use
cargo verify-projectto check integrity - Regularly update dependencies (Dependabot)
-
License Compliance:
- Run
cargo denyto check licenses - Ensure all dependencies use permissive licenses (MIT, Apache)
- Document any copyleft dependencies
- Run
-
Minimal Dependencies:
- Use
default-features = falseto avoid bloat - Avoid unnecessary dependencies
- Prefer standard library over external crates
- Use
Example Cargo.toml:
[dependencies]
serde = { version = "1.0", default-features = false, features = ["derive"] }
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }Threat Model: Unauthorized API access, token theft
Mitigation:
-
GitHub Token Handling:
- Token stored in environment variable (GITHUB_TOKEN)
- Never logged or printed
- Automatically provided by GitHub Actions
- Scoped to repository (no access to other repos)
- Expires after job completion
-
License Key Handling (when enabled):
- Stored in environment variable (GITLEAKS_LICENSE)
- Never logged or printed
- Validated via Keygen.sh API
- Scoped to repository fingerprint
-
API Authentication:
- All GitHub API calls include
Authorization: Bearer {token} - Token validation by GitHub (not by action)
- 401/403 errors → Fatal failure
- All GitHub API calls include
Threat Model: Error messages revealing sensitive information
Mitigation:
-
Error Message Sanitization:
- Remove sensitive data from error messages
- Use generic messages for external users
- Detailed messages for debugging (in logs only)
-
Stack Trace Filtering:
- Don't expose internal paths in production
- Sanitize file paths (relative to workspace)
-
Example:
// Bad: Exposes token Err(format!("API call failed with token {}", token)) // Good: Masks token Err(format!("API call failed with token '***'"))
Input Validation:
- All file paths validated (no
..traversal) - All git references validated (no shell injection)
- All JSON validated with schema
- All environment variables validated
Secrets Management:
- GITHUB_TOKEN never logged
- GITLEAKS_LICENSE never logged
- Detected secrets always redacted (gitleaks
--redact) - Error messages sanitized
Dependency Security:
-
cargo auditpasses (no high/critical vulns) -
cargo denypasses (license compliance) - Cargo.lock committed to repository
- Dependencies use
default-features = false
Authentication:
- All GitHub API calls authenticated
- Tokens scoped appropriately
- 401/403 errors handled (fatal)
Error Handling:
- Error messages sanitized
- Stack traces filtered
- No information disclosure
Decision: Use JavaScript for I/O operations and WASM for business logic
Rationale:
- WASM Strengths: Memory safety, performance, cross-platform, sandboxing
- WASM Limitations: No file I/O, no network, no process spawning
- JavaScript Strengths: Native GitHub Actions support, I/O operations, ecosystem
- JavaScript Limitations: Memory safety, performance
Alternatives Considered:
- Pure JavaScript: Simple but loses Rust benefits (safety, performance)
- Pure Rust Native: Requires cross-compilation for all platforms
- Docker Container: Slower startup, larger distribution size
Trade-offs:
- Pros: Best of both worlds, universal binary (WASM), sandboxed execution
- Cons: FFI boundary overhead, complexity of two languages
Status: Accepted
Decision: Delegate secret detection to external gitleaks binary instead of reimplementing
Rationale:
- Gitleaks is mature, well-tested, and actively maintained
- Reimplementing secret detection would be significant effort
- Gitleaks team is expert in this domain
- Allows independent evolution of detection rules
Alternatives Considered:
- Reimplement in Rust: High effort, duplication, maintenance burden
- Embed gitleaks as library: Not feasible (different languages)
Trade-offs:
- Pros: Leverage existing expertise, faster development, lower maintenance
- Cons: External dependency, download overhead, version compatibility
Mitigation:
- Cache gitleaks binary (reduces download overhead)
- Pin default version (stability)
- Allow version override (flexibility)
Status: Accepted
Decision: No persistent state between runs; each execution is independent
Rationale:
- GitHub Actions model is stateless (each job is isolated)
- Simplifies implementation (no database, no state management)
- Easier to reason about (deterministic)
- No cache invalidation issues
Alternatives Considered:
- Persistent Database: Store historical results for deduplication
- File-Based Cache: Store scan results between runs
Trade-offs:
- Pros: Simple, deterministic, no cache invalidation bugs
- Cons: Cannot deduplicate across runs, re-scan on every trigger
Status: Accepted
Decision: Route events to specialized handlers based on event type (push, PR, workflow_dispatch, schedule)
Rationale:
- Each event type has different requirements (PR needs comments, push doesn't)
- Specialized handlers simplify logic (no complex conditionals)
- Clear separation of concerns
- Easier to test (mock one event type at a time)
Alternatives Considered:
- Generic Handler: Single handler with conditionals for event type
- Polymorphic Handlers: Abstract base class with overrides
Trade-offs:
- Pros: Clear, maintainable, testable
- Cons: Some code duplication across handlers
Status: Accepted
Decision: Use SARIF v2 as the data format for secret detection results
Rationale:
- Gitleaks native output format
- Industry standard for security findings (OASIS)
- Rich metadata (locations, fingerprints, rules)
- Supported by GitHub Code Scanning
Alternatives Considered:
- Custom JSON Format: Would require gitleaks modification
- Plain Text: Difficult to parse, no structure
Trade-offs:
- Pros: Standard format, rich metadata, GitHub integration
- Cons: Complex schema (deep nesting)
Status: Accepted
Decision: Post inline review comments on pull requests at exact secret locations
Rationale:
- Best user experience (see error at exact location)
- Actionable feedback (fingerprint for .gitleaksignore)
- Aligns with GitHub code review workflow
- Prevents secrets from being merged
Alternatives Considered:
- Job Summary Only: Less visibility, no direct feedback
- PR Description Comment: Not line-specific, easy to miss
Trade-offs:
- Pros: Excellent UX, prevents merges, actionable
- Cons: API rate limits, large diff errors (HTTP 422)
Mitigation:
- Deduplication prevents spam
- Non-fatal errors (secrets still in summary/artifacts)
- Rate limit tracking and backoff
Status: Accepted
Decision: Implement exponential backoff with jitter for transient API failures
Rationale:
- Transient failures are common (network blips, rate limits)
- Exponential backoff prevents overwhelming the server
- Jitter prevents thundering herd (synchronized retries)
- Industry best practice
Configuration:
- Default: 3 retries
- Initial delay: 1 second
- Multiplier: 2.0 (exponential)
- Max delay: 60 seconds
- Jitter: 10%
Alternatives Considered:
- Linear Backoff: Less effective, can still overwhelm server
- No Retries: Brittle, fails on transient errors
Trade-offs:
- Pros: Resilient, self-healing, best practice
- Cons: Increased latency on failures
Status: Accepted
Decision: Aggressively optimize WASM binary size (target: <500 KB)
Rationale:
- Faster distribution (smaller git clone)
- Faster load time (network transfer, parsing)
- Better user experience (responsive)
- Enables npm distribution (size limits)
Optimization Techniques:
opt-level = 'z'(optimize for size)lto = true(link-time optimization)codegen-units = 1(maximum optimization)strip = true(remove debug symbols)wasm-opt -Oz(post-process optimization)default-features = false(minimal dependencies)
Alternatives Considered:
- Optimize for Speed: Larger binary (1-2 MB)
- No Optimization: Debug build (5-10 MB)
Trade-offs:
- Pros: Small binary, fast distribution, better UX
- Cons: Longer build times, harder debugging (in release mode)
Status: Accepted
Decision: Maintain 100% backward compatibility with gitleaks-action v2.x
Rationale:
- Ease migration (no workflow changes required)
- User trust (predictable behavior)
- Drop-in replacement (users can switch with confidence)
- Avoid ecosystem fragmentation
Compatibility Requirements:
- Same environment variables
- Same boolean parsing logic (quirks and all)
- Same output formats (SARIF, comments, summaries)
- Same exit codes (0, 1, 2)
- Same error messages (where practical)
Alternatives Considered:
- Clean Slate: Redesign API, breaking changes
- Partial Compatibility: Support new API, deprecate old
Trade-offs:
- Pros: Easy migration, user trust, no fragmentation
- Cons: Inherits v2 quirks (e.g., boolean parsing)
Status: Accepted
Decision: Implement license validation but keep it disabled by default (feature flag)
Rationale:
- Original implementation had license validation (via Keygen.sh)
- Currently disabled due to payment processing issues
- Retain code for future re-enablement
- No impact on users (disabled)
Status: Accepted (disabled)
Future Considerations:
- Re-enable if Keygen service restored
- Provide fallback mechanism (grace period)
- Document clearly in user-facing docs
Build Performance:
- Target: Cold build ≤5 min, cached build ≤2 min
- Strategy: Rust cache (Swatinem/rust-cache), sccache, parallel compilation
- Measurement: CI pipeline duration
Runtime Performance:
- Target: Total overhead ≤2 seconds (excluding gitleaks scan)
- Breakdown:
- WASM load: ≤50ms
- Event parsing: ≤10ms
- SARIF parsing: ≤100ms (10 findings)
- GitHub API: ≤500ms per request
- Measurement: Instrumented logging, benchmarks
Binary Size:
- Target: WASM ≤500 KB (uncompressed), ≤200 KB (gzip)
- Strategy:
opt-level='z', LTO,wasm-opt -Oz, minimal dependencies - Measurement: File size after build
Memory Usage:
- Target: Peak ≤200 MB (including Node.js runtime)
- Strategy: Streaming JSON parsing, bounded collections, no memory leaks
- Measurement: Process memory monitoring
Availability:
- Target: 99.9% success rate (excluding secrets found)
- Strategy: Retry with backoff, graceful degradation, comprehensive error handling
- Measurement: Success rate in CI/CD pipelines
Fault Tolerance:
- Transient Failures: Retry with exponential backoff (API, network)
- Non-Critical Failures: Log warning, continue (PR comments, cache)
- Critical Failures: Fail fast with clear error message (authentication, SARIF parse)
Error Recovery:
- Cache failures → Download fresh
- API failures (non-critical) → Continue (secrets still in summary)
- Large diff errors → Skip comment, log info
Confidentiality:
- Secrets never logged or exposed
- Tokens masked in error messages
- WASM sandboxing prevents unauthorized access
Integrity:
- Input validation prevents injection attacks
- Dependency pinning prevents supply chain attacks
- Cargo.lock ensures reproducible builds
Availability:
- Rate limit tracking prevents exhaustion
- Resource limits prevent denial of service
- Retry logic prevents transient failures
Modularity:
- Clear separation of concerns (event routing, SARIF parsing, etc.)
- Each module has single responsibility
- Minimal coupling between modules
Testability:
- Pure functions (no side effects) easy to test
- Mocking interfaces for external dependencies
- 70+ unit tests, 25+ integration tests
Documentation:
- Comprehensive pseudocode (12,674 lines)
- Inline code comments
- API documentation (rustdoc)
- User-facing README
Horizontal Scalability: N/A (each run is independent, no shared state)
Vertical Scalability:
- Memory usage scales with SARIF size (streaming parsing mitigates)
- GitHub API rate limits are primary constraint (1,000 req/hr)
- Large repositories → Longer gitleaks scan (not action overhead)
Platform Compatibility:
- Ubuntu 22.04, 24.04 (linux/x64)
- macOS 13, 14 (darwin/x64, darwin/arm64)
- Windows Server 2022 (windows/x64)
Runtime Compatibility:
- Node.js 20 (current)
- Node.js 24 (future, fall 2025)
Backward Compatibility:
- 100% compatible with gitleaks-action v2.x
- Same inputs, outputs, behavior
-
WASM Cannot Perform I/O:
- Limitation: No file system, network, or process access
- Mitigation: JavaScript layer handles all I/O
-
GitHub Actions Runtime:
- Constraint: Node.js 20/24 only
- Mitigation: Target
nodejsfor wasm-pack
-
GitHub API Rate Limits:
- Constraint: 1,000 requests/hour (workflow token)
- Mitigation: Minimize API calls, track rate limits, backoff
-
External Gitleaks Dependency:
- Constraint: Requires gitleaks binary for scanning
- Mitigation: Download and cache binary, allow version override
-
Backward Compatibility:
- Constraint: Must maintain v2.x compatibility
- Mitigation: Preserve all interfaces, quirks, and behaviors
-
Open Source License:
- Constraint: All dependencies must be permissively licensed
- Mitigation: Use MIT/Apache licenses, avoid GPL
-
License Validation Disabled:
- Constraint: Keygen service payment issues
- Mitigation: Feature toggle (disabled), retain code for future
-
No Persistent Storage:
- Constraint: GitHub Actions is stateless
- Mitigation: Stateless architecture, no database
The SecretScout architecture is a hybrid JavaScript + WASM design that leverages the strengths of both technologies:
- JavaScript handles I/O operations (file system, network, process spawning)
- WASM handles business logic (event routing, SARIF parsing, comment generation)
- Gitleaks binary handles secret detection (external dependency)
This architecture achieves:
- Functional Parity: 100% compatibility with gitleaks-action v2.x
- Performance: <2 second overhead, <500 KB binary size
- Security: WASM sandboxing, input validation, secrets management
- Reliability: Retry logic, graceful degradation, comprehensive error handling
- Maintainability: Modular design, extensive documentation, testability
The architecture is implementation-ready and provides a solid foundation for the Rust implementation phase.
Architecture Phase Status: ✅ COMPLETE
Document Version: 1.0 Date: October 16, 2025 Architect: Lead System Architect (Claude Code) Review Status: Ready for approval
Next Steps (not included in this phase):
- Review and approve architecture
- Proceed to implementation (Refinement phase)
- Set up Rust project structure based on architecture
- Implement components following architectural guidelines