Skip to content

hkevin01/java-to-python-test-suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Java to Python Test Suite

Verification-first test infrastructure for secure, dependency-aware Java to Python translation services.

License Last Commit Issues Python Pytest Security Tests

Executive Summary

This project uses a Python + pytest stack because it maximizes test expressiveness, async API coverage, and security-focused validation in one cohesive framework. The suite is intentionally built around comparison and traceability: each major requirement is represented in marker groups, assertion patterns, dependency ordering tests, and visual models.

Why This Stack, How It Is Used, and Benefits

Technology Why Used How Used in This Suite Benefit Over Alternatives
Python 3.11+ Fast iteration and excellent testing ecosystem Executes all test layers and fixture logic Lower friction than Java/JUnit for mixed async + security test authoring
pytest Marker-based structure and fixture system Separates unit/integration/correctness/negative/adversarial pipelines Better parametrization and fixture ergonomics than unittest
pytest-asyncio Native async compatibility Runs async endpoint tests without custom event-loop wrappers Cleaner than ad-hoc loop management
httpx + ASGITransport In-process API contract testing Calls API endpoints with dependency overrides and mock backends Faster and more deterministic than external server + requests
cryptography + PyJWT Realistic auth-path verification Generates RSA keys and signs test JWTs at runtime Stronger coverage than static token-only tests
javalang Java structure awareness in validation workflows Supports parser-oriented assertions in unit tests More reliable than regex-only Java parsing checks

Important

The suite verifies not only correctness, but also translation safety and dependency order requirements, including base-class-before-subclass guarantees through topological sorting tests.

Table of Contents

Overview

This repository is a dedicated test harness for a Java-to-Python translation service. It validates parser behavior, method/type fidelity, API contract integrity, authorization controls, guardrail enforcement, and adversarial resilience. It is designed for teams that need reproducible quality and security checks before releasing translation features.

Important

The suite assumes an external orchestrator source path and environment variables are available as configured in conftest.py.

Core value for this project:

  • Confirms required behavior with explicit assertions (not heuristic checks only).
  • Compares expected ordering and output properties against actual responses.
  • Detects failures in dependency ordering and cycle handling early.
  • Verifies that translation order favors reusable base components before dependents.

(back to top)

Requirements to Validation Mapping

Requirement Implementation Focus Evidence in Test Suite Outcome Verified
Parse Java artifacts safely Parser and class-info extraction paths tests/unit/test_java_parsing.py AST/data extraction is stable for normal and malformed inputs
Build dependency graph correctly Intra-project edge construction tests/unit/test_dependency_graph.py No self-loops, no JDK noise, valid class map
Sort translation order by dependency Topological ordering logic tests/unit/test_topological_sort.py Dependencies appear before dependent classes
Translate base classes before subclasses Ordering invariant in project translation plan tests/unit/test_topological_sort.py and tests/integration/test_project_translate_api.py Base abstractions precede concrete subclasses/services
Detect cycles without dropping files Cycle fallback behavior tests/adversarial/test_circular_dependencies.py and unit cycle tests had_cycle is true and all files remain represented
Block unsafe or manipulative input Input guardrails tests/adversarial/test_prompt_injection.py and tests/unit/test_guardrails.py Injection/secret patterns rejected before model path
Enforce RBAC and policy boundaries JWT + permission checks tests/negative/test_rbac_enforcement.py Unauthorized roles/actions are denied

(back to top)

Requirements Verification and Validation

This suite applies both verification and validation:

  • Verification asks: are we building the system right against explicit requirements?
  • Validation asks: are we building the right behavior for secure translation operations?

V&V Strategy Matrix

Requirement Area Verification Method Validation Method Pass Criteria Primary Evidence
Dependency graph correctness Unit assertions on graph edges and node invariants Integration checks of API dependency output No self-loops, no missing files, dependency-first order tests/unit/test_dependency_graph.py, tests/integration/test_project_translate_api.py
Topological ordering (base before subclass) Unit invariant checks for order index relationships Project-level translate API response order checks For every edge A depends on B, index(B) < index(A) tests/unit/test_topological_sort.py, tests/integration/test_project_translate_api.py
Cycle detection robustness Unit and adversarial cycle test scenarios End-to-end circular project request handling had_cycle true on cyclic input, all files retained in output tests/adversarial/test_circular_dependencies.py, tests/unit/test_topological_sort.py
Security guardrails Unit and adversarial pattern blocking tests API-level blocked request behavior checks Injection and credential patterns rejected before unsafe processing tests/unit/test_guardrails.py, tests/adversarial/test_prompt_injection.py
RBAC and auth correctness Negative role/permission tests Unauthorized API paths return denied responses Role permissions enforced with no privilege escalation tests/negative/test_rbac_enforcement.py, integration auth tests
Output structure fidelity Correctness tests over syntax/import/signatures Workflow-level usage consistency checks Outputs remain parseable and structurally aligned to expectations tests/correctness/*.py

Verification Pipeline

flowchart TD
  A[Requirements] --> B[Unit Verification]
  B --> C[Integration Verification]
  C --> D[Negative and Adversarial Verification]
  D --> E[Validation Against Runtime Behaviors]
  E --> F[Release Confidence Decision]
Loading

Validation Acceptance Gates

Gate Scope Command Pattern Minimum Acceptance
Gate 1 Core logic verification pytest -m unit -q All dependency/order/parser tests pass
Gate 2 API contract verification pytest -m integration -q Endpoint contract fields and ordering checks pass
Gate 3 Security validation pytest -m negative -q && pytest -m adversarial -q RBAC, injection, and egress/model policy checks pass
Gate 4 Output quality validation pytest -m correctness -q Output syntax/structure/import quality checks pass
Gate 5 Full-system confidence pytest -q No regressions across all marker groups

Traceability Notes

  • Requirement-to-test traceability is explicit through marker groups and targeted modules.
  • Visualization-to-requirement traceability is captured by architecture, object model, and dependency diagrams.
  • Algorithm-to-requirement traceability is captured by Kahn ordering assertions that enforce base-before-subclass translation.

Note

V&V is strongest when failures are triaged by marker group first, then by requirement area, so remediation stays requirement-focused rather than only test-focused.

(back to top)

Architecture

flowchart LR
  A[Fixture Corpus: Java Inputs] --> B[Pytest Marker Groups]
  B --> C[Unit Validations]
  B --> D[Integration Endpoint Contracts]
  B --> E[Negative Security and RBAC]
  B --> F[Adversarial Guardrail Tests]
  C --> G[Dependency Graph + Topological Sort Validation]
  D --> H[Translate and Translate-Project API Behavior]
  E --> H
  F --> H
  G --> I[Confidence in Ordering and Requirements]
  H --> I
Loading

Architecture intent:

  • Marker groups isolate concerns so each risk area is testable independently.
  • Unit tests validate deterministic algorithmic behavior (graph and order).
  • Integration tests confirm API contract fields like dependency_order and had_cycle.
  • Security suites ensure unsafe requests fail fast and auditable paths stay intact.

(back to top)

Object Model

classDiagram
  class FileEntry {
    +string filename
    +string source
    +class_info
    +set dependencies
    +int order
  }

  class ProjectTranslationPlan {
    +list ordered_files
    +map class_map
    +bool had_cycle
  }

  class JavaClassInfo {
    +string name
    +bool is_interface
    +bool is_abstract
    +set imports
    +set methods
  }

  ProjectTranslationPlan "1" --> "many" FileEntry : contains
  FileEntry --> JavaClassInfo : parsed_from
Loading

How this model helps:

  • Makes ordering state explicit (order, dependencies, had_cycle).
  • Supports comparison between parsed structure and output expectations.
  • Enables requirement-level assertions that are easy to reason about in tests.

(back to top)

Dependency Graph and Topological Sort

The translation planner builds a directed dependency graph where each node is a class/file and edges represent prerequisite relationships (for example, subclass depends on base class).

flowchart TD
  A[AbstractProcessor] --> B[PaymentProcessor]
  C[Order] --> D[OrderService]
  E[IRepository] --> F[OrderRepository]
  C --> F
Loading

The expected translation order is dependency-first:

  1. Base abstractions and interfaces.
  2. Core domain models.
  3. Concrete implementations and services.

This is why tests verify examples such as Order before OrderService and AbstractProcessor before PaymentProcessor.

Dependency ordering checkpoints used by the suite
Ordering Check Why It Matters Test Evidence
Order before OrderService Service methods require model definitions first Unit and integration ordering assertions
AbstractProcessor before PaymentProcessor Subclass translation needs base contract context Unit topological ordering assertions
IRepository before OrderRepository Interface constraints should be available before implementation Unit topological ordering assertions
Cycle path still returns all files Production robustness under imperfect source graphs Circular dependency adversarial/unit tests

(back to top)

Why Kahn's Algorithm Matters Here

What is Kahn's Algorithm? (Layman's Explanation)

Imagine you have a to-do list with dependencies:

  • Task A: "Learn Python" (must do first)
  • Task B: "Build a web app" (depends on Task A - you need Python knowledge)
  • Task C: "Deploy the app" (depends on Task B - you need a working app to deploy)

You can't do Task B until Task A is done. You can't do Task C until Task B is done. Kahn's algorithm automatically figures out the correct order to do tasks when there are many interdependencies.

In our case, we have Java classes instead of tasks:

  • Order.java (no dependencies - do first)
  • OrderService.java (depends on Order)
  • OrderRepository.java (depends on both Order and OrderService)

Kahn's algorithm ensures Order.java is translated to Python before OrderService.java, which is translated before OrderRepository.java.

How Kahn's Algorithm Works (Step by Step)

Step 1: Count Prerequisites (In-Degree) For each class, count how many other classes it needs:

Order:             0 dependencies (no prerequisites)
OrderService:      1 dependency (depends on Order)
OrderRepository:   2 dependencies (depends on Order and OrderService)

Step 2: Find Classes with Zero Prerequisites Start with classes that don't depend on anything:

Queue = [Order]  (has 0 dependencies)

Step 3: Process One Class at a Time

  • Take Order from the queue
  • Tell all classes that depend on Order: "Order is done!"
  • OrderService loses one dependency (Order is now satisfied)
  • OrderRepository loses one dependency (Order is now satisfied)
  • Check if any class now has zero dependencies:
    • OrderService: 1 - 1 = 0 dependencies left β†’ Add to queue!
Processed = [Order]
Queue = [OrderService]

Step 4: Repeat

  • Take OrderService from the queue
  • Tell OrderRepository: "OrderService is done!"
  • OrderRepository: 2 - 1 = 1 dependency left (still needs Order, but it's already done)
    • Actually, Order was already processed, so OrderRepository should have 1 left
    • But both its dependencies (Order, OrderService) are done β†’ Add to queue!
Processed = [Order, OrderService]
Queue = [OrderRepository]
  • Take OrderRepository from the queue
  • No classes depend on it
Processed = [Order, OrderService, OrderRepository]
Queue = []  (empty - we're done!)

Step 5: Detect Cycles If some classes remain with unmet dependencies after processing everything, there's a circular dependency (cycle):

  • A depends on B
  • B depends on C
  • C depends on A (creates a circle!)

These classes can't be properly ordered, but the algorithm includes them anyway so you're aware of the problem.

Algorithm Pseudo-Code

function KahnSort(graph):
    // Count how many dependencies each node has
    for each node in graph:
        in_degree[node] = count of nodes it depends on
    
    // Find who depends on whom (reverse lookup)
    for each edge (A depends on B):
        dependents[B].add(A)
    
    // Start with nodes that have no dependencies
    queue = [all nodes where in_degree = 0]
    result = []
    
    // Process nodes in order
    while queue is not empty:
        current = queue.pop()
        result.add(current)
        
        // For each node that depends on current:
        for each dependent in dependents[current]:
            dependent.in_degree -= 1
            if dependent.in_degree = 0:
                queue.add(dependent)
    
    // Check for cycles
    if result.size < graph.size:
        had_cycle = TRUE
        // Add remaining nodes (they're in a cycle)
        result.add(remaining nodes)
    
    return (result, had_cycle)

Real-World Code Example from This Project

When we have Java files:

// Order.java
public class Order { ... }

// OrderService.java
public class OrderService {
    private Order order;  // depends on Order!
    ...
}

// OrderRepository.java
public interface OrderRepository {
    Order findById(String id);  // depends on Order!
}

Kahn's algorithm outputs: [Order, OrderService, OrderRepository]

This guarantees:

  • Order is translated first
  • OrderService can reference Order class (exists in Python)
  • OrderRepository can reference Order class (exists in Python)

Why Not Just Random Order?

If we translated OrderService before Order:

class OrderService:
    def __init__(self):
        self.order: Order  # ERROR! Order not defined yet!

This fails because Order doesn't exist yet. Kahn's algorithm prevents this.

High-level behavior (The original formulation):

  1. Compute in-degree for each node.
  2. Start with nodes that have in-degree 0 (no unmet dependencies).
  3. Remove processed nodes and decrement neighbors' in-degree.
  4. Continue until all nodes are processed.
  5. If nodes remain with non-zero in-degree, a cycle exists.

In this test suite, that behavior directly supports translation correctness:

  • Guarantees dependency-first ordering for base classes and shared contracts.
  • Prevents subclass-first generation that can create invalid imports/signatures.
  • Detects cycles early while still preserving a complete output list for diagnostics.
sequenceDiagram
  participant Graph as Dependency Graph
  participant Kahn as Kahn Sort (In-Degree)
  participant Planner as Translation Planner
  Graph->>Kahn: Nodes + dependency edges
  Kahn->>Kahn: 1. Compute in-degree (# dependencies per node)
  Kahn->>Kahn: 2. Find nodes with in-degree = 0
  Kahn->>Kahn: 3. Process each in order, decrement neighbors
  Kahn->>Kahn: 4. Continue until queue empty
  Kahn->>Kahn: 5. Check if nodes remain (cycle detection)
  Kahn-->>Planner: dependency_order list
  Kahn-->>Planner: had_cycle flag
  Planner-->>Planner: translate base classes before subclasses
Loading

Tip

Kahn's approach is deterministic and testable: each assertion can verify that every dependency index is lower than its dependent index. The algorithm guarantees: if class B must be translated before class A (A depends on B), then index(B) < index(A) in the output list.

(back to top)

Visualization as a Verification Tool

Visualizations in this README are not decorative. They reduce ambiguity when comparing implemented function behavior against requirements.

Visualization Confirms Comparison Benefit
Architecture flowchart End-to-end validation pipeline Quickly spots missing validation layers
Object model diagram Data structures and relationships Confirms required fields exist for assertions
Dependency graph diagram Expected dependency direction Makes ordering mistakes obvious during review
Kahn sequence diagram Algorithm steps and outputs Aligns function behavior with requirement statements

How this helps requirement comparison:

  • Requirement text says dependency-first translation.
  • Graph + sequence diagrams show exactly how dependency-first behavior is enforced.
  • Unit tests then compare actual order indices to required invariants.
  • Integration tests compare API dependency_order to expected file precedence.

(back to top)

Technology Stack Decision Matrix

Stack Part Chosen Option Alternative Why Chosen for This Project Practical Benefit
Test framework pytest unittest Marker groups and fixture composition scale better for layered suites Faster targeted runs and cleaner test organization
Async testing pytest-asyncio custom loop management Native async test support without boilerplate Lower maintenance and fewer flaky async tests
API client httpx + ASGITransport requests + live server In-process execution keeps integration tests deterministic Better speed and less CI networking variability
Auth validation cryptography + PyJWT static token strings Runtime key/signature generation tests real verification paths Higher confidence in RBAC behavior
Java structure parsing javalang regex parsing Structural parsing avoids brittle text matching More robust dependency and class extraction checks
Technology usage map by test concern
Test Concern Main Technology Role
Parser and graph correctness pytest + javalang Validates class extraction and dependency edges
Endpoint behavior pytest-asyncio + httpx Exercises translate endpoints and payload contracts
RBAC and token handling cryptography + PyJWT Generates realistic signed JWTs for role checks
Guardrails and adversarial handling pytest markers + fixtures Enforces injection/secret blocking expectations

(back to top)

Testing & Quality Assurance Tool Integration Matrix

This test suite can be enhanced through integration with specialized testing, analysis, and verification tools. Below are recommended integrations organized by capability:

Static Code Analysis Tools (Top-to-Bottom Requirements Verification)

Tool Purpose Integration Point Validates Python Support Cost Model
Klocwork (Perforce) SAST - Security, quality, reliability Pre-commit hooks, CI/CD pipeline Security vulnerabilities, code defects, reliability issues βœ… Yes Enterprise/Commercial
SonarQube Code quality & maintainability Post-test analysis, quality gates Code quality, technical debt, duplication, test coverage βœ… Yes Open-source/Commercial
Checkmarx (SAST) Enterprise security scanning Pipeline integration, compliance Deep vulnerability analysis, compliance standards, OWASP βœ… Yes Enterprise/Commercial
Coverity (Synopsys) Deep static analysis Build integration, incremental analysis Memory/security issues, race conditions βœ… Yes Enterprise/Commercial
Bandit Python security scanning Pre-commit, CI integration Python security issues, hardcoding secrets βœ… Yes (Python-specific) Open-source
ESLint/Pylint Linting & style Git hooks, pre-flight checks Code style, suspicious patterns, imports βœ… Yes (Pylint) Open-source

Why multiple tools? Each excels in different domains:

  • Klocwork for security-first orgs needing compliance-grade SAST
  • SonarQube for quality gates and technical debt tracking
  • Checkmarx when regulatory/enterprise security is primary
  • Bandit/Pylint for lightweight pre-commit gating

Test Execution & Measurement Tools

Tool Purpose Integration Point Metrics Collected Use Case Cost
pytest (current) Unit/integration test framework Direct test runner Pass/fail, execution time Core test execution Open-source
pytest-cov Code coverage measurement Coverage plugin, post-test Line/branch coverage % Verify guardrails touch all code paths Open-source
Codecov Coverage tracking & trending CI upload, GitHub integration Coverage trends, PR diffs Long-term quality visibility Free/Pro
Datadog Continuous testing & monitoring API instrumentation Test performance, flakiness Detect regression patterns Commercial
LoadRunner Performance and load testing Scheduled pipeline stage, release gate Response times, throughput, error rate, SLA compliance Validate API under expected translation volume Commercial

Recommended first addition: pytest-cov to verify that guardrail code paths (input_guard, output_guard, provider_lock) are fully exercised.

Mutation Testing (Test Quality Verification)

Tool Purpose How It Works Value for This Suite Python Support
Stryker Mutation testing framework Modifies code, reruns tests Verifies tests catch real bugs βœ… Yes
PIT Bytecode mutation (Java/JVM) Mutates compiled bytecode Validates our test harness quality βœ… (via JVM)

Application to this suite: Run mutation tests on guardrails code (input_guard, output_guard, provider_lock) to ensure rejection logic is properly tested.

Dependency & Supply Chain Security

Tool Purpose Scans Integration Python Support
Snyk Dependency vulnerability scanning requirements.txt, package manifests Pre-commit, PR checks, CI βœ… Yes
OWASP Dependency-Check Known vulnerability database Dependencies, transitive CLI, Maven/Gradle, CI βœ… Yes
Black Duck (Synopsys) License/composition analysis Codebases, dependencies CI pipeline, compliance βœ… Yes
pip-audit Python package auditing pip requirements GitHub Actions, pre-commit βœ… Yes (Python-specific)

Why this matters: fastapi, pytest, javalang, and cryptography dependencies must remain secure. Snyk + pip-audit provide light/fast scanning; Black Duck for enterprise compliance.

Requirements Verification & Traceability Tools

Tool Function Integration Traceability Compliance
Azure DevOps Test Plans Requirements↔Tests mapping Work items, test suites Bi-directional links CMMI/ISO ready
Jira Xray Test management within Jira Issues, test runs, coverage Requirement→Test→Result Regulatory (FDA, etc.)
TestRail Standalone test management API, CI integration Test case traceability SOC 2, HIPAA compatible
ReqIF Editor Requirements interchange format File-based traceability Spec→Design→Test Automotive (ASIL) standard

Current project: README.md serves as living requirements. For regulated environments, migrate to one of above tools to create formal traceability matrix.

DevOps & CI/CD Integration Points

Pipeline Stage Tool Category Recommended Tool What It Checks
Pre-commit Linting + Security Bandit, Pylint, Pre-commit hooks Fast rejection of obvious issues
Build Static Analysis Klocwork, SonarQube scanner Deep security & quality analysis
Test Execution + Coverage pytest + pytest-cov Functional correctness, coverage %
Mutation Test Quality Stryker or PIT Are tests strong enough?
Dependency Scan Supply Chain Snyk + pip-audit Known vulnerabilities in deps
Compliance Reporting SonarQube/Checkmarx dashboards Meet quality gates, audit trail

Top-to-Bottom Requirements Verification Example Flow

graph TD
    A[Requirements<br/>README.md] -->|Defined as test markers| B[Test Suite<br/>387 tests]
    B -->|Run on every commit| C[pytest<br/>Unit/Int/Correctness]
    C -->|Coverage tracked| D[pytest-cov<br/>Code coverage %]
    D -->|Trending| E[Codecov<br/>Historical view]
    C -->|Mutation test| F[Stryker<br/>Test quality]
    F -->|Validates| G{Tests strong<br/>enough?}
    G -->|Yes| H[SonarQube<br/>Quality gates]
    C -->|Security scan| I[Klocwork/Checkmarx<br/>Vulnerability detection]
    I -->|Verify| J[Zero high-risk<br/>findings]
    K[requirements.txt] -->|Supply chain scan| L[Snyk/pip-audit<br/>Dependency check]
    H -->|Release gate| M[Deploy<br/>with confidence]
    J -->|Security approval| M
    L -->|No vulns found| M
    style A fill:#e1f5ff
    style M fill:#c8e6c9
Loading

This flow ensures:

  1. Requirements are explicit (README)
  2. Tests verify requirements (pytest suite)
  3. Tests are strong (mutation testing)
  4. Code is secure (static analysis + SAST)
  5. Dependencies are safe (supply chain scanning)
  6. Quality gates passed (SonarQube)

(back to top)

Integration Implementation Patterns

1. Code Coverage with pytest-cov

Add coverage measurement to verify all guardrail code is exercised:

# Run tests with coverage
pytest --cov=guardrails --cov=core --cov-report=html --cov-report=term

# Verify minimum coverage threshold
pytest --cov=guardrails --cov-fail-under=90

In CI/CD (GitHub Actions example):

- name: Run tests with coverage
  run: pytest --cov=guardrails --cov=core --cov-report=xml

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v3
  with:
    files: ./coverage.xml

Why this matters: Guardrails (input_guard.py, output_guard.py) must have zero uncovered branches to ensure all security checks are tested.

2. Security Scanning with Bandit (Lightweight Pre-commit)

Add Python security scanning before commit:

# Install Bandit
pip install bandit

# Scan project
bandit -r guardrails/ core/ api/ tools/ -f json -o bandit-report.json

# Fail on medium+ severity
bandit -r . -ll  # -ll = medium level and above

Pre-commit hook (.pre-commit-config.yaml):

- repo: https://github.com/PyCQA/bandit
  rev: 1.7.5
  hooks:
    - id: bandit
      args: ['-ll']  # Medium severity minimum
      exclude: tests/

Focus areas: Detect hardcoded secrets, SQL injection patterns, insecure random usage in guardrails and auth modules.

3. Dependency Vulnerability Scanning

Quick setup with pip-audit (Python-specific):

# Install pip-audit
pip install pip-audit

# Check dependencies
pip-audit --desc  # Show vulnerability descriptions

# In CI, fail on high-severity
pip-audit --fail-on high

GitHub Actions integration:

- name: Check dependencies for vulnerabilities
  run: pip-audit --fail-on high

Critical dependencies to monitor:

  • fastapi (API framework)
  • cryptography (JWT/RBAC)
  • javalang (Java parsing)
  • pydantic (data validation)

4. Static Code Quality with SonarQube (Optional, Enterprise)

For organizations with SonarQube instance:

# Install SonarScanner
pip install sonarscan

# Run analysis (requires sonar.projectKey, sonar.host.url, sonar.login)
sonar-scanner \
  -Dsonar.projectKey=java-to-python \
  -Dsonar.host.url=https://sonarqube.company.com \
  -Dsonar.login=$SONAR_TOKEN

Quality gate conditions:

  • Coverage > 80%
  • Duplicated lines < 5%
  • Code smells < 10
  • No critical issues

5. Mutation Testing with Stryker (Test Validation)

Verify that tests catch real bugs by mutating code:

# Install Stryker for Python
pip install mutmut

# Run mutation tests on guardrails
mutmut run --paths-to-mutate=guardrails

# Generate HTML report
mutmut html

Example: Test that input_guard.py rejection logic is properly tested:

mutmut run --paths-to-mutate=guardrails/input_guard.py \
  --tests-dir=tests/adversarial

Success criteria: > 80% mutation score (tests kill > 80% of mutants)

6. Compliance Reporting & Traceability (Regulated Environments)

For organizations requiring formal verification:

Current state (README-based):

README.md
β”œβ”€β”€ Requirements section
β”œβ”€β”€ Test suite breakdown
β”œβ”€β”€ Unit/Integration/Correctness/Negative/Adversarial breakdown
└── Maps to test files

Migrate to (TestRail example):

  1. Create test plan in TestRail
  2. Link each test case to requirement ID
  3. Run tests via API
  4. Auto-generate compliance report
# Example: Link test to requirement
# TestRail API: Create test case run with requirement traceability
POST /api/v2/add_result_for_case/1/123
{
    "status_id": 1,  # passed
    "comment": "Verifies Req-002: Dependency ordering",
    "custom_requirement_id": "REQ-002"
}

7. LoadRunner Performance Integration

LoadRunner fits this project as the dedicated non-functional gate for the FastAPI endpoints:

Endpoint Suggested LoadRunner Transaction Default SLA Primary Assertion Current Project Hook
/api/v1/translate translate 250 ms Median and p95 stay within SLA Audit log writes loadrunner transaction summary
/api/v1/translate-project translate_project 500 ms Multi-file requests stay below release threshold Audit log writes per-request performance budget status
/api/v1/translate-requirements translate_requirements 250 ms Requirements scaffolding stays responsive Audit log writes Six Sigma-style CTQ metrics

This repository now exposes LoadRunner-friendly transaction metadata in audit records:

{
  "action": "translate",
  "latency_ms": 83.2,
  "performance_budget_ms": 250,
  "performance_status": "within_control",
  "loadrunner": {
    "transaction": "translate",
    "response_time_ms": 83.2,
    "sla_ms": 250,
    "passed": true
  }
}

That makes it straightforward to compare internal audit data with external LoadRunner runs and to use the same transaction names in performance dashboards.

7.1 Release Dashboard Endpoint

The service now includes a small read-only release dashboard endpoint at /api/v1/audit-report.

It aggregates the JSONL audit log into a single release-oriented summary:

Dashboard Section Aggregates Why It Matters For Release Decisions
summary Total requests, ok requests, blocked requests, unique actions Quick go/no-go snapshot
actions Per-endpoint request count, average latency, p95 latency, LoadRunner pass rate Shows which endpoint is drifting
performance Global average latency, p95 latency, performance status counts Highlights SLA breaches and warning trends
quality CTQ pass rates, average DPMO, sigma-band counts, control-state counts Converts raw audit events into process-quality signals

Example usage:

curl -H "Authorization: Bearer <token>" http://localhost:8000/api/v1/audit-report

Example response shape:

{
  "summary": {
    "total_requests": 24,
    "ok_requests": 21,
    "blocked_requests": 3,
    "unique_actions": 3
  },
  "actions": {
    "translate": {
      "requests": 12,
      "avg_latency_ms": 85.4,
      "p95_latency_ms": 140.2,
      "loadrunner_pass_rate": 1.0
    }
  },
  "performance": {
    "avg_latency_ms": 91.7,
    "p95_latency_ms": 151.6,
    "performance_status_counts": {
      "within_control": 22,
      "warning": 1,
      "breach": 1
    },
    "loadrunner_pass_rate": 0.958
  },
  "quality": {
    "ctq_metrics": {
      "reliability": {
        "pass_count": 23,
        "total": 24,
        "pass_rate": 0.958
      }
    },
    "avg_dpmo": 13888.889,
    "sigma_band_counts": {
      "good": 20,
      "watch": 4
    },
    "control_state_counts": {
      "in_control": 21,
      "watch": 2,
      "out_of_control": 1
    }
  }
}

8. CI/CD Pipeline with All Tools (Complete Setup)

Recommended GitHub Actions workflow:

name: End-to-End Quality & Security

on: [push, pull_request]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      # Linting & style
      - name: Lint with Pylint
        run: |
          pip install pylint
          pylint guardrails/ core/ api/ tools/ --fail-under=9.0
      
      # Security scanning
      - name: Bandit security scan
        run: |
          pip install bandit
          bandit -r . -ll --exclude tests/
      
      # Dependency audit
      - name: Check dependencies
        run: |
          pip install pip-audit
          pip-audit --fail-on high
      
      # Test execution
      - name: Run tests
        run: pytest --cov=guardrails --cov=core --cov-report=xml

      # Performance regression gate
      - name: Run LoadRunner suite
        if: env.LOADRUNNER_SCENARIO_ID != ''
        run: |
          echo "Trigger LoadRunner scenario $LOADRUNNER_SCENARIO_ID against /api/v1 endpoints"
      
      # Coverage upload
      - name: Upload coverage
        uses: codecov/codecov-action@v3
      
      # Mutation testing (optional, slower)
      - name: Mutation test guardrails
        run: |
          pip install mutmut
          mutmut run --paths-to-mutate=guardrails --tests-dir=tests
      
      # Quality gate (SonarQube)
      - name: SonarQube analysis
        if: env.SONAR_HOST_URL != ''
        run: |
          pip install sonarscan
          sonar-scanner -Dsonar.host.url=${{ secrets.SONAR_HOST_URL }} \
                       -Dsonar.login=${{ secrets.SONAR_TOKEN }}

Testing Algorithm Matrix

Algorithm / Technique What It Does Where It Appears In This Project Why It Improves Confidence
Topological sorting (Kahn) Orders dependent nodes safely tools/project_translator.py, tests/unit/test_topological_sort.py Prevents subclass-before-base translation defects
Boundary value analysis Hits min/max and edge inputs tests/adversarial/test_boundary_conditions.py Finds off-by-one and empty-input failures quickly
Equivalence partitioning Tests one representative per input class Guardrail and malformed-input tests Keeps coverage broad without exploding test count
Decision-table testing Covers combinations of conditions and outcomes RBAC and forbidden-pattern tests Ensures policy combinations do not create gaps
State-transition testing Verifies behavior across state changes Audit trail blocked/allowed request scenarios Confirms system reacts correctly as request status changes
Cycle detection Detects unsortable dependency graphs tests/adversarial/test_circular_dependencies.py Verifies graceful degradation on invalid project graphs
Mutation testing Injects fake bugs to measure test strength Documented via mutmut / Stryker integration path Confirms tests fail when logic is wrong
Load testing Measures latency and throughput under concurrency LoadRunner integration and audit metrics Protects release readiness under realistic traffic
Risk-based prioritization Focuses effort on highest-risk paths Negative, adversarial, and auth tests Keeps security-critical paths heavily defended
Pairwise / combinatorial sampling Reduces huge input combinations to meaningful pairs Recommended next step for API option matrices Expands coverage efficiently for future input flags

Six Sigma and Process Quality Matrix

Six Sigma Idea Meaning In Plain Terms Project Implementation Evidence / Metric
CTQ (Critical to Quality) The small set of outcomes that must go right Audit records now track latency, reliability, safety, traceability ctq_metrics in audit log
DMAIC Define, Measure, Analyze, Improve, Control loop README traceability + tests + audit metrics + quality gates Requirements tables, tests, and audit trail
DPMO Defects per million opportunities Quality snapshot computes DPMO per request six_sigma.dpmo in audit log
Control state Is the process stable or drifting? Requests classified as in_control, watch, or out_of_control six_sigma.control_state
Performance control limits Expected latency window before escalation Per-endpoint SLA budgets in env and audit metrics performance_budget_ms, performance_status
FMEA mindset Rank likely failures before release Negative/adversarial suites focus on auth, injection, model lock, egress Security-focused test groups
Voice of customer / CTQ translation Convert user needs into measurable gates README requirement tables map behavior to tests and tooling Traceability matrices
Continuous improvement Use data from each run to tighten the process Audit + coverage + static analysis + performance gates CI pipeline and audit summaries

(back to top)

Scientific and Computer Science Algorithm Catalog

This section catalogs established computer science and mathematical algorithms that apply directly to the Java-to-Python translation pipeline, audit trail, guardrails, and quality metrics implemented in this project. Each algorithm is linked to the project area it improves.

Graph Theory and Dependency Analysis Algorithms

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
Kahn's (implemented) In-degree based topological sort for DAGs Build order resolution, dependency scheduling Deterministic ordering and clear cycle detection when no zero in-degree node remains Already used to order Java classes before translation so base classes are processed before dependents Not for weighted path problems or graphs that are not DAG-like
Tarjan's SCC One-pass DFS algorithm that finds all strongly connected components Cycle grouping in directed graphs, compilers, package analyzers Linear-time cycle group discovery and reverse-topological SCC output Can report all dependency cycles at once with grouped diagnostics for project translation failures Not needed for tiny graphs where simple cycle-exists checks are enough
Kosaraju's SCC Two-pass DFS SCC algorithm over graph and reversed graph SCC extraction when implementation simplicity is preferred Easy to reason about and verify for correctness Alternate SCC implementation for cross-validating cycle group results from Tarjan Less ideal when memory access to reverse graph is costly or graph is streaming
DFS/BFS Fundamental graph traversals for depth or level exploration Reachability, component discovery, shortest unweighted paths (BFS) Foundational and fast, useful in almost every graph pipeline DFS supports dependency walk and cycle heuristics; BFS can identify translation batches by level Not enough alone when you need weighted optimization, SCC grouping, or formal ordering guarantees
Dijkstra Shortest-path algorithm for non-negative weighted graphs Routing, minimum cost path, critical path scoring Finds best path under weighted constraints efficiently Can prioritize translation sequence by cost/risk weights (complexity, blast radius, module criticality) Not for negative edge weights, where Bellman-Ford style methods are required
Floyd-Warshall Dynamic programming for all-pairs shortest paths Dense graph all-pairs analysis, transitive reachability Gives full matrix visibility into every pair relationship Useful for full dependency impact maps and change blast-radius analysis Avoid on large sparse graphs due to cubic cost
Union-Find Disjoint-set structure with union/find operations Connectivity checks, incremental grouping, Kruskal-like workflows Very fast near constant-time merges and membership checks Can speed incremental dependency ingestion and fast connectivity sanity checks before deeper analysis Not suitable for directed SCC semantics or ordered traversal outputs

Code Analysis and Transformation

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
AST Traversal (implemented) Tree walk over parsed syntax nodes Compilers, linters, refactoring, static analyzers Preserves structural meaning better than regex parsing Already powers Java structure extraction for classes/imports/method signatures Not for runtime behavior reasoning without control/data flow context
Tree Edit Distance (Zhang-Shasha) Minimum edit cost between two trees AST diffing, clone analysis, migration similarity checks Captures structural differences not visible in plain text diff Can score Java vs translated Python AST fidelity for stronger parity evidence Avoid for very large trees in hot paths due to higher compute cost
CFG Graph model of possible execution paths in a function/method Dead code detection, path analysis, coverage planning Exposes branch structure and reachability explicitly Can verify translated Python keeps equivalent branch reachability vs Java Not needed for simple straight-line code with no branching
Data-Flow Analysis Tracks definitions, uses, and propagation of values/types Compiler optimization, bug finding, security checks Detects misuse and propagation mistakes early Can validate Java type/variable semantics survive mapping into Python Avoid when analysis precision cost exceeds value for trivial modules
Program Slicing Extracts statements relevant to a variable/output criterion Debugging, comprehension, targeted verification Reduces analysis scope and noise Isolates only code affecting a translated output to speed parity root-cause analysis Not ideal when holistic system interactions are the real issue
Taint Analysis (implemented conceptually) Marks untrusted input and tracks flow to sensitive sinks Security validation, injection prevention Directly maps to security risk pathways Supports guardrail hardening by tracing untrusted request data through translation pipeline Not useful when all inputs are already trusted and isolated
Hindley-Milner Type Inference Unification-based static type inference Functional languages, inferred typing systems Improves correctness with less manual annotation Could auto-suggest Python type hints from Java source semantics Not a fit where dynamic/runtime types dominate behavior
Abstract Interpretation Sound approximation of program states over abstract domains Static verification and bug class elimination Can prove classes of errors without executing code Can add formal assurance on translated output safety properties Avoid where exact concrete behavior is mandatory and approximation is too coarse

Pattern Matching and Security

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
Aho-Corasick Trie + failure-link automaton for multi-pattern search IDS signatures, malware scanning, keyword dictionaries Finds all patterns in one pass efficiently Can replace sequential guardrail regex checks with one multi-pattern scanner for injection/secrets Not ideal for complex contextual patterns better handled by full parsers or regex engines
Rabin-Karp Rolling-hash string matching approach Plagiarism/clone detection, multiple substring checks Fast average matching and convenient window hashing Can detect repeated risky snippets or clone patterns across translated outputs Avoid when hash collision handling overhead or exact single-pattern speed is critical
Boyer-Moore Heuristic skip-based exact pattern matcher Fast exact search in large text Often sublinear average performance for single pattern Useful for fast scanning of one high-priority forbidden token/signature Not for many patterns at once; Aho-Corasick is better there
Bloom Filter Probabilistic membership structure with false positives only Caching, prefiltering, dedupe prechecks Very memory-efficient and fast precheck stage Can fast-reject obviously safe payloads before expensive deep scans Not for workflows requiring zero false positives and exact membership
Levenshtein Distance Edit-distance metric between strings Fuzzy matching, near-duplicate detection, typo tolerance Quantifies similarity robustly Can score translation drift and flag suspiciously divergent output from expected behavior/text Avoid for strict semantic equivalence judgments without structural context

Formal Verification and Correctness

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
Model Checking Exhaustive state-space verification against temporal properties Protocol verification, safety-critical policy checks Finds counterexamples rigorously Can prove RBAC and policy-lock invariants over request state transitions Avoid for very large unconstrained state spaces without abstraction
Symbolic Execution Executes paths with symbolic values and constraints Path discovery, bug finding, test generation Reaches edge paths hard to hit with manual tests Can generate adversarial API vectors to stress translation and guardrails Not ideal when path explosion makes runtime impractical
Concolic Testing Concrete execution guided by symbolic constraints Automated test input generation Practical compromise between full symbolic and random testing Can expand coverage for translation endpoints with targeted boundary/path inputs Avoid when harness constraints are too expensive to maintain
Hoare Logic Pre/postcondition proof framework for program correctness Formal specs and proof-oriented correctness Sharp contractual reasoning around invariants Can specify and verify required behavior for dependency ordering and policy checks Not needed where lightweight testing already provides enough assurance
Property-Based Testing Randomized input generation checked against invariants Invariant testing and edge-case exploration Finds surprising cases that example-based tests miss Can stress graph ordering and parity invariants over large random input spaces Avoid when properties are weakly defined or nondeterministic outputs are expected

Software Metrics

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
McCabe Cyclomatic Complexity Branch/path complexity metric from control flow Test planning and maintainability risk scoring Correlates complexity with defect and testing effort Can drive risk-based test intensity on translated functions/classes Not as a sole quality signal without context
Halstead Metrics Operator/operand based software volume and effort metrics Productivity and maintainability analysis Gives a language-agnostic complexity lens Can compare source vs translated code inflation and detect complexity bloat Avoid as hard pass/fail gates in isolation
Maintainability Index Composite maintainability score from complexity/volume/LOC Portfolio-level code health tracking Easy high-level signal for triage Can prioritize translated files for manual review when score degrades Not reliable for very small files or generated code alone
Fan-In/Fan-Out Counts inbound and outbound dependency edges Architecture coupling analysis Highlights hotspots and blast-radius risk Can prioritize high fan-in classes for stricter parity and regression checks Not needed for tiny low-coupling modules

Audit and Statistical Process Control

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
Shewhart Control Charts (implemented baseline) Control limits over time-series process metrics Manufacturing and ops stability monitoring Fast detection of obvious out-of-control behavior Already aligns to audit control-state tracking for latency/quality drift Less sensitive to small gradual drifts
CUSUM Cumulative drift detector versus target mean Early shift detection in process monitoring Detects subtle persistent changes earlier than Shewhart Can alert on slow latency degradation before SLA breach Not for highly non-stationary streams without segmentation
EWMA Exponentially weighted moving average trend estimator Smoothed monitoring and anomaly trend tracking Balances noise reduction with responsiveness Can provide cleaner quality/latency trendlines in audit dashboards Avoid if abrupt shifts are the only concern and lag is unacceptable
Z-Score Anomaly Detection Standard deviation based outlier scoring Basic anomaly and quality outlier flags Simple, interpretable, low implementation cost Can flag suspicious request records for investigation in near real-time Not for heavy-tailed or non-Gaussian distributions without robust variants
Isolation Forest Tree-ensemble unsupervised anomaly detector Fraud, operations anomalies, multivariate outliers Captures nonlinear multivariate anomalies well Can detect odd combinations of role, latency, block-rate, and payload characteristics Avoid for tiny datasets where model instability is high
Bayesian Inference Posterior probability updating with evidence Risk forecasting, decision support under uncertainty Integrates prior knowledge and new evidence rigorously Can estimate release risk from test outcomes plus historical defects Not needed when deterministic thresholds are sufficient
Fisher's Exact Test Exact significance test for contingency tables Small sample proportion comparisons Reliable p-values for low-count events Can test whether blocked-request spikes are statistically significant Avoid for large-sample cases where simpler approximations are fine

Test Coverage and Combinatorial

Algorithm What It Is Most Common Use Why It Should Be Used How It Helps This Project When Not To Use
IPOG Covering-array generator for t-way combinations Combinatorial API/config test design Large coverage gains with far fewer cases than full Cartesian products Can systematically cover role x endpoint x payload combinations with manageable test counts Not necessary for very small parameter spaces
MC/DC Coverage Criterion requiring each condition independently affect outcome Safety-critical software verification Strong decision-logic assurance with efficient test sets Can harden guardrail and RBAC condition logic validation Avoid as universal requirement for low-risk modules due to overhead
Coverage-Guided Fuzzing Mutation fuzzing guided by code coverage feedback Security hardening and crash discovery Efficiently discovers deep parser/validation edge cases Can stress translation endpoints with malformed/adversarial Java inputs Not ideal where deterministic reproducibility and strict runtime budgets dominate
N-version/Differential Testing Compare outputs across independent implementations Compiler/runtime verification and migration confidence Great at finding semantic mismatches Can compare legacy Java oracle against translated Python outputs continuously Not useful if all compared implementations share same defect source

(back to top)

Legacy Java-to-Python Function Parity (Proof Tests)

The suite now includes proof-style parity tests that run the same function behavior in both legacy Java and translated Python and assert identical outputs for shared input vectors.

What Is A Vector, Vectoring, And A Vector Runner?

In this repository, a vector means one structured test case: input values plus the expected output.

Example vector concept:

  • Input: base=5, multiplier=10, premium=true
  • Expected output: 75

That single row is one vector. A vector file is a list of many such rows (normal, edge, and negative scenarios).

Vectoring is the testing approach where both runtimes (legacy Java and translated Python) are driven from that same shared vector dataset instead of hardcoded test values in multiple places.

Why vectoring is useful:

  • Single source of truth for migration parity expectations
  • Less duplicated test data across languages
  • Easier reviews and audits of behavioral requirements
  • Faster updates when business rules change

Vector Runner in this project:

  • LegacyCalculatorVectorRunner.java reads the shared JSON vectors
  • Executes the legacy Java function for each vector
  • Emits per-case output (id, actual, expected) for parity checks

This is how we prove output equivalence:

  1. Define vectors in shared JSON/CSV fixture files
  2. Run legacy Java against those vectors
  3. Run translated Python against those same vectors
  4. Assert Java output equals Python output for each vector id

This pattern gives an explicit migration proof: same inputs, same outputs, across runtimes.

Proof Test What It Verifies Location
Java fixture expected-value test Legacy Java behavior is stable and explicit tests/correctness/test_legacy_java_python_equivalence.py
Python fixture expected-value test Translated Python behavior matches intended outputs tests/correctness/test_legacy_java_python_equivalence.py
Cross-language equivalence test Java output == Python output for the same inputs tests/correctness/test_legacy_java_python_equivalence.py

Fixture sources:

  • fixtures/java/simple/LegacyCalculator.java
  • fixtures/java/simple/LegacyCalculatorVectorRunner.java
  • fixtures/expected_python/legacy_calculator.py
  • fixtures/vectors/legacy_calculator_vectors.json
  • fixtures/vectors/legacy_calculator_vectors.csv

Shared Vector Baseline (Single Source Of Truth)

Asset Runtime Consumer Purpose Status
legacy_calculator_vectors.json Python parity tests + Java vector runner Canonical vector source (id, input, expected) Implemented
legacy_calculator_vectors.csv Optional import/export interoperability Spreadsheet-friendly mirror for manual review Implemented
LegacyCalculatorVectorRunner.java Java runtime Reads shared JSON vectors and evaluates legacy function Implemented
test_legacy_java_python_equivalence.py pytest Parameterized cross-runtime parity assertions Implemented

Tools That Already Support This Testing Pattern

Tool How It Helps With Java-to-Python Parity Typical Use
pytest parameterized tests Reuse the same vectors for both runtimes Core parity assertions (implemented)
JUnit 5 parameterized tests Capture legacy Java oracle outputs Legacy baseline generation (recommended next)
ApprovalTests Golden-master snapshot comparisons Regression lock for legacy outputs (recommended next)
JSON/CSV test vectors Runtime-agnostic shared inputs/outputs Single source of truth for parity data (implemented)
Testcontainers Reproducible Java runtime execution Stable local runtime parity in isolated containers (recommended next)

Practical recommendation: keep a shared vector file and run both Java and Python against it, treating Java output as the initial oracle during migration.

Zero-Trust Solutions Matrix

Zero-Trust Control What It Means Project Implementation Evidence
Verify identity on every request No implicit trust by network location JWT verification + RBAC dependency checks in API routes tests/negative/test_rbac_enforcement.py
Explicit policy decision per request Each request must be allow/deny evaluated Input guardrails, model lock, egress policy lock, blocked audit path tests/negative/test_model_blocking.py, tests/negative/test_egress_blocking.py, tests/adversarial/test_prompt_injection.py
Least privilege access Users only get required capabilities Role-permission mapping with permission-scoped endpoints core/auth.py, tests/negative/test_rbac_enforcement.py
Continuous verification Runtime signals prove controls remain active Audit report includes zero-trust rates, quality attestations, deny rate /api/v1/audit-report zero-trust section
Assume breach + contain blast radius Treat unsafe inputs as hostile by default Block injection/secret payloads and sanitize audit records guardrails/input_guard.py, guardrails/output_guard.py, tests/integration/test_audit_trail.py

The release dashboard now includes a dedicated zero_trust section with:

  • posture
  • identity_verification_rate
  • policy_decision_rate
  • continuous_verification_rate
  • policy_deny_rate

This makes zero-trust status measurable release-over-release instead of purely descriptive.

Requirements-to-Implementation Mapping

Requirement (README) Test (pytest) Security Check Quality Gate Coverage
"Guarantee base-before-subclass order" test_topological_sort.py (16 tests) Klocwork scan SonarQube: no high issues 95%+ on project_translator.py
"Detect circular dependencies" test_circular_dependencies.py (4 tests) Bandit: no unsafe loops No tech debt on Kahn logic 100% cycle path
"Block injection patterns" test_prompt_injection.py (5 tests) Klocwork CWE-89, CWE-95 SonarQube security hotspots 100% on input_guard patterns
"Redact secrets from output" test_forbidden_patterns.py (4 tests) Bandit hardcoding check No credential leak in logs 100% on output_guard.redact()
"Enforce RBAC via JWT" test_rbac_enforcement.py (4 tests) Checkmarx token validation Crypto best practices 100% on auth.py verify_token
"Policy lock for models/egress" test_model_blocking.py (3 tests) Klocwork: whitelist bypass No bypass paths 100% on provider_lock.py

This table is the top-to-bottom traceability matrix: each requirement has a test, security validation, and quality gate.

(back to top)

pie title Test File Distribution by Marker Group
  "unit" : 5
  "integration" : 4
  "correctness" : 4
  "negative" : 4
  "adversarial" : 4
Loading
Marker Group Purpose Key Benefit
unit Algorithmic correctness for parsing/graph/order Fast feedback on core logic
integration API request/response and contract validation Catches wiring and schema regressions
correctness Python output structure and signature quality Protects translation fidelity
negative Policy and access-control enforcement Prevents unsafe execution paths
adversarial Injection and malformed input hardening Reduces attack-surface risk

(back to top)

Tool Compliance for Top Secret SCI/SCIF Regulated Environments

Classification Criteria:

  • 🟒 APPROVED: Tool is explicitly approved for classified/SCI work, has required security certifications (ISO 27001, FedRAMP, etc.), commonly used in defense/government sectors, or is open-source with minimal attack surface.
  • 🟑 CONDITIONAL: Tool can be used with specific restrictions (on-prem deployment only, special licensing, restricted data flow, etc.).
  • πŸ”΄ NOT APPROVED: Tool lacks required certifications, uses unapproved cloud storage, transmits classified data externally, or has known security concerns for SCI environments.

Caution

All tools flagged as NOT APPROVED or CONDITIONAL must be reviewed by your security/compliance officer before use. Do not deploy tools flagged as NOT APPROVED in SCI/SCIF environments. CONDITIONAL tools require explicit variance/waiver documentation.

Core Runtime & Test Framework Dependencies

Tool Version Purpose SCI/SCIF Status Restrictions/Notes
Python 3.11+ Runtime interpreter 🟒 APPROVED Open-source, widely used in government. Requires system-level deployment controls.
pytest 8.0+ Test framework 🟒 APPROVED Open-source, MIT license. Standard in Python security testing. No external data transmission.
pytest-asyncio 0.23+ Async test support 🟒 APPROVED Open-source, BSD license. Minimal attack surface.
httpx 0.27+ HTTP client for API testing 🟒 APPROVED Open-source, BSD license. Used for in-process API testing only (no external calls).
FastAPI 0.136+ Web framework 🟑 CONDITIONAL Open-source, MIT license. Requires hardened deployment configuration for SCI. Ensure all dependencies are audited. On-prem deployment only.
cryptography 42.0+ Cryptographic library 🟒 APPROVED Open-source, dual Apache/BSD license. NIST-standard algorithms. Actively maintained.
PyJWT 2.8+ JWT signing/verification 🟒 APPROVED Open-source, MIT license. Minimal, focused functionality.
Pydantic 2.9+ Data validation 🟒 APPROVED Open-source, MIT license. No external validation calls. Widely adopted in security projects.
javalang 0.13+ Java parser 🟒 APPROVED Open-source, BSD license. Local parsing only, no network access.

Static Analysis & Security Scanning Tools (SAST/SCA)

Tool Purpose SCI/SCIF Status Restrictions/Notes Recommended?
Klocwork (Perforce) SAST - vulnerabilities, code quality 🟒 APPROVED Enterprise tool explicitly used by aerospace/defense. ISO 27001 certified. TÜV-SÜD certified. Commercial license required. βœ… YES - Preferred for classified environments
SonarQube Code quality & maintainability 🟑 CONDITIONAL On-prem deployment: APPROVED. Cloud (SonarCloud): NOT APPROVED. Requires air-gapped or internal-only instance. ⚠️ On-prem only
Checkmarx (SAST) Enterprise vulnerability scanning 🟒 APPROVED Explicitly targets government/defense. Supports on-prem. Commercial license required. βœ… YES - Enterprise-grade SAST
Coverity (Synopsys) Deep static analysis 🟒 APPROVED Defense/aerospace standard tool. Commercial license required. Supports on-prem deployment. βœ… YES - Advanced static analysis
Bandit Python-specific security scanning 🟒 APPROVED Open-source, Apache 2.0 license. Lightweight, local execution only. βœ… YES - Lightweight pre-commit check
Pylint Python linting & style 🟒 APPROVED Open-source, GPL license. No external calls. Standard in Python ecosystem. βœ… YES - Pre-commit linting
pip-audit Python dependency vulnerability scanning 🟒 APPROVED Open-source, MIT license. Local scanning, no remote calls by default. βœ… YES - Lightweight dependency audit
OWASP Dependency-Check Dependency vulnerability scanner 🟒 APPROVED Open-source, Apache 2.0 license. Can run air-gapped with offline DB. βœ… YES - Comprehensive SCA
Black Duck (Synopsys) License & composition analysis 🟑 CONDITIONAL Commercial tool with on-prem option. Requires licensing agreement for classified use. ⚠️ On-prem with variance
Snyk Dependency scanning SaaS πŸ”΄ NOT APPROVED Cloud-based SaaS. Data transmission to external service prohibited for SCI. Unapproved for classified use. ❌ NO - Do not use

Testing & Performance Measurement Tools

Tool Purpose SCI/SCIF Status Restrictions/Notes Recommended?
pytest-cov Code coverage measurement 🟒 APPROVED Open-source, BSD license. Local execution only. Generates coverage reports. βœ… YES - Essential for V&V
Codecov Coverage tracking SaaS πŸ”΄ NOT APPROVED Cloud-based service. Transmits coverage data to external servers. Not approved for SCI environments. ❌ NO - Do not use
Datadog APM & monitoring SaaS πŸ”΄ NOT APPROVED Cloud SaaS. Continuous data transmission to external servers. Classified data cannot be sent to Datadog. ❌ NO - Do not use
LoadRunner (Micro Focus/OpenText) Performance & load testing 🟑 CONDITIONAL On-prem/self-hosted: APPROVED with proper security hardening. Cloud version: NOT APPROVED. Commercial license required. ⚠️ On-prem only
Stryker Mutation testing (Python/Java) 🟒 APPROVED Open-source, Apache 2.0 license. Runs locally, no external calls. βœ… YES - Test quality verification
PIT Mutation testing for Java bytecode 🟒 APPROVED Open-source, Apache 2.0 license. Local execution only. βœ… YES - For Java parity testing

Requirements Traceability & Test Management Tools

Tool Purpose SCI/SCIF Status Restrictions/Notes Recommended?
TestRail Test management & traceability 🟑 CONDITIONAL Self-hosted/on-prem: APPROVED with proper security controls. Cloud version: NOT APPROVED. Proprietary, commercial license. ⚠️ On-prem with security review
Jira Xray Test management within Jira 🟑 CONDITIONAL On-prem Jira: APPROVED. Cloud Jira: NOT APPROVED for SCI data. Proprietary plugin, commercial license. ⚠️ On-prem only
Azure DevOps Test Plans Requirements & test traceability 🟑 CONDITIONAL On-prem: APPROVED (requires Azure DevOps Server). Cloud (azure.com): NOT APPROVED for SCI. ⚠️ On-prem only
ReqIF Editor Requirements interchange format 🟒 APPROVED Open-source, EPL license. Local file-based tool, no external connections. βœ… YES - For requirements management

DevOps & Continuous Integration (CI/CD)

Tool Purpose SCI/SCIF Status Restrictions/Notes Recommended?
GitHub Actions Cloud-hosted CI/CD πŸ”΄ NOT APPROVED Cloud-hosted service. Builds and artifacts transmitted to GitHub servers. Not approved for SCI code/data. ❌ NO - Use on-prem CI/CD
Jenkins On-prem CI/CD automation 🟒 APPROVED Open-source, MIT license. Can be air-gapped or on-prem only. Widely used in government. βœ… YES - Preferred CI/CD for SCI
GitLab CI (Cloud) Cloud-hosted CI/CD πŸ”΄ NOT APPROVED Cloud-hosted. Not approved for SCI code transmission. ❌ NO - Use on-prem option
GitLab CI (Self-Hosted) Self-hosted CI/CD 🟑 CONDITIONAL On-prem deployment: APPROVED with proper air-gapping. Proprietary core, open-source options available. ⚠️ On-prem with security review

Summary: Compliance Status by Category

Category Approved Count Conditional Count Not Approved Count Recommendation
Core Dependencies 8/8 1 0 Use all core deps. Harden FastAPI deployment.
Static Analysis (SAST/SCA) 5/9 2 2 Use Klocwork, Checkmarx, Coverity as primary SAST. Avoid Snyk cloud.
Testing & Performance 4/6 1 1 Use pytest-cov and mutation testing. Avoid Codecov/Datadog cloud.
Requirements & Test Mgmt 1/4 3 0 Use ReqIF or on-prem TestRail/Jira. Avoid cloud services.
CI/CD 1/4 1 2 Use Jenkins on-prem. Avoid GitHub Actions and cloud CI.
TOTAL 19/31 8/31 5/31 Buildable with APPROVED tools. CONDITIONAL tools need variance.

Deployment Guidelines for SCI/SCIF Environments

For APPROVED Tools:

  • No additional review needed.
  • Deploy using standard security hardening practices.
  • Ensure all infrastructure is on-prem and air-gapped from external networks.

For CONDITIONAL Tools:

  • Requires security/compliance officer review and variance documentation.
  • Must be deployed on-prem (not cloud).
  • Ensure all data remains within security boundary.
  • Document any external dependencies or data transmission.

For NOT APPROVED Tools:

  • DO NOT DEPLOY in SCI/SCIF environments.
  • Seek alternative APPROVED tools.
  • Escalate to program security office if no alternative exists.

Migration Recommendations

If you are currently using NOT APPROVED tools:

Current Tool Reason Not Approved APPROVED Alternative
Codecov Cloud SaaS, external data transmission Use local pytest-cov + local artifact storage
Datadog Cloud SaaS, continuous monitoring Use on-prem ELK, Grafana, or Prometheus stack
Snyk Cloud SaaS, external scanning Use OWASP Dependency-Check (on-prem) + Bandit
GitHub Actions Cloud CI/CD Use Jenkins on-prem or GitLab self-hosted
SonarCloud Cloud SaaS Use SonarQube on-prem instance

(back to top)

Setup and Installation

Prerequisites:

  • Python 3.11+
  • Access to the orchestrator source path expected by conftest.py

Install:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Optional local env file:

cp .env.example .env

Quick validation:

pytest --collect-only -q

(back to top)

Usage

Run full suite:

pytest -q

Run by concern:

pytest -m unit -q
pytest -m integration -q
pytest -m correctness -q
pytest -m negative -q
pytest -m adversarial -q

Focused debugging flow:

  1. Install dependencies.
  2. Run the relevant marker group.
  3. Use -k to isolate failing behavior.
  4. Re-run the same slice to confirm regression closure.
pytest -m integration -k dependency_order -q

Tip

For long local runs, use Ctrl+C to stop gracefully and keep the latest failure summary.

(back to top)

Roadmap

gantt
  title Verification Roadmap
  dateFormat  YYYY-MM-DD
  section Core
  Parser and ordering guarantees            :done, r1, 2026-01-01, 2026-02-20
  section Security
  RBAC and guardrail hardening              :active, r2, 2026-02-21, 2026-05-30
  section Expansion
  Coverage growth and mutation checks       :r3, 2026-06-01, 2026-09-01
Loading
Phase Goals Target Status
Core Preserve dependency and translation order correctness Q1 2026 Complete
Security Broaden adversarial and RBAC scenarios Q2 2026 In progress
Expansion Add mutation testing and richer fixture corpora Q3 2026 Planned

(back to top)

Contributing

See CONTRIBUTING.md for workflow and test expectations.

Quality checklist for pull requests
  • Add or update tests for each behavior change.
  • Preserve dependency-order invariants in project translation paths.
  • Keep fixtures deterministic and security-safe.
  • Run targeted marker groups plus a full suite pass before opening a PR.

(back to top)

License

This project is licensed under the MIT License. See LICENSE for details.

(back to top)

About

Verification-first test infrastructure for secure, dependency-aware Java to Python translation services.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors