Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
56f66af
refactor(worker): restructure monolithic jobs.py into modular archite…
bencap Jan 7, 2026
22b7853
feat: Add comprehensive job traceability system database schema
bencap Jan 7, 2026
5de8fb4
fix(logging): simplify context saving logic to overwrite existing map…
bencap Jan 7, 2026
ad2e7fb
tests: add TransactionSpy class for mocking database transaction meth…
bencap Jan 12, 2026
9a3171e
feat: add BaseManager class with transaction handling and rollback fe…
bencap Jan 12, 2026
2e05a7e
feat: Job manager class, supporting utilities, and unit tests
bencap Jan 12, 2026
dc72637
feat: Pipeline manager class, supporting utilities, and unit tests
bencap Jan 14, 2026
d2c53fc
feat: add function to check if job dependencies are reachable
bencap Jan 16, 2026
bdb7964
feat: add markers for test categorization in pytest
bencap Jan 16, 2026
39d89c9
fix: mock job manager returning in fixture rather than yielding
bencap Jan 17, 2026
a1b254b
fix: enhance error logging for job and pipeline state transitions
bencap Jan 17, 2026
8c79577
fix: re-order imports in job manager test file
bencap Jan 17, 2026
411dc52
fix: use conftest_optional import structure in worker test module
bencap Jan 17, 2026
7631090
feat: Add decorators for job and pipeline management
bencap Jan 20, 2026
213f966
feat: use context for logging in job manager
bencap Jan 20, 2026
abfa82d
feat: decorator for job run record guarantees
bencap Jan 21, 2026
4379467
feat: add test mode support to job and pipeline decorators
bencap Jan 21, 2026
a98d7e7
fix: simplify exc handling in job management decorator
bencap Jan 21, 2026
45e166a
feat: allow pipelines to be started by decorated jobs
bencap Jan 22, 2026
4e9b22b
tests: unit tests for worker manager utilities
bencap Jan 22, 2026
0b78253
feat: add network test marker and control socket access in pytest
bencap Jan 22, 2026
79c0df4
Refactor test setup by replacing `setup_worker_db` with `with_populat…
bencap Jan 22, 2026
4509899
wip: refactor jobs to use job management system
bencap Jan 22, 2026
53f6722
refactor: reduce mocking of database across worker tests
bencap Jan 23, 2026
4919dca
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
5ebe2a5
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
2cabfb5
refactor: centralize decorator test mode flag fixture
bencap Jan 23, 2026
bfb0f7a
feat: enhance pipeline start logic with controllable coordination
bencap Jan 24, 2026
cb9e164
feat: logic fixups and comprehensive test cases for variant processin…
bencap Jan 24, 2026
dbe770f
feat: add start_pipeline job and related tests for pipeline management
bencap Jan 24, 2026
65f11bc
feat: gnomAD managed job tests and enhancements
bencap Jan 25, 2026
65c8c36
feat: uniprot managed job tests and enhancements
bencap Jan 27, 2026
0130963
feat: clingen managed job enhancements
bencap Jan 28, 2026
2c6b6c9
fixup(variant creation)
bencap Jan 28, 2026
9b66f51
feat: implement job and pipeline factories with definitions and tests
bencap Jan 28, 2026
38e028b
feat: integrate PipelineFactory for variant creation and update proce…
bencap Jan 28, 2026
e866136
feat: add context manager for database session management
bencap Jan 28, 2026
7764008
feat: use session context manager in worker decorators rather than in…
bencap Jan 28, 2026
3569ae6
refactor: streamline context handling in job and pipeline decorators
bencap Jan 28, 2026
b5691b6
feat: add new job definitions for score set annotation pipeline
bencap Jan 29, 2026
8dc3051
feat: implement AnnotationStatusManager for managing variant annotati…
bencap Jan 29, 2026
48c4928
feat: add annotation status tracking to jobs
bencap Jan 29, 2026
d5d9339
feat: streamline job results and exception handling in tests
bencap Jan 29, 2026
08b97fe
feat: less prescriptive status messages in complete job functions
bencap Jan 29, 2026
8a34bfc
fix: ensure exception info is always present for failed jobs in job m…
bencap Jan 29, 2026
c3b5c0a
fix: move Athena engine fixture to optional conftest for core depende…
bencap Jan 29, 2026
9614184
feat: add standalone context creation for worker lifecycle management
bencap Jan 29, 2026
a75295d
feat: add asyncclick dependency and update environment script to use it
bencap Jan 29, 2026
3d32baf
feat: add standalone job definitions and update lifecycle context for…
bencap Jan 29, 2026
4c6e61a
feat: refactor populate_mapped_variant_data to use async and job subm…
bencap Jan 29, 2026
2d64a8d
chore: test cleanup
bencap Jan 29, 2026
c44726b
docs: minimal developer docs via copilot for worker jobs
bencap Jan 29, 2026
5fc19a4
fix: mypy typing
bencap Jan 29, 2026
722ca72
fix: test attempting to connect via socket to athena
bencap Jan 29, 2026
5ab1215
feat: add Slack error notifications to job/pipeline decorators
bencap Jan 29, 2026
947e78c
fix: update TODO comments for clarity and specificity in UniProt and …
bencap Jan 29, 2026
ed48980
feat: make Redis client optional in managers and add error handling f…
bencap Jan 29, 2026
0e916ac
feat: implement create_job_dependency method in JobFactory with valid…
bencap Jan 29, 2026
fe9742c
feat: refactor UniProt ID mapping script to use async commands and jo…
bencap Jan 29, 2026
adce263
feat: refactor link_gnomad_variants script to use async commands and …
bencap Jan 30, 2026
24efdeb
feat: refactor clingen_car_submission script to use async commands an…
bencap Jan 30, 2026
4861214
feat: refactor clingen_ldh_submission script to streamline job submis…
bencap Jan 30, 2026
6442a42
feat: clinvar clinical control refresh job + script
bencap Jan 30, 2026
29adafc
feat: update annotation type handling to use enum directly and switch…
bencap Feb 4, 2026
fcbcf32
feat: add functions to retrieve associated ClinVar Allele IDs and enh…
bencap Feb 4, 2026
7c9f11f
refactor: remove redundant fixture for setting up sample variants in …
bencap Feb 4, 2026
050838e
chore: add TODO for caching ClinVar control data to improve performance
bencap Feb 4, 2026
ecaf1f0
feat: add multiple refresh job definitions for ClinVar controls with …
bencap Feb 4, 2026
a5c6437
feat: enhance test workflow to run fast tests on pull requests and fu…
bencap Feb 4, 2026
bddba7a
feat: add Redis caching for ClinGen API requests to reduce redundant …
bencap Feb 17, 2026
4ea63d5
feat: add commit option to job progress and status update methods for…
bencap Feb 17, 2026
c34741c
feat: implement stalled job cleanup with unified retry handling
bencap Feb 17, 2026
7fbcbbe
fix: correct type annotations in cleanup.py
bencap Feb 17, 2026
6ec194f
wip: standardize job result contracts
bencap Mar 2, 2026
dd150b1
ai: update instruction files for testing guidance
bencap Mar 12, 2026
5c388e1
feat(mapping): populate hgvs_assay_level when creating mapped variants
bencap Apr 15, 2026
12ba7e1
fix: update down_revision to correct previous migration reference
bencap Apr 16, 2026
dafc4b0
fix(types): update JobExecutionOutcome home to avoid circular imports
bencap Apr 16, 2026
bccaff7
build(deps): upgrade pytest-postgresql from ~5.0.0 to ~7.0.0
bencap Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions .github/instructions/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,105 @@ poetry run python -m mavedb.scripts.<script_name>
- [server_main.py](src/mavedb/server_main.py) — App setup and dependency injection
- [authentication.py](src/mavedb/lib/authentication.py) — Auth patterns
- [conftest.py](tests/conftest.py) — Test fixtures and database setup

### Naming Conventions
- **Variables & functions**: `snake_case` (e.g., `score_set_id`, `create_variants_for_score_set`)
- **Classes**: `PascalCase` (e.g., `ScoreSet`, `UserData`, `ProcessingState`)
- **Constants**: `UPPER_SNAKE_CASE` (e.g., `MAPPING_QUEUE_NAME`, `DEFAULT_LDH_SUBMISSION_BATCH_SIZE`)
- **Enum values**: `snake_case` (e.g., `ProcessingState.success`, `MappingState.incomplete`)
- **Database tables**: `snake_case` with descriptive association table names (e.g., `scoreset_contributors`, `experiment_set_doi_identifiers`)
- **API endpoints**: kebab-case paths (e.g., `/score-sets`, `/experiment-sets`)

### Documentation Conventions
*For general Python documentation standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific additions:*

- **Algorithm explanations**: Include comments explaining complex logic, especially URN generation and bioinformatics operations
- **Design decisions**: Comment on why certain architectural choices were made
- **External dependencies**: Explain purpose of external bioinformatics libraries (HGVS, SeqRepo, etc.)
- **Bioinformatics context**: Document biological reasoning behind genomic data processing patterns

### Commenting Guidelines
**Core Principle: Write self-explanatory code. Comment only to explain WHY, not WHAT.**

**✅ WRITE Comments For:**
- **Complex bioinformatics algorithms**: Variant mapping algorithms, external service interactions
- **Business logic**: Why specific validation rules exist, regulatory requirements
- **External API constraints**: Rate limits, data format requirements
- **Non-obvious calculations**: Score normalization, statistical methods
- **Configuration values**: Why specific timeouts, batch sizes, or thresholds were chosen

**❌ AVOID Comments For:**
- **Obvious operations**: Variable assignments, simple loops, basic conditionals
- **Redundant descriptions**: Comments that repeat what the code clearly shows
- **Outdated information**: Comments that don't match current implementation

### Error Handling Conventions
- **Structured logging**: Always use `logger` with `extra=logging_context()` for correlation IDs
- **HTTP exceptions**: Use FastAPI `HTTPException` with appropriate status codes and descriptive messages
- **Custom exceptions**: Define domain-specific exceptions in `src/mavedb/lib/exceptions.py`
- **Worker job errors**: Send Slack notifications via `send_slack_error()` and log with full context
- **Validation errors**: Use Pydantic validators and raise `ValueError` with clear messages

### Code Style and Organization Conventions
*For general Python style conventions, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Async patterns**: Use `async def` for I/O operations, regular functions for CPU-bound work
- **Database operations**: Use SQLAlchemy 2.0 style with `session.scalars(select(...)).one()`
- **Pydantic models**: Separate request/response models with clear inheritance hierarchies
- **Bioinformatics data flow**: Structure code to clearly show genomic data transformations

### Testing Conventions
*For testing philosophy, mocking boundaries, and conventions see `.github/instructions/testing.instructions.md`. For general Python testing standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Test function naming**: Use descriptive names that reflect bioinformatics operations (e.g., `test_cannot_publish_score_set_without_variants`)
- **Fixtures**: Use `conftest.py` for shared fixtures, especially database and worker setup
- **Mocking**: Mock only at system boundaries (external services, Redis/ARQ, Slack). Do not mock internal helpers or `update_progress`
- **Constants**: Define test data including genomic sequences and variants in `tests/helpers/constants.py`
- **Integration testing**: Test full bioinformatics workflows including external service interactions

## Codebase Conventions

### URN Validation
- Use regex patterns from `src/mavedb/lib/validation/urn_re.py`
- Validate URNs in Pydantic models with `@field_validator`
- URN generation logic in `src/mavedb/lib/urns.py` and `temp_urns.py`

### Worker Jobs (ARQ/Redis)
- **Job definitions**: All background jobs in `src/mavedb/worker/jobs.py`
- **Settings**: Worker configuration in `src/mavedb/worker/settings.py` with function registry and cron jobs
- **Job patterns**:
- Use `setup_job_state()` for logging context with correlation IDs
- Implement exponential backoff with `enqueue_job_with_backoff()`
- Handle database sessions within job context
- Send Slack notifications on failures via `send_slack_error()`
- **Key job types**:
- `create_variants_for_score_set` - Process uploaded CSV data
- `map_variants_for_score_set` - External variant mapping via VRS
- `submit_score_set_mappings_to_*` - Submit to external annotation services
- **Enqueueing**: Use `ArqRedis.enqueue_job()` from routers with correlation ID for request tracing

### View Models (Pydantic)
- **Base model** (`src/mavedb/view_models/base/base.py`) converts empty strings to None and uses camelCase aliases
- **Inheritance patterns**: `Base` → `Create` → `Modify` → `Saved` model hierarchy
- **Field validation**: Use `@field_validator` for single fields, `@model_validator(mode="after")` for cross-field validation
- **URN validation**: Validate URNs with regex patterns from `urn_re.py` in field validators
- **Transform functions**: Use functions in `validation/transform.py` for complex data transformations
- **Separate models**: Request (`Create`, `Modify`) vs response (`Saved`) models with different field requirements

### External Integrations
- **HGVS/SeqRepo** for genomic sequence operations
- **DCD Mapping** for variant mapping and VRS transformation
- **CDOT** for transcript/genomic coordinate conversion
- **GA4GH VRS** for variant representation standardization
- **ClinGen services** for allele registry and linked data hub submissions

## Key Files to Reference
- `src/mavedb/models/score_set.py` - Primary data model patterns
- `src/mavedb/routers/score_sets.py` - Complex router with worker integration
- `src/mavedb/worker/jobs.py` - Background processing patterns
- `src/mavedb/view_models/score_set.py` - Pydantic model hierarchy examples
- `src/mavedb/server_main.py` - Application setup and dependency injection
- `src/mavedb/data_providers/services.py` - External service integration patterns
- `src/mavedb/lib/authentication.py` - Authentication and authorization patterns
- `tests/conftest.py` - Test fixtures and database setup
- `docker-compose-dev.yml` - Service architecture and dependencies
185 changes: 76 additions & 109 deletions .github/instructions/testing.instructions.md
Original file line number Diff line number Diff line change
@@ -1,121 +1,88 @@
---
description: 'MaveDB testing conventions — fixtures, mocking, test data patterns'
description: 'Testing philosophy and conventions for the MaveDB API'
applyTo: 'tests/**/*.py'
---

# Testing Conventions for MaveDB
# Testing Conventions

## Test Infrastructure
## Outcome-Based Testing

### Database
- **pytest-postgresql** provides ephemeral PostgreSQL instances per test session
- Database schema is created from SQLAlchemy models via `Base.metadata.create_all()`
- Each test gets a clean transaction that rolls back after completion
- Core fixtures live in `tests/conftest.py`
Test what code does (return values, DB state, external boundary calls), not how it does it (internal method calls, message strings, call sequences). Tests should survive internal refactoring without changes.

### Network Isolation
- **pytest-socket** blocks real network calls in tests
- External services (HGVS, SeqRepo, DCD Mapping, ClinGen) must be mocked
**Assert on:**
- Return values and response objects
- DB state changes (query for created/updated/deleted records)
- External boundary calls (see Mocking Boundaries below)

## Fixtures
**Do not assert on:**
- Internal function invocations (e.g., that a helper was called with specific args)
- Call counts or call sequences on internal methods
- Log or progress message strings

## Mocking Boundaries

Only mock at system boundaries — the edges where your code talks to something external:
- External services (APIs, third-party clients)
- Infrastructure (Redis/ARQ, Slack, email)
- Network I/O (`run_in_executor`, HTTP clients)
- File I/O (S3, local filesystem in tests)

Do NOT mock internal helpers, validators, or data transforms. Test through them.

## Unit vs Integration Test Responsibilities

**Unit tests:** Edge cases, error paths, invalid inputs, boundary conditions. Use mocked external services.

**Integration tests:** Happy paths, end-to-end workflows, DB state verification. Use real DB with test fixtures.

### Two-Tier conftest
- `tests/conftest.py` — Core fixtures: database session, auth overrides, user contexts, API client
- `tests/<module>/conftest.py` — Module-specific fixtures for that test directory

### Auth Fixtures
Four pre-configured user contexts:
- **Default user** — standard authenticated user (test ORCID)
- **Anonymous user** — unauthenticated
- **Extra user** — second authenticated user (for permission tests)
- **Admin user** — user with admin role

### DependencyOverrider
Switch auth context mid-test using the `DependencyOverrider` context manager:
```python
with DependencyOverrider(app, {get_current_user: lambda: admin_user}):
response = client.get("/api/v1/score-sets/private-urn")
assert response.status_code == 200
```

## Test Data Constants

All test constants live in `tests/helpers/constants.py` with naming conventions:

| Prefix | Purpose | Example |
|--------|---------|---------|
| `VALID_*` | Valid input values | `VALID_ACCESSION`, `VALID_GENE_NAME` |
| `TEST_*` | Complete test objects (dicts) | `TEST_SCORE_SET`, `TEST_EXPERIMENT` |
| `TEST_MINIMAL_*` | Minimal valid objects | `TEST_MINIMAL_SCORE_SET` |
| `SAVED_*` | Expected shapes after save | `SAVED_SCORE_SET` |
| `*_RESPONSE` | Expected API response shapes | `SCORE_SET_RESPONSE` |
## Assertion Best Practices

- Use `session.refresh()` before asserting on modified ORM objects
- Add custom assertion messages to complex assertions where the failure message wouldn't immediately clarify what went wrong
- Include negative assertions where appropriate (verify unwanted records don't exist)
- Don't add messages to trivially clear assertions like `assert len(variants) == 0`

## Test Naming

Use descriptive names that reflect the operation and expected outcome:
```python
def test_cannot_publish_score_set_without_variants(): ...
def test_admin_can_view_private_score_set(): ...
def test_create_experiment_with_invalid_urn_returns_422(): ...
```

## Mocking External Services

Always mock external bioinformatics services:
```python
from unittest.mock import patch

@patch("mavedb.data_providers.services.cdot_rest")
@patch("mavedb.worker.jobs.map_variants_for_score_set")
def test_publish_enqueues_mapping(mock_map, mock_cdot, client, db):
...
```

Common mock targets:
- `mavedb.data_providers.services.cdot_rest`
- `mavedb.worker.jobs.*` (individual job functions)
- `mavedb.lib.authentication.get_current_user`
- HGVS/SeqRepo data providers

## Helper Factories

Use factory functions in test helpers to create test objects:
```python
from tests.helpers.constants import TEST_SCORE_SET

def create_score_set(client, payload=TEST_SCORE_SET):
response = client.post("/api/v1/score-sets/", json=payload)
assert response.status_code == 201
return response.json()
```

## Testing Patterns

### Permission Testing
Test both allowed and denied access for each role:
```python
def test_owner_can_update_draft(client, db):
...

def test_non_owner_cannot_update_draft(client, db):
with DependencyOverrider(app, {get_current_user: lambda: other_user}):
response = client.put(f"/api/v1/score-sets/{urn}", json=update_data)
assert response.status_code == 404 # 404, not 403
```

### Worker Job Testing
Test job logic directly, not through the API:
```python
async def test_create_variants_processes_csv(db, score_set):
ctx = {"db": db}
await create_variants_for_score_set(ctx, score_set.id, "test-correlation-id")
assert score_set.num_variants > 0
```

### Schema Validation
Verify that response shapes match view models:
```python
def test_score_set_response_has_record_type(client):
response = client.get(f"/api/v1/score-sets/{urn}")
assert response.json()["recordType"] == "score_set"
```
Use the pattern: `test_<function_name>_<condition>_<expected_outcome>`

Examples:
- `test_submit_to_car_when_disabled_skips_submission`
- `test_create_score_set_returns_422_when_missing_target`

Apply to tests being modified; don't rename all tests at once.

## Parametrization

Use `@pytest.mark.parametrize` with descriptive `ids` when the same logic is tested across multiple states. Prefer parametrization over copy-pasting near-identical tests.

## Fixtures

- Keep fixtures minimal and composable
- Define fixtures in the most specific `conftest.py` where they're needed
- Don't duplicate fixtures across test classes — lift shared ones to the nearest common conftest
- Use factory fixtures when tests need variants of the same object

---

# Worker-Specific Conventions

The following conventions apply specifically to `tests/worker/`.

## Job Test Assertions

- Assert on `JobExecutionOutcome.status` and `.data` for every job test
- Assert on DB state changes for the domain objects the job modifies
- For external service jobs: assert boundary calls (ClinGen CAR/LDH, UniProt, gnomAD/Athena, S3, ClinVar)

## Let `update_progress` Run Unpatched

`update_progress()` calls `session.commit()` as a checkpoint. This is production behavior and should execute in tests. Letting it run means tests verify that checkpoint commits don't break state or interfere with final outcomes. Don't patch it, don't mock it, don't assert on its messages.

## TransactionSpy Usage

**USE in manager/decorator tests** (e.g., `test_job_manager.py`, `test_pipeline_manager.py`): The commit/rollback boundary IS the contract here. If someone removes a commit, data silently won't persist in production. DB state checks alone can't catch this because the test session may auto-commit on teardown.

**USE `mock_database_flush_failure` / `mock_database_rollback_failure`**: These simulate DB errors that are genuinely hard to reproduce otherwise. Valuable for testing error recovery paths in infrastructure code.

**DO NOT USE in job-level tests** (e.g., `test_clingen.py`, `test_cleanup.py`, `test_creation.py`): The job's contract is "variants were created" or "stalled jobs were retried," not "session.commit() was called." Use DB state queries instead.
31 changes: 26 additions & 5 deletions .github/workflows/run-tests-on-push.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: Run Tests (On Push)
name: Run Tests
on:
push:
# Run all tests on main, fast tests on other branches

env:
LOG_CONFIG: test
Expand Down Expand Up @@ -50,7 +51,12 @@ jobs:
- run: pip install --upgrade pip
- run: pip install poetry
- run: poetry install --with dev
- run: poetry run pytest tests/
- name: Run fast tests on non-main branches
if: github.event_name == 'push' && github.ref != 'refs/heads/main'
run: poetry run pytest tests/ -m "not network and not slow"
- name: Run full tests on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: poetry run pytest tests/

run-tests-3_11:
runs-on: ubuntu-latest
Expand All @@ -66,7 +72,12 @@ jobs:
- run: pip install --upgrade pip
- run: pip install poetry
- run: poetry install --with dev --extras server
- run: poetry run pytest tests/ --show-capture=stdout --cov=src
- name: Run fast tests on non-main branches
if: github.ref != 'refs/heads/main'
run: poetry run pytest tests/ -m "not network and not slow" --show-capture=stdout
- name: Run all tests with coverage on main branch
if: github.ref == 'refs/heads/main'
run: poetry run pytest tests/ --show-capture=stdout --cov=src

run-tests-3_12-core-dependencies:
runs-on: ubuntu-latest
Expand All @@ -80,7 +91,12 @@ jobs:
- run: pip install --upgrade pip
- run: pip install poetry
- run: poetry install --with dev
- run: poetry run pytest tests/
- name: Run fast tests on non-main branches
if: github.ref != 'refs/heads/main'
run: poetry run pytest tests/ -m "not network and not slow"
- name: Run all tests on main branch
if: github.ref == 'refs/heads/main'
run: poetry run pytest tests/

run-tests-3_12:
runs-on: ubuntu-latest
Expand All @@ -96,4 +112,9 @@ jobs:
- run: pip install --upgrade pip
- run: pip install poetry
- run: poetry install --with dev --extras server
- run: poetry run pytest tests/ --show-capture=stdout --cov=src
- name: Run fast tests on non-main branches
if: github.ref != 'refs/heads/main'
run: poetry run pytest tests/ -m "not network and not slow" --show-capture=stdout
- name: Run all tests with coverage on main branch
if: github.ref == 'refs/heads/main'
run: poetry run pytest tests/ --show-capture=stdout --cov=src
Loading
Loading