Contributing to the WILDS WDL Library

Thank you for your interest in contributing to the WILDS WDL Library! This document provides guidelines for contributing modules, pipelines, and improvements to our centralized collection of bioinformatics WDL infrastructure.

Getting Started
Repository Structure
Types of Contributions
Module Development Guidelines
Pipeline Development Guidelines
Testing Requirements
Documentation Standards
Documentation Website
Pull Request Process
Code of Conduct

Getting Started

Before contributing code changes, please:

Fork the repository to your GitHub account
Set up your development environment with the required tools:
- For local testing:
  - sprocket (recommended)
  - miniWDL
  - uv for automated local testing
- Docker Desktop for container execution
Make code changes and push them to your fork
Submit a pull request (PR) to merge your contributions into the main branch of the original repo
- The title of your PR should briefly describe the change.
- If your contribution resolves an issue, the body of your PR should contain Fixes #issue-number

Repository Structure

The WILDS WDL Library follows a two-tier architecture:

Modules: Collection of tasks that use a given tool
Pipelines: Analysis workflows that import and combine module tasks (ranging from basic examples to production-ready pipelines)

wilds-wdl-library/
├── modules/
│   └── ww-toolname/
│       ├── ww-toolname.wdl
│       └── README.md
├── pipelines/
│   └── ww-pipeline-name/
│       ├── ww-pipeline-name.wdl
│       ├── inputs.json
│       └── README.md
└── .github/
    └── workflows/     # CI/CD automation

Types of Contributions

1. Bug Reports and Issues

Use the GitHub Issues page
Provide detailed information about the problem
Include error messages, info about input files, and steps to reproduce
Tag issues appropriately (bug, enhancement, question, etc.)

2. Documentation Improvements

Fix typos, improve clarity, or add missing information
Enhance README files with better examples

3. Module Contributions

Focus on one high-utility bioinformatics tool
Follow standardized module structure
Include comprehensive testing and validation

4. Pipeline Contributions

Combine existing modules into analysis workflows
Range from basic educational examples (2-3 modules) to advanced production pipelines (10+ modules)
Document complexity level in the README
Provide educational and/or production value for the community

Module Development Guidelines

See our ww-template module as an example

The module folder must contain:

ww-toolname.wdl - Main WDL file containing task definitions for the tool
testrun.wdl - Test workflow demonstrating module functionality (must be named testrun.wdl)
README.md - Comprehensive documentation

The module folder may optionally contain:

Custom scripts (e.g., .R, .py, .sh) - If your task requires a custom script that isn't part of the container image, place it directly in the module directory alongside the WDL files. The script can be fetched at runtime using curl or wget in the task's command block.

Your main WDL file (ww-toolname.wdl) must include:

Version declaration: Use WDL version 1.0
Task definitions: Individual tasks with proper resource requirements
Metadata documentation: Describe properties of tasks (e.g. inputs, outputs) using meta and parameter_meta blocks

Your test workflow file (testrun.wdl) must include:

Version declaration: Use WDL version 1.0
Module imports: Import the module being tested and the ww-testdata module using GitHub URLs
Sample struct definition: Define a struct for organizing sample inputs if needed
Test workflow: A toolname_example workflow that demonstrates all tasks (must follow the naming convention {module}_example where {module} is the tool name, e.g., star_example for ww-star)
Auto-downloading of test data: Use the ww-testdata module to automatically provision test data
Validation task (optional): Consider including a validation task to verify output correctness

Parameter preferences:

Use descriptive parameter names
Include optional parameters with sensible defaults
Support both single samples and batch processing where applicable

Docker image preferences:

Use images from the WILDS Docker Library when available
If creating new images, follow WILDS container standards and consider contributing to the WILDS Docker Library.
Specify exact image versions (avoid latest tags)
Document image dependencies in the README

Pipeline Development Guidelines

Pipelines should:

Combine existing modules from the library
Demonstrate realistic analysis workflows
Serve as educational templates and/or production-ready analyses
Use publicly available test data
Document their complexity level (Basic, Intermediate, or Advanced)

Complexity Levels:

Level	Modules	Typical Runtime	Description
Basic	2-3	< 30 minutes	Simple integrations ideal for learning
Intermediate	4-6	1-4 hours	Multi-step analyses for common use cases
Advanced	10+	> 4 hours	Comprehensive production pipelines

Prefer Existing Modules

Pipelines should primarily combine existing modules - prefer using existing modules over creating new task definitions. If you need new functionality, consider contributing it as a module first.

Pipeline inputs.json

Each pipeline should include an inputs.json file that serves as an example for users. This file demonstrates the expected input structure and helps users understand what values they need to provide when running the pipeline. Your inputs.json should:

Use dummy/placeholder paths for file inputs (e.g., "/path/to/your/sample.fastq.gz")
Include common or recommended values for non-file parameters
Document all required inputs with realistic example values
Use the pipeline's README to provide descriptions and guidance for each input parameter

Note: GitHub Action tests use the ww-testdata module to automatically download test data, so your inputs.json does not need to reference actual test files for CI purposes.

Platform-Specific Configurations (Optional)

Pipelines may include optional platform-specific configuration directories for execution on cloud platforms or workflow management systems:

Location: Place platform configs in a subdirectory within the pipeline (e.g., pipelines/ww-example/.cirro/)
Naming convention: Use dotfile directory names (.cirro/, .terra/, etc.) to indicate platform
Standalone principle: Keep all pipeline-related files (WDL, inputs, platform configs) in the pipeline directory
Documentation: Document platform configurations in the pipeline's README with links to platform documentation
Examples:
- .cirro/ for Cirro platform (config documentation)
- .terra/ for Terra workspace configurations
- Other platform-specific directories as needed

Platform configurations are entirely optional and should not be required to run the pipeline with standard WDL executors (Cromwell, miniWDL, Sprocket).

Cirro Configuration Validation: Pipelines with .cirro/ directories are automatically validated in CI. The validation checks that all required files are present (preprocess.py, process-form.json, process-input.json, process-output.json, process-compute.config), JSON files are valid, and preprocess.py has no syntax errors. You can run this locally with make lint_cirro.

Testing Requirements

Local Tests

Make sure you have these installed:

sprocket (recommended)
miniWDL
uv for automated testing with our Makefile
Docker Desktop for container execution

Option 1: Manual Testing

Test your WDL manually by navigating to the module directory:

cd modules/ww-toolname

# Linting with miniwdl (check both main module and test workflow)
miniwdl check ww-toolname.wdl
miniwdl check testrun.wdl

# Linting with sprocket (ignoring things we don't care about)
sprocket lint \
  -e TodoComment \
  -e ContainerUri \
  -e TrailingComma \
  -e CommentWhitespace \
  -e UnusedInput \
  ww-toolname.wdl

sprocket lint \
  -e TodoComment \
  -e ContainerUri \
  -e TrailingComma \
  -e CommentWhitespace \
  -e UnusedInput \
  testrun.wdl

# Test running (use testrun.wdl for execution tests)
sprocket run testrun.wdl --entrypoint toolname_example
miniwdl run testrun.wdl

Option 2: Automated Testing with Makefile (Recommended)

Use our automated Makefile from the repository root for easier testing:

# Test a specific module (replace ww-toolname with your module name)
make lint MODULE=ww-toolname          # Run all linting checks
make lint_sprocket MODULE=ww-toolname # Run only sprocket linting
make lint_miniwdl MODULE=ww-toolname  # Run only miniwdl linting
make run_sprocket MODULE=ww-toolname  # Run sprocket with proper entrypoint
make run_miniwdl MODULE=ww-toolname   # Run miniwdl

# Test all modules
make lint    # Lint all modules
make run     # Run all modules with both sprocket and miniwdl

The Makefile automatically handles:

Proper entrypoint naming for sprocket ({module}_example)
Module discovery and validation
Dependency checking (sprocket, uv, etc.)
Consistent test execution across all modules

Test Data

Use the ww-testdata module for standardized test datasets
If you need additional test datasets, modify the ww-testdata module also
Include small, representative test files in your examples

Automated Tests

All contributions must pass our automated testing pipeline which executes on a PR via GitHub Actions:

Multi-executor validation: Tests with Cromwell, miniWDL, and Sprocket
Container verification: All Docker images must be accessible and functional
Syntax validation: WDL syntax and structure validation
Integration testing: Cross-module compatibility testing
Cirro validation: Validates .cirro/ configurations for pipelines that include them

Documentation Website

The WILDS WDL Library includes an automatically-generated documentation website that provides comprehensive technical documentation for all modules and pipelines. Understanding how this documentation works is important for contributors.

How Documentation is Generated

The documentation website is built using Sprocket and automatically deployed to GitHub Pages. The documentation is generated from:

README files: Each module and pipeline directory contains a README.md that becomes the documentation homepage for that component
WDL files: Task descriptions, inputs, outputs, and metadata are automatically extracted from WDL files
Main README: The repository's root README.md serves as the documentation site homepage

Automatic Deployment

Documentation is automatically built and deployed when changes are merged to the main branch:

The build-docs.yml GitHub Actions workflow triggers on push to main
The workflow runs the make_preambles.py script to prepare WDL files
Sprocket generates static HTML documentation
The postprocess_docs.py script applies final formatting
Documentation is deployed to GitHub Pages at the repository's documentation URL

Important: You don't need to build or commit documentation files - they are generated automatically in CI/CD.

Previewing Documentation Locally

Before submitting a PR, you can preview how your changes will appear on the documentation website using the provided Makefile targets:

Build and Preview Documentation

# Build documentation locally (mirrors the CI/CD process)
make docs-preview

# Serve the documentation on http://localhost:8000
make docs-serve

# Or do both in one command
make docs

The docs-preview target will:

Check for uncommitted changes and warn you (docs are built from your last commit)
Safely stash any uncommitted work
Run the same build process as the GitHub Actions workflow
Generate documentation in the docs/ directory
Restore your uncommitted changes when finished
Clean up all temporary build files

Note: The docs/ directory is gitignored and should never be committed to the repository.

What Gets Built

When you run make docs-preview, the build process:

Prepends each module's README to its WDL file for better documentation context
Converts GitHub import URLs to relative paths for local navigation
Generates comprehensive HTML documentation for all tasks, workflows, and components
Applies custom styling and post-processing

Documentation Best Practices

When contributing, ensure your documentation is clear and complete:

README files: Write clear, user-focused descriptions of what your module/pipeline does
Task metadata: Use meta blocks to document task purpose, authors, and other high-level information
Parameter metadata: Use parameter_meta blocks to describe all inputs and outputs
Examples: Include usage examples in README files
Preview locally: Always run make docs-preview before submitting a PR to verify how your documentation will appear

Troubleshooting Documentation Builds

If you encounter issues with local documentation builds:

Ensure you have the required dependencies installed (sprocket, uv, python 3.13)
Check that you're running the command from the repository root
Review error messages - they often indicate issues with WDL syntax or README formatting

For questions about documentation, please contact wilds@fredhutch.org.

Pull Request Process

After meeting the requirements above, submit a PR to merge your forked repo into main.

Create descriptive PR title:
- Examples: Add BWA alignment module, Add RNA-seq analysis pipeline
Fill out PR template: Provide detailed information about your contribution
Link related issues: Reference any GitHub issues your PR addresses
Request reviews: Tag Emma Bishop (@emjbishop) or Taylor Firman (@tefirman)

Review Criteria

Your PR will be evaluated on:

Functionality: Does it work as intended?
Testing: Are tests comprehensive and passing?
Documentation: Is documentation clear and complete?
Standards compliance: Does it follow WILDS conventions?
Code quality: Is the WDL code well-structured and readable?
Uniqueness: Does it avoid duplicating existing functionality in the library?

Help for new contributors

New contributors are welcome! If you're new to WDL or bioinformatics workflows:

Review our WDL 101 course materials
Check out existing modules for examples
Don't hesitate to ask questions in issues or via email. If you have a uw.edu or fredhutch.org email you can also ask questions in our fh-data slack workspace
Consider starting with documentation contributions

For more questions you can contact the Fred Hutch Office of the Chief Data Officer (OCDO) at wilds@fredhutch.org

Code of Conduct

By participating in this project, you agree to abide by our code of conduct:

Be respectful: Treat all community members with respect and kindness
Be collaborative: Work together constructively and help others learn
Be inclusive: Welcome contributors from all backgrounds and experience levels
Be patient: Remember that everyone is learning and growing

Reporting Issues

If you experience or witness unacceptable behavior, please report it to wilds@fredhutch.org.

License

By contributing to this project, you agree that your contributions will be licensed under the MIT License. See the LICENSE file for details.

Thank you for contributing to WILDS! Your contributions help advance reproducible bioinformatics research for the entire community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to the WILDS WDL Library

Table of Contents

Getting Started

Repository Structure

Types of Contributions

1. Bug Reports and Issues

2. Documentation Improvements

3. Module Contributions

4. Pipeline Contributions

Module Development Guidelines

Pipeline Development Guidelines

Testing Requirements

Local Tests

Option 1: Manual Testing

Option 2: Automated Testing with Makefile (Recommended)

Test Data

Automated Tests

Documentation Website

How Documentation is Generated

Automatic Deployment

Previewing Documentation Locally

Build and Preview Documentation

What Gets Built

Documentation Best Practices

Troubleshooting Documentation Builds

Pull Request Process

Review Criteria

Help for new contributors

Code of Conduct

Reporting Issues

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to the WILDS WDL Library

Table of Contents

Getting Started

Repository Structure

Types of Contributions

1. Bug Reports and Issues

2. Documentation Improvements

3. Module Contributions

4. Pipeline Contributions

Module Development Guidelines

Pipeline Development Guidelines

Testing Requirements

Local Tests

Option 1: Manual Testing

Option 2: Automated Testing with Makefile (Recommended)

Test Data

Automated Tests

Documentation Website

How Documentation is Generated

Automatic Deployment

Previewing Documentation Locally

Build and Preview Documentation

What Gets Built

Documentation Best Practices

Troubleshooting Documentation Builds

Pull Request Process

Review Criteria

Help for new contributors

Code of Conduct

Reporting Issues

License