![]() |
A centralized collection of bioinformatics WDL infrastructure providing reusable, well-tested components that can be combined to create powerful computational pipelines for genomics research. |
The WILDS WDL Library consolidates bioinformatics workflows into a single, well-organized repository that serves as both a collection of production-ready tools and a demonstration of WDL best practices. Rather than maintaining separate repositories for each workflow, this library promotes modularity and reusability through a two-tier architecture.
The library is organized into two complementary levels:
Tool-specific collections of reusable WDL tasks with comprehensive testing.
- Purpose: Foundational building blocks for larger workflows
- Content: Individual bioinformatics tools (STAR, BWA, GATK, etc.)
- Testing: Unit tests ensure each task functions correctly over time
- Usage: Import tasks into custom workflows or run demonstration workflows
Complete analysis workflows combining multiple modules.
- Purpose: Functional pipelines ranging from educational examples to production-ready analyses
- Content: Multiple modules combined into analysis workflows of varying complexity
- Complexity Levels: Basic (2-3 modules), Intermediate (4-6 modules), Advanced (10+ modules)
- Testing: Integration tests verify modules work together seamlessly
- Usage: Templates for common workflows, learning examples, or production analyses
Thanks to GitHub URL imports, you can download and run any pipeline without cloning the entire repository:
# Download a pipeline and its example inputs
# Option 1: Use curl from the command line
curl -O https://raw.githubusercontent.com/getwilds/wilds-wdl-library/main/pipelines/ww-sra-star/ww-sra-star.wdl
curl -O https://raw.githubusercontent.com/getwilds/wilds-wdl-library/main/pipelines/ww-sra-star/inputs.json
# Option 2: Download directly from GitHub by navigating to the file and clicking the download button
# Modify inputs.json as necessary for your data, then run via the command line or PROOF's point-and-click interface
sprocket run ww-sra-star.wdl inputs.jsonIf you want to explore multiple components or contribute:
# Clone the repository
git clone https://github.com/getwilds/wilds-wdl-library.git
cd wilds-wdl-library
# Run a module test workflow (no inputs needed)
cd modules/ww-star
sprocket run testrun.wdl
# Run a pipeline (modify inputs.json as necessary)
cd ../../pipelines/ww-sra-star
sprocket run ww-sra-star.wdl inputs.jsonimport "https://raw.githubusercontent.com/getwilds/wilds-wdl-library/refs/heads/main/modules/ww-sra/ww-sra.wdl" as sra_tasks
import "https://raw.githubusercontent.com/getwilds/wilds-wdl-library/refs/heads/main/modules/ww-star/ww-star.wdl" as star_tasks
workflow my_analysis {
call sra_tasks.fastqdump { input: sra_id = "SRR12345678" }
call star_tasks.star_align_two_pass {
input: sample_data = { "name": "sample1", "r1": fastqdump.r1_end, "r2": fastqdump.r2_end }
}
}WILDS pipelines use GitHub URLs for imports, providing several advantages:
- No local cloning required: Use modules directly without downloading the repository
- Version control: Pin to specific commits or tags for reproducibility
- Easy updates: Switch between versions by changing the URL
- Modular usage: Import only the modules you need
All components are tested with multiple WDL executors:
- Sprocket: Modern WDL executor with enhanced features
- Cromwell: Production-grade workflow engine
- miniWDL: Lightweight local execution
Fred Hutch researchers can use PROOF to submit workflows directly to the on-premise HPC cluster. This provides a user-friendly interface for researchers unfamiliar with command-line tools while leveraging the power of the institutional computing resources.
Cromwell Configuration: PROOF users can customize workflow execution using Cromwell options. See cromwell-options.json for example configurations including call caching, output directories, and more. For detailed information, refer to the Cromwell workflow options documentation.
Platform-Specific Configurations: Some pipelines include optional platform-specific configurations (e.g., .cirro/ directories) for execution on cloud platforms like Cirro. These configurations are self-contained within each pipeline directory.
- Continuous Integration: All components tested on every pull request
- Multi-Executor Validation: Ensures compatibility across different WDL engines
- Real Data Testing: Uses authentic bioinformatics datasets for validation
- Scheduled Monitoring: Weekly checks detect infrastructure changes
- Standardized Structure: Consistent organization across all components
- Container Management: Versioned, tested Docker images from the WILDS Docker Library
- Documentation Standards: Comprehensive README files and inline documentation
- Version Control: Semantic versioning and careful dependency management
We welcome contributions at all levels:
- Focus on high-utility bioinformatics tools
- Follow the standard module structure
- Include comprehensive tests and validation
- Provide detailed documentation
- Combine existing modules (prefer existing modules over new tasks)
- Demonstrate common analysis patterns
- Include realistic test datasets
- Document complexity level and integration approaches
- Enhance existing README files
- Add usage examples and tutorials
- Improve inline code documentation
- Contribute to the WILDS documentation site
See our Contributing Guidelines for detailed information.
- Expanding the module collection with high-priority tools (GATK variant calling, additional alignment tools)
- Adding new pipelines across all complexity levels
- Enhancing testing infrastructure and validation
- Additional advanced pipelines for publication-ready analyses
- Enhanced integration with Fred Hutch computing infrastructure
- Community-contributed modules and pipelines
- Advanced documentation and tutorial content
- Issues and Bug Reports: GitHub Issues
- General Questions: Contact the Fred Hutch Office of the Chief Data Officer (OCDO) at wilds@fredhutch.org
- Documentation: Contributing Guidelines
- Fred Hutch Users: Scientific Computing Wiki
- WILDS Docker Library: Container images used by WDL workflows
- WILDS Documentation: Comprehensive guides and best practices
- Fred Hutch SciWiki: Institutional computing resources and tutorials
Distributed under the MIT License. See LICENSE for details.
