Skip to content

Latest commit

 

History

History
462 lines (334 loc) · 15.8 KB

File metadata and controls

462 lines (334 loc) · 15.8 KB

Contributing to the WILDS Docker Library

Thank you for your interest in contributing to the WILDS Docker Library! This document provides guidelines for contributing Docker images, improvements, and documentation to our centralized collection of bioinformatics container infrastructure.

Table of Contents

Getting Started

Before contributing code changes, please:

  1. Fork the repository to your GitHub account

  2. Set up your development environment with the required tools:

  3. Make code changes and push them to your fork

  4. Submit a pull request (PR) to merge your contributions into the main branch of the original repo

    • The title of your PR should briefly describe the change
    • If your contribution resolves an issue, the body of your PR should contain Fixes #issue-number

Repository Structure

The WILDS Docker Library organizes each bioinformatics tool in its own directory:

wilds-docker-library/
├── toolname/
│   ├── Dockerfile_X.Y.Z        # Specific version
│   ├── Dockerfile_latest       # Most recent version
│   ├── README.md               # Tool documentation
│   └── CVEs_*.md              # Security vulnerability reports
├── template/
│   └── Dockerfile_template     # Template for new contributions
└── .github/
    └── workflows/              # CI/CD automation

Key conventions:

  • Each tool has its own directory (use lowercase names)
  • Dockerfiles use the naming pattern Dockerfile_VERSION (e.g., Dockerfile_1.19, Dockerfile_latest)
  • Every tool directory must include a comprehensive README.md
  • Vulnerability reports are auto-generated by CI/CD workflows

Types of Contributions

1. Bug Reports and Issues

  • Use the GitHub Issues page
  • Provide detailed information about the problem
  • Include error messages, Docker/system versions, and steps to reproduce
  • Tag issues appropriately (bug, enhancement, question, etc.)

2. Documentation Improvements

  • Fix typos, improve clarity, or add missing information
  • Enhance README files with better examples
  • Update outdated version information or usage instructions

3. New Docker Images

  • Add new images for bioinformatics tools
  • Follow our standardized Dockerfile structure and naming conventions
  • Include comprehensive testing and documentation

4. Version Updates

  • Add new versions of existing images for updated tools
  • Update existing Dockerfiles to address security vulnerabilities
  • Ensure backward compatibility or document breaking changes

5. Image Optimization

  • Reduce image sizes while maintaining functionality
  • Improve build times and layer caching
  • Add multi-platform support (linux/amd64 and linux/arm64)

Docker Image Development Guidelines

See our template Dockerfile as a comprehensive reference

Creating a New Tool Directory

  1. Create a directory named after the tool (use lowercase):

    mkdir toolname
    cd toolname
  2. Copy the template:

    cp ../template/Dockerfile_template Dockerfile_VERSION
  3. Customize the Dockerfile following the template's guidance

Dockerfile Requirements

Your Dockerfile must include:

  • Appropriate base image: Choose the minimal base that supports your tool (Ubuntu, Miniforge, Bioconductor, etc.)
  • Complete metadata labels: All required OCI labels with accurate information
  • Shell configuration: SHELL ["/bin/bash", "-o", "pipefail", "-c"] for better error handling
  • Pinned versions: All dependencies should use explicit versions for reproducibility
  • Smoke test: A simple RUN command to verify the tool installed correctly
  • Cleanup commands: Remove temporary files and caches to minimize image size

Required labels (update all fields for your tool):

LABEL org.opencontainers.image.title="toolname"
LABEL org.opencontainers.image.description="Docker image for TOOLNAME in FH DaSL's WILDS"
LABEL org.opencontainers.image.version="X.Y.Z"
LABEL org.opencontainers.image.authors="wilds@fredhutch.org"
LABEL org.opencontainers.image.url=https://ocdo.fredhutch.org/
LABEL org.opencontainers.image.documentation=https://getwilds.org/
LABEL org.opencontainers.image.source=https://github.com/getwilds/wilds-docker-library
LABEL org.opencontainers.image.licenses=MIT

Dockerfile Best Practices

Image Size:

  • Target size: A few hundred MB (2GB maximum)
  • Combine RUN commands to reduce layers
  • Remove build dependencies after compilation
  • Clean package manager caches (rm -rf /var/lib/apt/lists/*, mamba clean -afy, etc.)

Reproducibility:

  • Pin ALL software versions explicitly
  • Avoid latest tags in downloads and dependencies
  • Use apt-cache policy to get current security-patched versions of system packages
  • Document the exact source URLs and versions

Security:

  • Never include secrets, credentials, or sensitive data
  • Use minimal base images when possible
  • Follow the principle of least privilege
  • Let automated workflows scan for vulnerabilities

Platform Support:

  • Build for multi-platform (linux/amd64 and linux/arm64) by default
  • Our CI/CD automatically attempts multi-platform builds
  • If your tool has platform restrictions, document them clearly in the README
  • See existing AMD64-only images (BWA, DESeq2, HISAT2, etc.) for examples

Tool Focus:

  • One primary tool per image (maximum 1-2 closely related tools)
  • Include only necessary dependencies
  • If building a complex workflow, consider multiple separate images

Naming Conventions

Dockerfile naming:

  • Dockerfile_X.Y.Z for specific versions (e.g., Dockerfile_1.19)
  • Dockerfile_latest for the most current version
  • The version in the filename determines the Docker tag automatically

Tool directory naming:

  • Use lowercase
  • Use hyphens for multi-word tools (e.g., sra-tools, combine-counts)
  • Match the common name of the tool

Image tagging (handled automatically by CI/CD):

  • getwilds/toolname:X.Y.Z (from Dockerfile_X.Y.Z)
  • getwilds/toolname:latest (from Dockerfile_latest)
  • Images are pushed to both DockerHub and GitHub Container Registry

Testing Requirements

Local Testing (Required Before PR)

Before submitting a pull request, test your Docker images locally. You can test manually or use our automated Makefile (recommended).

Option 1: Automated Testing with Makefile (Recommended)

The repository includes a Makefile that automates linting and building for standardized testing. You'll need hadolint installed for linting.

Quick start - see all available commands:

make help

Test a specific image:

# Lint Dockerfiles in a specific tool directory
make lint IMAGE=toolname

# Build for AMD64 only
make build_amd64 IMAGE=toolname

# Build for ARM64 only
make build_arm64 IMAGE=toolname

# Build for both architectures
make build IMAGE=toolname

# Full validation: lint + build for both architectures
make validate IMAGE=toolname

# Clean up built images
make clean IMAGE=toolname

Test all images in the repository:

# Lint all Dockerfiles
make lint

# Build all images for both architectures
make build

# Full validation of all images
make validate

# Clean up all built images
make clean

Notes about the Makefile:

  • The Makefile automatically handles multi-platform builds (AMD64 and ARM64)
  • ARM64 builds skip tools listed in amd64_only_tools.txt
  • Images are tagged as getwilds/toolname:version-amd64 or getwilds/toolname:version-arm64
  • The template directory is automatically skipped
  • When building all images (IMAGE=*), the Makefile automatically prunes build cache and removes images after building to save disk space
  • Built images are labeled with built-by=makefile for easy cleanup

Option 2: Manual Testing

1. Build the image:

cd toolname
docker build -t test-toolname:VERSION -f Dockerfile_VERSION .

2. Verify functionality:

# Test basic functionality (adjust command for your tool)
docker run --rm test-toolname:VERSION toolname --version

# Test with real data (recommended)
docker run --rm -v /path/to/test-data:/data test-toolname:VERSION \
  toolname [appropriate-test-command]

3. Check image size:

docker images test-toolname:VERSION
# Should be a few hundred MB, max 2GB

4. Run security scan (if Docker Scout available):

docker scout cves test-toolname:VERSION

5. Test multi-platform build (optional but recommended):

docker buildx build --platform linux/amd64,linux/arm64 -t test-toolname:VERSION .

Automated Testing

All contributions must pass our automated CI/CD pipeline which runs via GitHub Actions:

  • Dockerfile linting: Validates Dockerfile syntax and best practices
  • Multi-platform builds: Attempts to build for both linux/amd64 and linux/arm64
  • Security scanning: Runs Docker Scout to identify vulnerabilities
  • Image publishing: Pushes successfully built images to registries
  • Documentation sync: Updates DockerHub descriptions from README files

The workflows automatically trigger when:

  • Dockerfiles are modified in a pull request or push to main
  • Monthly scheduled scans (first day of each month)
  • Manual workflow dispatch by maintainers

Documentation Standards

README Requirements

Each tool directory must include a README.md with:

1. Header section:

  • Tool name and description
  • Links to official tool documentation
  • Brief overview of what the tool does

2. Available versions:

  • Table or list of all supported versions
  • Indicate which is the latest version

3. Platform availability (if applicable):

  • Note if the image is AMD64-only or has platform restrictions
  • Explain why (e.g., "Contains x86-specific optimizations")

4. Image locations:

  • DockerHub: docker pull getwilds/toolname:VERSION
  • GHCR: docker pull ghcr.io/getwilds/toolname:VERSION

5. Usage examples:

  • Basic command examples with Docker
  • Basic command examples with Apptainer/Singularity
  • Real-world usage scenarios when possible

6. Installed components:

  • List all major tools and versions in the image
  • Note any additional utilities or dependencies

7. Security information:

  • Link to or mention the vulnerability reports in the directory
  • Note when last scanned

8. Contributing/Support:

README Template Structure

# Tool Name

Brief description of what this tool does.

[Link to official documentation](https://example.com)

## Available Versions

| Tag | Tool Version | Image Size |
|-----|--------------|------------|
| latest | X.Y.Z | XXX MB |
| X.Y.Z | X.Y.Z | XXX MB |

## Platform Availability

Available for: linux/amd64, linux/arm64
(Or note restrictions if AMD64-only)

## Usage

### Docker
\`\`\`bash
docker pull getwilds/toolname:latest
docker run --rm getwilds/toolname:latest toolname --version
\`\`\`

### Apptainer/Singularity
\`\`\`bash
apptainer pull docker://getwilds/toolname:latest
apptainer run toolname_latest.sif toolname --version
\`\`\`

## Installed Components

- Tool Name: vX.Y.Z
- Dependency1: vA.B.C

## Security

Vulnerability reports are available in this directory as `CVEs_*.md` files.
Images are scanned monthly and on each build.

## Contributing

See the [CONTRIBUTING.md](../.github/CONTRIBUTING.md) for guidelines.

Pull Request Process

After meeting the requirements above, submit a PR to merge your forked repo into main.

Before Submitting

  1. Test locally: Build and run your Docker image successfully
  2. Review checklist: Complete the Dockerfile best practices checklist in the template
  3. Update documentation: Ensure README.md is complete and accurate
  4. Clean up: Remove any test images, temporary files, or debugging code

PR Submission

  1. Create descriptive PR title:

    • New images: Add [ToolName] Docker image (vX.Y.Z)
    • Updates: Update [ToolName] to vX.Y.Z
    • Fixes: Fix [brief description] in [ToolName]
  2. Fill out PR template:

    • Describe what changed and why
    • List any new dependencies or breaking changes
    • Include testing performed
  3. Link related issues: Reference any GitHub issues your PR addresses

  4. Request reviews: Tag Emma Bishop (@emjbishop) or Taylor Firman (@tefirman)

Review Criteria

Your PR will be evaluated on:

  • Functionality: Does the image work as intended?
  • Testing: Have you tested the build and basic functionality?
  • Documentation: Is the README clear, complete, and accurate?
  • Standards compliance: Does it follow WILDS conventions and best practices?
  • Image quality: Is it appropriately sized and optimized?
  • Security: Are there any obvious security concerns?
  • Uniqueness: Does it avoid duplicating existing functionality?

After Submission

Once your PR is submitted:

  1. Automated workflows will build and test your images
  2. Security scans will run and generate reports
  3. Reviewers will provide feedback
  4. Address any requested changes
  5. Once approved and merged, images are automatically published to DockerHub and GHCR

Help for New Contributors

New contributors are welcome! If you're new to Docker or bioinformatics containers:

  • Start with our template Dockerfile which has extensive comments and examples
  • Review existing tool directories for real-world examples:
    • Simple compiled tool: samtools/
    • Java application: picard/
    • R/Bioconductor: deseq2/
    • Python environment: scanpy/
  • Check out Docker's best practices guide
  • Don't hesitate to ask questions in issues or via email
  • If you have a uw.edu or fredhutch.org email you can also ask questions in our fh-data Slack workspace
  • Consider starting with documentation contributions or version updates before adding entirely new tools

For more questions you can contact the Fred Hutch Data Science Lab at wilds@fredhutch.org

Code of Conduct

By participating in this project, you agree to abide by our Code of Conduct:

  • Be respectful: Treat all community members with respect and kindness
  • Be collaborative: Work together constructively and help others learn
  • Be inclusive: Welcome contributors from all backgrounds and experience levels
  • Be patient: Remember that everyone is learning and growing

Reporting Issues

If you experience or witness unacceptable behavior, please report it to wilds@fredhutch.org.

License

By contributing to this project, you agree that your contributions will be licensed under the MIT License. See the LICENSE file for details.


Thank you for contributing to WILDS! Your contributions help advance reproducible bioinformatics research for the entire community.