Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
This ToolUniverse skill enables systematic identification of causal variants from GWAS data using:
- Statistical fine-mapping (SuSiE, FINEMAP, etc.) to compute posterior probabilities
- Locus-to-gene (L2G) predictions to link variants to their likely causal genes
- Functional annotations from GWAS Catalog and Open Targets Genetics
- Integration of multiple data sources for comprehensive variant prioritization
from python_implementation import prioritize_causal_variants
# Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())
# Output:
# Query Gene: TCF7L2
# Credible Sets Found: 8
# Top Causal Genes:
# - TCF7L2 (L2G score: 0.863)- SKILL.md: Complete skill documentation with concepts, workflows, and examples
- QUICK_START.md: 5-minute getting started guide
- SKILL_TESTING_REPORT.md: Comprehensive testing results (100% pass rate)
Rank variants by posterior probability of being causal:
result = prioritize_causal_variants("APOE", "alzheimer")
for cs in result.credible_sets:
print(f"{cs.trait}: {cs.lead_variant.rs_ids[0]}")
print(f" Method: {cs.finemapping_method}")
print(f" Top gene: {cs.l2g_genes[0]}")Get all fine-mapped loci from a GWAS:
from python_implementation import get_credible_sets_for_study
credible_sets = get_credible_sets_for_study("GCST90029024")
print(f"Found {len(credible_sets)} independent loci")Find relevant GWAS studies:
from python_implementation import search_gwas_studies_for_disease
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
for study in studies:
print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")Get experimental validation suggestions:
suggestions = result.get_validation_suggestions()
# Suggests: CRISPR assays, eQTL analysis, colocalization, replication- Locus Prioritization: "Which variant at this locus is causal?"
- Gene Discovery: "Which genes does this variant affect?"
- Study Exploration: "What are all the T2D risk loci?"
- Validation Planning: "How should we experimentally validate this?"
- Meta-Analysis: "Compare fine-mapping across multiple studies"
✓ 100% Test Pass Rate (10/10 comprehensive tests)
Tested with real-world examples:
- APOE and Alzheimer's disease
- TCF7L2 and type 2 diabetes
- FTO and obesity
See SKILL_TESTING_REPORT.md for details.
- Get variant info and credible sets
- Study-level credible set queries
- Disease-based study search
- L2G predictions
- SNP search by gene/rsID
- Association queries
- Study metadata
No API keys required - all data is public.
pip install tooluniversepython_implementation.py: Main Python SDK implementationSKILL.md: Complete documentationQUICK_START.md: Quick referencetest_skill_comprehensive.py: Test suite (10 tests)SKILL_TESTING_REPORT.md: Testing reportREADME.md: This file
- Python 3.8+
- ToolUniverse >= 1.0.0
- No API keys needed
from python_implementation import (
search_gwas_studies_for_disease,
get_credible_sets_for_study,
prioritize_causal_variants
)
# Step 1: Find T2D studies
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
# Step 2: Get all loci
credible_sets = get_credible_sets_for_study(largest['id'])
# Step 3: Prioritize TCF7L2 variants
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
# Step 4: Generate report
print(result.get_summary())
for suggestion in result.get_validation_suggestions():
print(suggestion)# Start with a known variant
result = prioritize_causal_variants("rs429358") # APOE4
# Check all traits
print(f"Associated with {len(set(result.associated_traits))} traits")
# Find credible sets
for cs in result.credible_sets:
print(f"{cs.trait}: {cs.l2g_genes[0] if cs.l2g_genes else 'No gene'}")A minimal set of variants containing the causal variant with 95-99% probability. Each variant has a posterior probability of causality.
The probability that a specific variant is causal, given GWAS data and LD structure. Higher values indicate stronger candidates.
Locus-to-gene score (0-1) integrating distance, eQTLs, chromatin interactions, and functional annotations. Higher scores = stronger gene-variant links.
- SuSiE: Handles multiple causal variants per locus
- FINEMAP: Fast Bayesian stochastic search
- PAINTOR: Integrates functional annotations
Results validated against known biology:
- rs429358 (APOE4) → High L2G for APOE, Alzheimer's association ✓
- rs7903146 (TCF7L2) → Strong diabetes association, TCF7L2 top gene ✓
- rs9939609 (FTO) → BMI/obesity traits, intergenic variant ✓
This skill follows the ToolUniverse skill creation workflow:
- Domain analysis
- Tool testing
- Implementation
- Documentation
- Comprehensive testing
MIT License - see ToolUniverse main repository
If you use this skill, please cite:
- Open Targets Genetics: Ghoussaini et al. (2021) Nature Genetics
- GWAS Catalog: Sollis et al. (2023) Nucleic Acids Research
- SuSiE: Wang et al. (2020) JRSS-B
- L2G method: Mountjoy et al. (2021) Nature Genetics
- Documentation: See SKILL.md
- Issues: Create GitHub issue in ToolUniverse repository
- Questions: Check QUICK_START.md
- Initial release
- 10 comprehensive tests (100% pass rate)
- Support for gene and variant queries
- Study-level analysis
- Validation suggestions
- Complete documentation