Skip to content

Commit 7608055

Browse files
committed
Add FermiLink skills for gwaslab
1 parent 6e2527a commit 7608055

File tree

33 files changed

+1915
-0
lines changed

33 files changed

+1915
-0
lines changed

skills/.compile_report.json

Lines changed: 472 additions & 0 deletions
Large diffs are not rendered by default.

skills/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.evidence/
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
name: gwaslab-advanced-topics
3+
description: Use this skill for consolidated low-frequency GWASLab topics (reference-data management, VCF/FASTA/GTF IO, reserved headers, ancestry inference, update logs, and utility docs) that were merged to avoid one-doc routing fragmentation.
4+
---
5+
6+
# gwaslab: Advanced Topics
7+
8+
## High-Signal Playbook
9+
### Route Conditions
10+
- Use this skill for specialized references/utilities not covered by core workflow skills.
11+
- Route to `gwaslab-simulation-workflows` for end-to-end harmonization pipelines.
12+
- Route to `gwaslab-inputs-and-modeling` for generic format/header mapping.
13+
14+
### Canonical Workflow
15+
1. Open the exact topic doc first (`Download`, `Reference`, VCF/FASTA/GTF, `InferAncestry`, `reserved_header`).
16+
2. Run the smallest reproducible check using local test/reference data.
17+
3. Escalate to `references/source_map.md` only for implementation-level behavior.
18+
19+
### Minimal Checks
20+
```python
21+
import gwaslab as gl
22+
23+
print(gl.check_available_ref(show_all=False, verbose=False))
24+
print(gl.get_path("1kg_eas_hg19"))
25+
```
26+
27+
```bash
28+
pytest -q test/test_read_fasta_chromosome_formats.py test/test_read_gtf_chromosome_formats.py
29+
```
30+
31+
### Validation Checkpoints
32+
- Reference key lookup returns a non-empty path or a clear missing-reference message.
33+
- FASTA/GTF/VCF readers load expected chromosome naming styles.
34+
- Reserved-header checks match expected column contracts.
35+
36+
## Scope
37+
- Consolidated routing for specialized topics that each had narrow single-doc coverage.
38+
- Keeps uncommon but important reference and utility content discoverable without fragmented routing.
39+
40+
## Route the request
41+
- Reference asset catalogs/download paths/configuration -> `docs/Download.md`, `docs/Download_reference.md`, `docs/Reference.md`, `docs/CommonData.md`
42+
- File-format internals (VCF/FASTA/GTF) -> `docs/VCF.md`, `docs/FASTA.md`, `docs/GTF.md`
43+
- Data model/header contracts -> `docs/SumstatsObject.md`, `docs/reserved_header.md`
44+
- Population and panel utilities -> `docs/InferAncestry.md`, `docs/Hapmap3.md`
45+
- Lead/novel utility details -> `docs/utility_get_lead_novel.md`
46+
- Release and method reference context -> `docs/UpdateLogs.md`, `docs/PaperReference.md`
47+
48+
## Workflow
49+
- Start with the exact doc matching the specialized request.
50+
- Use `references/doc_map.md` for full inventory and headings.
51+
- Escalate to `references/source_map.md` only when behavior details are not explicit in docs.
52+
- Cite exact documentation paths.
53+
54+
## Source entry points for unresolved issues
55+
- `src/gwaslab/bd/bd_download.py`
56+
- `src/gwaslab/bd/bd_common_data.py`
57+
- `src/gwaslab/bd/bd_get_hapmap3.py`
58+
- `src/gwaslab/io/io_vcf.py`
59+
- `src/gwaslab/io/io_fasta.py`
60+
- `src/gwaslab/io/io_gtf.py`
61+
- `src/gwaslab/qc/qc_reserved_headers.py`
62+
- `src/gwaslab/util/util_ex_infer_ancestry.py`
63+
- `src/gwaslab/util/util_in_get_sig.py`
64+
- `src/gwaslab/g_Sumstats.py`
65+
- Prefer targeted search: `rg -n "<symbol_or_keyword>" src/gwaslab`
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# gwaslab documentation map: Advanced Topics
2+
3+
Generated from documentation roots:
4+
- `docs`
5+
- `examples`
6+
- `test`
7+
8+
Total docs grouped in this topic: 14
9+
10+
## File inventory
11+
- `docs/CommonData.md` | title: Commonly used data in GWASLab | headings: Commonly used data in GWASLab; Chromosome notation conversion dictionary; Full chromosome list
12+
- `docs/Download.md` | title: Download reference data | headings: Download reference data; Downloading and file management system; Configurations
13+
- `docs/Download_reference.md` | title: Download reference data | headings: Download reference data; Check available reference data; Check downloaded reference data
14+
- `docs/FASTA.md` | title: FASTA I/O in GWASLab | headings: FASTA I/O in GWASLab; Supported File Formats; Reading FASTA Files
15+
- `docs/GTF.md` | title: GTF I/O in GWASLab | headings: GTF I/O in GWASLab; Supported File Formats; Reading GTF Files
16+
- `docs/Hapmap3.md` | title: Hapmap3 SNPs in GWASLab | headings: Hapmap3 SNPs in GWASLab
17+
- `docs/InferAncestry.md` | title: Infer Ancestry | headings: Infer Ancestry; How It Works; .infer_ancestry()
18+
- `docs/Reference.md` | title: Reference data for handling Sumstats | headings: Reference data for handling Sumstats; Reference genome sequence for variant allele alignment; Processed Reference files for harmonization
19+
- `docs/reserved_header.md` | title: Reserved headers | headings: Reserved headers
20+
- `docs/SumstatsObject.md` | title: Sumstats Object in GWASLab | headings: Sumstats Object in GWASLab; gl.Sumstats(); Options
21+
- `docs/PaperReference.md` | title: Academic Paper References | headings: Academic Paper References; GWASLab; Methods Implemented
22+
- `docs/UpdateLogs.md` | title: Update Logs | headings: Update Logs; v4.1.1 20260208; v4.1.0 20260205
23+
- `docs/utility_get_lead_novel.md` | title: Lead and novel variants | headings: Lead and novel variants; Load sample data; Get lead variants
24+
- `docs/VCF.md` | title: VCF I/O in GWASLab | headings: VCF I/O in GWASLab; Supported File Formats; VCF File Detection and Chromosome Handling
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# gwaslab source map: Advanced Topics
2+
3+
Generated from source roots:
4+
- `src`
5+
6+
Use this map only after exhausting the topic docs in `doc_map.md`.
7+
8+
## Topic query tokens
9+
- `download_ref`
10+
- `check_available_ref`
11+
- `get_path`
12+
- `vcf`
13+
- `fasta`
14+
- `gtf`
15+
- `reserved_header`
16+
- `infer_ancestry`
17+
- `hapmap3`
18+
- `get_lead`
19+
- `get_novel`
20+
21+
## Fast source navigation
22+
- `rg -n "<symbol_or_keyword>" src/gwaslab`
23+
- `rg -n "^(def|class)\s+" src/gwaslab/bd/bd_download.py src/gwaslab/io/io_vcf.py src/gwaslab/io/io_fasta.py src/gwaslab/io/io_gtf.py`
24+
25+
## Suggested source entry points
26+
- `src/gwaslab/bd/bd_download.py` | key functions: `check_available_ref`, `check_downloaded_ref`, `get_path`, `download_ref`
27+
- `src/gwaslab/bd/bd_common_data.py` | key functions: `get_chr_to_NC`, `get_recombination_rate`, `get_chain`, `get_format_dict`
28+
- `src/gwaslab/bd/bd_get_hapmap3.py` | key function: `_get_hapmap3`
29+
- `src/gwaslab/io/io_vcf.py` | key functions: `is_vcf_file`, `check_vcf_chr_prefix`, `_get_ld_matrix_from_vcf`
30+
- `src/gwaslab/io/io_fasta.py` | key functions: `parse_fasta`, `load_fasta_auto`, `load_fasta_filtered`, `get_fasta_record`
31+
- `src/gwaslab/io/io_gtf.py` | key functions: `read_gtf`, `get_gtf`, `gtf_to_protein_coding`, `gtf_to_all_gene`
32+
- `src/gwaslab/qc/qc_reserved_headers.py` | key functions: `get_default_sanity_ranges`, `_get_headers`, `_check_overlap_with_reserved_keys`
33+
- `src/gwaslab/util/util_ex_infer_ancestry.py` | key functions: `_infer_ancestry`, `calculate_fst`
34+
- `src/gwaslab/util/util_in_get_sig.py` | key functions: `_get_sig`, `_get_novel`, `_check_novel_set`
35+
- `src/gwaslab/g_Sumstats.py` | key methods: `infer_ancestry`, `filter_hapmap3`, `get_lead`, `get_novel`
36+
37+
## Function-level behavior checks
38+
- `pytest -q test/test_path_manager.py test/test_infer_build_and_hapmap3.py`
39+
- `pytest -q test/test_read_fasta_chromosome_formats.py test/test_read_gtf_chromosome_formats.py`
40+
- `pytest -q test/test_get_lead_top.py test/test_get_novel.py`
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
name: gwaslab-analysis-and-output
3+
description: Use this skill for GWASLab visualization and reporting outputs, including mqq/regional/stacked/miami workflows, lead-variant annotations, and export/report patterns.
4+
---
5+
6+
# gwaslab: Analysis and Output
7+
8+
## High-Signal Playbook
9+
### Route Conditions
10+
- Use this skill for plot generation, output tuning, and report-ready artifacts.
11+
- Route to `gwaslab-getting-started` if data is not yet loadable/QCed.
12+
- Route to `gwaslab-simulation-workflows` for harmonization or clumping prerequisites.
13+
- Route to `gwaslab-inputs-and-modeling` for format-conversion-only requests.
14+
15+
### Triage Questions
16+
- Which figure is required: `mqq`, `regional`, `stacked`, `miami`, correlation heatmap?
17+
- Are input objects already QCed and build-tagged?
18+
- Is region-specific LD annotation needed (VCF/recombination/GTF)?
19+
- Are there one or two cohorts (for Miami/stacked comparisons)?
20+
- Is the issue styling/parameter tuning or plotting failure?
21+
- What final artifact is needed: figure only, lead table, or formatted output package?
22+
23+
### Canonical Workflow
24+
1. Confirm input data quality and build before plotting (`docs/tutorial_v4.md`, `docs/visualization_mqq.md`).
25+
2. Generate baseline `plot_mqq()` with `skip/cut` to control scale/performance (`docs/visualization_mqq.md`).
26+
3. For locus detail, use `plot_mqq(mode="r", region=...)` with optional LD/gene tracks (`docs/visualization_regional.md`).
27+
4. For comparisons, use `gl.plot_stacked_mqq()` or `gl.plot_miami2()` with `Sumstats` objects (`docs/visualization_stacked_mqq.md`, `docs/visualization_miami2.md`).
28+
5. Extract/annotate lead variants with `get_lead()` and export with `to_format()` (`docs/utility_get_lead_novel.md`, `docs/Format.md`).
29+
6. Cross-check known plotting edge cases in `KnownIssues` when outputs look inconsistent (`docs/KnownIssues.md`).
30+
31+
### Minimal Working Example
32+
```python
33+
import gwaslab as gl
34+
35+
s = gl.Sumstats("sumstats.tsv.gz", fmt="auto", build="19")
36+
s.basic_check(verbose=False)
37+
38+
s.plot_mqq(skip=2, cut=20)
39+
s.plot_mqq(mode="r", region=(7, 156538803, 157538803))
40+
lead = s.get_lead(sig_level=5e-8)
41+
s.to_format("analysis_ready", fmt="gwaslab")
42+
```
43+
44+
### Pitfalls and Fixes
45+
- `gl.plot_miami2()` called with file paths: pass `Sumstats` objects for `path1` and `path2` (`docs/visualization_miami2.md`).
46+
- Regional plots missing LD context: provide reference VCF and matching build (`docs/visualization_regional.md`).
47+
- Plotting full datasets is too slow/noisy: increase `skip` and use `cut`/`mode` adjustments (`docs/visualization_mqq.md`).
48+
- Unexpected lead extraction in older versions: use updated package and check `KnownIssues` notes (`docs/KnownIssues.md`).
49+
- Export formatting confusion after plotting: use explicit `to_format()` options and naming rules (`docs/Format.md`).
50+
51+
### Convergence and Validation Checks
52+
- Figures render with expected panel/layout mode and no build mismatch warnings.
53+
- Lead-variant counts are consistent with chosen significance/window settings.
54+
- Annotation labels and highlighted loci map to expected SNPs/genes.
55+
- Exported tables/files reopen cleanly and preserve plotted variant identifiers.
56+
57+
## Scope
58+
- Visualization and output generation from QC-ready GWAS summary statistics.
59+
- Includes parameter tuning and output packaging.
60+
61+
## Primary documentation references
62+
- `docs/visualization_mqq.md`
63+
- `docs/visualization_regional.md`
64+
- `docs/visualization_stacked_mqq.md`
65+
- `docs/visualization_miami2.md`
66+
- `docs/visualization_plot_genetic_correlation.md`
67+
- `docs/Format.md`
68+
- `docs/format_load_save.md`
69+
- `docs/KnownIssues.md`
70+
71+
## Workflow
72+
- Start with docs examples for the requested plot mode.
73+
- Use `references/doc_map.md` for wider option inventory.
74+
- Escalate to source maps only when parameter behavior is unclear.
75+
76+
## Source entry points for unresolved issues
77+
- `src/gwaslab/viz/viz_plot_mqqplot.py`
78+
- `src/gwaslab/viz/viz_plot_regional2.py`
79+
- `src/gwaslab/viz/viz_plot_stackedregional.py`
80+
- `src/gwaslab/viz/viz_plot_stackedpanel.py`
81+
- `src/gwaslab/viz/viz_plot_miamiplot2.py`
82+
- `src/gwaslab/viz/viz_plot_rg_heatmap.py`
83+
- `src/gwaslab/viz/viz_aux_quickfix.py`
84+
- `src/gwaslab/viz/viz_aux_save_figure.py`
85+
- `src/gwaslab/util/util_in_get_sig.py`
86+
- `src/gwaslab/io/io_to_formats.py`
87+
- Prefer targeted search: `rg -n "<symbol_or_keyword>" src/gwaslab/viz src/gwaslab/util src/gwaslab/io`
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# gwaslab documentation map: Analysis and Output
2+
3+
Generated from documentation roots:
4+
- `docs`
5+
- `examples`
6+
- `test`
7+
8+
Total docs grouped in this topic: 12
9+
10+
## File inventory
11+
- `docs/Visualization.md` | title: Manhattan plot and QQ plot | headings: .plot_mqq(); options
12+
- `docs/visualization_mqq.md` | title: Manhattan and Q-Q plot | headings: Load data into Sumstats Object; .plot_mqq()
13+
- `docs/visualization_regional.md` | title: Regional plot | headings: region mode; LD annotation
14+
- `docs/visualization_stacked_mqq.md` | title: Stacked Manhattan and regional plot | headings: gl.plot_stacked_mqq()
15+
- `docs/visualization_miami2.md` | title: Miami plot | headings: gl.plot_miami2(); options
16+
- `docs/visualization_plot_genetic_correlation.md` | title: Correlation heatmap | headings: Full heatmap; selected traits
17+
- `docs/EffectSize.md` | title: Comparing effect sizes | headings: gl.compare_effect(); scatter options
18+
- `docs/Format.md` | title: Output sumstats in certain formats | headings: .to_format(); options
19+
- `docs/format_load_save.md` | title: Input and output sumstats | headings: output handling
20+
- `docs/utility_get_lead_novel.md` | title: Lead and novel variants | headings: get_lead; get_novel
21+
- `docs/KnownIssues.md` | title: Known issues | headings: plotting and lead extraction caveats
22+
- `docs/Gallery.md` | title: GWASLab Gallery | headings: Manhattan/Q-Q; regional; other examples
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# gwaslab source map: Analysis and Output
2+
3+
Generated from source roots:
4+
- `src`
5+
6+
Use this map only after exhausting the topic docs in `doc_map.md`.
7+
8+
## Topic query tokens
9+
- `plot_mqq`
10+
- `plot_region`
11+
- `plot_stacked_mqq`
12+
- `plot_miami2`
13+
- `plot_rg`
14+
- `compare_effect`
15+
- `get_lead`
16+
- `get_novel`
17+
- `to_format`
18+
19+
## Fast source navigation
20+
- `rg -n "<symbol_or_keyword>" src/gwaslab/viz src/gwaslab/util src/gwaslab/io`
21+
- `rg -n "^\s+def\s+(plot_mqq|plot_region|get_lead|get_novel|to_format)" src/gwaslab/g_Sumstats.py`
22+
23+
## Suggested source entry points
24+
- `src/gwaslab/g_Sumstats.py` | key methods: `plot_mqq`, `plot_region`, `get_lead`, `get_novel`, `to_format`
25+
- `src/gwaslab/viz/viz_plot_mqqplot.py` | key function: `_mqqplot`
26+
- `src/gwaslab/viz/viz_plot_regional2.py` | key functions: `_plot_regional`, `process_vcf`, `process_ld`, `process_gtf`
27+
- `src/gwaslab/viz/viz_plot_stackedregional.py` | key function: `plot_stacked_mqq`
28+
- `src/gwaslab/viz/viz_plot_miamiplot2.py` | key function: `plot_miami2`
29+
- `src/gwaslab/viz/viz_plot_rg_heatmap.py` | key function: `plot_rg`
30+
- `src/gwaslab/viz/viz_plot_compare_effect.py` | key function: `compare_effect`
31+
- `src/gwaslab/viz/viz_aux_save_figure.py` | key functions: `save_figure`, `get_default_path`
32+
- `src/gwaslab/util/util_in_get_sig.py` | key functions: `_get_sig`, `_get_novel`, `_anno_gene`
33+
- `src/gwaslab/io/io_to_formats.py` | key functions: `_to_format`, `_write_tabular`
34+
- `src/gwaslab/view/view_report.py` | key function: `generate_qc_report`
35+
36+
## Function-level behavior checks
37+
- `pytest -q test/test_viz_mqqplot.py test/test_viz_related_plots.py test/test_viz_panel.py`
38+
- `pytest -q test/test_get_lead_top.py test/test_get_novel.py`
39+
- `pytest -q test/test_read_ldsc_plot_rg.py test/test_viz_compare_effect.py`
40+
- `pytest -q test/test_io_to_formats.py test/test_report.py`
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
name: gwaslab-api-and-scripting
3+
description: Use this skill for GWASLab Sumstats API lifecycle questions, command-line automation, and Python/CLI interoperability in scripted pipelines.
4+
---
5+
6+
# gwaslab: API and Scripting
7+
8+
## High-Signal Playbook
9+
### Route Conditions
10+
- Use this skill for API/CLI method mapping, batch automation, and script reproducibility.
11+
- Route to `gwaslab-getting-started` for first-run basics.
12+
- Route to `gwaslab-inputs-and-modeling` for schema and export-only questions.
13+
- Route to `gwaslab-simulation-workflows` for harmonization/clumping/downstream pipeline design.
14+
15+
### Canonical Workflow
16+
1. Load with `gl.Sumstats(...)` and run `basic_check()` when needed (`docs/SumstatsObject.md`, `docs/CLI.md`).
17+
2. Apply analysis/action methods (`harmonize`, `get_lead`, `to_format`) in deterministic order.
18+
3. Convert the same flow to CLI for batch reproducibility (`docs/CLIWorkflowExamples.md`).
19+
4. Validate API/CLI parity on a small subset before full runs.
20+
21+
### Minimal API/CLI Parity Check
22+
```python
23+
import gwaslab as gl
24+
25+
s = gl.Sumstats("test/raw/dirty_sumstats.tsv", fmt="gwaslab")
26+
s.basic_check(remove=True, remove_dup=True, verbose=False)
27+
out = s.get_lead(sig_level=5e-8)
28+
print(len(s.data), len(out))
29+
```
30+
31+
```bash
32+
gwaslab --input test/raw/dirty_sumstats.tsv --fmt gwaslab --qc --remove --remove-dup --out tmp_api_cli_check --to-fmt gwaslab
33+
```
34+
35+
### Validation Checkpoints
36+
- API and CLI both complete without parser/runtime errors.
37+
- Expected core columns remain after `basic_check()`.
38+
- Output file from CLI exists and reloads via `gl.Sumstats(..., fmt="gwaslab")`.
39+
40+
## Scope
41+
- Programmatic `gl.Sumstats` lifecycle: load, QC/harmonize, analysis methods, and output.
42+
- CLI command construction and translation between API and CLI workflows.
43+
- Script-level reproducibility and batch processing patterns.
44+
45+
## Primary documentation references
46+
- `docs/SumstatsObject.md`
47+
- `docs/CLI.md`
48+
- `docs/CLIWorkflowExamples.md`
49+
- `docs/format_load_save.md`
50+
- `docs/tutorial_v4.md`
51+
52+
## Workflow
53+
- Start with docs above and map requested operation to API or CLI primitive.
54+
- Prefer API for in-memory chaining/custom logic; prefer CLI for reproducible batch jobs.
55+
- Use examples in `docs/CLIWorkflowExamples.md` for multi-step scripting templates.
56+
- If docs are insufficient, inspect `references/doc_map.md`, then `references/source_map.md`.
57+
- Cite exact doc paths.
58+
59+
## API lifecycle anchors
60+
- `gl.Sumstats(...)` -> load/normalize input schema.
61+
- `basic_check(...)` / `harmonize(...)` -> core data-processing stages.
62+
- `get_lead(...)`, `plot_mqq(...)`, `clump(...)` -> analysis utilities.
63+
- `to_format(...)` / `to_pickle(...)` -> persistence and interchange.
64+
65+
## Source entry points for unresolved issues
66+
- `src/gwaslab/g_Sumstats.py`
67+
- `src/gwaslab/CLI/cli.py`
68+
- `src/gwaslab/__init__.py`
69+
- `src/gwaslab/io/io_preformat_input.py`
70+
- `src/gwaslab/io/io_to_formats.py`
71+
- `src/gwaslab/hm/hm_harmonize_sumstats.py`
72+
- `src/gwaslab/util/util_in_fill_data.py`
73+
- Prefer targeted search: `rg -n "<symbol_or_keyword>" src/gwaslab`
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# gwaslab documentation map: API and Scripting
2+
3+
Generated from documentation roots:
4+
- `docs`
5+
- `examples`
6+
- `test`
7+
8+
Total docs grouped in this topic: 5
9+
10+
## File inventory
11+
- `docs/SumstatsObject.md` | title: Sumstats Object in GWASLab | headings: Sumstats Object in GWASLab; gl.Sumstats(); Options
12+
- `docs/CLI.md` | title: Command Line Interface (CLI) | headings: Command Line Interface (CLI); Basic Usage; Quick Examples
13+
- `docs/CLIWorkflowExamples.md` | title: CLI Workflow Examples | headings: CLI Workflow Examples; Example 1: Basic QC Pipeline; Example 2: Harmonization Pipeline
14+
- `docs/format_load_save.md` | title: Input and output sumstats | headings: Input and output sumstats; Input; Loading data
15+
- `docs/tutorial_v4.md` | title: Tutorial for gwaslab 4.0.0 | headings: Tutorial for gwaslab 4.0.0; Loading data into gwaslab Sumstats; Harmonization

0 commit comments

Comments
 (0)