Add API for iterating over variants#234
Conversation
I think the filename is the correct choice here as this is going to be part of the external API, and I'd rather not have an explicit API dependency on Zarr (at least initially).
Agreed
This bit I don't like - why not just a list of strings? I don't think the extra bcftools sample selection syntax is particularly powerful, and it's easy to replicate with some extra API functions if we need to. |
I've now done this. I've also added some tests for the edge case where
I was about to change to a list of strings, but there is actually an extra complication due to the |
|
LGTM. As concerns the samples being formatted as a string or array, it seems the functionality for parsing the samples string is already in place ( |
Yes, it's used in the CLI where you can pass a bcftools-style string such as
I'm suggesting we start with the single string one, but could add more later if needed (e.g. a |
|
I don't like the way that this implies we have to do things like this is an anti-pattern IMO, and I don't think we should be entirely constrained by the CLI syntax when designing the Python API. How about we use parameter type checking to decide the appropriate action? So, for |
Sounds good - I've implemented this in the latest commit. |
Fixes #112
While writing this I noticed a few inconsistencies with
variant_chunk_iter:variant_chunk_itertakes arootZarr group, whereasvariant_iterin this PR takes a filename. Any preferences?variant_regionsandvariant_targets(invariant_chunk_iter) to justregionsandtargets(as I've done forvariant_iterin this PR).variant_chunk_itertakes asamples_selectionarray, whereasvariant_iterin this PR takes a bcftools-stylesamplesstring. That's probably OK as sometimes we want to parse samples before iterating (for query and view commands for example). We could easily support bothsamplesandsamples_selection(mutually exclusive) in both functions if needed.