All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Deprecate functions
n_ambiguous,n_gapsandn_certain. Instead, use the equivalent methodscount(f, seq)with the appropriate functionf. - Deprecate method
Base.count(::Function, ::BioSequence, ::BioSequence), and the other methods ofcountwhich are subtypes of this. - Deprecate use of functions
matchesandmismatcheswhere the input seqs have different lengths. - Optimise
count(==(biosymbol), biosequence)andcount(==(biosymbol), biosequence) - Optimise contruction of
LongSequencenucleotide sequences from sequences with a different bit-number (e.g. two-bit seqs from four-bit seqs)
- Add functions
bioseqandguess_alphabetto easily construct a biosequence of an unknown alphabet from e.g. a string. - Relax requirement of
decode, such that it no longer needs to check for invalid data. Note that this change is not breaking, since it is not possible for correctly-implementedAlphabetandBioSequenceto store invalid data.
- Dropped support for Julia versions older than 1.10.0
- Added a 'Recipes' page to the documentation
- Add new genetic code:
blepharisma_macronuclear_genetic_code - Improve documentation of sequence count methods and sequence string literals
- Various performance improvements to counting,
ExactSearchQueryandispalindromic
- The heuristics for translating sequences with ambiguous symbols is now improved.
Now,
translatedoes not rely on heuristics but uses an algorithm that always returns exactly the right amino acid in the face of ambiguous nucleotides.
- Attempting to translate a nucleotide sequence with gap symbols now throws an error (#278, see #277)
- Migrate from SnoopPrecompile to PrecompileTools (#273)
- Improve error when mis-encoding
LongDNAfrom byte-like inputs (#267) - Remove references to internal
Random.GLOBAL_RNG(#265)
- Fix bug in converting
LongSubSeqtoLongSequence(#261)
- Add
iteratemethod forAlphabets(#233) - Add SnoopPrecompile workload and dependency on SnoopPrecompile (#257)
- Add
rand!([::AbstractRNG], ::LongSequence, [::Sampler])methods
- It is now possible to
joinBioSymbols into a BioSequence. - Add
findallmethods toBioSequence
Release has been yanked from General Registry
- Removed
unsafe_setindex!. Instead, use normal setindex with@inbounds. - Removed minhashing functionality - see package MinHash.jl
- Removed composition functionality - see package Kmers.jl
- Removed ReferenceSequence functionality
- Removed demultiplexer functionality
- Removed kmer functionality - this is moved to Kmers.jl
- Removed VoidAlphabet and CharAlphabet
- Removed ConditionIterator
- Added type
LongSubSeq, a view into aLongSequence. - Added method
translate!(::LongAminoAcidSeq, ::LongNucleotideSeq; kwargs...) - Added method
join(::Type{T<:BioSeuence}, it)to join an iterable of biosequences to a new instance of T. - Added method
join!(s::BioSequence, it), an in-place version ofjoin
LongSequenceis no longer copy-on-write. For views, useLongSubSeq.- Renamed
LongAminoAcidSeq->LongAA,LongNucleotideSeq->LongNucLongRNASeq->LongRNAandLongDNASeq->LongDNA - The interface for
AlphabetandBioSequenceis now more clearly defined, documented, and tested. - The constructor
LongSequence{A}(::Integer)has been removed in favor ofLongSequence{A}(undef, ::Integer). - Biological sequences can no longer be converted to/from strings and vectors.
- Updated the element and substring search API to conform to
Base.find*patterns.
- Fixed syntax errors where functions were marked with
@inboundsinstead of@inline.
- New subtypes of Random.Sampler, SamplerUniform and SamplerWeighted.
- Random
LongSequences can now be created withrandseq, optionally using a sampler to specify element distribution. - All random
LongSequencegenerator methods take an optional AbstractRNG argument. - Add methods to
randseqto optimize random generation ofNucleicAcidorAminoAcidLongSequences. - BioGenerics is now a dependency - replaces BioCore.
- A
SkipmerFactoryiterator that allows iteration over the Skipmers in a nucleotide sequence. A Skipmer is aMer(see changed below), that is generated using a certain cyclic nucleotide sampling pattern. See this paper for more details. - A
BigMerparametric primitive type has been added, that has the same functionality asMer(see changed section), but uses 128 bits instead of 64. - An abstract parametric type called
AbstractMerhas been added to unifyMerandBigMer. - Generators of bit-parallel iteration code have been introduced to help developers write bitparallel implementations of some methods. Counting GC content, matches and mismatches have been migrated to use these generators.
- Added
occursinmethods for exact matching.
- The abstract
Sequencetype is now calledBioSequence{A}. - The type previously called
BioSequence{A}is nowLongSequence{A}. Kmersare now a parametric primitive type:Mer{A<:NucleicAcidAlphabet{2},K}.unsafe_setindex!has been made systematic for allsetindexmethods as a way of bypassing all bound checking andorphan!calls.- Kmer string literals have been updated, they are now
mer""string literals, and they have a flag to enforce the type ofMere.g.:mer"ATCG"dna,mer"AUCG"rna - No longer use an old version of Twiddle and deprecated functions.
- Using
Base.countwith certain functions and sequence combinations dispatches to highly optimized bit-parallel implementations, falling back to a default naive counting loop by default for all other predicate-sequence combinations. - No more implicit conversion from strings to biological sequences. The
Base.convertmethods have been renamed toBase.parsemethods.
- The FASTQ module.
- The FASTA module.
- The TwoBit module.
- The ABIF module.
- BioCore is no longer a dependency.
- Automa is no longer a dependency.
- Automatic conversion of
LongDNASeqtoLongRNASeqwhen translating sequences. - Add
alternative_startkeyword argument to translate(). - Add abstract type for kmer iterators.
- 🐎 Faster kmer iteration.
- Fixed indexing in ABIF records.
1.0.0 - 2018-08-23
- Issue and PR templates.
- Code of Conduct and Contributing files.
- A changelog file.
- Support for julia v0.7 and v1.0.
- ❗ Support for julia v0.6.
0.8.3 - 2018-02-28
- Fix the
sequencemethod so as the sequence type can be specified, allowing type-stable efficient code generation.
0.8.2 - 2018-02-19
- A bug fix for
FASTA.Recordwriting where the width parameter of aFASTA.Writeris less than or equal to zero.
0.8.1 - 2017-11-10
- Update documentation generation.
- Fixes to type definition keywords.
- Bit-parallel GC counting.
0.8.0 - 2017-08-16
- Position weight matrix search functionality.
- A generalised composition method.
typeminandtypemaxmethods forKmertypes.
MinHashfunction now generalised toReadertypes.- Updates to doc tests.
0.7.0 - 2017-07-28
- Support for julia v0.6 only.
- ❗ Dropped support for julia v0.5.
0.6.3 - 2017-07-06
- Iterators.jl is not longer used as a dependency in favour of Itertools.jl.
0.6.1 - 2017-06-20
- Bug-fix for site-counting algorithm.
0.6.0 - 2017-06-14
- ⬆️ Compatibility with julia v0.6.
- The
ungapandungap!methods, that are shorthand for filtering gaps from biological sequences.
- Bug fixes for Kmer iteration that were caused by gaps in 4-bit encoded sequences.
0.5.0 - 2017-06-07
- All files pertaining to the old Bio.Seq module.