You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cookbook/02-alignments.md
+61-29Lines changed: 61 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,52 +5,83 @@ rss_descr = "Align a gene against a reference genome using BioAlignments.jl"
5
5
6
6
# Pairwise Alignment
7
7
8
-
On the most basic level, aligners take two sequences and use algorithms to try and "line them up"
8
+
On the most basic level, aligners take two sequences and use algorithms to try to "line them up"
9
9
and look for regions of similarity.
10
10
11
-
Pairwise alignment differs from multiple sequence alignment (MSA) because.
12
-
it only aligns two sequences, while MSA's align three or more.
13
-
In a pairwise alignment, there is one reference sequence, and one query sequence,
11
+
Pairwise alignment differs from multiple sequence alignment (MSA) because
12
+
it only aligns two sequences, while MSAs align three or more.
13
+
In a pairwise alignment, there is one reference sequence and one query sequence,
14
14
though this may not always be specified by the user.
15
15
16
16
17
17
### Running the Alignment
18
-
There are two main parameters for determining how we wish to perform our alignment:
18
+
There are two main parameters for determining how we want to perform our alignment:
19
19
the alignment type and score/cost model.
20
20
21
-
The alignment type specifies the alignment range (is the alignment local or global?)
21
+
The alignment type specifies the alignment range (local vs global alignment)
22
22
and the score/cost model explains how to score insertions and deletions.
23
23
24
24
#### Alignment Types
25
25
Currently, four types of alignments are supported:
26
-
- GlobalAlignment: global-to-global alignment
26
+
-`GlobalAlignment`: global-to-global alignment
27
27
- Aligns sequences end-to-end
28
28
- Best for sequences that are already very similar
29
-
- SemiGlobalAlignment: local-to-global alignment
30
-
- a modification of global alignment that allows the user to specify that gaps will be penalty-free at the beginning of one of the sequences and/or at the end of one of the sequences (more information can be found [here](https://www.cs.cmu.edu/~durand/03-711/2023/Lectures/20231001_semi-global.pdf)).
31
-
- LocalAlignment: local-to-local alignment
29
+
- All of query is aligned to all of reference
30
+
-`SemiGlobalAlignment`: local-to-global alignment
31
+
- A modification of global alignment that allows the user to specify that gaps are penalty-free at the beginning of one of the sequences and/or at the end of one of the sequences (more information can be found [here](https://www.cs.cmu.edu/~durand/03-711/2023/Lectures/20231001_semi-global.pdf)).
32
+
-`LocalAlignment`: local-to-local alignment
32
33
- Identifies high-similarity, conserved sub-regions within divergent sequences
33
34
- Can occur anywhere in the alignment matrix
34
-
- OverlapAlignment: end-free alignment
35
-
- a modification of global alignment where gaps at the beginning or end of sequences are permitted
35
+
- Maps the query sequence to the most similar region on the reference
36
+
-`OverlapAlignment`: end-free alignment
37
+
- A modification of global alignment where gaps at the beginning or end of sequences are permitted
36
38
37
-
Alignment type can also be a distance of two sequences:
38
-
-EditDistance
39
-
-LevenshteinDistance
40
-
-HammingDistance
39
+
The alignment type should be selected based on what is already known about the sequences the user is comparing:
40
+
-Are the two sequences very similar and we're looking for a couple of small differences?
41
+
-Is the query expected to be a nearly exact match within the reference?
42
+
-Are we looking at two sequences from wildly divergent organisms?
41
43
42
-
The alignment type should be selected based on what is already known about the sequences the user is comparing
43
-
(Are they very similar and we're looking for a couple of small differences?
44
-
Are we expecting the query to be a nearly exact match within the reference?).
45
-
and what you may be optimizing for
46
-
(Speed for a quick and dirty analysis?
47
-
Or do we want to use more resources to do a fine-grained comparison?).
48
44
49
-
Now that we have a good understanding of how `pairalign` works,
45
+
### Cost Model
46
+
47
+
The cost model provides a way to calculate penalties for differences between the two sequences,
48
+
and then finds the alignment that minimizes the total penalty.
49
+
`AffineGapScoreModel` is the scoring model currently supported by `BioAlignments.jl`.
50
+
It imposes an affine gap penalty for insertions and deletions,
51
+
which means that it penalizes the opening of a gap more than a gap extending.
52
+
This aligns (pun intended!!) with the biological principle that creating a gap is a rare event,
53
+
while extending an already existing gap is less so.
54
+
55
+
A user can also define their own `CostModel` instead of using `AffineGapScoreModel`.
56
+
This will allow the user to define their own scoring scheme for penalizing insertions, deletions, and substitutions.
57
+
58
+
After the cost model is defined, a distance metric is used to quantify and minimize the "cost" (difference) between the two sequences.
59
+
60
+
These distance metrics are currently supported:
61
+
-`EditDistance`
62
+
-`LevenshteinDistance`
63
+
-`HammingDistance`
64
+
65
+
This is a complicated topic, and more information can be found in the BioAlignments documentation about the cost model [here](https://biojulia.dev/BioAlignments.jl/stable/pairalign/).
66
+
67
+
Just like alignment type, the cost model should be selected based on what the user is optimizing for
68
+
and what is known about the two sequences.
69
+
70
+
71
+
### Calling BioAlignments to Run the Alignment
72
+
73
+
Now that we have a good understanding of how `pairalign` works, let's run an example!
0 commit comments