Implementation of classic sequence alignment algorithms in Python for DNA sequence comparison.
This project implements the two fundamental algorithms for biological sequence alignment:
- Needleman-Wunsch (Global Alignment) - Aligns entire sequences end-to-end
- Smith-Waterman (Local Alignment) - Finds the best matching subsequences
- β Read sequences from FASTA files
- β Global alignment with Needleman-Wunsch algorithm
- β Local alignment with Smith-Waterman algorithm
- β Backtracking to extract optimal alignments
- β Score matrix visualization
- β Validated against EMBOSS Needle & Water tools
sequence-alignment-bioinformatics/
βββ alignment.py # Main Python script with all algorithms
βββ sequences.fasta # Sample DNA sequences in FASTA format
βββ README.md # This file
- Python 3.x
- NumPy
pip install numpypython alignment.pySΓ©quence 1: ATGCGTACGTTAGC
SΓ©quence 2: ATGCCGTCGTTAGG
============================================================
ALIGNEMENT GLOBAL (Needleman-Wunsch)
============================================================
Score final: 10
=== Alignement Global ===
Seq1: ATGCGTACGTTAGC
Seq2: ATGCCGTCGTTAGG
============================================================
ALIGNEMENT LOCAL (Smith-Waterman)
============================================================
Score maximal: 10
=== Alignement Local ===
Seq1: ATGCGTACGTTAG
Seq2: ATGCCGTCGTTAG
Default scoring scheme:
| Parameter | Value |
|---|---|
| Match | +1 |
| Mismatch | 0 |
| Gap | -1 |
You can modify these parameters in the function calls:
matrix = score_matrix_alignement_global(seq1, seq2, match=2, mismatch=-1, gap=-2)- Initializes borders with gap penalties
- Fills matrix using dynamic programming
- Backtracking from bottom-right corner
- Initializes borders with zeros
- Minimum score is 0 (never goes negative)
- Backtracking from maximum score position until reaching 0
Results validated against professional tools:
- EMBOSS Needle (Global)
- EMBOSS Water (Local)
EL ALEM YOUSSEF
This project is for educational purposes - Bioinformatics TP3.