Skip to content

SvichkarevAnatoly/Bioinformatics-DNA-Motifs-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

398 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status codecov.io Code Health Codacy Badge

Bioinformatics-DNA-Motifs-Search

Requirements

Quickstart (for Linux)

Assuming git, Python 2.7 installed:

git clone https://github.com/jhkorhonen/MOODS.git
cd MOODS/src
make
cd ../python
python setup.py install

pip install biopython

git clone https://github.com/bbcf/bbcflib.git
cd bbcflib/
python setup.py install
cd path/to/Bioinformatics-DNA-Motifs-Search/src/utils

Contain utilities

Usage

usage: bed_center_extender.py [-h] [-l LENGTH] [-o OUTFILE] bedfile

Central extension each interval of the specified file to the same length

positional arguments:
bedfile               file with bed format intervals

optional arguments:
-h, --help            show this help message and exit
-l LENGTH, --length LENGTH
                    common extended length. If not specified, is extended
                    to the maximum length of the interval in the input
                    file.
-o OUTFILE, --output OUTFILE
                    output file with extended bed format intervals

usage: pattern_matching.py [-h] [-o [OUTPUT]] [-tf TF [TF ...]] [-th THRESHOLD] [-rc] [-b] [-e] fasta pwm

Matching position frequency matrices (PFM) against DNA sequences

positional arguments:
  fasta                 fasta file with DNA sequences
  pwm                   file with position weight matrices (PWM)

optional arguments:
  -h, --help            show this help message and exit
  -o [OUTPUT], --output [OUTPUT]
                        output file with matching results. Default stdout.
  -tf TF [TF ...], --factor TF [TF ...]
                        transcription factor name in pwm file. Default
                        matching with all tf in pwm file.
  -th THRESHOLD, --threshold THRESHOLD
                        The parameter threshold split for better control on
                        what parts of the scoring are used. Default 0.7.
  -rc, --reverse-complement
                        Scans against reverse complement sequence in addition
                        to the input sequence. Hits on reverse complement are
                        reported at position [position - sequence_length] in
                        complement of input sequence, which is always
                        negative. The actual hit site for any hit is always
                        seq[pos, pos + matrix_length]. Default False.
  -e, --excel           For saving results in easy paste to excel format.
                        Default human readable format.

usage: excel_ucsc_ids_to_bed_order.py [-h] [-b [BED]] [-o [OUTPUT]] excel

Converting matching results in plain text excel format in bed file orders

positional arguments:
  excel                 text file with excel matching

optional arguments:
  -h, --help            show this help message and exit
  -b [BED], --bed [BED]
                        bed file with intervals. Order identifiers to order in
                        bed file. Default not order.
  -o [OUTPUT], --output [OUTPUT]
                        output file with formatted matching results. Default
                        stdout.

usage: pwm_generator.py [-h] [-o [OUTPUT]] seqs

Create position weight matrices (PWM) from DNA sequences

positional arguments:
  seqs                  file with DNA sequences same length. One sequence in
                        on line

optional arguments:
  -h, --help            show this help message and exit
  -o [OUTPUT], --output [OUTPUT]
                        output file with PWM

usage: logo_generator.py [-h] (-s SEQS | -p PWM) -o OUTPUT

Create sequence logo from sequences

optional arguments:
  -h, --help            show this help message and exit
  -s SEQS, --seqs SEQS  file with DNA sequences same length. One sequence in
                        on line
  -p PWM, --pwm PWM     file with PWM.
  -o OUTPUT, --output OUTPUT
                        output file with logo in vector SVG format

About

search of nucleotide motifs DNA sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors