Skip to content

muhammed-ali/python-pdb-cath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a python script to extract relevant information from pdb and cath files for each pdb entry and store the individual files in a .parquet file format.

To run this data extraction script on a sample input files, I have provided the relevant input files in the "./python-pdb-cath-master/data/input/" directory. You will first need to create a "output" directory inside the "./python-pdb-cath-master/data/" directory and then execute the script by "python main.py" from the home directory of the project ("./python-pdb-cath-master/").

If you want to apply the script on the original files downloaded from the ftp servers of pdb (http://ftp.ebi.ac.uk/pub/databases/fastafiles/pdb/pdbaa.gz) and cath (ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/cath-domain-description-file.txt), please (un)comment out the relevant commands on line 7, 21, and 27 of the 'functions.py' script inside the main of the project.

About

Python script to extract relevant information from pdb and cath files for each pdb entry and store individual file in .parquet file format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages