Skip to content

greninger-lab/phipseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is for the article: Immunization with full-length TprC variants induces a broad response to surface-exposed epitopes of the Treponema pallidum repeat protein family and is partially protective in the rabbit model of syphilis, Giacani et. al. 10.1016/j.vaccine.2025.127406 Lorenzo Giacani a b, Emily Romeis a, Austin Haynes a 1, Barbara J. Molini a, Lauren C. Tantalo a, Linda H. Xu a, Aldo T. Trejos a b, Jessica Keane a 2,  Zakriye Mohamed a 3, Thaddeus D. Armstrong c, Benjamin A. Wieland c, Quynh Phung c 4, Dariia Vyshenska c, Victoria L. Campbell a, Charmie Godornes a,  David M. Koelle a b c d, Tara B. Reid a, Yang Wang f g, Anastassia A. Vorobieva f g, Anna Wald a c e h, Nicole A.P. Lieberman c, Alexander L. Greninger c e a - Department of Medicine, Division of Allergy & Infectious Diseases, University of Washington, 325 9th Ave, Seattle, WA 98104, USA b - Department of Global Health, University of Washington, 3980 15th Ave NE, Seattle, WA 98105, USA c - Department of Laboratory Medicine and Pathology, University of Washington, 825 Eastlake Ave E, Seattle, WA 98109, USA d - Benaroya Research Institute, 1201 9th Ave, Seattle, WA 98101, USA e - Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA f - VIB-VUB Center for Structural Biology, VIB, Brussels, Belgium g - Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium h - Department of Epidemiology, University of Washington, 1959 NE Pacific St, Seattle, WA 98195, USA

Requirements for replicating the plots: the 6 scripts listed below, the data contained within the SRA database (SRR accessions are listed in the 'Table_SX_PhIP_Seq_Metadata.csv' supplemental file, the kallisto reference indexes (in 'ref' folder), and the following dependencies: In bash: sratoolkit, cutadapt, kallisto, cd-hit, and pepsyn In R: These dependencies will be listed in the scripts themselves, at the very top In python: Same as with R

Guidelines for running the scripts: Before running any scripts, the only folders in need in your working directory (e.g. '/Users/path/to/Rabbit_Vaccination_Data') are: 'ref' - should contain 'tp_oligos_kallisto.idx' which is the kallisto index reference and 'tp_oligos.fasta' which can be used to make tp_oligos_kallisto.idx if you are using a version of kallisto other than v13

#Updating fasta to create new kallisto index kallisto index -i tp_oligos_kallisto.idx -t 8 tp_oligos.fasta

'TP_databases' - will contain a key for the Tp peptide display library (has gene ID (gene_id), aa sequence (peptide_seq), oligo sequence, gene location in genome (nt start site), peptide sequence in protein (min_loc)) called 'TP_key_with_corrected_min_max_loc.csv'. 'translation_key.csv' contains the amino acid sequences for proteins in the SS14 T.pallidum strain (NC_021508.1) 'metadata' - will contain 'Table_SX_PhIP_Seq_Metadata.csv' which contains metadata for each sample - sample name, SRA accession, if it is a rabbit or control, etc. 'final_scripts' - contains the 6 scripts used in this analysis

Running the code: 6 scripts:

  1. Phipseq_PE_processing_loop.sh - written in bash. Takes the SRA accessions from Table_SX_PhIP_Seq_Metadata.csv (in 'metadata') and downloads them from the SRA database, converts them to fastq.gz, uses cutadapt to trim 33 bases from R1 and 30 bases from R2, then uses kallisto to align to tp_oligos_kallisto.idx (in 'ref'). It will make the folders 'raw' (contains .sra raw files), 'fastq_raw' (contains the raw files converted to fastq.gz), 'trim' (trimmed raw fastq.gz files) and 'tp_abundance' (this will contain a folder called 'abundance' in which are all the .tsv files outputted by kallisto, named according to their SRA accession)

  2. Initialize_data.R - written in R. Takes the SRA accessions from Table_SX_PhIP_Seq_Metadata.csv (in 'metadata') and renames all files in ./tp_abundance/abundance according to the 'Sample' column. Once the files are renamed (inputs and controls will have the date appended to the end) this script creates the folders 'tprC', 'tprD2', 'tprK', and 'Controls' in ./tp_abundance then it will move all files into the appropriate folders (using the metadata sheet).

  3. DESeq_script_tprCDK.R - written in R. For each group of immunized rabbits (tprC, tprD2 or tprK), it will run DESeq2 on the 3 technical replicates of each rabbit (pre- and post-immunization replicates of a given rabbit run separately) with the TP library inputs that were run with that group of immunized rabbits as a background. It will create all the necessary folders as long as you provide the path to the parent directory. -It will create the folder 'results' in which will be 'DESeq2' which will contain 3 folders called 'tprC', 'tprK', and 'tprD2'. As the name suggests, this file path will store all the DESeq2 results sorted based on the immunized rabbit group. -In 'results' it will create 'slim_dfs' (this will contain 3 folders called 'tprC', 'tprD2' and 'tprK' to store modified DESeq2 dataframes ('sig_by_kmer', 'Rabbit', 'Imm' columns added, normalized count columns removed)) and 'sig_dfs' (this will contain 3 folders called 'tprC', 'tprD2' and 'tprK' to store dataframes of only signficant peptides)

  4. significant_peptide_determination_tprCDK.R - written in R. This script will take the DESeq2 results for each rabbit and determine significant peptides. All you need to provide is a path to the 'results' folder created in step #3. It will also output dataframes in the 'slim_dfs' and 'sig_dfs' folder. It will also creat pvalue threshold dataframes (using pre-immunization data) and files named 'bound_slim_{gene}_rabbits.csv' which are dataframes containing all data for a specific immunogen group (e.g 'bound_slim_tprC_rabbits.csv' contains the data for both pre- and post-immunization of the 7 rabbits immunized with tprC)

  5. final_plots_tprCDK.R - written in R. This script includes all code necessary for figures (excluding pymol heatmaps and supplementary figures). Provide a path to 'results' and 'TP_databases', then you can run the code to create Figure 9 and 10, and Supp figure 4. Then a section for finding the max -Log10(p-Value) of each amino acid along tprC and tprD2 across all library peptides from tprD2- and tprC-immunized rabbits.

  6. SupplementalFigures.R - written in R. This script is for creatings Supplementary figures S5, S6, and S7. Need to supply a file path.

  7. run_script_gradient.py - written in python. This is a python script because that is the language interpretable to pymol (https://pymol.org/). It takes the max -Log10(p-Value) coverage dataframes from script #5 (in final_heatmaps) and plots results on the model of your choice. Steps for running:

    1. Edit the script to include the correct pathname of the 'results' folder (right now there is a placeholder path)
    2. Open pymol and drag in the models for tprC and tprD2. They should be called tprc_nichols.cif and tprd2_ss14.cif (in TP_databases)
    3. Type - run /Users/path/to/Rabbit_Vaccination_Data/final_scripts/run_script_gradient.py
    4. Type - run_script(gene, model, zoom). 'gene' is the gene for which we have the max -Log10(p-Value) coverage dataframe, should be 'tprC' or 'tprD2', 'model' is the pymol model name created by Alphafold 3 (https://alphafoldserver.com/welcome) either 'tprc_nichols' or 'tprd2_ss14', 'zoom' customized the amount of zoom around the center of the model - do 70.

    run_script('tprC', 'tprc_nichols', 70) & run_script('tprD2', 'tprd2_ss14', 70)

    1. Save the figure - png /Users/path/to/Rabbit_Vaccination_Data/results/photos/3Dheatmap_tprC.png, dpi = 400

About

PhIP-Seq repository containing scripts and metadata files for the paper 'Immunization with full-length TprC variants induces a broad response to surface-exposed epitopes of the Treponema pallidum repeat protein family and is partially protective in the rabbit model of syphilis'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors