lncRNA identification and Co-expression analyses

Overview

This repository provides a complete computational framework to identify lncRNAs and putative regulatory interactions between lncRNAs and protein-coding genes by integrating:

Two lncRNA prediction tools (FEELnc and CPC2) to increase confidence in results
Two differential expression analysis tools (DESeq2 and edgeR)
Co-expression network analysis (WGCNA)

Repository Structure

├── 01_preprocessing/
│   ├── run01_fastp_trimming.py
│   ├── run02_hisat_mapping.py
│   ├── run03_stringtie_assembly.sh
│   └── run04_gffcompare.sh
├── 02_lncRNA_identification/
│   └── run05_FEELnc_CPC2_lncRNAprediction.txt
├── 03_differential_expression_analysis/
│   ├── run06_featurecounts.sh
│   └── run07_EdgeR_DESEQ2_DEanalysis.ipynb
├── 04_coexpression_WGCNA_analysis/
│   └── run08_WGCNA_coexpanalysis.ipynb
├── 05_network_filtering/
│   ├── run09_coexp_data_filter.ipynb
│   ├── run10_genepairs_net_construction.py
│   ├── run11_quantilnetwork_to_cytoscape.py
│   ├── run12_edges_attributes1_pairsxtrait.py
│   ├── run13_edges_attributes2_to_cytoscape.ipynb
│   ├── run14_nodes_attributes_to_cytoscape.ipynb
│   └── run15_annotation_nodes_of_interest.ipynb
└── README.md

Note: The modules must be executed in order.

Pipeline Overview

Step 01 - Preprocessing

`run01_fastp_trimming.py`

Input: - Raw paired-end reads (forward and reverse)

Example:

your_path/sample1_1.fastq.gz    your_path/sample1_2.fastq.gz

Output: - Trimmed reads

`run02_hisat_mapping.py`

Input: - Trimmed reads - Reference genome index

Example:

../sample1_paired1.fq.gz    ../sample1_paired2.fq.gz

Build index:

hisat2-build reference_genome.fasta index_output

Output: - .bam files

`run03_stringtie_assembly.sh`

Input: - .bam - ref_genome.gtf

Output: - stringtie_merged.gtf

`run04_gffcompare.sh`

Input: - ref_genome.gtf - stringtie_merged.gtf

Output: - Comparison data

Step 02 - lncRNA Identification

`run05_FEELnc_CPC2_lncRNAprediction.txt`

Input: - ref_genome.gtf - stringtie_merged.gtf

Output: - Common lncRNAs

Step 03 - Differential Expression Analysis

`run06_featurecounts.sh`

Input: - .bam

Output: - count_matrix.tsv

`run07_EdgeR_DESEQ2_DEanalysis.ipynb`

Input: - count_matrix.tsv - metadata.txt

Output: - DE genes, PCA, heatmaps

Step 04 - Co-expression Analysis

`run08_WGCNA_coexpanalysis.ipynb`

Input: - TMM.tsv - metadata.txt - binary_traits.tsv

Output: - Network files and plots

Step 05 - Network Filtering

`run09_coexp_data_filter.ipynb`

Input: - geneModuleMembership.csv - PvalueModuleMembership.csv - geneTraitSignificance_resistant.csv - GeneSignificancePvalue_resistant.csv

Output: - filtered_mm.tsv

`run10_genepairs_net_construction.py`

Input: - list_genes_of_interest.txt

Output: - gene_list_pairs.txt

`run11_quantilnetwork_to_cytoscape.py`

Input: - bigNet_edges.txt

Output: - Filtered networks

`run12_edges_attributes1_pairsxtrait.py`

Input: - filtered_network.txt - gene_list_pairs.txt

Output: - Weighted gene pairs

`run13_edges_attributes2_to_cytoscape.ipynb`

Input: - bignet_075_resistant.txt - bignet_075_resistant24hpi.txt - bignet_075_resistant60hpi.txt - bignet_075_resistant84hpi.txt - pairs genes with weight info.

Output: - edges_attributes.tsv

`run14_nodes_attributes_to_cytoscape.ipynb`

Input: - filtered_mm.tsv - genes_DE_run07.txt - any list of genes attributes

Output: - node_attributes.tsv

`run15_annotation_nodes_of_interest.ipynb`

Input: - annotation_file.tsv - list_nodes.txt - list_edges.txt

Output: - annotated_genes.tsv

Software and tools

R v4.5.1.
Python v3.10.
FastQC v0.11.9 [1]
MultiQC v.1.23 [2]
Fastp v0.23.2 [3]
Hisat2 v2.2.165 [4]
Stringtie v2.2.2 [5]
Gffcompare v0.11.2 [6]
CPC2 v1.0.1 [7]
FEELnc v3 [8]
FeatureCounts v1.22.2 [9]
edgeR v4.6.3 [10]
DESeq2 v1.48.1 [11]
WGCNA package v.1.73 [12]

References

Andrews S. FastQC: a quality control tool for high throughput sequence data. Available Online Httpwwwbioinformaticsbabrahamacukprojectsfastqc. 2010.
Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–8. https://doi.org/10.1093/bioinformatics/btw354.
Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta. 2023;2. https://doi.org/10.1002/imt2.107.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. https://doi.org/10.1038/s41587-019-0201-4.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5. https://doi.org/10.1038/nbt.3122.
Pertea M, Pertea G. GFF Utilities: GffRead and GffCompare. F1000Research. 2020;9. https://doi.org/10.12688/f1000research.23297.1.
Kang YJ, Yang DC, Kong L, Hou M, Meng YQ, Wei L, et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017;45:W12–6. https://doi.org/10.1093/nar/gkx428.
Wucher V, Legeai F, Hédan B, Rizk G, Lagoutte L, Leeb T, et al. FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 2017;45. https://doi.org/10.1093/nar/gkw1306.
Liao Y, Smyth GK, Shi W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–30. https://doi.org/10.1093/bioinformatics/btt656.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. https://doi.org/10.1186/s13059-014-0550-8.
Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9. https://doi.org/10.1186/1471-2105-9-559.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lncRNA identification and Co-expression analyses

Overview

Repository Structure

Pipeline Overview

Step 01 - Preprocessing

`run01_fastp_trimming.py`

`run02_hisat_mapping.py`

`run03_stringtie_assembly.sh`

`run04_gffcompare.sh`

Step 02 - lncRNA Identification

`run05_FEELnc_CPC2_lncRNAprediction.txt`

Step 03 - Differential Expression Analysis

`run06_featurecounts.sh`

`run07_EdgeR_DESEQ2_DEanalysis.ipynb`

Step 04 - Co-expression Analysis

`run08_WGCNA_coexpanalysis.ipynb`

Step 05 - Network Filtering

`run09_coexp_data_filter.ipynb`

`run10_genepairs_net_construction.py`

`run11_quantilnetwork_to_cytoscape.py`

`run12_edges_attributes1_pairsxtrait.py`

`run13_edges_attributes2_to_cytoscape.ipynb`

`run14_nodes_attributes_to_cytoscape.ipynb`

`run15_annotation_nodes_of_interest.ipynb`

Software and tools

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
01_preprocessing		01_preprocessing
02_lncRNA_identification		02_lncRNA_identification
03_differential_expression_analysis		03_differential_expression_analysis
04_coexpression_WGCNA_analysis		04_coexpression_WGCNA_analysis
05_network_filtering		05_network_filtering
docs		docs
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

lncRNA identification and Co-expression analyses

Overview

Repository Structure

Pipeline Overview

Step 01 - Preprocessing

run01_fastp_trimming.py

run02_hisat_mapping.py

run03_stringtie_assembly.sh

run04_gffcompare.sh

Step 02 - lncRNA Identification

run05_FEELnc_CPC2_lncRNAprediction.txt

Step 03 - Differential Expression Analysis

run06_featurecounts.sh

run07_EdgeR_DESEQ2_DEanalysis.ipynb

Step 04 - Co-expression Analysis

run08_WGCNA_coexpanalysis.ipynb

Step 05 - Network Filtering

run09_coexp_data_filter.ipynb

run10_genepairs_net_construction.py

run11_quantilnetwork_to_cytoscape.py

run12_edges_attributes1_pairsxtrait.py

run13_edges_attributes2_to_cytoscape.ipynb

run14_nodes_attributes_to_cytoscape.ipynb

run15_annotation_nodes_of_interest.ipynb

Software and tools

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run01_fastp_trimming.py`

`run02_hisat_mapping.py`

`run03_stringtie_assembly.sh`

`run04_gffcompare.sh`

`run05_FEELnc_CPC2_lncRNAprediction.txt`

`run06_featurecounts.sh`

`run07_EdgeR_DESEQ2_DEanalysis.ipynb`

`run08_WGCNA_coexpanalysis.ipynb`

`run09_coexp_data_filter.ipynb`

`run10_genepairs_net_construction.py`

`run11_quantilnetwork_to_cytoscape.py`

`run12_edges_attributes1_pairsxtrait.py`

`run13_edges_attributes2_to_cytoscape.ipynb`

`run14_nodes_attributes_to_cytoscape.ipynb`

`run15_annotation_nodes_of_interest.ipynb`

Packages