Multi-cohort cerebrospinal fluid proteomics identifies robust molecular signatures across the Alzheimer disease continuum

Introduction

This repository contains the code for bioinformatics analyses described in the article "Multi-cohort cerebrospinal fluid proteomics identifies robust molecular signatures across the Alzheimer disease continuum".

This project investigated CSF proteomics data from the SomaScan 7K platform to identify proteins associated with Alzheimer disease. Idnetified proteins were leveraged to create AD-spcific prediction model, pseudo-trajectory analysis, biological pathway and cell type enrichment analyses to understand underlying AD biology.

Content

The code covers the following main analysis steps:

Data pre-processing: Proteomics data preparation and surrogate variable (SV) computation
Differential expression analysis (Discovery, Replication, and Meta-analyses)
AD prediction model development using Lasso regression
Survival analysis to identify individuals that will conert to AD
AD progression analysis to distinguish between slow and fast progressors
Pseudo trajectory analysis to group/cluster proteins based on their expressin in AT continuum (A-T-, A+T-, A+T+)
Network and pathway enrichment analyses
Cell type enrichment analysis

Data

Proteomics data analysed in this study is available at:

ADNI: http://adni.loni.usc.edu/
Knight-ADRC: https://dss.niagads.org/ (Accession: ng00130)
FACE and Barcelona-1 cohorts: http://www.fundacioace.com/
PPMI: https://www.ppmi-info.org/
Stanford-ADRC: https://web.stanford.edu/group/adrc/cgi-bin/web-proj/datareq.php

Requirements

The code was written in R (version 4.3.0) and relies on multiple R and Bioconductor packages, including:

sva
clusterProfiler
scran
glmnet
nlme
pROC
igraph
survminer
mclust
Additional packages listed at the beginning of each R script

License

The code is available under the MIT License.

Instructions

The code was tested on R 4.3.0 on Linux operating systems, but should be compatible with later versions of R installed on current Linux, Mac, or Windows systems.

To run the code, the correct working directory containing the input data must be specified at the beginning of the R-scripts, otherwise the scripts can be run as-is.

The scripts should be run in the following order:

data_preparation.R

differential_expression_analysis.R

prediction_modeling.R

survival_analysis.R

progression_analysis.R

clustering_pseudo_trajectories.R

network_and_pathway_analysis.R

cell_type_enrichment.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-cohort cerebrospinal fluid proteomics identifies robust molecular signatures across the Alzheimer disease continuum

Table of contents

Introduction

Content

Data

Requirements

License

Instructions

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
cell_type_enrichment.R		cell_type_enrichment.R
clustering_pseudo_trajectories.R		clustering_pseudo_trajectories.R
data_preparation.R		data_preparation.R
differential_expression_analysis.R		differential_expression_analysis.R
network_and_pathway_analysis.R		network_and_pathway_analysis.R
prediction_modeling.R		prediction_modeling.R
progression_analysis.R		progression_analysis.R
survival_analysis.R		survival_analysis.R

Folders and files

Latest commit

History

Repository files navigation

Multi-cohort cerebrospinal fluid proteomics identifies robust molecular signatures across the Alzheimer disease continuum

Table of contents

Introduction

Content

Data

Requirements

License

Instructions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages