Predict-encoding

Preprocessing and source-localization of the high-density electroencephalogram (EEG) data from the Penn Electrophysiology of Encoding and Retrieval Study, a large-scale open dataset (7,000+ 90 minute sessions across 300+ subjects) focused on verbal memory encoding.

Structure

├── data 
│   ├── dataframes             <- Dataframes (written to .csv) with single-trial info and ROI time-series.
│   ├── external               <- Third-party data required for processing.
│   ├── interim                <- Intermediate data created during processing.
│   ├── processed              <- Cleaned *-raw.fif and *-epo.fif instances for each EEG session (MNE-python Raw and Epochs objects).
│   └── raw                    <- Raw BIDS-formatted data from OpenNeuro.
│
├── results                    <- Preprocessing report.
│
├── slurm                      <- Output of jobs submitted to the SLURM scheduler.
│   ├── output                 <- Sucessful jobs.
│   └── error                  <- Unsucessful jobs.
│
├── src                        <- Source code.
│   ├── preprocessing          <- Module with functions for preprocessing data.
│   │  ├── __init__.py         
│   │  ├── config.py           <- Dictionaries with preprocessing parameters.
│   │  ├── preprocess.py       <- High-level preprocessing functions.
│   │  └── utils.py            <- Low-level functions.
│   │
│   ├── 00_download_data.sh    <- Shell script to download data (raw BIDS-formatted data and thhird-part data).
│   ├── 01_run_pipeline.py     <- Python script to run the pipeline for each EEG session.
│   └── pipeline.py            <- Preprocessing pipeline.
│
├── .env                       <- Environment variables.
├── .gitignore                 <- Files to be ignored by Git.
├── README.md                  <- Project overview.
└── requirements.txt           <- Dependencies ('pip install -r requirements.txt').

Pipeline

This pipeline is set up to run on computer clusters scheduled with slurm.

Set-up.

Initialize the project.

git clone https://github.com/JonahKember/pred_enc_preprocessing

Create a .env file in the project directory and define environment variables.

# Project directory.
project_dir=/path

# Account used to submit SLURM jobs to the scheduler.
account=rrg-user

Create a virtual environment with the necessary dependencies.

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Run the shell script 00_download_data.sh, which downloads the raw BIDS-formatted data from OpenNeuro and third-party data required for the pipeline:
```
bash src/00_download_data.sh
```

Preprocessing

The full preprocessing pipeline is implemented through the script src/preprocessing/01_run_pipeline.py, which can be run through the terminal:
```
# Create and run job scripts.
python src/preprocessing/01_run_pipeline.py --create --run
```
Arguments
--create: Write SBATCH scripts for each EEG session to /jobs.
--run: Submit N jobs (specified in job_params, see below) to the scheduler.
--report: Write a report with preprocessing information for each session.

The amount of memory and time requested for each job is specified in the dictionary job_params, found in the config file of the preprocessing module: src/preprocessing/config.py
```
job_params = {
	'stage':'raw',
	'hours':00,
	'minutes':15,
	'mem_per_cpu':'16G',
	'n_jobs':500
}
```
Parameters
stage (str): preprocessing stage to run (one of: 'raw', 'epochs', 'dataframe').
hours (int): number of hours to request for job.
minutes (int): number of minutes to request for job.
mem_per_cpu (str): amount of RAM to request for job.
n_jobs (int): number of jobs to submit to the scheduler at once (-1: run all jobs).
- Each job applies a distinct processing stage ('raw', 'epochs', and 'dataframe') to a single EEG session.
- SBATCH scripts are written to /jobs,and are formatted as: f'sub-{subject}_ses-{session}_{stage}'.
- Job outputs are written to /slurm/output/ if sucessful, and /slurm/error/ if unsucessful.
- The pipeline is run in three separate stages to help conserve computational resources, as each stage has different RAM requirements.
- ~2,500 jobs are created for each preprocessing stage. SLURM has job limits (i.e., 1,000), so this needs to be done in chunks.
The parameters for the preprocessing pipeline are specifed in the python dictionary params, found in the config file of the preprocessing module: src/preprocessing/config.py:
```
params = {
	'overwrite':False,
	'h_freq_filter':90,
	'l_freq_filter':.1,
	'notch_filter':60,
	'epoch_tmin':-.5,
	'epoch_tmax':3,
	'baseline':(-.5,0),
	'downsample':250
}
```
Parameters
overwrite (bool): whether previous data should be overwritten.
h_freq_filter (int): high-frequency filter cut-off (Hz).
l_freq_filter (int): low-frequency filter cut-off (Hz).
notch_filter (int): frequency of line-noise (Hz).
epoch_tmin (float): pre-stimulus time for epoching (seconds).
epoch_tmax (float): post-stimulus time for epoching (seconds).
baseline (float, float): minimum and maximum time for baseline correction (seconds).
downsample (int): rate for resampling (Hz).

How it works

The job scripts created through src/preprocessing/01_run_pipeline.py run the pipeline src/preprocessing/pipeline.py with the appropriate arguments specified.
The preprocess.py file found within the preprocessing module includes a specific high-level function for each preprocessing stage:

preprocess_raw(subject, session)
This cleans the raw EEG and returns an instance of a MNE-python raw object.

preprocess_epochs(subject, session)
This epochs the raw data and returns an instance of a MNE-python epochs object.

preprocess_dataframe(subject, session)
This extracts the single trial source-localized time-series for each ROI in the left 'inferior_frontal' and left 'medial_temporal' cortices of the HCP_MMP1.0 atlas. It then writes the ROI time-series (along with single-trial information) to .csv /data/dataframes/f'{subject}.csv'.
The utils.py file within the preprocessing module contains a set of useful low-level functions.

Project overview

Background

Specific patterns of electrical acivity are observed within areas of the left inferior-frontal and left medial-temporal cortices in the few seconds after the visual presentation of words. These electrical currents promote verbal memory encoding: (1) their time-series can be used to predict whether words will be subsequently recalled, and (2) their experimental manipulation (via closed-loop stimulation) can be used to enhance the probability of subsequent word recall.

Problem

Verbal memory (the ability to recall previously encountered words) decreases considerably in older adults (i.e., aged 60+). It is unclear how degeneration of the neural mechanisms which underly verbal memory encoding contribute to these deficits.

Solution

To address this problem, we:

Identify features of the electrical currents generated in left inferior-frontal and medial-temporal cortices that are predictive of verbal memory encoding (i.e., discriminate subsequently remembered from forgotten words) using both a data-driven approach (focused on massive feature extraction) and theory-driven approach (focused on dynamic shifts in the periodic and aperiodic components of the power spectral density).
Test whether these features reliably differ in older adults (aged 60+).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict-encoding

Structure

Pipeline

Set-up.

Preprocessing

How it works

Project overview

Background

Problem

Solution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
data		data
results		results
slurm		slurm
src		src
.gitignore		.gitignore
README.md		README.md
README.png		README.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Predict-encoding

Structure

Pipeline

Set-up.

Preprocessing

How it works

Project overview

Background

Problem

Solution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages