Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts	scripts
.gitignore	.gitignore
README.md	README.md

SPC Data Explorer

An interactive dashboard for exploring Cell Painting phenotypic screening data, built with Dash and Plotly. Supports both SPC (Spherical Phenotype Clustering) and CellProfiler analysis pipelines with unified visualisation capabilities.

Overview

This application provides an interactive web-based interface for exploring high-content screening data from Cell Painting assays. It was developed to:

Visualise compound phenotypes in reduced dimensionality space (UMAP/t-SNE)
Compare analysis methods by switching between SPC and CellProfiler pipelines
Identify phenotypic neighbours through landmark-based distance analysis
Explore compound metadata including MOA, target annotations, and chemical structures

The dashboard integrates multiple data sources including morphological features, compound annotations, chemoproteomics data, and microscopy images to provide a comprehensive view of phenotypic screening results.

Features

Dual Pipeline Support

The application supports data from two distinct analysis pipelines:

SPC (Spherical Phenotype Clustering): A machine learning approach using ResNet-based feature extraction and cosine similarity metrics
CellProfiler: Traditional morphological profiling with standardised feature sets

Each pipeline has its own column naming conventions, which the app handles transparently through configurable column mappings.

Interactive Visualisation

UMAP/t-SNE scatter plots for both SPC and CellProfiler datasets
Dynamic colour mapping by:
- Library source (GSK, JUMP, SGC, etc.)
- Mechanism of Action (MOA)
- Landmark proximity status
- Plate/well location
- Various distance metrics
Compound search with autocomplete supporting:
- PP_ID (e.g., PPXXXX@1.0)
- Treatment names (e.g., CompoundXXXX@0.1)
- MOA/gene names (e.g., UNG@0.1 (CR000023@0.1))
Visual highlighting of selected compounds on the plot

Microscopy Image Integration

Hover preview: See microscopy thumbnails instantly when hovering over data points
Click for details: Full compound information panel with larger image
Multiple scaling modes: Fixed (comparable across images) or auto-scaled (per-image optimisation)
Text overlays: Optional treatment or MOA labels on images
Multi-site support: Random site selection from available fields of view

Landmark Analysis

Reference compounds ("landmarks") with known mechanisms serve as anchors for phenotypic interpretation:

Distance calculations to three closest landmarks for each compound
Validity indicators showing if compounds fall within meaningful distance thresholds
Detailed landmark information including:
- MOA/target annotations
- PP_ID identifiers
- Cosine distances
- Broad Institute annotations

Rich Metadata Display

Hover and click interactions reveal comprehensive compound information:

Basic info: Treatment, plate, well, concentration, library
Annotations: MOA, target description, Broad annotation
Chemical structure: Rendered from SMILES using RDKit
Chemoproteomics: Protein targets from pulldown experiments
Gene descriptions: Functional annotations for target genes
Distance metrics: MAD cosine, variance, standard deviation measures

Data Sources

This visualisation app displays data generated by two upstream analysis pipelines:

SPC Analysis Pipeline

Repository: spc-cosine-analysis (TBD - update link)
Description: Spherical Phenotype Clustering using ResNet feature extraction and cosine similarity analysis
Output: Parquet files with UMAP/t-SNE coordinates, landmark distances, and unified metrics

CellProfiler Analysis Pipeline

Repository: cellprofiler_processing (TBD - update link)
Description: Traditional CellProfiler morphological profiling with MAD normalisation
Output: Parquet files with Metadata_ prefixed columns and landmark analysis results

Project Structure

spc-data-explorer/
└── scripts/
    ├── main.py                      # Application entry point
    ├── config_loader.py             # Configuration management (singleton pattern)
    ├── environment.yml              # Conda environment specification
    ├── requirements.txt             # Pip dependencies (alternative to conda)
    │
    ├── callbacks/
    │   ├── __init__.py
    │   ├── plot_callbacks.py        # Main scatter plot generation and updates
    │   ├── image_callbacks.py       # Hover/click image display with metadata
    │   ├── search_callbacks.py      # MOA-based compound search functionality
    │   ├── detailed_search_callbacks.py  # Advanced search with multiple criteria
    │   └── landmark_callbacks.py    # Landmark analysis modal and calculations
    │
    ├── components/
    │   ├── __init__.py
    │   ├── layout.py                # Dashboard layout structure
    │   ├── controls.py              # UI control panels (dropdowns, sliders)
    │   └── search.py                # Search component builders
    │
    ├── config/
    │   └── config_20251118_TEST_INPUTS.py  #  RECOMMENDED: Latest config
    │
    ├── data/
    │   ├── __init__.py
    │   ├── loader.py                # Main data loading with column harmonisation
    │   └── landmark_loader.py       # Landmark data processing and validation
    │
    ├── utils/
    │   ├── __init__.py
    │   ├── color_utils.py           # Colour palette management for categories
    │   ├── image_utils.py           # Thumbnail finding and image processing
    │   └── smiles_utils.py          # Chemical structure rendering via RDKit
    │
    └── generate_thumbnails/
        ├── scripts/
        │   └── generate_thumbnails_perc_and_auto_thresh_V1.py  # Thumbnail generator
        └── submit/
            └── thumbnails_*.sh      # SLURM submission scripts for each dataset

Key Configuration Note

Use config_20251118_TEST_INPUTS.py as your starting point.

This is the latest configuration file that includes:

Separate loading logic for SPC and CellProfiler datasets
Correct column name mappings for both pipelines (e.g., plate vs Metadata_plate_barcode)
Hover column definitions for each data type
Plot type configurations for all four views (SPC UMAP/t-SNE, CP UMAP/t-SNE)

Thumbnail Generation

The generate_thumbnails/ directory contains scripts for creating RGB thumbnail images from multi-channel Cell Painting microscopy data. These thumbnails are displayed in the dashboard when hovering over or clicking on data points.

Overview

Cell Painting assays typically acquire 4-5 fluorescent channels per field of view. The thumbnail generator combines these channels into false-colour RGB thumbnails (500×500 pixels) suitable for quick visual inspection.

Scaling Modes

The script produces two versions of each thumbnail:

Mode	Directory	Description	Best For
Fixed	`fixed/`	Pre-defined intensity limits based on dataset-wide percentiles	Comparing phenotypes across treatments, identifying outliers
Auto	`auto/`	Per-image 1st-99th percentile scaling	Examining morphological details, dim images, QC checking

Channel Mapping

Fluorescent channels are mapped to RGB colours:

Blue: Nuclear stains (HOECHST 33342, DAPI)
Green: Alexa 488, FITC (ER, actin, cytoplasmic markers)
Red: Alexa 568, MitoTracker Deep Red, Cy5 (mitochondria, membrane)

Usage

Basic usage:

python generate_thumbnails_perc_and_auto_thresh_V1.py \
    /path/to/max_projected_images \
    /path/to/output/thumbnails \
    --scaling both

Scan directories first (planning mode):

python generate_thumbnails_perc_and_auto_thresh_V1.py \
    /path/to/images \
    /path/to/thumbnails \
    --scan-only \
    --input-dirs /other/path1 /other/path2

SLURM Submission

For HPC environments, use the submission scripts in submit/:

sbatch thumbnails_20251020_HaCaT_HTC_V1_V2_cell_paint.sh

Example SLURM configuration:

#SBATCH --job-name=thumbnails
#SBATCH --time=168:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --partition=ncpu

Output Structure

thumbnails/
├── fixed/                    # Fixed intensity scaling
│   ├── {plate_barcode}/
│   │   ├── {plate}_{well}_{site}.png
│   │   └── ...
│   └── ...
└── auto/                     # Auto-scaled per image
    ├── {plate_barcode}/
    │   └── ...
    └── ...

Installation

Prerequisites

Python 3.11+
Conda (recommended) or pip
Access to data files (parquet) and thumbnail images

Setup with Conda (Recommended)

# Clone the repository
git clone https://github.com/YOUR_USERNAME/spc-data-explorer.git
cd spc-data-explorer/scripts

# Create environment from file
conda env create -f environment.yml

# Activate environment
conda activate spc_visualisation

Setup with Pip (Alternative)

# Clone the repository
git clone https://github.com/YOUR_USERNAME/spc-data-explorer.git
cd spc-data-explorer/scripts

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

RDKit Note

RDKit is required for chemical structure rendering. It's easiest to install via conda:

conda install -c conda-forge rdkit

Configuration

1. Create Your Configuration File

# Copy the recommended config
cp config/config_20251118_TEST_INPUTS.py config/config_myproject.py

2. Update Paths

Edit your configuration file to point to your data:

class Config:
    # Data paths - update these!
    SPC_DATA_PATH = Path("/path/to/spc_analysis_output.parquet")
    CP_DATA_PATH = Path("/path/to/cellprofiler_output.parquet")
    
    # Thumbnail directory (should contain 'fixed/' and 'auto/' subdirectories)
    THUMBNAIL_DIRS = Path("/path/to/thumbnails")

3. Environment-Specific Paths

The template supports automatic path switching based on username:

if os.environ.get("USER", "") == "your_cluster_username":
    # Cluster paths (e.g., /nemo/...)
    ANALYSIS_DIR = Path("/nemo/path/to/analysis")
else:
    # Local paths (e.g., mounted volumes)
    ANALYSIS_DIR = Path("/Volumes/path/to/analysis")

Usage

Starting the Application

cd scripts/

# Interactive config selection (prompts you to choose)
python main.py

# Or specify config directly
python main.py --config config_myproject

# Or use environment variable
export SPC_CONFIG=config_myproject
python main.py

The app will start at http://127.0.0.1:8090 (or the port specified in your config).

Dashboard Navigation

Select Plot Type: Choose from:
- SPC UMAP / t-SNE
- CellProfiler UMAP / t-SNE
- Custom axes
Colour By: Select metadata column for point colouring:
- Library, MOA, landmark status
- Plate, well location
- Distance metrics (continuous colour scales)
Search Compounds: Type to search by:
- Compound ID: PPXXXX
- Treatment: CompoundXXXX
- Gene/MOA: UNG → shows UNG@0.1 (CR000023@0.1)
Interact with Plot:
- Hover: See microscopy image preview + key metadata
- Click: Open detailed compound panel with full information
- Zoom/Pan: Standard Plotly interactions
Adjust Settings:
- Point size slider
- Image scaling mode (fixed/auto)
- Optional text labels on images

Input Data Format

SPC Dataset Required Columns

Column	Description
`UMAP1`, `UMAP2`	UMAP coordinates
`TSNE1`, `TSNE2`	t-SNE coordinates
`plate`, `well`	Plate and well identifiers
`treatment`	Treatment identifier
`PP_ID`, `PP_ID_uM`	Compound ID and with concentration
`library`	Source library
`moa_first`, `moa_compound_uM`	Mechanism of action
`closest_landmark_*`	Landmark distance data

CellProfiler Dataset Required Columns

Column	Description
`UMAP1`, `UMAP2`	UMAP coordinates
`TSNE1`, `TSNE2`	t-SNE coordinates
`Metadata_plate_barcode`	Plate identifier
`Metadata_well`	Well identifier
`Metadata_PP_ID`, `Metadata_PP_ID_uM`	Compound identifiers
`Metadata_library`	Source library
`Metadata_annotated_target_first`	MOA/target
`closest_landmark_Metadata_*`	Landmark data

Thumbnail Directory Structure

thumbnails/
├── fixed/           # Fixed intensity scaling (comparable)
│   ├── plate1/
│   │   ├── plate1_A01_01.png
│   │   ├── plate1_A01_02.png
│   │   └── ...
│   └── plate2/
└── auto/            # Auto-scaled per image
    ├── plate1/
    └── plate2/

Development

Adding New Colour Options

Edit utils/color_utils.py to add new colour column configurations:

color_columns = [
    ('new_column', False, 'Display Name', px.colors.qualitative.Set1),
    # (column_name, is_continuous, display_label, colour_palette)
]

Adding New Hover Fields

Update your config file's get_hover_columns() and get_hover_display() methods to include additional fields in the hover template.

Customising the Layout

The dashboard layout is defined in components/layout.py. Modify this file to add new panels or rearrange existing components.

Troubleshooting

Common Issues

"No data available" error

Check that your data paths in the config file are correct
Verify the parquet files exist and are readable
Ensure required columns are present in your data

Images not displaying

Verify thumbnail directory path is correct
Check that fixed/ and auto/ subdirectories exist
Confirm image naming convention: {plate}_{well}_{site}.png

Slow performance with large datasets

Consider filtering data before loading
Reduce the number of hover columns
Use server-side pagination for very large datasets

RDKit import errors

Install RDKit via conda: conda install -c conda-forge rdkit
If using pip, RDKit installation can be complex - conda is recommended

Folders and files

Latest commit

History

Repository files navigation

SPC Data Explorer

Overview

Features

Dual Pipeline Support

Interactive Visualisation

Microscopy Image Integration

Landmark Analysis

Rich Metadata Display

Data Sources

SPC Analysis Pipeline

CellProfiler Analysis Pipeline

Project Structure

Key Configuration Note

Thumbnail Generation

Overview

Scaling Modes

Channel Mapping

Usage

SLURM Submission

Output Structure

Installation

Prerequisites

Setup with Conda (Recommended)

Setup with Pip (Alternative)

RDKit Note

Configuration

1. Create Your Configuration File

2. Update Paths

3. Environment-Specific Paths

Usage

Starting the Application

Dashboard Navigation

Input Data Format

SPC Dataset Required Columns

CellProfiler Dataset Required Columns

Thumbnail Directory Structure

Development

Adding New Colour Options

Adding New Hover Fields

Customising the Layout

Troubleshooting

Common Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages