Skip to content

Commit 6d7225b

Browse files
tdmaarseveentdmaarseveen
authored andcommitted
Visualize learned embedding & identify batch effect with tSNE
1 parent 4249d90 commit 6d7225b

15 files changed

+1677
-696
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
example_data/
2+
tsne/

README.md

Lines changed: 77 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,86 @@
44
## Background
55
Clustering techniques that use deep learned embeddings often outperform conventional clustering techniques such as k-means [[1](https://www.nature.com/articles/s41598-021-91297-x)]. However, when it comes to projecting new samples onto the learned embedding there is a lack of guidelines & tools. We built POODLE to facilitate the projection of new samples onto this product space. Samples are clustered one-by-one according to their orientation in the latent space.
66

7-
## Deep learning technique
7+
#### Deep learning technique
88
We used the autoencoder architecture of MAUI as an example. However, one could also adopt a different deep learning architecture or even a factor analysis technique (like MOFA). Currently, this github repo does not provide examples for other techniques.
99

10-
## Robust to difference in dimensionality
11-
Poodle is flexible for situations where certain data is absent in the clinic, as one may build a shared product space and only project patients on the variables present in both sets. However, ensure that the key features are still included.
10+
#### Robust to difference in dimensionality
11+
Poodle is flexible for situations where certain data is absent in the clinic, as one may build a shared product space and only project patients on the variables present in both sets. However, ensure that the key features are still included. The more you diverge from the initial set of features, the more you'll loose the cluster essence.
1212

13-
## How to start
14-
Start a notebook session on your device and open the following file :
13+
## Installation
14+
Once you have downloaded the github repo you can install the required packages by running:
15+
16+
```sh
17+
$ pip install -r requirements.txt
18+
```
19+
20+
## How does poodle work?
21+
Arguably, the best way to get familiar with the capabilities of poodle, is to start a notebook session on your device and open the following example:
1522
[Start here](examples/projecting_patients.ipynb)
23+
24+
*For those that prefer to start right away, we have listed the essential functions down below:*
25+
26+
## Functions in poodle
27+
28+
#### Specify the structure of the data
29+
You need to specify the columns for each modality, in case you want to use different modalities. If your variables are all of the same type, one list will suffice.
30+
```python
31+
d_input = {'cat' : list(CATEGORIC_VARIABLES), 'num' : list(NUMERIC_VARIABLES)}
32+
```
33+
*Disclaimer*: Columns need to be present in both the original & new set. If there happens to be a discrepancy between the sets you need to learn a shared product space (see [this notebook](examples/projecting_patients.ipynb) for more info).
34+
35+
36+
#### Prepare patient projection
37+
```python
38+
# Before projecting a new patient, you need to update metadata with the new information.
39+
from poodle import utils as pup
40+
41+
# create metadata
42+
df_meta = pup.getMetaDataPatient(df_cluster, list(df_cluster['pseudoId'].values), new_pat)
43+
```
44+
45+
#### Project a patient onto the latent space
46+
```python
47+
# Now you can project the patient onto the learned embedding. You need to supply the following:
48+
# the model (i.e. maui), metadata, original latent space, modality information and sample data.
49+
50+
# project & classify a new patient
51+
y, z = pup.predictPatientCluster(maui_model, df_meta, z_existent, d_input, sample)
52+
53+
# Collect coordinates of newly added patients
54+
z_new.append(np.array(z)[-1])
55+
```
56+
Output:
57+
* `y`: the cluster probabilities for a new patient
58+
* `z`: the coordinates of the new patient (on the latent space)
59+
* `z_new`: the coordinates of all new patients
60+
61+
## Visualization in poodle
62+
#### Check quality of replicate clusters vs shared product space
63+
```python
64+
from poodle import visualization as viz
65+
import pandas as pd
66+
67+
# Import clustering probabilities of all new patients
68+
df_projection = pd.read_csv('../example_data/results/ClusterAssignment_NewPatients.csv', sep=',')
69+
70+
# Plot both original & replicate distribution
71+
viz.plotQualityControl(df_cluster[['Cluster', 'pseudoId']], df_projection, z_existent, pd.DataFrame(z_new))
72+
```
73+
#### Show differences in spatial variance
74+
```python
75+
viz.plotSpatialVariation(l_new, l_old)
76+
```
77+
#### Map a specific patient
78+
```python
79+
viz.plotClusterMapping(df_meta, z, new_pat)
80+
```
81+
#### Show top 10 closest neighbours
82+
```python
83+
df_neighbours = pup.find_neighbours(df_meta, z, new_pat)
84+
viz.plot_neighbours(df_neighbours, new_pat)
85+
```
86+
1687

1788
## WIP
18-
Be aware that this github repo is still a work in progress. We will update the readme as we make new additions to the tool. For example: we aim to add tSNE projection, baseline comparison and batch correction in the near future.
89+
Be aware that this github repo is still a work in progress. We will update the readme as we make new additions to the tool. For example: we aim to add tSNE projection, baseline comparison and batch correction in the near future.

examples/.ipynb_checkpoints/projecting_patients-checkpoint.ipynb

Lines changed: 575 additions & 335 deletions
Large diffs are not rendered by default.

examples/projecting_patients.ipynb

Lines changed: 575 additions & 335 deletions
Large diffs are not rendered by default.

figures/original/tsne_original.png

14.4 KB
Loading
-1.44 KB
Loading
-3.06 KB
Loading
-3.78 KB
Loading
-367 Bytes
Loading
28 KB
Loading

0 commit comments

Comments
 (0)