Skip to content

Commit 8dfc97b

Browse files
yusuf1759tjduigna
andauthored
Add unit testing to eval code (#38)
* fix: fix issues with non-existent directory * fix: update eval.md * chore: fix linting * chore: fix linting * update posebusters descriptions * chore: cli short hand for eval scripts * chore: elevate _reset_config to _config._clear * test: stratification test passes --------- Co-authored-by: Thomas Duignan <thomas.j.duignan@gmail.com>
1 parent 781709d commit 8dfc97b

File tree

20 files changed

+342
-160
lines changed

20 files changed

+342
-160
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
reports/
2+
test_eval/
23
tox_conda*
34

45
# Byte-compiled / optimized / DLL files

column_descriptions/posebusters_checks.tsv

Lines changed: 123 additions & 124 deletions
Large diffs are not rendered by default.

docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ services:
2626
- ./.coveragerc:/app/.coveragerc
2727
- ./reports/:/app/reports/
2828
- ./examples/:/app/examples/
29-
command: /bin/bash -c "python -m pytest -v && cp .coverage reports/.coverage"
29+
command: /bin/bash -c "python -m pytest -n auto -v && cp .coverage reports/.coverage"

docs/evaluation.md

Lines changed: 64 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ sd_hide_title: true
66

77
## Evaluating docking poses across a stratified test set
88

9+
The `plinder.eval` subpackage allows (1) assessing protein-ligand complex predictions against reference `plinder` systems, and
10+
(2) correlating the performance of these predictions against the level of similarity of each test system to the corresponding training set.
11+
12+
The output file from running the scripts `src/plinder/eval/docking/write_scores.py` and `src/plinder/eval/docking/stratify_test_set.py` generates the same evaluation metrics as the ones we have on the public leaderboard.
13+
914
The `plinder-eval` package allows
1015

1116
1. assessing protein-ligand complex predictions against reference `plinder` systems, and
@@ -26,17 +31,17 @@ leaderboard (coming soon).
2631
- `confidence`: Optional score associated with the pose
2732
- `ligand_file`: Path to the SDF file of the pose
2833

29-
`split.csv` with `system_id` and `split` columns mapping PLINDER systems to `train`, or `test`.
34+
`split.parquet` with, at a minimum, `system_id` and `split` columns mapping PLINDER systems to `train`, or `test`.
3035

3136
### Commands
3237

3338
#### Write scores
3439

3540
```bash
36-
python src/plinder/eval/docking/write_scores.py --prediction_file predictions.csv --data_dir PLINDER_DATA_DIR --output_dir scores --num_processes 64
41+
plinder_eval --prediction_file tests/test_data/eval/predictions.csv --data_dir tests/test_data/eval --output_dir test_eval/ --num_processes 8
3742
```
3843

39-
This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in `scores/scores` and the summary file across all poses is stored in `scores.parquet`.
44+
This calculates accuracy metrics for all predicted poses compared to the reference. JSON files of each pose are stored in `test_eval/scores` and the summary file across all poses is stored in `test_eval/scores.parquet`.
4045

4146
The predicted pose is compared to the reference system and the following ligand scores are calculated:
4247

@@ -69,24 +74,70 @@ For oligomeric complexes:
6974

7075
If `score_posebusters` is True, all posebusters checks are saved.
7176

77+
You can inspect the results at `test_eval/scores.parquet`
78+
79+
```python
80+
>>> import pandas as pd
81+
>>> df = pd.read_parquet("test_eval/scores.parquet")
82+
>>> df.T
83+
0 1
84+
model 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
85+
reference 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
86+
num_reference_ligands 1 1
87+
num_model_ligands 1 1
88+
num_reference_proteins 1 2
89+
num_model_proteins 1 2
90+
fraction_reference_ligands_mapped 1.0 1.0
91+
fraction_model_ligands_mapped 1.0 1.0
92+
lddt_pli_ave 0.889506 0.557841
93+
lddt_pli_wave 0.889506 0.557841
94+
lddt_pli_amd_ave 0.85815 0.510695
95+
lddt_pli_amd_wave 0.85815 0.510695
96+
scrmsd_ave 1.617184 3.665143
97+
scrmsd_wave 1.617184 3.665143
98+
rank 1 1
99+
```
100+
72101
#### Write test stratification data
73102

74103
(This command will not need to be run by a user, the `test_set.parquet` and `val_set.parquet` file will be provided with the split release)
75104

76105
```bash
77-
python src/plinder/eval/docking/stratify_test_set.py --split_file split.csv --data_dir PLINDER_DATA_DIR --output_dir test_data --num_processes 16
106+
plinder_stratify --split_file split.csv --data_dir PLINDER_DATA_DIR --output_dir test_data
78107
```
79108

80109
Makes `test_data/test_set.parquet` which
81110

82111
- Labels the maximum similarity of each test system to the training set across all the similarity metrics
83-
- Stratifies the test set based on training set similarity into `novel_pocket_pli`, `novel_pocket_ligand`, `novel_protein`, `novel_all`, and `not_novel`
84-
- Labels test systems with high quality
85-
86-
#### Write evaluation results
87-
88-
```bash
89-
python src/plinder/eval/docking/make_plots.py --score_file scores/scores.parquet --data_file test_data/test_set.parquet --output_dir results
112+
- Stratifies the test set based on training set similarity into `novel_pocket_pli`, `novel_ligand_pli`, `novel_protein`, `novel_ligand`, `novel_all` and `not_novel`
113+
- Labels test systems with high quality.
114+
115+
To inspect the result of the run, do:
116+
```python
117+
>>> import pandas as pd
118+
>>> df = pd.read_parquet("test_eval/test_set.parquet")
119+
>>> df.T
120+
0 1
121+
system_id 1a3b__1__1.B__1.D 1ai5__1__1.A_1.B__1.D
122+
pli_qcov 0.0 0.0
123+
protein_seqsim_qcov_weighted_sum 0.0 0.0
124+
protein_seqsim_weighted_sum 0.0 0.0
125+
protein_fident_qcov_weighted_sum 0.0 0.0
126+
protein_fident_weighted_sum 0.0 0.0
127+
protein_lddt_qcov_weighted_sum 0.0 0.0
128+
protein_lddt_weighted_sum 0.0 0.0
129+
protein_qcov_weighted_sum 0.0 0.0
130+
pocket_fident_qcov 0.0 0.0
131+
pocket_fident 0.0 0.0
132+
pocket_lddt_qcov 0.0 0.0
133+
pocket_lddt 0.0 0.0
134+
pocket_qcov 0.0 0.0
135+
tanimoto_similarity_max 0.0 0.0
136+
passes_quality False False
137+
novel_pocket_pli True True
138+
novel_pocket_ligand True True
139+
novel_protein True True
140+
novel_all True True
141+
not_novel False False
142+
>>>
90143
```
91-
92-
Writes out results.csv and plots of performance as a function of training set similarity across different similarity metrics.

flows/docker.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ def test_image(
275275
cmd.append("test")
276276
if args is not None and len(args):
277277
cmd.extend(
278-
split(f'''/bin/bash -c "python -m pytest -v {' '.join(args)} && cp .coverage reports/.coverage"''')
278+
split(f'''/bin/bash -c "python -m pytest -n auto -v {' '.join(args)} && cp .coverage reports/.coverage"''')
279279
)
280280
Proc(cmd, env=env).execute()
281281
if push:

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ dependencies = [
3636

3737
[project.scripts]
3838
plinder_download = "plinder.core.index.utils:download_plinder_cmd"
39-
plinder_eval = "plinder.eval.run:main"
40-
plinder_create_submission = "plinder.eval.create_submission:main"
39+
plinder_eval = "plinder.eval.docking.write_scores:main"
40+
plinder_stratify = "plinder.eval.docking.stratify_test_set:main"
4141

4242
[project.optional-dependencies]
4343
lint = [

src/plinder/core/utils/config.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,11 @@ class _get_config:
6969
_packages: dict[str, set[str]] = {}
7070
_cfg = DictConfig({})
7171

72+
def _clear(self) -> None:
73+
self._schema = {}
74+
self._packages = {}
75+
self._cfg = DictConfig({})
76+
7277
def __call__(
7378
self,
7479
*,

src/plinder/eval/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,16 @@
11
# Copyright (c) 2024, Plinder Development Team
22
# Distributed under the terms of the Apache License 2.0
3+
from textwrap import dedent
4+
5+
try:
6+
import ost # noqa
7+
except (ImportError, ModuleNotFoundError):
8+
raise ImportError(
9+
dedent(
10+
"""\
11+
plinder.eval requires the OpenStructureToolkit >= 2.8.0 (ost) to be installed.
12+
Please refer to the documentation for installation instructions and current limitations.
13+
See the note here: https://github.com/plinder-org/plinder?tab=readme-ov-file#-getting-started
14+
"""
15+
)
16+
)

src/plinder/eval/docking/stratify_test_set.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
from plinder.core.scores.protein import cross_similarity as protein_cross_similarity
1313
from plinder.core.utils.log import setup_logger
14-
from plinder.data.smallmolecules import mol2morgan_fp, tanimoto_maxsim_matrix
14+
from plinder.data import smallmolecules
1515

1616
LOG = setup_logger(__name__)
1717

@@ -96,14 +96,14 @@ def compute_ligand_max_similarities(
9696
) -> None:
9797
if "fp" not in df.columns:
9898
smiles_fp_dict = {
99-
smi: mol2morgan_fp(smi)
99+
smi: smallmolecules.mol2morgan_fp(smi)
100100
for smi in df["ligand_rdkit_canonical_smiles"].drop_duplicates().to_list()
101101
}
102102
df["fp"] = df["ligand_rdkit_canonical_smiles"].map(smiles_fp_dict)
103103

104104
df_test = df.loc[df["split"] == test_label][["system_id", "fp"]].copy()
105105

106-
df_test["tanimoto_similarity_max"] = tanimoto_maxsim_matrix(
106+
df_test["tanimoto_similarity_max"] = smallmolecules.tanimoto_maxsim_matrix(
107107
df.loc[df["split"] == train_label]["fp"].to_list(),
108108
df_test["fp"].to_list(),
109109
)
@@ -359,9 +359,9 @@ def main() -> None:
359359
args = parser.parse_args()
360360

361361
StratifiedTestSet.from_split(
362-
split_file=args.split_file,
363-
data_dir=args.data_dir,
364-
output_dir=args.output_dir,
362+
split_file=Path(args.split_file),
363+
data_dir=Path(args.data_dir),
364+
output_dir=Path(args.output_dir),
365365
train_label=args.train_label,
366366
test_label=args.test_label,
367367
overwrite=args.overwrite,

src/plinder/eval/docking/write_scores.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,8 @@ def extract_and_score_test_set(
161161
predictions = pd.read_csv(prediction_file)
162162
test_systems = set(predictions["reference_system_id"])
163163
system_dir = output_dir / "test_systems"
164+
system_dir.mkdir(parents=True, exist_ok=True)
165+
(output_dir / "scores").mkdir(parents=True, exist_ok=True)
164166
if not overwrite:
165167
test_systems = test_systems - set(x.name for x in system_dir.iterdir())
166168
system_dir.mkdir(exist_ok=True)

0 commit comments

Comments
 (0)