Skip to content

Commit f27eeb5

Browse files
tjduignayusuf1759OleinikovasVNinjani
authored
Groundwork for sphinx docs (#7)
* docs: add docs boilerplate * chore: ignore generated docs * docs: fix the intro hook to the README * fix: push coverage on main branch * chore: images hosted on github outside of git * chore: add updated image * chore: remove image * chore: add image via github link * chore: update readme overview image * chore: update readme * chore: update overview * fix: put networkit back * Doc updates (#6) * chore: add publications to readme, citation and notice files * docs: update eval readme * docs: update data readme * fix: extra comma * added benchmark + stratification description * updated test stratification stats * style: ruff --------- Co-authored-by: Ninjani <janani.durairaj@gmail.com> Co-authored-by: OleinikovasV <vladas@vant.ai> Co-authored-by: Thomas Duignan <thomas.j.duignan@gmail.com> --------- Co-authored-by: Yusuf Adeshina <mr.adeshina.yusuf@gmail.com> Co-authored-by: Vladas Oleinikovas <v.oleinikovas@gmail.com> Co-authored-by: Ninjani <janani.durairaj@gmail.com> Co-authored-by: OleinikovasV <vladas@vant.ai>
1 parent 4acea60 commit f27eeb5

File tree

18 files changed

+368
-47
lines changed

18 files changed

+368
-47
lines changed

.github/workflows/main.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,15 @@ jobs:
3838
if: steps.get-tag.outputs.bump != ''
3939
run: python flows/docker.py test --push
4040
- name: Save git tag
41-
if: steps.get-tag.outputs.bump != 'skip'
41+
if: steps.get-tag.outputs.bump != ''
4242
run: git push origin ${{ steps.get-tag.outputs.bump }}
43+
- name: Copy and surgery coverage
44+
if: steps.get-tag.outputs.bump != ''
45+
run: |
46+
cp reports/.coverage .
47+
sqlite3 .coverage "update file set path='src/' || substr(path, 40);"
48+
- name: Post coverage comment
49+
if: steps.get-tag.outputs.bump != ''
50+
uses: py-cov-action/python-coverage-comment-action@v3
51+
with:
52+
GITHUB_TOKEN: ${{ github.token }}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ instance/
8888

8989
# Sphinx documentation
9090
docs/_build/
91+
docs/source/pinder*
9192

9293
# PyBuilder
9394
.pybuilder/

CITATION.cff

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
cff-version: 1.2.0
2+
message: "If you use this software, please cite it as below."
3+
authors:
4+
- family-names: "Durairaj"
5+
given-names: "Janani"
6+
- family-names: "Adeshina"
7+
given-names: "Yusuf"
8+
- family-names: "Cao"
9+
given-names: "Zhonglin"
10+
- family-names: "Zhang"
11+
given-names: "Xuejin"
12+
- family-names: "Oleinikovas"
13+
given-names: "Vladas"
14+
- family-names: "Duignan"
15+
given-names: "Thomas"
16+
- family-names: "McClure"
17+
given-names: "Zachary"
18+
- family-names: "Robin"
19+
given-names: "Xavier"
20+
- family-names: "Rossi"
21+
given-names: "Emanuele"
22+
- family-names: "Zhou"
23+
given-names: "Guoqing"
24+
- family-names: "Veccham"
25+
given-names: "Srimukh"
26+
- family-names: "Isert"
27+
given-names: "Clemens"
28+
- family-names: "Peng"
29+
given-names: "Yuxing"
30+
- family-names: "Sundareson"
31+
given-names: "Prabindh"
32+
- family-names: "Akdel"
33+
given-names: "Mehmet"
34+
- family-names: "Corso"
35+
given-names: "Gabriele"
36+
- family-names: "Stärk"
37+
given-names: "Hannes"
38+
- family-names: "Carpenter"
39+
given-names: "Zachary"
40+
- family-names: "Bronstein"
41+
given-names: "Michael"
42+
- family-names: "Kucukbenli"
43+
given-names: "Emine"
44+
- family-names: "Schwede"
45+
given-names: "Torsten"
46+
- family-names: "Naef"
47+
given-names: "Luca"
48+
title: "PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource"
49+
doi: 10.1101/2024.07.17.603955
50+
version: 0.0.1
51+
date-released: 2024-07-17
52+
url: "https://github.com/plinder-org/plinder"
53+
preferred-citation:
54+
type: conference-paper
55+
authors:
56+
- family-names: "Durairaj"
57+
given-names: "Janani"
58+
- family-names: "Adeshina"
59+
given-names: "Yusuf"
60+
- family-names: "Cao"
61+
given-names: "Zhonglin"
62+
- family-names: "Zhang"
63+
given-names: "Xuejin"
64+
- family-names: "Oleinikovas"
65+
given-names: "Vladas"
66+
- family-names: "Duignan"
67+
given-names: "Thomas"
68+
- family-names: "McClure"
69+
given-names: "Zachary"
70+
- family-names: "Robin"
71+
given-names: "Xavier"
72+
- family-names: "Rossi"
73+
given-names: "Emanuele"
74+
- family-names: "Zhou"
75+
given-names: "Guoqing"
76+
- family-names: "Veccham"
77+
given-names: "Srimukh"
78+
- family-names: "Isert"
79+
given-names: "Clemens"
80+
- family-names: "Peng"
81+
given-names: "Yuxing"
82+
- family-names: "Sundareson"
83+
given-names: "Prabindh"
84+
- family-names: "Akdel"
85+
given-names: "Mehmet"
86+
- family-names: "Corso"
87+
given-names: "Gabriele"
88+
- family-names: "Stärk"
89+
given-names: "Hannes"
90+
- family-names: "Carpenter"
91+
given-names: "Zachary"
92+
- family-names: "Bronstein"
93+
given-names: "Michael"
94+
- family-names: "Kucukbenli"
95+
given-names: "Emine"
96+
- family-names: "Schwede"
97+
given-names: "Torsten"
98+
- family-names: "Naef"
99+
given-names: "Luca"
100+
doi: "10.1101/2024.07.17.603955"
101+
journal: "bioRxiv"
102+
eventtitle: "Machine Learning for Life and Material Science, ICML 2024"
103+
month: 7
104+
title: "PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource"
105+
year: 2024

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright 2024 VantAI, Inc.
189+
Copyright 2024 Plinder Development Team
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.

NOTICE

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
PLINDER - The Protein Ligand INteractions Dataset and Evaluation Resource
2+
Copyright (c) 2024, Plinder Development Team
3+
4+
The PLINDER project is a collaboration between the
5+
University of Basel, SIB Swiss Institute of Bioinformatics,
6+
VantAI, NVIDIA, and MIT CSAIL.
7+
8+
If you find this software useful, please cite:
9+
10+
Durairaj, Janani, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, et al. “PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource.” bioRxiv, July 17, 2024, 2024.07.17.603955. https://doi.org/10.1101/2024.07.17.603955.

README.md

Lines changed: 44 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
![plinder](./assets/plinder.png)
1+
![plinder](https://github.com/user-attachments/assets/05088c51-36c8-48c6-a7b2-8a69bd40fb44)
22

33
<div align="center">
44
<h1>The Protein Ligand INteractions Dataset and Evaluation Resource</h1>
@@ -8,8 +8,23 @@
88

99
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/plinder-org/plinder/blob/master/LICENSE.txt)
1010
[![test](https://github.com/plinder-org/plinder/actions/workflows/pr.yaml/badge.svg)](https://github.com/plinder-org/plinder/actions/workflows/pr.yaml)
11-
[![Coverage badge](https://github.com/plinder-org/plinder/raw/python-coverage-comment-action-data/badge.svg)](https://github.com/plinder-org/plinder/tree/python-coverage-comment-action-data)
11+
[![coverage](https://github.com/plinder-org/plinder/raw/python-coverage-comment-action-data/badge.svg)](https://github.com/plinder-org/plinder/tree/python-coverage-comment-action-data)
1212

13+
![overview](https://github.com/user-attachments/assets/39d251b1-8114-4242-b9fc-e0cce900d22f)
14+
15+
# 📚 About
16+
17+
**plinder**, short for **p**rotein **l**igand **in**teractions **d**ataset and **e**valuation **r**esource,
18+
is a dataset and resource for training and evaluation of protein-ligand docking algorithms.
19+
It is a comprehensive, annotated, high quality dataset:
20+
21+
- \> 400k PLI systems across > 11k SCOP domains and > 50k unique small molecules
22+
- 500+ annotations for each system, including protein and ligand properties, quality, matched molecular series and more
23+
- Automated curation pipeline to keep up with the PDB
24+
- 14 PLI metrics and over 20 billion similarity scores
25+
- Unbound \(_apo_\) and _predicted_ Alphafold2 structures linked to _holo_ systems
26+
- `train-val-test` splits and ability to tune splitting based on the learning task
27+
- Robust evaluation harness to simplify and standard performance comparison between models
1328

1429
# 📢 Notice
1530

@@ -19,32 +34,16 @@ VantAI, NVIDIA, MIT CSAIL, and the community at large.
1934
If you find `plinder` useful,
2035
please see the citation file for details on how to cite.
2136

22-
# 🚧 Under construction
23-
24-
Please bear with us as we migrate the `plinder` project to
25-
open source as we work to share it with the world. There are
26-
some gaps in the code and documentation, which will be fixed
27-
as soon as possible. The dataset itself is complete, but the
28-
code to interact with some parts of the dataset is still under
29-
development.
30-
31-
# 📚 About
32-
33-
**plinder**, short for **p**rotein **l**igand **in**teractions **d**ataset and **e**valuation **r**esource,
34-
is a dataset and resource for training and evaluation of protein-ligand docking algorithms.
35-
3637
# 👨💻 Getting Started
3738

3839
Please use a virtual environment for the `plinder` project.
3940
We recommend the [miniforge](https://github.com/conda-forge/miniforge) environment manager.
4041

41-
4242
**NOTE**: We currently only support a Linux environment. `plinder`
4343
uses `openstructure` for some of its functionality and is available
4444
from the `aivant` conda channel using `conda install aivant::openstructure`, but it is only built targeting Linux architectures.
4545
For MacOS users, please see the relevant [docker](#package-publishing) resources below.
4646

47-
4847
## Install plinder
4948

5049
The `plinder` package can be obtained from GitHub:
@@ -60,8 +59,7 @@ Or with a development installation:
6059
cd plinder
6160
pip install -e '.[dev]'
6261

63-
64-
# ⬇️ Getting the dataset
62+
# ⬇️ Getting the dataset
6563

6664
Using the `plinder.core` API, you can transparently and lazily
6765
download and interact with most of the components of the dataset.
@@ -109,21 +107,37 @@ with the dataset.
109107

110108
## 🏅 Gold standard benchmark sets
111109

112-
Discuss stratification efforts
110+
As part of `plinder` resource we also provide train, validation and test splits that are curated to minimize the information leakage based on protein-ligand interaction similarity. In addition, we have prioritized the systems that has a linked experimental `apo` structure or matched molecular series to support realistic inference scenarios for hit discovery and optimization.
111+
Finally, a particular care is taken for test set that is further prioritized to contain high quality structures to provide unambiguous ground-truths for performance benchmarking.
112+
113+
![plinder](./assets/plinder_test_stratification.png)
114+
115+
Moreover, as we enticipate this resource to be used for benchmarking a wide range of methods, including those simultaneously predicting protein structure (aka. co-folding) or those generating novel ligand structures, we further stratified test (by novel ligand, pocket, protein or all) to cover a wide range of tasks.
116+
117+
Our latest test split [#TODO] contains:
118+
119+
| Novel | # of systems | # of high quality | stratification criteria |
120+
|:--------|---------------:|------------------:|:---------------:|
121+
| pocket | 5206 | 5203 | PLI shared < 50 _&_ Pocket shared lDDT < 0.5 |
122+
| ligand | 2395 | 2395 | ECFP4 fingerprint similarity < 0.3 |
123+
| protein | 983 | 983 | Protein Seq. Sim. < 0.3 _&_ Protein lDDT > 0.7 |
124+
| all | 268 | 268 | all of the above |
125+
| none | 0 | 0 | none of the above |
126+
113127

114128
## 🧪 Training set
115129

116130
Discuss the splits
117131

118-
## ⚖️ Evaluation harness
132+
## ⚖️ Evaluation harness
119133

120134
See the [`plinder.eval`](#src/plinder-eval/plinder/eval/docking/README.md) docs for more details.
121135

122136
## 📦 Dataloader
123137

124138
Dataloader is currently under construction.
125139

126-
## ℹ️ Filters & Annotations
140+
## ℹ️ Filters & Annotations
127141

128142
See the [`plinder.data`](#src/plinder-data/plinder/data/README.md) docs for more details.
129143

@@ -135,7 +149,6 @@ We are currently working on the following:
135149
- Establishing a leaderboard
136150
- Improving the documentation and examples
137151

138-
139152
# 👨💻 Code organization
140153

141154
This code is split into 4 sub-packages
@@ -147,16 +160,15 @@ This code is split into 4 sub-packages
147160

148161
# 💽 Dataset Generation
149162

150-
![Workflow](./assets/workflow.png)
163+
![workflow](https://github.com/user-attachments/assets/cde72643-5fdf-4998-8719-216d0cef2706)
151164

152165
See the [End-to-end pipeline](#src/plinder-data/README.md) description for technical details about the dataset generation.
153166

154-
155167
# 📝 Examples & documentation
156168

157169
Package documentation, including API documentation, [example notebooks](examples/), and supplementary guides, are made available.
158170

159-
# ⚙️ Dev guide
171+
# ⚙️ Dev guide
160172

161173
To develop and test changes to the source code, please use a development installation:
162174

@@ -221,3 +233,8 @@ since the previous release:
221233
- If `bumpversion patch` is present in the commit message (or nothing is found), the patch version will be bumped
222234

223235
**NOTE**: The CI workflow will use the __most recent__ match in the commit history to make its decision.
236+
237+
# 📃 Publications
238+
Durairaj, Janani, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, Xavier Robin, Emanuele Rossi, Guoqing Zhou, Srimukh Prasad Veccham, Clemens Isert, Yuxing Peng, Prabindh Sundareson, Mehmet Akdel, Gabriele Corso, Hannes Stärk, Zachary Wayne Carpenter, Michael M. Bronstein, Emine Kucukbenli, Torsten Schwede, Luca Naef. 2024. “PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource.”
239+
[bioRxiv](https://doi.org/10.1101/2024.07.17.603955)
240+
[ICML'24 ML4LMS](https://openreview.net/forum?id=7UvbaTrNbP)

assets/plinder.png

-86.3 KB
Binary file not shown.
962 KB
Loading

assets/workflow.png

-602 KB
Binary file not shown.

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

0 commit comments

Comments
 (0)