RNA-KG v2.0: An ontology-based KG for representing interactions involving RNA molecules enriched with properties
RNA-KG is a knowledge graph encompassing biological knowledge about RNAs gathered from more than 90 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. Relationships are characterized by standardized properties that capture the specific context (e.g., cell line, tissue, pathological state) in which they have been identified. In addition, the nodes are enriched with detailed attributes, such as descriptions, synonyms, and molecular sequences sourced from platforms such as OBO ontologies, NCBI repositories, RNAcentral, and Ensembl. RNA-KG can be both used by directly exploring and visualizing the KG, and by applying computational methods to analyze and infer bio-medical knowledge. RNA-KG is constantly maintained and updated with new experimental data. More details can be found in RNA-KG v1.0 article and RNA-KG v2.0 pre-print.
- Notebooks and pointers to processed data and ontologies to build the current release of RNA-KG.
- Code for reproducing experiments described in the RNA-KG v2.0 pre-print (Applications and Use Cases section).
- RNA-KG v2.0 website: https://RNA-KG.biodata.di.unimi.it
- Public Neo4j endpoint: https://neo4j.biodata.di.unimi.it (usr: rnakgv20, pwd: rnakgv20)
- Database dump: https://RNA-KG.biodata.di.unimi.it/rnakgv20.dump; raw nodes list: https://RNA-KG.biodata.di.unimi.it/nodes.csv; raw edges list: https://RNA-KG.biodata.di.unimi.it/edges.csv (raw data are available on Zenodo)
- RNA-KG v2.0 API docs: https://RNA-KG.biodata.di.unimi.it/api/v1/docs
- RNA-KG v2.0 pre-print: https://arxiv.org/abs/2508.07427; RNA-KG v1.0 paper: https://www.nature.com/articles/s41597-024-03673-7
The current release can be generated via the four provided Jupyter Notebooks in the notebooks directory. main.py is used by the first notebook to process ontologies according to PheKnowLator's implementation of OWL-NETS.
Steps:
- Download, clean, and generate the ontology graph by merging the 11 ontologies describing the RNA-KG metagraph/schema.
- Define and generate lookup tables for normalizing entities according to standard identifiers.
- Process relationships from the 80 linked open data repositories, including edge properties.
- Add node properties and link them to ontology terms.
Finally, import the generated CSV files into Neo4j:
sudo bin/neo4j-admin database import full "RNA-KGv2.0" \
--nodes=neo4j/import/nodes.csv \
--relationships=neo4j/import/edges.csv \
--overwrite-destination \
--verbose \
--skip-duplicate-nodes \
--skip-bad-relationships \
--multiline-fields=trueRNA-KG is a knowledge graph encompassing biological knowledge about RNAs gathered from 91 linked open data repositories and ontologies. Here, we list integrated sources and ontologies. Special thank to DrugBank for liking this project and making me access supplementary data!
- RNA-centered sources: miRBase; miRDB; miRNet; miRecords; EpimiR; HMDD; miR2Disease; TargetScan; SomamiR DB; TarBase; miRTarBase; SM2miR; TransmiR; PolymiRTS; dbDEMC; TAM; PuTmiR; miRPathDB; miRCancer; miRdSNP; miRandola; ICBP siRNA; Apta-Index; eSkip-Finder; Addgene; LncBook; LncRNADisease; LncExpDB; dbEssLnc; lncATLAS; NONCODE; Lnc2Cancer; LncRNAWiki; LncBase; TANRIC; Ribocentre; Rfam; ViroidDB; TBDB; RSwitch; tRFdb; tsRFun; MINTbase; snoDB; tRNAdb; GtRNAdb; piRBase; iPiDA-GCN; TarpiD; RNAInter; RNALocate; RNADisease; ncRDeathDB; cncRNADB; ViRBase; Vesiclepedia; DirectRMDB; Modomics; starBase2; microT; miRanda; RNAcentral; PhenomiR; circBase; Ensembl; RNAhybrid; POSTAR2
- Sources covering "general" biomedical knowledge: DisGeNET; GeneMANIA; The Human Protein Atlas; CTD; ClinVar; STRING; Reactome; HGNC; UniProtKB; DrugBank; The GO resource; COSMIC; GTEx
- Ontologies: Gene Ontology (GO); Mondo Disease Ontology (Mondo); Human Phenotype Ontology (HPO); Vaccine Ontology (VO); Chemical Entities of Biological Interest (ChEBI); Uber-anatomy ontology (Uberon); Cell Line Ontology (CLO); PRotein Ontology (PRO); Sequence Ontology (SO); Pathway Ontology (PW); Relations Ontology (RO)
For standardizing properties, we also considered: dbSNP; PubMed; NCI Thesaurus OBO Edition (NCIT); Chemical Methods Ontology (CHMOD); Cellosaurus; Disease Ontology (DO); Online Mendelian Inheritance in Man (OMIM); NCBI organismal classification (NCBItaxon); Medical Subject Headings (MeSH)
Don't hesitate to contact us, especially if you believe a new data source should be integrated into RNA-KG. To get in touch with us, please create an issue or send us an email 📩.
- Application of Graph Representation Learning methods to analyze RNA-KG → article; code (link prediction pipeline)
- LLM+RNA-KG: validating RNA-related facts extracted from the literature via LLM by combining RNA-KG and graph ML → SPIREX
- Development of a RNA Ontology with a particular emphasis on non-coding RNA molecules.
- Development of graphical facilities for supporting the user in the data acquisition process and thus reducing the manual effort required for mapping the data available in the different data sources into RNA-KG.
This project is licensed under Apache License 2.0 - see the LICENSE.md file for details.
Please cite the following articles if RNA-KG was useful for your research:
@article{Cavalleri2024rnakg,
title="An ontology-based knowledge graph for representing interactions involving RNA molecules",
author="Emanuele Cavalleri and Alberto Cabri and Mauricio Soto-Gomez and Sara Bonfitto and Paolo Perlasca and Jessica Gliozzo and Tiffany J. Callahan and Justin Reese and Peter N Robinson and Elena Casiraghi and Giorgio Valentini and Marco Mesiti",
year="2024",
journal="Sci. Data",
publisher="Springer Science and Business Media LLC",
volume=11,
number=1,
pages="906",
month=aug,
year=2024,
copyright="https://creativecommons.org/licenses/by-nc-nd/4.0",
language="en"
}@misc{Cavalleri2025rnakgv20,
title={RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties},
author={Emanuele Cavalleri and Paolo Perlasca and Marco Mesiti},
year={2025},
eprint={2508.07427},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2508.07427},
}
