Skip to content

Collection of tutorials to build RNA-KG v2.0 from scratch and for reproducing experiments such as "context-aware" KG pruning, clustering, and link prediction.

License

Notifications You must be signed in to change notification settings

AnacletoLAB/RNA-KG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA-KG v2.0: An ontology-based KG for representing interactions involving RNA molecules enriched with properties

RNA-KG is a knowledge graph encompassing biological knowledge about RNAs gathered from more than 90 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. Relationships are characterized by standardized properties that capture the specific context (e.g., cell line, tissue, pathological state) in which they have been identified. In addition, the nodes are enriched with detailed attributes, such as descriptions, synonyms, and molecular sequences sourced from platforms such as OBO ontologies, NCBI repositories, RNAcentral, and Ensembl. RNA-KG can be both used by directly exploring and visualizing the KG, and by applying computational methods to analyze and infer bio-medical knowledge. RNA-KG is constantly maintained and updated with new experimental data. More details can be found in RNA-KG v1.0 article and RNA-KG v2.0 pre-print.


Metagraph Excerpt of the RNA-KG schema


Sample Excerpt of a RNA-KG subgraph


What Does This Repository Provide?

  • Notebooks and pointers to processed data and ontologies to build the current release of RNA-KG.
  • Code for reproducing experiments described in the RNA-KG v2.0 pre-print (Applications and Use Cases section).

Releases



Generate RNA-KG current release

The current release can be generated via the four provided Jupyter Notebooks in the notebooks directory. main.py is used by the first notebook to process ontologies according to PheKnowLator's implementation of OWL-NETS.

Steps:

  1. Download, clean, and generate the ontology graph by merging the 11 ontologies describing the RNA-KG metagraph/schema.
  2. Define and generate lookup tables for normalizing entities according to standard identifiers.
  3. Process relationships from the 80 linked open data repositories, including edge properties.
  4. Add node properties and link them to ontology terms.

Finally, import the generated CSV files into Neo4j:

sudo bin/neo4j-admin database import full "RNA-KGv2.0" \
    --nodes=neo4j/import/nodes.csv \
    --relationships=neo4j/import/edges.csv \
    --overwrite-destination \
    --verbose \
    --skip-duplicate-nodes \
    --skip-bad-relationships \
    --multiline-fields=true

List of integrated sources

RNA-KG is a knowledge graph encompassing biological knowledge about RNAs gathered from 91 linked open data repositories and ontologies. Here, we list integrated sources and ontologies. Special thank to DrugBank for liking this project and making me access supplementary data!

For standardizing properties, we also considered: dbSNP; PubMed; NCI Thesaurus OBO Edition (NCIT); Chemical Methods Ontology (CHMOD); Cellosaurus; Disease Ontology (DO); Online Mendelian Inheritance in Man (OMIM); NCBI organismal classification (NCBItaxon); Medical Subject Headings (MeSH)

Get In Touch or Get Involved

Contact Us

Don't hesitate to contact us, especially if you believe a new data source should be integrated into RNA-KG. To get in touch with us, please create an issue or send us an email 📩.

Related projects

  • Application of Graph Representation Learning methods to analyze RNA-KG → article; code (link prediction pipeline)
  • LLM+RNA-KG: validating RNA-related facts extracted from the literature via LLM by combining RNA-KG and graph ML → SPIREX

Future work

  • Development of a RNA Ontology with a particular emphasis on non-coding RNA molecules.
  • Development of graphical facilities for supporting the user in the data acquisition process and thus reducing the manual effort required for mapping the data available in the different data sources into RNA-KG.

Attribution

Licensing

This project is licensed under Apache License 2.0 - see the LICENSE.md file for details.

Citing RNA-KG

Please cite the following articles if RNA-KG was useful for your research:

@article{Cavalleri2024rnakg,
    title="An ontology-based knowledge graph for representing interactions involving RNA molecules",
    author="Emanuele Cavalleri and Alberto Cabri and Mauricio Soto-Gomez and Sara Bonfitto and Paolo Perlasca and Jessica Gliozzo and Tiffany J. Callahan and Justin Reese and Peter N Robinson and Elena Casiraghi and Giorgio Valentini and Marco Mesiti",
    year="2024",
    journal="Sci. Data",
    publisher="Springer Science and Business Media LLC",
    volume=11,
    number=1,
    pages="906",
    month=aug,
    year=2024,
    copyright="https://creativecommons.org/licenses/by-nc-nd/4.0",
    language="en"
}
@misc{Cavalleri2025rnakgv20,
  title={RNA-KG v2.0: An RNA-centered Knowledge Graph with Properties},
  author={Emanuele Cavalleri and Paolo Perlasca and Marco Mesiti},
  year={2025},
  eprint={2508.07427},
  archivePrefix={arXiv},
  primaryClass={cs.DB},
  url={https://arxiv.org/abs/2508.07427},
}

About

Collection of tutorials to build RNA-KG v2.0 from scratch and for reproducing experiments such as "context-aware" KG pruning, clustering, and link prediction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published