Work in progress - please contact ejmctavish@ucmerced.edu or mtholder@ku.edu with any issues or questions.
In this tutorial we will walk through:
- Getting existing trees for arbitrary sets of taxa
- Visualizing conflict between estimates
- Getting date estimates for nodes
- Updating an existing phylogeny with new data
The Open Tree of Life (https://opentreeoflife.github.io/) is a project that unites phylogenetic inferences and taxonomy to provide a synthetic estimate of species relationships across the entire tree of life.

This tree currently includes 2.4 million tips, and is based on 1,330 published studies which include 112,890 unique tips.
The synthetic tree uses a combined taxonomy across a large number of taxonomy resources with evolutionary estimates from published phylogenetic studies. https://opentreeoflife.github.io/browse/
https://tree.opentreeoflife.org is our interactive tree viewer. You can browse by the synthetic tree and leave feedback.
Click on nodes to move through the tree. If you click the "Legend" button at the top, you will get an explanation of what information the visual elements of the tree convey.
You can reveal the "Properties panel" by clicking on "ⓘ Show Properties" button or the "ⓘ" link that appears when your mouse is over a node or branch in the tree.
The properties panel contains:
- links to the taxon in our reference taxonomy (OTT) and other taxonomies
- the ID of the node
- the count of how many tips in the tree descend from the node
- information about how to download a Newick representation of the subtree rooted at that node, and
- information about taxonomies and phylogenies that support or disagree with that node. (e.g. https://tree.opentreeoflife.org/opentree/opentree14.9@mrcaott3089ott32977/Corytophaninae--Leiocephalus)
Clicking on "ⓘ Hide Properties" will hide the panel so that you can see more of the tree.
If you have feedback about the relationships that you see, use the "Add Comment" button. Comments that are entered here are stored as issues in our feedback repository.
However! that pretty much won't change anything unless you upload a tree that reflects the relationships you think should be there and adding it to synthesis.
https://tree.opentreeoflife.org/taxonomy/browse is our browser for the Open Tree Taxonomy. That taxonomy is an input into our full synthetic tree and is used to help us align tips in different trees that refer to the same taxon.
The taxonomy includes links to unique identifiers in other digitally available taxonomies, such as GBIF or NCBI.
We are working towards swapping to the Catalogue of Life taxonomy used by GBIF for easier updating and cross-compatibility.
Check out https://tree.opentreeoflife.org
Search for your favorite organism! Don't agree with the relationships? You can fix them by uploading new inferences.
You can use the download a subtree of interest directly from the website.
You will need to have git and python3 installed for this tutorial. See installation instructions here https://opentreeoflife.github.io/SSBworkshop2026/#setup
Get the tutorial folder
git clone https://github.com/McTavishLab/SSB2026
cd SSB2026
We will use wrappers available in the python package OpenTree,(McTavish et al 2022) to make it easier to work with the Open Tree Api's.
Install the opentree python package in a virtual environment:
python3 -m venv venv-opentree
source venv-opentree/bin/activate
pip install opentree
pip install dendropy
pip install requests
It is often useful to access the pruned subtree for just the taxa you are interested in. In order to do so, you need to map taxon names to unique identifiers. One of the key challenges of comparing trees across studies is minor differences in names and naming.
A solution to this, is mapping taxon names to unique identifiers using the Open Tree Taxonomic Name Resolution Service (TNRS). There are a few options to use this service including via the API, or the browser based bulk name mapping. https://tree.opentreeoflife.org/curator/tnrs/
We will look up a tree of the hydrozoan jellyfish species found around Woods Hole.
The names of the taxa you will search for this tutorial were dowloaded from GBIF (GBIF.org (27 May 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.gcmn6n).
The names are in the file opentree_tutorial/data/WH_hydrozoan_names.txt
Try this
- Go to https://tree.opentreeoflife.org/curator/tnrs/
- Click on "add names", and upload the names file.
- In the mapping options section,
- select 'Cnidarians' to narrow down the possibilities and speed up mapping
- Click "Map selected names"
Exact matches will show up in green, and can be accepted by clicking "accept exact matches".
A few taxa may still show suggested names. Click through to the taxonomy, and you will see that the name from the GBIF data is listed as a synonym.
Once you have accepted names for each of the taxa, click "save nameset".
Make sure your mappings were saved! If you don't 'accept' matches, they don't download.
Download the zip file to your laptop (renaming it through the browser doens't currently work). Extract the files. Take a look at the human readable version (output/main.csv).
main.json contains the the same data in a more computer readable format.
Transfer the output/main.csv file to the tutorial folder on the cluster, and rename it to WH_jellies.csv
You can use the OpenTree API's and taxon id numbers to get the tree for a subset of taxa directly from the command line
For example:
curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree -H "content-type:application/json" -d '{"ott_ids":[662625, 765195, 662618]}'
For more on the OpenTree APIs see https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs
It is often more convenient to manipulate both trees and names within a scripting language.
There is a python script in the tree_comparison_tutorial folder 'get_synth_subtree.py'.
This script uses the taxon ids in the file WH_jellies.csv you transferred from your computer to get a synthetic tree from the OpenTree APIs.
If you had trouble with that step you can use backup_output/WH_jellies.csv as the input file instead.
The argument 'output' sets the first part of the output filename.
python get_synth_subtree.py --input-file data/WH_jellies.csv --output WH_jellyfish_synth
(or if we skipped actually running the TNRS)
python get_synth_subtree.py --input-file backup_output/WH_jellies.csv --output WH_jellyfish_synth
This script will write two files out to your current working directory - the tree in newick format, 'WH_jellyfish_synth.tre' and 'WH_jellyfish_synth_citations.txt' the citations of published trees that went into generating that tree, and support the relationships in it.
Open the synthetic subtree in figtree, and the citations in a text viewer.
Q Are any of the genera non-monophyletic? What one(s)?
Q Look at this genus/genera in the tree viewer (tree.opentreeoflife.org). What studies break the monophyly of each taxon?
Q Is there conflict among the input sources? Is there any tree from these studies that recovers either of these genera as monophyletic?
Or you can search names in the search box at tree.opentreeoflife.org, and get the ott ids from there.
For example, to see the relationships between Python (ott:675102), e.g. Python regius, Podarcis (ott:937560), e.g. Podarcis muralis ,
and Anolis carolinenesis (ott:705356)
,
,
.
try running:
python get_synth_subtree.py --ott-ids 970153 675102 937560 --output lizards
The output tree will be written to lizards.tre.
Q Is there anything surprising about these relationships? (The answer probably depends on your pre-existing herp phylogeny knowledge :P)
You can seach the corpus of trees based on taxon name or taxon id
python find_trees.py "Homarus americanus" --property ot:ottTaxonName
python find_trees.py 937560 --property ot:ottId
This is beta functionality, that we are in the process of adding to the OpenTree services.
To estimate dates, we will use the Chronosynth API. The dates API is work-in-progress, and so it is not yet as user friendly as it will be.
A summary of the methods is here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Chronosynth-methods-overview
There are some API docs here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Draft-API-docs
Using the dates API we can get dates that align to for individual nodes in the synth tree.
This is based on the same information you saw in the conflict viewer.
You can you a CURL call to GET the current information for dates for a node.
You can query the study corpus based on either a higher taxon id (e.g. Python ott675102), or an internal node label from the synthesis tree (e.g. mrcaott1000311ott3643727).
If you take a look at downloaded synth tree file lizards.tre, you can see the internal node labels, or they are in the URL of teh tree browser online.
The dates API will return the ages of internal nodes of input trees that align with that node.
e.g.
https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott675102
or
curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott675102
To look at the node itself, you can navigate to
"https://tree.opentreeoflife.org/curator/study/view/{STUDY ID}?tab=home&tree={TREE ID}&node={NODE_ID}"
e.g. https://tree.opentreeoflife.org/curator/study/view/ot_307?tab=home&tree=tree2&node=node10778
The python script get_dates.py in the opentree tutorial folder translates the short form citations into their full citation information. It outputs the tree if you input a list of ids, and a date file with age estimates and citations for nodes (look at the tree file to see which node is which).
python get_dates.py --ott-ids 970153 675102 937560 --output lizard_ages
Open 'lizard_ages_dates.txt'.
Q What are the maximum and minimum age estimates for the root of this three taxon tree?
Q Is there overlap between the age estimates for the root and for the internal node?
You can also gather node age data for a single taxon, as long as it is not a tip of the tree.
python get_dates.py --ott-ids 675102 --output python_ages
There is also an R-package to gather date information and estimate dated trees, Datelife, available online at (datelife.opentreeoflife.org).
You can also get dates for arbitrary nodes in the synth tree, which are not associated with taxa.
curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/mrcaott1000311ott3643727 | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 184 100 184 0 0 213 0 --:--:-- --:--:-- --:--:-- 213
{
"query": "mrcaott1000311ott3643727",
"synth_node_id": "mrcaott1000311ott3643727",
"ot:source_node_ages": [
{
"age": 9.325001,
"source_id": "ot_1592@tree1",
"source_node": "node22956"
}
]
}
This node https://tree.opentreeoflife.org/opentree/argus/opentree13.4@mrcaott1000311ott3643727 in the synthetic tree aligns with this node in this dated tree https://tree.opentreeoflife.org/curator/study/view/ot_1592?tab=home&tree=tree1&node=node22956
How many nodes are in your tree? How many do you have date estimates for?
Lets choose a part of the synth tree where we have lot of data!
https://tree.opentreeoflife.org/opentree/opentree13.4@ott109893/Cracidae
We can estimate a dated tree for Cracidae
import requests
import json
import dendropy
from opentree import OT
## Get Dated synth tree
url = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'
## Here, we have several date estimates for each node,
## so we can run the summarization several times,
## choosing one at random each time
payload = { "node_id" : 'ott109893',
"select":'random',
"reps" : 5}
resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()
treeset = dendropy.TreeList.get(string=";".join(resp_dict['dated_trees_newick_list']), schema="newick")
#The labels are all as ottids - which are convenient for data analysis but annoying for interpretability
## This uses an API call to translate them back to
for taxon in treeset.taxon_namespace:
ottid = taxon.label
output = OT.taxon_info(ottid)
taxon.label = output.response_dict["name"] + "_" + ottid
treeset.write(path="labelled_treelist.tre", schema="newick")
## You can also download all the data from this link:
resp_dict['tar_file_download']
'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_09_2026_14_05_29.tar.gz'
wget 'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_09_2026_14_05_29.tar.gz'
tar -xzvf chrono_out_01_09_2026_14_05_29.tar.gz
This directory (awkwardly nested in a directory named 'tmp') contains the sampled ages for each of 5 runs, the set of 5 tree in the file bladj.tre, and the short form citations in date_cites.txt
We can also compare nodes in our custom synth tree to dated trees, and use them to infer dates for those nodes.
Read in your custom synth tree, and get dates for it
import dendropy
import requests
import json
custom_synth_dir = "snacktavish_aves_839319_example"
treepath = "{}/labelled_supertree/labelled_supertree.tre".format(custom_synth_dir)
custom_synth = dendropy.Tree.get_from_path(treepath, schema = "newick")
url = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'
payload = { "newick" : custom_synth.as_string(schema="newick")}
## This step is slow!
resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()
dated_tree = dendropy.Tree.get(data=resp_dict['dated_trees_newick_list'][0], schema="newick")
dated_tree.write(path = "ottid_dated_tree.tre",schema= "newick")
##The labels on this tree are ott ids - we can translate them back to taxon names
labelled_treepath = "{}/labelled_supertree/labelled_supertree_ottnames.tre".format(custom_synth_dir)
labelled_custom_synth = dendropy.Tree.get_from_path(labelled_treepath, schema = "newick")
label_dict = dict()
for name in labelled_custom_synth.taxon_namespace:
label_dict[name.label.split()[-1]]=name.label
for taxon in dated_tree.taxon_namespace:
taxon.label = label_dict[taxon.label]
dated_tree.write(path = "labelled_dated_tree.tre",schema= "newick")
#To download the set of mapped ages you can get the tar file from
resp_dict['tar_file_download']
## Once you download and unzip
## The ages are in tmp/chrono_out_datetime/node_ages.jsonhttps://github.com/snacktavish/TreeUpdatingComparison/blob/master/answers/OpenTree_exercise_answers.md (although some answers could change in future, as the OpenTree synthetic tree is updated with new phylogenetic data)
If there is time, try one of the ideas below.
Make a list of taxa you are interested in and save it in a text file. (Scientific names only)
Resolve those names to Open Tree identifiers, and use get_synth_subtree.py to get a tree for your taxa of interest.
Take a look at the area of the synthetic tree that is interesting to you.
Do you have, or know of a published tree that would do a better job on those relationships, but it isn't included in the synthetic tree?
Upload it to the main website https://tree.opentreeoflife.org/curator, and those phylogenetic inferences will be incorporated into later drafts of the synthetic tree!
There is a great package, Rotl that makes it easy to access and work with OpenTree data in R.
Try it out using either: Tutorial on rotl at, https://ropensci.org/tutorials/rotl_tutorial/ Tutorial on linking data from OpenTree with species locations from GBIF, https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html
Complete bird trees taxonomically matched to large scale trait and location data sets.
https://github.com/eliotmiller/clootl/blob/master/examples/avonet.md
### Unifying geographic and phylogenetic data using python-opentree and Jupyter notebooksBrain a bit tired? There are some fun visualizations of the OpenTree tree.
Take a look around OneZoom https://www.onezoom.org/ tree of life explorer
or this emoji hyperbolic tree https://glouwa.github.io/


