Skip to content

Latest commit

 

History

History
488 lines (291 loc) · 17.7 KB

File metadata and controls

488 lines (291 loc) · 17.7 KB

Comparing phylogenetic trees, getting a tree for your taxa, gathering date estimates

Work in progress - please contact ejmctavish@ucmerced.edu or mtholder@ku.edu with any issues or questions.

In this tutorial we will walk through:

  • Getting existing trees for arbitrary sets of taxa
  • Visualizing conflict between estimates
  • Getting date estimates for nodes
  • Updating an existing phylogeny with new data

The Open Tree of Life

The Open Tree of Life (https://opentreeoflife.github.io/) is a project that unites phylogenetic inferences and taxonomy to provide a synthetic estimate of species relationships across the entire tree of life.

This tree currently includes 2.4 million tips, and is based on 1,330 published studies which include 112,890 unique tips.

The synthetic tree uses a combined taxonomy across a large number of taxonomy resources with evolutionary estimates from published phylogenetic studies. https://opentreeoflife.github.io/browse/

The synthetic tree

Tree Browser

https://tree.opentreeoflife.org is our interactive tree viewer. You can browse by the synthetic tree and leave feedback.

Navigation

Click on nodes to move through the tree. If you click the "Legend" button at the top, you will get an explanation of what information the visual elements of the tree convey.

Seeing more info about a node

You can reveal the "Properties panel" by clicking on " Show Properties" button or the "" link that appears when your mouse is over a node or branch in the tree.

The properties panel contains:

Clicking on " Hide Properties" will hide the panel so that you can see more of the tree.

Feedback

If you have feedback about the relationships that you see, use the "Add Comment" button. Comments that are entered here are stored as issues in our feedback repository.

However! that pretty much won't change anything unless you upload a tree that reflects the relationships you think should be there and adding it to synthesis.

Taxonomy Browser

https://tree.opentreeoflife.org/taxonomy/browse is our browser for the Open Tree Taxonomy. That taxonomy is an input into our full synthetic tree and is used to help us align tips in different trees that refer to the same taxon.

The taxonomy includes links to unique identifiers in other digitally available taxonomies, such as GBIF or NCBI.

We are working towards swapping to the Catalogue of Life taxonomy used by GBIF for easier updating and cross-compatibility.

Accessing data using the website

Check out https://tree.opentreeoflife.org

Search for your favorite organism! Don't agree with the relationships? You can fix them by uploading new inferences.

You can use the download a subtree of interest directly from the website.

Tutorial setup

You will need to have git and python3 installed for this tutorial. See installation instructions here https://opentreeoflife.github.io/SSBworkshop2026/#setup

Get the tutorial folder

git clone https://github.com/McTavishLab/SSB2026
cd SSB2026

We will use wrappers available in the python package OpenTree,(McTavish et al 2022) to make it easier to work with the Open Tree Api's.

Install the opentree python package in a virtual environment:

python3 -m venv venv-opentree
source venv-opentree/bin/activate
pip install opentree
pip install dendropy
pip install requests

Getting a tree for your taxa

It is often useful to access the pruned subtree for just the taxa you are interested in. In order to do so, you need to map taxon names to unique identifiers. One of the key challenges of comparing trees across studies is minor differences in names and naming.

A solution to this, is mapping taxon names to unique identifiers using the Open Tree Taxonomic Name Resolution Service (TNRS). There are a few options to use this service including via the API, or the browser based bulk name mapping. https://tree.opentreeoflife.org/curator/tnrs/

We will look up a tree of the hydrozoan jellyfish species found around Woods Hole.

The names of the taxa you will search for this tutorial were dowloaded from GBIF (GBIF.org (27 May 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.gcmn6n).

The names are in the file opentree_tutorial/data/WH_hydrozoan_names.txt

Try this

  • Go to https://tree.opentreeoflife.org/curator/tnrs/
  • Click on "add names", and upload the names file.
  • In the mapping options section,
    • select 'Cnidarians' to narrow down the possibilities and speed up mapping
  • Click "Map selected names"

Exact matches will show up in green, and can be accepted by clicking "accept exact matches".

A few taxa may still show suggested names. Click through to the taxonomy, and you will see that the name from the GBIF data is listed as a synonym.

Once you have accepted names for each of the taxa, click "save nameset".

Make sure your mappings were saved! If you don't 'accept' matches, they don't download.

Download the zip file to your laptop (renaming it through the browser doens't currently work). Extract the files. Take a look at the human readable version (output/main.csv).

main.json contains the the same data in a more computer readable format.

Transfer the output/main.csv file to the tutorial folder on the cluster, and rename it to WH_jellies.csv

Using API's

You can use the OpenTree API's and taxon id numbers to get the tree for a subset of taxa directly from the command line

For example:

curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree -H "content-type:application/json" -d '{"ott_ids":[662625, 765195, 662618]}'

For more on the OpenTree APIs see https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs

It is often more convenient to manipulate both trees and names within a scripting language.

Getting a subtree

There is a python script in the tree_comparison_tutorial folder 'get_synth_subtree.py'.

This script uses the taxon ids in the file WH_jellies.csv you transferred from your computer to get a synthetic tree from the OpenTree APIs. If you had trouble with that step you can use backup_output/WH_jellies.csv as the input file instead.

The argument 'output' sets the first part of the output filename.

python get_synth_subtree.py --input-file data/WH_jellies.csv --output WH_jellyfish_synth

(or if we skipped actually running the TNRS)

python get_synth_subtree.py --input-file backup_output/WH_jellies.csv --output WH_jellyfish_synth

This script will write two files out to your current working directory - the tree in newick format, 'WH_jellyfish_synth.tre' and 'WH_jellyfish_synth_citations.txt' the citations of published trees that went into generating that tree, and support the relationships in it.

Open the synthetic subtree in figtree, and the citations in a text viewer.

Q Are any of the genera non-monophyletic? What one(s)?

Q Look at this genus/genera in the tree viewer (tree.opentreeoflife.org). What studies break the monophyly of each taxon?

Q Is there conflict among the input sources? Is there any tree from these studies that recovers either of these genera as monophyletic?

Comparing trees

Or you can search names in the search box at tree.opentreeoflife.org, and get the ott ids from there.

For example, to see the relationships between Python (ott:675102), e.g. Python regius, Podarcis (ott:937560), e.g. Podarcis muralis ,
and Anolis carolinenesis (ott:705356)
drawing, drawing, drawing .

try running:

python get_synth_subtree.py --ott-ids 970153 675102 937560  --output lizards

The output tree will be written to lizards.tre.

Q Is there anything surprising about these relationships? (The answer probably depends on your pre-existing herp phylogeny knowledge :P)

Finding published trees that have your taxon or taxa of interest:

You can seach the corpus of trees based on taxon name or taxon id

python find_trees.py "Homarus americanus" --property ot:ottTaxonName
python find_trees.py 937560 --property ot:ottId

Dated trees

This is beta functionality, that we are in the process of adding to the OpenTree services.

To estimate dates, we will use the Chronosynth API. The dates API is work-in-progress, and so it is not yet as user friendly as it will be.

A summary of the methods is here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Chronosynth-methods-overview
There are some API docs here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Draft-API-docs

Using the dates API we can get dates that align to for individual nodes in the synth tree. This is based on the same information you saw in the conflict viewer.
You can you a CURL call to GET the current information for dates for a node.

You can query the study corpus based on either a higher taxon id (e.g. Python ott675102), or an internal node label from the synthesis tree (e.g. mrcaott1000311ott3643727).

If you take a look at downloaded synth tree file lizards.tre, you can see the internal node labels, or they are in the URL of teh tree browser online.

The dates API will return the ages of internal nodes of input trees that align with that node.

e.g.

https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott675102

or

curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott675102

To look at the node itself, you can navigate to "https://tree.opentreeoflife.org/curator/study/view/{STUDY ID}?tab=home&tree={TREE ID}&node={NODE_ID}"
e.g. https://tree.opentreeoflife.org/curator/study/view/ot_307?tab=home&tree=tree2&node=node10778

The python script get_dates.py in the opentree tutorial folder translates the short form citations into their full citation information. It outputs the tree if you input a list of ids, and a date file with age estimates and citations for nodes (look at the tree file to see which node is which).

python  get_dates.py --ott-ids 970153 675102 937560 --output lizard_ages

Open 'lizard_ages_dates.txt'.

Q What are the maximum and minimum age estimates for the root of this three taxon tree?

Q Is there overlap between the age estimates for the root and for the internal node?

You can also gather node age data for a single taxon, as long as it is not a tip of the tree.

python  get_dates.py --ott-ids 675102 --output python_ages

There is also an R-package to gather date information and estimate dated trees, Datelife, available online at (datelife.opentreeoflife.org).

You can also get dates for arbitrary nodes in the synth tree, which are not associated with taxa.

curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/mrcaott1000311ott3643727 | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   184  100   184    0     0    213      0 --:--:-- --:--:-- --:--:--   213
{
  "query": "mrcaott1000311ott3643727",
  "synth_node_id": "mrcaott1000311ott3643727",
  "ot:source_node_ages": [
    {
      "age": 9.325001,
      "source_id": "ot_1592@tree1",
      "source_node": "node22956"
    }
  ]
}

This node https://tree.opentreeoflife.org/opentree/argus/opentree13.4@mrcaott1000311ott3643727 in the synthetic tree aligns with this node in this dated tree https://tree.opentreeoflife.org/curator/study/view/ot_1592?tab=home&tree=tree1&node=node22956

Evaluate your confidence in your tree.

How many nodes are in your tree? How many do you have date estimates for?

Dating a subtree

Lets choose a part of the synth tree where we have lot of data!
https://tree.opentreeoflife.org/opentree/opentree13.4@ott109893/Cracidae
We can estimate a dated tree for Cracidae

import requests
import json
import dendropy
from opentree import OT

## Get Dated synth tree
url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'


## Here, we have several date estimates for each node,
## so we can run the summarization several times,
## choosing one at random each time
payload = { "node_id" : 'ott109893',
            "select":'random',
            "reps" : 5}

resp = requests.post(url=url, data=json.dumps(payload))

resp_dict = resp.json()

treeset = dendropy.TreeList.get(string=";".join(resp_dict['dated_trees_newick_list']), schema="newick")


#The labels are all as ottids - which are convenient for data analysis but annoying for interpretability

## This uses an API call to translate them back to 
for taxon in treeset.taxon_namespace:
    ottid = taxon.label
    output = OT.taxon_info(ottid)
    taxon.label =  output.response_dict["name"] + "_" + ottid


treeset.write(path="labelled_treelist.tre", schema="newick")

## You can also download all the data from this link:
resp_dict['tar_file_download']
'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_09_2026_14_05_29.tar.gz'
wget 'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_09_2026_14_05_29.tar.gz'
tar -xzvf chrono_out_01_09_2026_14_05_29.tar.gz

This directory (awkwardly nested in a directory named 'tmp') contains the sampled ages for each of 5 runs, the set of 5 tree in the file bladj.tre, and the short form citations in date_cites.txt

We can also compare nodes in our custom synth tree to dated trees, and use them to infer dates for those nodes.

Read in your custom synth tree, and get dates for it

import dendropy
import requests
import json

custom_synth_dir = "snacktavish_aves_839319_example"
treepath = "{}/labelled_supertree/labelled_supertree.tre".format(custom_synth_dir)

custom_synth = dendropy.Tree.get_from_path(treepath, schema = "newick")

url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'

payload = { "newick" : custom_synth.as_string(schema="newick")}


## This step is slow!
resp = requests.post(url=url, data=json.dumps(payload))


resp_dict = resp.json()


dated_tree = dendropy.Tree.get(data=resp_dict['dated_trees_newick_list'][0], schema="newick")

dated_tree.write(path = "ottid_dated_tree.tre",schema= "newick")


##The labels on this tree are ott ids - we can translate them back to taxon names 

labelled_treepath = "{}/labelled_supertree/labelled_supertree_ottnames.tre".format(custom_synth_dir)


labelled_custom_synth = dendropy.Tree.get_from_path(labelled_treepath, schema = "newick")

label_dict = dict()
for name in labelled_custom_synth.taxon_namespace:
     label_dict[name.label.split()[-1]]=name.label


for taxon in dated_tree.taxon_namespace:
    taxon.label = label_dict[taxon.label]

dated_tree.write(path = "labelled_dated_tree.tre",schema= "newick")


#To download the set of mapped ages you can get the tar file from

resp_dict['tar_file_download']

## Once you download and unzip 
## The ages are in tmp/chrono_out_datetime/node_ages.json

Answers to the questions

https://github.com/snacktavish/TreeUpdatingComparison/blob/master/answers/OpenTree_exercise_answers.md (although some answers could change in future, as the OpenTree synthetic tree is updated with new phylogenetic data)

Choose your own adventure!

If there is time, try one of the ideas below.

Get a synthetic tree

Make a list of taxa you are interested in and save it in a text file. (Scientific names only)

Resolve those names to Open Tree identifiers, and use get_synth_subtree.py to get a tree for your taxa of interest.

Contribute to OpenTree

Take a look at the area of the synthetic tree that is interesting to you.

Do you have, or know of a published tree that would do a better job on those relationships, but it isn't included in the synthetic tree?

Upload it to the main website https://tree.opentreeoflife.org/curator, and those phylogenetic inferences will be incorporated into later drafts of the synthetic tree!

Unifying geographic and phylogenetic data using R/Rstudio

There is a great package, Rotl that makes it easy to access and work with OpenTree data in R.

Try it out using either: Tutorial on rotl at, https://ropensci.org/tutorials/rotl_tutorial/ Tutorial on linking data from OpenTree with species locations from GBIF, https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html

drawing

Working on birds? Check out clootl!

Complete bird trees taxonomically matched to large scale trait and location data sets.

https://github.com/eliotmiller/clootl/blob/master/examples/avonet.md

drawing

### Unifying geographic and phylogenetic data using python-opentree and Jupyter notebooks

https://github.com/McTavishLab/jupyter_OpenTree_tutorials/blob/master/notebooks/DEMO_OpenTree.ipynb

Zoom around

Brain a bit tired? There are some fun visualizations of the OpenTree tree.

Take a look around OneZoom https://www.onezoom.org/ tree of life explorer

or this emoji hyperbolic tree https://glouwa.github.io/