If you work with trees (phylogenetics or not) and you regularly use python, you have probably used or heard about one of the following packages: Bio.phylo, dendropy or ETE.
While each one of those packages has its own unique strengths and weaknesses, I particularly like the ETE module. Here is why !
This post is based on one of my past presentation at monbug. I actually convert the ipython notebook to this markdown with nbconvert as described by Christopher S. Corley on his blog. The config I used with nbconvert can be found here. The github repository with all the original files for the presentation can be found here : monbug_ete. You can use nbviewer to view the notebook directly if you prefer.
What’s ETE ??
ETE is a python Environment for Tree Exploration created by Jaime Huerta-Cepas.
It’s a framework that assists in the manipulation of any type of hierarchical tree (ie reading, writing, visualisation, annotation, etc). The current latest version is ete3.
Installation
You can install ETE with pip : pip install ete3
. Check this link for more details about optional/unmet dependencies : http://etetoolkit.org/download/
Quick introduction to the API
A great in-depth tutorial for working with tree data structure in ETE is provided by the authors : http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html. I’m going to make a light introduction to the API here, but I really recommend you to read the official doc!
Let’s take a quick glance at the available tree data structure in ete :
In [58]:
1
['ClusterTree', 'EvolTree', 'NexmlTree', 'PhyloTree', 'PhyloxmlTree', 'Tree']
As you can see, you have a basic tree data structure (Tree
) and more specialized tree structures, like PhyloTree
for phylogenetics
=> ETE can read tree from a string or a file
In [59]:
=> In ete, a tree is a Node. This implies that the root is a Node, so are all its descendants.
In [61]:
1
2
3
4
5
6
7
8
9
10
11
/-a
/-|
/-| \-b
| |
/-| \-c
| |
--| \-d
|
| /-e
\-|
\-f
=> You can add information to nodes by adding features
The following code will traverse the tree t1
and add a feature sexiness
to each leaf.
In [62]:
=> Features are just attributes.
In [63]:
1
2
3
4
5
6
7
8
9
10
11
/-a, 8
/-|
/-| \-b, 1
| |
/-| \-c, 9
| |
--| \-d, 3
|
| /-e, 9
\-|
\-f, 3
=> You can search by features
In [64]:
1
2
[Tree node 'a' (-0x7ffff810443aa570)]
[Tree node 'a' (-0x7ffff810443aa570)]
=> Here is a quick list of useful functions
In [65]:
1
2
SISTERS of a :
[Tree node 'b' (0x7efbbc55ab0)]
In [66]:
1
2
3
4
5
6
7
8
9
FIRST CHILD OF ROOT
/-a
/-|
/-| \-b
| |
--| \-c
|
\-d
In [67]:
1
2
3
4
5
LCA (a, b) :
/-a
--|
\-b
In [68]:
1
2
RF DISTANCE between t1 and t2 :
0
Introduction to tree visualization with ete
Data : a random tree with random branches
- Tree rendering
- Tree Style
In [71]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/-G, 0.47936
/, 0.11319
| | /-F, 0.53403
| \, 0.52094
-, 1.0 \-E, 0.89822
|
| /-L, 0.27682
\, 0.32620
| /-K, 0.50173
\, 0.07320
| /-J, 0.14208
\, 0.93141
| /-I, 0.05555
\, 0.87512
\-H, 0.81088
=> Trees can be saved as images. Supported format are png, pdf and svg.
In [74]:
=> You can use TreeStyle
to change how the tree is displayed
In [75]:
Let’s draw a circular tree now
In [76]:
=> faces
are wonderful
faces
allow you to add graphical informations to a node. It can be a simple Text, an Image or a more useful information like a Chart or Sequence domains.
Here is the list of available faces :
In [77]:
1
['AttrFace', 'BarChartFace', 'CircleFace', 'DynamicItemFace', 'Face', 'ImgFace', 'OLD_SequenceFace', 'PieChartFace', 'ProfileFace', 'RandomFace', 'RectFace', 'SeqMotifFace', 'SequenceFace', 'SequencePlotFace', 'StackedBarFace', 'StaticItemFace', 'TextFace', 'TreeFace']
Faces can be added at different areas around a node.
With Faces, you can actually make things like this (treeception) :
It’s also possible to define a layout function that will determine how a node will be rendered. Let’s see how to do that and in which cases this could be useful with the next example.
Application 1 : Duplication|Loss history of a gene familly
Data : genetree newick where I have already added a feature (states) :
- states = 1 ==> internal node with duplication
- states = 0 ==> internal node with speciation
In [80]:
1
2
3
4
5
6
7
8
9
/-Dre_1, 0
/, 0
| | /-Cfa_1, 0
| \, 0
-, 1 \-Hsa_1, 0
|
| /-Dre_2, 0
\, 0
\-Cfa_2, 0
In [81]:
Application 2 : Phylogenetic tree, protein sequence and information content
Data :
- An alignment
- A tree constructed using that alignment (Actually those two were randomly generated)
In [82]:
1
2
3
4
5
6
7
8
9
10
>A
MAEIPDETIQQFMALT---SNIAVQYLSEFGDLNEALNSY
>B
MAEIPDATIQQFMALTNVSHNIAVQY--EFGDLNEALNSY
>C
MAEIPDATIQ----LTNVSHNIAVQYLSEFGDLNEALNSY
>D
MAEAPDETIQQFMALTNVSHNIAVQYLSEFGDLNEAL---
(A,(D,(B,C)));
In [83]:
You can do a lot of things with ete if you take the time to learn how to use it. I didn’t have time to talk about ClusterNode
, EvolNode
or all the other great modules of ete, but I hope this post spark your interest and was useful to you.
Also, READ THE DOCS.