Help - Open files
The option "Open..." in the file menu displays a dialog that can either be used to open TreeGraph 2 tree files (*.xtg) or to import other supported tree formats (e.g. Nexus). You can select any file in a supported format here to open it in the TreeGraph 2 editor.
This article describes this feature for the latest version of TreeGraph 2. For older versions the following articles are available: Open files (until 2.0.34) |
Contents
Supported formats
TreeGraph 2 format
The TreeGraph 2 format with the file extension XTG (Extensible TreeGraph format) is the default tree format for TreeGraph 2 and allows to save an unlimited number of additional node/branch data columns and the TreeGraph 2 specific formats of the tree elements. Every XTG-file contains only one tree.
Because XTG is an XML format it can easily be used by developers of other applications if necessary. (A formal definition of XTG can be found here.)
XTG before version 2.0.40
Note that TreeGraph versions 2.0.41 and older did not declare the XML namespace (http://bioinfweb.info/xmlns/xtg) in their generated XTG documents since it was not formally defined at that time. Later versions do so, but files with a declared namespace cannot be opened with older versions. (They simply show an empty document after loading.) It is recommended to used the latest version of TreeGraph 2 but if there is a reason why you would have to open documents with version 2.0.41 or older, which were created with a later version, you would have to remove the namespace declarations manually from the <TreeGraphDocument>
tag (e.g. with a text editor).
Newick format
Newick files are simply text files that consist of one or more tree descriptions in the Newick notation. In contrast to Nexus files they contain no further syntax elements or other information than the trees.
Nexus format
The Nexus format is widely used in phylogenetics and can contain trees in Newick notation and furthermore also information about taxa and phylogenetic data sets such as sequence alignments. Several common programs such as PAUP*, Mesquite and MacClade generate trees in this format. A Nexus file usually consists of different blocks which contain different types of information, whereas the trees
-block is the only relevant one for TreeGraph 2. Just like Newick files, Nexus files can contain several trees. (Note that the phylogenetic Nexus format supported by TreeGraph 2 has nothing to do with the NeXus format used in particle physics.)
Nexus format with additional annotations
TreeGraph 2 is also able to read Nexus files which contain special node annotations as hot comments generated by BEAST or MrBayes. Since version 2.3.0 they are loaded into hidden branch data columns (whereas before they have been imported as hidden node data) and could e.g. be visualized as text labels by using the Copying node/branch data function. To perform further calculation based on the imported values the Calculating node/branch data function can be used.
The Nexus parser of TreeGraph reads hot comments behind node names and since version 2.3.0 also behind branch length. Hot comments can have the following form:
Comment type | Numerical example | Character example |
---|---|---|
Unnamed | (A, (B, C[98.4])); |
(A, (B, C[text])); |
Named | (A, (B, C[&prob=98.4])); |
(A, (B, C[&info=text])); |
Multiple named | (A, (B, C[&prob=98.4,otherValue=18])); |
(A, (B, C[&info=text,otherText=abc])) ; |
Named array | (A, (B, C[&prob_range={97.8,98.6}])); |
(A, (B, C[&info={text1,text2}])); |
Forced parsing as textual value | n/a | (A, (B, C[&info="98.40"])); |
Unnamed comments are imported with the node/branch data ID unnamedNodeHotComment or unnamedBranchHotComment depending on their position in the Newick string (behind the node name or behind the branch length). There content is always considered as a single value, i.e. a comment like (A, (B, C[text1,text2]);
would not be imported as two columns with the valus text1
and text2
, but as a single column with the value text1,text2
. Therefore multiple values should always be coded with names hot comments.
In named hot comments any combination of textual, numerical and array values is possible. Numerical values must always have a dot as the decimal separater and be notated without thousands separators because the comma is reserved for separating different values. Scientific notation like 9.84E+1
are supported.
If a single value is specified within quotation marks it as always parsed a textual value, even if it could be parsed as a number. Quotation marks can both be used in names and unnamed comments.
NeXML
NeXML is an XML format modeling alignment and tree data similar to the Nexus format, but in addition offers an RDF-based way to annotate data elements (including tree nodes and branches).
PhyloXML
PhyloXML is a XML format modeling phylogenetic trees including a set of predefined types of node and branch annotations that can be imported by TreeGraph (see below). Support für reading and also writing PhyloXML will be extended in future versions using JPhyloIO.
Branch length scale in imported files
If a Newick, Nexus or phyloXML file is imported the branch length scale and the small interval of the scale bar are calculated automatically under the following conditions.
- The branch length scale (distance per branch length unit) is set so that the average length of all branches (which have a defined length) is 4 mm.
- The small interval of the scale bar is calculated so that it is about 1 mm long. The equivalence of 1 mm in branch length units is rounded to the first significant digit. If e.g. 0.367 branch length units would make up one millimeter the length of the small interval would be set to 0.4 branch length units. (This feature was not available before version 4.0.41.)
Importing Nexus and Newick
If you want to import a Newick- or Nexus-file you have some more options than you have for opening an XTG-file because these file types support only two types of node/branch data: node names and branch lengths. The names of the terminal nodes are always imported as such into TreeGraph because they usually describe names of taxa. With respect to the internal node names and branch lengths stored in the Nexus- or Newick-file you can choose the branch/node data column in which you want these values to be stored.
Internal node names
Since the internal node names are often used to store support values, the open-dialog contains a combo box that allows selecting the type of node/branch data the internal node names of the Newick notation should be saved to. The following choices are possible:
- Node names - The internal nodes of the TreeGraph document will contain the imported internal node names.
- New text labels with the specified ID - For every internal node name of the Newick notation a text label will be created in the TreeGraph document which has the ID you specify in the input field right of the combo box.
- New hidden branch data with the specified ID - The internal node names will be stored in hidden branch data fields in the TreeGraph document.
- New hidden node data with the specified ID - The internal node names will be stored in hidden node data fields in the TreeGraph document.
Translate internal node names
Some Nexus files might contain a taxon table which specifies an ID for each taxon of the tree(s). The Newick string then only contains these IDs rather than the full taxon names. In these cases you can specify whether TreeGraph 2 should use this table only to translate the terminal node names or also the internal node names. Some programs use the internal node names to store support values which might erroneously be translated based on the taxon table if you check this option (e.g. a support value of 99 would be replaced by the 99th taxon in the list.)
Note that values in quotes (e.g. "99"
instead of 99
) will not be translated.
Branch lengths
Below the combo box for the internal node names you will find a second combo box which allows to select the way you want to import the branch lengths of the document to import. In this case you make this choice for all branch lengths including the lengths of branches that lead to terminal nodes. Here you have the following options:
- Branch lengths - The branch length data of the imported file will also be stored as branch lengths in the TreeGraph document.
- New text labels with the specified ID
- New hidden branch data with the specified ID
- New hidden node data with the specified ID
(Note that you can also move or copy the contents of different node/branch data columns later. (See Copying node/branch data for details.)
This feature was not available before TreeGraph 2.0.23.
Files with several trees
If a Newick- or Nexus-file contains more than one tree you will be asked to select a single tree you want to import. The selection dialog will show you previews of all trees contained in the file you want to import (see example on the right).
Importing NeXML
Since version 2.11.1 TreeGraph 2 is able to import phylogenetic trees including node and branch annotations from NeXML. Metadata attached to nodes is imported into hidden node data columns and data attached to branches into hidden branch data columns using the RDF predicate (full URI) as the respective heading. Only literal metas and URLs linked by resource metas are currently imported. Other resource metas and their nested information are ignored. (In future versions of TreeGraph, its metadata model will be extended to fully support RDF-based annotations, allowing to make use of more metadata available in NeXML as well as writing to that format.)
Importing phyloXML
In contrast to the XTG format PhyloXML predefines several node/branch data field for special usage (e.g. taxonomic information, information about a molecular sequence associated with a node). When loading a PhyloXML file TreeGraph moves the values from that data structure to its own (consisting of node names, labels and hidden node/branch data).
PhyloXML documents can also contain several trees. If so you will be asked which tree to import just as you are when importing Newick or Nexus files (see above).
Translation of the PhyloXML elements
The following table describes how the data stored in a clade
element of a phyloXML document is translated into the TreeGraph 2 data structure. (It is assumed that you are familiar with the PhyloXML format. A formal definition of this format can be found here.)
PhyloXML element under clade : |
TreeGraph 2 data structure: |
---|---|
name |
Node name and hidden node data with the ID phyloXML.name |
branch_length |
Branch length |
confidence |
Text label (The type attribute defines the label ID. Since there are only numeric confidence values allowed in PhyloXML, non numeric values are not imported. Note that although theoretically allows in PhyloXML you should not use the same value for type more than one time on the same node, because TreeGraph 2 considers the value as a unique ID.) |
width * |
Branch width and hidden branch data with the ID phyloXML.branch_width (You can use the hidden branch data field with the Set distance values by node/branch data function if you want to rescale the branch widths or additionally apply them as node line widths.) |
color * |
Branch color and Node line color |
node_id |
This element is not imported. |
taxonomy.id |
This element is not imported. |
taxonomy.code |
Hidden node data with the ID phyloXML.taxonomy.code |
taxonomy.scientific_name |
Hidden node data with the ID phyloXML.taxonomy.scientific_name and node name if the name element was not specified |
taxonomy.authority |
Hidden node data with the ID phyloXML.taxonomy.authority |
taxonomy.common_name |
Hidden node data with the ID phyloXML.taxonomy.common_name and node name if neither the name nor the taxonomy.scientific_name elements were specified |
taxonomy.synonym |
Hidden node data with the ID phyloXML.taxonomy.synonym |
taxonomy.rank |
Hidden node data with the ID phyloXML.taxonomy.rank |
taxonomy.uri |
This element is not imported. |
sequence.type (attribute) |
Hidden node data with the ID phyloXML.sequence.type |
sequence.id_source (attribute) |
This element is not imported. |
sequence.id_ref (attribute) |
This element is not imported. |
sequence.symbol |
Hidden node data with the ID phyloXML.sequence.symbol |
sequence.accession |
Hidden node data with the ID phyloXML.sequence.accession |
sequence.name |
Hidden node data with the ID phyloXML.sequence.name and node name if neither the name nor the taxonomy.scientific_name nor the taxonomy.common_name elements were specified |
sequence.location |
Hidden node data with the ID phyloXML.sequence.location |
sequence.mol_seq |
This element is not imported. |
sequence.uri |
This element is not imported. |
sequence.annotation |
This element is not imported. |
sequence.domain_architecture |
This element is not imported. |
events |
This element is not imported. |
binary_characters |
This element is not imported. |
distribution.desc |
Hidden node data with the ID phyloXML.distribution.desc |
distribution.point |
This element is not imported. |
distribution.polygon |
This element is not imported. |
date.desc |
Hidden node data with the ID phyloXML.date.desc |
date.value |
Hidden node data with the ID phyloXML.date.value |
date.minimum |
Hidden node data with the ID phyloXML.date.minimum |
date.maximum |
Hidden node data with the ID phyloXML.date.maximum |
reference |
This element is not imported. |
property |
This element is not imported. |
* Note that the values for width
and color
are applied to all subelements unless they define own values for these fields.
After having imported a phyloXML document you can of course move, copy or delete the imported node/branch data columns (e.g. by making a hidden node data column a text label column) using the Copying node/branch data or Calculating_node/branch_data functions amongst others.
See also
- Adding support values
- Newick syntax errors
- XTG format
- Importing node/branch data
- Exporting trees as Newick/Nexus files
- File menu