Help - Open files

From TreeGraph help

The option "Open..." in the file menu displays a dialog that can either be used to open TreeGraph 2 tree files (*.xtg) or to import other supported tree formats (e.g. Nexus). You can select any file in a supported format here to open it in the TreeGraph 2 editor.

ArticleOnLatestVersion.png This article describes this feature for the latest version of TreeGraph 2. For older versions the following articles are available:
Open files (until 2.0.34)


Supported formats

TreeGraph 2 format

The TreeGraph 2 format with the file extension XTG (Extensible TreeGraph format) is the default tree format for TreeGraph 2 and allows to save an unlimited number of additional node/branch data columns and the TreeGraph 2 specific formats of the tree elements. Every XTG-file contains only one tree.

Because XTG is an XML format it can easily be used by developers of other applications if necessary. (A formal definition of XTG can be found here.)

XTG before version 2.0.40

Note that TreeGraph versions 2.0.41 and older did not declare the XML namespace (http://bioinfweb.info/xmlns/xtg) in their generated XTG documents since it was not formally defined at that time. Later versions do so, but files with a declared namespace cannot be opened with older versions. (They simply show an empty document after loading.) It is recommended to used the latest version of TreeGraph 2 but if there is a reason why you would have to open documents with version 2.0.41 or older, which were created with a later version, you would have to remove the namespace declarations manually from the <TreeGraphDocument> tag (e.g. with a text editor).

Newick format

Newick files are simply text files that consist of one or more tree descriptions in the Newick notation. In contrast to Nexus files they contain no further syntax elements or other information than the trees.

Nexus format

The Nexus format is widely used in phylogenetics and can contain trees in Newick notation and furthermore also information about taxa and phylogenetic data sets such as sequence alignments. Several common programs such as PAUP*, Mesquite and MacClade generate trees in this format. A Nexus file usually consists of different blocks which contain different types of information, whereas the trees-block is the only relevant one for TreeGraph 2. Just like Newick files, Nexus files can contain several trees. (Note that the phylogenetic Nexus format supported by TreeGraph 2 has nothing to do with the NeXus format used in particle physics.)

Nexus format with additional annotations

TreeGraph 2 is also able to read Nexus files which contain special node annotations as hot comments generated by BEAST or MrBayes. Since version 2.3.0 they are loaded into hidden branch data columns (whereas before they have been imported as hidden node data) and could e.g. be visualized as text labels by using the Copying node/branch data function. To perform further calculation based on the imported values the Calculating node/branch data function can be used.

The Nexus parser of TreeGraph reads hot comments behind node names and since version 2.3.0 also behind branch length. Hot comments can have the following form:

Comment type Numerical example Character example
Unnamed (A, (B, C[98.4])); (A, (B, C[text]));
Named (A, (B, C[&prob=98.4])); (A, (B, C[&info=text]));
Multiple named (A, (B, C[&prob=98.4,otherValue=18])); (A, (B, C[&info=text,otherText=abc]));
Named array (A, (B, C[&prob_range={97.8,98.6}])); (A, (B, C[&info={text1,text2}]));
Forced parsing as textual value n/a (A, (B, C[&info="98.40"]));

Unnamed comments are imported with the node/branch data ID unnamedNodeHotComment or unnamedBranchHotComment depending on their position in the Newick string (behind the node name or behind the branch length). There content is always considered as a single value, i.e. a comment like (A, (B, C[text1,text2]); would not be imported as two columns with the valus text1 and text2, but as a single column with the value text1,text2. Therefore multiple values should always be coded with names hot comments.

In named hot comments any combination of textual, numerical and array values is possible. Numerical values must always have a dot as the decimal separater and be notated without thousands separators because the comma is reserved for separating different values. Scientific notation like 9.84E+1 are supported.

If a single value is specified within quotation marks it as always parsed a textual value, even if it could be parsed as a number. Quotation marks can both be used in names and unnamed comments.

NeXML

NeXML is an XML format modeling alignment and tree data similar to the Nexus format, but in addition offers an RDF-based way to annotate data elements (including tree nodes and branches).

PhyloXML

PhyloXML is a XML format modeling phylogenetic trees including a set of predefined types of node and branch annotations that can be imported by TreeGraph (see below). Support für reading and also writing PhyloXML will be extended in future versions using JPhyloIO.

Branch length scale in imported files

If a Newick, Nexus or phyloXML file is imported the branch length scale and the small interval of the scale bar are calculated automatically under the following conditions.

  • The branch length scale (distance per branch length unit) is set so that the average length of all branches (which have a defined length) is 4 mm.
  • The small interval of the scale bar is calculated so that it is about 1 mm long. The equivalence of 1 mm in branch length units is rounded to the first significant digit. If e.g. 0.367 branch length units would make up one millimeter the length of the small interval would be set to 0.4 branch length units. (This feature was not available before version 4.0.41.)

Importing Nexus and Newick

If you want to import a Newick- or Nexus-file you have some more options than you have for opening an XTG-file because these file types support only two types of node/branch data: node names and branch lengths. The names of the terminal nodes are always imported as such into TreeGraph because they usually describe names of taxa. With respect to the internal node names and branch lengths stored in the Nexus- or Newick-file you can choose the branch/node data column in which you want these values to be stored.

Internal node names

Since the internal node names are often used to store support values, the open-dialog contains a combo box that allows selecting the type of node/branch data the internal node names of the Newick notation should be saved to. The following choices are possible:

  • Node names - The internal nodes of the TreeGraph document will contain the imported internal node names.
  • New text labels with the specified ID - For every internal node name of the Newick notation a text label will be created in the TreeGraph document which has the ID you specify in the input field right of the combo box.
  • New hidden branch data with the specified ID - The internal node names will be stored in hidden branch data fields in the TreeGraph document.
  • New hidden node data with the specified ID - The internal node names will be stored in hidden node data fields in the TreeGraph document.

Translate internal node names

Some Nexus files might contain a taxon table which specifies an ID for each taxon of the tree(s). The Newick string then only contains these IDs rather than the full taxon names. In these cases you can specify whether TreeGraph 2 should use this table only to translate the terminal node names or also the internal node names. Some programs use the internal node names to store support values which might erroneously be translated based on the taxon table if you check this option (e.g. a support value of 99 would be replaced by the 99th taxon in the list.)

Note that values in quotes (e.g. "99" instead of 99) will not be translated.

Branch lengths

Below the combo box for the internal node names you will find a second combo box which allows to select the way you want to import the branch lengths of the document to import. In this case you make this choice for all branch lengths including the lengths of branches that lead to terminal nodes. Here you have the following options:

  • Branch lengths - The branch length data of the imported file will also be stored as branch lengths in the TreeGraph document.
  • New text labels with the specified ID
  • New hidden branch data with the specified ID
  • New hidden node data with the specified ID

(Note that you can also move or copy the contents of different node/branch data columns later. (See Copying node/branch data for details.)

This feature was not available before TreeGraph 2.0.23.

The "Select tree"-dialog can displays previews of all trees contained in a Newick- or Nexus-file.

Files with several trees

If a Newick- or Nexus-file contains more than one tree you will be asked to select a single tree you want to import. The selection dialog will show you previews of all trees contained in the file you want to import (see example on the right).

Importing NeXML

Since version 2.11.1 TreeGraph 2 is able to import phylogenetic trees including node and branch annotations from NeXML. Metadata attached to nodes is imported into hidden node data columns and data attached to branches into hidden branch data columns using the RDF predicate (full URI) as the respective heading. Only literal metas and URLs linked by resource metas are currently imported. Other resource metas and their nested information are ignored. (In future versions of TreeGraph, its metadata model will be extended to fully support RDF-based annotations, allowing to make use of more metadata available in NeXML as well as writing to that format.)

Importing phyloXML

In contrast to the XTG format PhyloXML predefines several node/branch data field for special usage (e.g. taxonomic information, information about a molecular sequence associated with a node). When loading a PhyloXML file TreeGraph moves the values from that data structure to its own (consisting of node names, labels and hidden node/branch data).

PhyloXML documents can also contain several trees. If so you will be asked which tree to import just as you are when importing Newick or Nexus files (see above).

Translation of the PhyloXML elements

The following table describes how the data stored in a clade element of a phyloXML document is translated into the TreeGraph 2 data structure. (It is assumed that you are familiar with the PhyloXML format. A formal definition of this format can be found here.)

PhyloXML element under clade: TreeGraph 2 data structure:
name Node name and hidden node data with the ID phyloXML.name
branch_length Branch length
confidence Text label (The type attribute defines the label ID. Since there are only numeric confidence values allowed in PhyloXML, non numeric values are not imported. Note that although theoretically allows in PhyloXML you should not use the same value for type more than one time on the same node, because TreeGraph 2 considers the value as a unique ID.)
width* Branch width and hidden branch data with the ID phyloXML.branch_width
(You can use the hidden branch data field with the Set distance values by node/branch data function if you want to rescale the branch widths or additionally apply them as node line widths.)
color* Branch color and Node line color
node_id This element is not imported.
taxonomy.id This element is not imported.
taxonomy.code Hidden node data with the ID phyloXML.taxonomy.code
taxonomy.scientific_name Hidden node data with the ID phyloXML.taxonomy.scientific_name and node name if the name element was not specified
taxonomy.authority Hidden node data with the ID phyloXML.taxonomy.authority
taxonomy.common_name Hidden node data with the ID phyloXML.taxonomy.common_name and node name if neither the name nor the taxonomy.scientific_name elements were specified
taxonomy.synonym Hidden node data with the ID phyloXML.taxonomy.synonym
taxonomy.rank Hidden node data with the ID phyloXML.taxonomy.rank
taxonomy.uri This element is not imported.
sequence.type (attribute) Hidden node data with the ID phyloXML.sequence.type
sequence.id_source (attribute) This element is not imported.
sequence.id_ref (attribute) This element is not imported.
sequence.symbol Hidden node data with the ID phyloXML.sequence.symbol
sequence.accession Hidden node data with the ID phyloXML.sequence.accession
sequence.name Hidden node data with the ID phyloXML.sequence.name and node name if neither the name nor the taxonomy.scientific_name nor the taxonomy.common_name elements were specified
sequence.location Hidden node data with the ID phyloXML.sequence.location
sequence.mol_seq This element is not imported.
sequence.uri This element is not imported.
sequence.annotation This element is not imported.
sequence.domain_architecture This element is not imported.
events This element is not imported.
binary_characters This element is not imported.
distribution.desc Hidden node data with the ID phyloXML.distribution.desc
distribution.point This element is not imported.
distribution.polygon This element is not imported.
date.desc Hidden node data with the ID phyloXML.date.desc
date.value Hidden node data with the ID phyloXML.date.value
date.minimum Hidden node data with the ID phyloXML.date.minimum
date.maximum Hidden node data with the ID phyloXML.date.maximum
reference This element is not imported.
property This element is not imported.

* Note that the values for width and color are applied to all subelements unless they define own values for these fields.

After having imported a phyloXML document you can of course move, copy or delete the imported node/branch data columns (e.g. by making a hidden node data column a text label column) using the Copying node/branch data or Calculating_node/branch_data functions amongst others.

See also

Additional resources