Help - Calculating node/branch data

From TreeGraph help

The Calculate node/branch data... function allows you to specify an mathematical expression which defines values for all lines of a node/branch data column. In this expression you can reference the current values of the specified column as well as the values of all other columns. Additionally a set of predefined functions and constants (see below) can be used.

Note that this feature was not available before version 2.0.24. The Edit node/branch data-dialog which was present before this version offered a set of rescaling operations instead. It is recommended to use the latest version of TreeGraph 2.

ArticleOnLatestVersion.png This article describes this feature for the latest version of TreeGraph 2. For older versions the following articles are available:
Calculating node/branch data (until 2.3.0), Calculating node/branch data (until 2.12.0)


The dialog

The "Calculate node/branch data"-dialog

To access the Calculate node/branch data-dialog select EditNode/branch dataCalculate node/branch data... from the main menu.

Target column

Here you can select the node/branch data column that shall be calculated. There are two alternative options available:

Define a single target column

This option allows to select an existing or new column that will be the target for all calculated values. The column which was selected in the document window is preselected here if it is editable.

Calculate target column ID for each line

Instead of selecting a single target column, this option allows to calculate the target column using an expression. This way it is possible to select different target columns for different lines, e.g. depending on the values of other columns. (See below for an example application.)

In addition to the expression to calculate the target column, you can select the type of new column(s) to be generated. The result of the expression specified here might be an ID that currently does not exists on the current node. In such a case, the radio buttons here allow to specify whether a new text label, a new hidden node data or a new hidden branch data shall be created. (Note that the type of already existing values that are overwritten will remain unchanged, regardless of the selection made here.)

Calculate value

You can type in the expression used to calculate the value for each target cell in the text field. Details on how such expressions look like can be found below.

The following additional options are available:

Clear target column(s) before calculation

If this option is checked, all current values in columns that are calculated are deleted before the calculation starts. Note that it will not be possible to obtain any value using expressions like getParentValue(THIS) if that option is checked.

This option is meant for the case that the target column is calculated and not defined in a static way (the option Calculate target column ID for each line is chosen). (Otherwise each line of the column is overwritten anyway.) In such cases all values of all columns for which the value of at least one line is calculated, will be deleted before the calculation. (Note that only the value for one column can be calculated for each line.)

Set value for cells that are empty after the calculation

After a calculation was performed, some cells of the target column(s) may still be empty. In the case of a single static target columns this may happen, because an expression was invalid in some lines, since source data from other columns is missing there. In the case of calculated target column IDs this always happens, since only one column per line is calculated. The value defined here will be set to all empty cells of all affected columns after the calculation.

Note previously present values in cells that are not calculated are only overwritten with the value specified, if the previous option to clear target columns is also selected.

Expressions

Accessing single values of node/branch data columns

TreeGraph 2 offers two functions to directly access other node/branch data values:

  • getValue: Returns the value of a node/branch data column in the same line as the cell that is currently calculated.
  • getParentValue: Returns a value in the specified column attached to the parent node of the current node (associated with the line that is currently calculated).

Both functions can take one or two parameters. The first parameter always specifies the column that should be referenced either by its ID (if it is a hidden node/branch data or a label column) or by a special key word (see below) used to identify the default node name-, node name- or branch length-column.

The second parameter is optional and defines a default value which is returned if the referenced column does not contain a value in the current line. If no default value is specified and the referenced column contains no value the calculation for this line is aborted and the value stored in the target column will be empty.

Examples

Expression: Description:
getValue(NAME) Returns the node name in the current line. (Could be a numeric or a textual value.)
getValue(LENGTH) Returns the branch length in the current line.
getValue(LENGTH, 0) Returns the branch length in the current line or 0 if no branch length is specified.
getValue("id1") Returns the value of the column with the ID "id1" in the current line. (This could be a hidden node data-, a hidden branch data or a label column.)
getParentValue(THIS, 0) + 1 Adds one to the value in the same column attached to the parent node.
getParentValue("brLensSum", 0) + getValue(LENGTH) This expression calculates the overall branch length from the root to the current node saved on a column with the ID "brLensSum".

Checking node/branch data

The functions following two functions can be used to check whether a node/branch data column contains a value in the current line or its parent.

  • hasValue
  • hasParentValue

It has one parameter which specifies the column (analog to getValue) that shall be checked. This function can be helpful in combination with the if-function (see below) in an expression like this:

if(hasValue("id1") || hasValue("id2"), getValue(NAME), "undefined")

Note that the functions getParentValue and hasParentValue were not available before version 2.4.0.

Special keywords

The following special keywords can be used with the getValue or hasValue functions.

  • THIS: References the column which shall be calculated by this expression. (E.g. getValue(THIS) * 2 would set the new value of this column as two times the old value.)
  • UNIQUE: References the unique node name column.
  • NAME: References the node name column.
  • LENGTH: References the branch length column.

Operators

The following list contains all valid operators. Each operator binds as strong or more than all operators listed below. The right columns indicate of which types the operands have to be.

Name: Operator: Numeric: Textual: Boolean: Example:
Power ^ YesSymbol.png NoSymbol.png NoSymbol.png 10^-2 (= 0.02)
Unary plus + YesSymbol.png NoSymbol.png NoSymbol.png +4
Unary minus - YesSymbol.png NoSymbol.png NoSymbol.png -4
Modulo % YesSymbol.png NoSymbol.png NoSymbol.png 10 % 4 (= 2)
Division / YesSymbol.png NoSymbol.png NoSymbol.png getValue(LENGTH) / 2
Multiplication * YesSymbol.png NoSymbol.png NoSymbol.png getValue("id1") * 100
Addition + YesSymbol.png YesSymbol.png NoSymbol.png
getValue("id1") + 10
"Two " + "words"
Less or equal <= YesSymbol.png NoSymbol.png NoSymbol.png if(getValue(THIS) <= 100, getValue(THIS), 100)
Greater or equal >= YesSymbol.png NoSymbol.png NoSymbol.png if(getValue(THIS) <= 100, "A", "B")
Less then < YesSymbol.png NoSymbol.png NoSymbol.png if(getValue(THIS) < 100, getValue(THIS), 100)
Greater then > YesSymbol.png NoSymbol.png NoSymbol.png if(getValue(THIS) < 100, "A", "B")
Equals = YesSymbol.png YesSymbol.png NoSymbol.png if(getValue(THIS) = 100, getValue("id"), getValue("id2"))
Not equal != YesSymbol.png YesSymbol.png NoSymbol.png if(getValue(THIS) != "empty", getValue("id1"), 0)
And && NoSymbol.png NoSymbol.png YesSymbol.png if(getValue(THIS) < 100 && getValue(THIS) > 50, getValue("id1"), "")
Or || NoSymbol.png NoSymbol.png YesSymbol.png if(hasValue("id1") || hasValue("id2"), getValue(NAME), "undefined")

Mathematical functions referencing other cells

Each of these functions calculates a value from a set of other values and has three forms:

  • Standard form: Takes a list of any numeric values and calculates the result from these. (Example: sum(10, 5, getValue("colA"), 12)).
  • Line form: Uses all lines (values from all nodes) of one node/branch data column to calculate the result. (Example: sumOfLines("colA") return the sum of all values in colA.)
    • If any line of that column does not contain a numeric value, it is ignored.
  • Column form: Uses the values of different columns attached to the currently calculated node to determine the result. It takes a list of column IDs as parameters. (Example: sumOfColumns("colA", "colB", "colC") returns the sum of the values in the three specified columns in the current line.)
    • If any of the specified columns contains no numeric value, it is ignored in the calculation. (That is the advantage of using the column form instead of sum(getValue("colA"), getValue("colB"), getValue("colC")), which would fail if any of the three columns contains no or a non-numeric value.)


Name: Definition: Description:
Sum sum(Numeric v1, Numeric v2, ...) Returns the sum of all passed parameters. Two or more parameters can be passed.
sumOfLines(ColumnReference r1) Returns the sum of all lines of the specified node/branch data column. Exactly one column reference (column ID or special keyword) must be passed.
sumOfColumns(ColumnReference r1, ColumnReference r2, ...) Returns the sum of the values of all specified node/branch data columns attached to the currently calculated node. One ore more column references (column ID or special keyword) can be passed.
Product product(Numeric v1, Numeric v2, ...) Returns the product of all passed parameters. Two or more parameters can be passed.
productOfLines(ColumnReference r1) Returns the product of all lines of the specified node/branch data column. Exactly one column reference (column ID or special keyword) must be passed.
productOfColumns(ColumnReference r1, ColumnReference r2, ...) Returns the product of the values of all specified node/branch data columns attached to the currently calculated node. One ore more column references (column ID or special keyword) can be passed.
Maximum value max(Numeric v1, Numeric v2, ...) Returns the maximum of all passed parameters. Two or more parameters can be passed.
maxOfLines(ColumnReference r1) Returns the maximum of all lines of the specified node/branch data column. Exactly one column reference (column ID or special keyword) must be passed.
maxOfColumns(ColumnReference r1, ColumnReference r2, ...) Returns the maximum of the values of all specified node/branch data columns attached to the currently calculated node. One ore more column references (column ID or special keyword) can be passed.
Minimum value min(Numeric v1, Numeric v2, ...) Returns the minimum of all passed parameters. Two or more parameters can be passed.
minOfLines(ColumnReference r1) Returns the minimum of all lines of the specified node/branch data column. Exactly one column reference (column ID or special keyword) must be passed.
minOfColumns(ColumnReference r1, ColumnReference r2, ...) Returns the minimum of the values of all specified node/branch data columns attached to the currently calculated node. One ore more column references (column ID or special keyword) can be passed.
Mean value mean(Numeric v1, Numeric v2, ...) Returns the mean of all passed parameters. Two or more parameters can be passed.
meanOfLines(ColumnReference r1) Returns the mean of all lines of the specified node/branch data column. Exactly one column reference (column ID or special keyword) must be passed.
meanOfColumns(ColumnReference r1, ColumnReference r2, ...) Returns the mean of the values of all specified node/branch data columns attached to the currently calculated node. One ore more column references (column ID or special keyword) can be passed.

Note that none of these functions was available before version 2.0.47. mean and all line and column forms were not available before version 2.4.0.

Functions related to the topological position

The following boolean functions can be used to check topological properties of the current node. They can be combined with the if function e.g. to use different expressions for terminal or internal nodes or the root.

  • isLeaf(): Returns 1 if the current node is a terminal node or 0 otherwise.
  • isRoot(): Returns 1 if the current node is the root of the tree or 0 otherwise.
  • indexInParent(): Returns the index the current node has in its parent. Note that the first node has the index 0 (not 1). If this function is called on the root node, it will return -1.

Note that none of these functions were available before version 2.4.0.

Text functions

The functions described here can be used to manipulate or characterize textual values. They are especially useful to calculating target column IDs.

Note that none of these functions were available before version 2.13.0.

Name: Definition: Description:
To upper case toUpperCase(Textual text) Returns a text in upper case letters.

Example: toUpperCase(getValue(THIS)) would convert all textual values of a column to upper case.
To lower case toUpperCase(Textual text) Returns a text in upper case letters.
Subsequence subsequence(Textual text, Numeric startIndex) Returns a suffix of a textual value starting at the specified position (inclusive). The index of the first character is 0.

Example: subsequence("ABC", 1) would return "BC".
subsequence(Textual text, Numeric startIndex, Numeric endIndex) Returns a part of a textual value ranging from startIndex (inclusive) to endIndex (exclusive). The index of the first character is 0.

Example: subsequence("ABCDEF", 1, 3) would return "BC".
Contains text contains(Textual text, Textual part) Returns a Boolean value indicating whether part is contained in text.
Starts with startsWith(Textual text, Textual prefix) Returns a Boolean value indicating whether text starts with prefix. (Both values may also be identical.)
Ends with endsWith(Textual text, Textual suffix) Returns a Boolean value indicating whether text starts with suffix. (Both values may also be identical.)
First position of a text firstIndexOf(Textual text, Textual part) Returns the index of the first occurrence of part in text or -1 of it is not contained at all. The index of the first character is 0.

Example: firstIndexOf("ABC DBE DB", "BE") would return 5.
firstIndexOf(Textual text, Textual part, Numeric startIndex) Returns the index of the first occurrence of part in text at or after startIndex or -1 of it is not contained at all.

Example: firstIndexOf("ABC DBE DB", "B", 4) would return 5.
Last position of a text lastIndexOf(Textual text, Textual part) Returns the index of the last occurrence of part in text or -1 of it is not contained at all. The index of the first character is 0.

Example: lastIndexOf("ABC DBE DB", "D") would return 8.
lastIndexOf(Textual text, Textual part, Numeric startIndex) Returns the index of the last occurrence of part in text at or before startIndex or -1 of it is not contained at all.

Example: firstIndexOf("ABC DBE DB", "D", 6) would return 4.

General mathematical functions

Name: Definition: Description:
Sine sin(Numeric x) Calculates the sine of the passed numeric value.
Cosine cos(Numeric x) Calculates the cosine of the passed numeric value.
Tangent tan(Numeric x) Calculates the tangent of the passed numeric value.
Inverse sine asin(Numeric x) Calculates the inverse sine of the passed numeric value.
Inverse cosine acos(Numeric x) Calculates the inverse cosine of the passed numeric value.
Inverse tangent atan(Numeric x) Calculates the inverse tangent of the passed numeric value.
Inverse tangent atan2(Numeric x, Numeric y) Calculates the inverse tangent (or arcus tangent) of y / x given adjacent and x. The result is between -π and π
Hyperbolic sine sinh(Numeric x) Calculates the hyperbolic sine of the given angle.
Hyperbolic cosine cosh(Numeric x) Calculates the hyperbolic cosine of the given angle.
Hyperbolic tangent tanh(Numeric x) Calculates the hyperbolic tangent of the given angle.
Inverse hyperbolic sine asinh(Numeric x) Calculates the inverse hyperbolic sine of the given angle.
Inverse hyperbolic cosine acosh(Numeric x) Calculates the inverse hyperbolic cosine of the given angle.
Inverse hyperbolic tangent atanh(Numeric x) Calculates the inverse hyperbolic tangent of the given angle.
Natural logarithm ln(Numeric x) Calculates the natural logarithm of the passed numeric value.
Logarithm to the base 10 log(Numeric x) Calculates the logarithm to the base 10 of the passed numeric value.
Exponential function exp(Numeric x) Calculates the exponential function to the base e. It is the inverse function of the natural logarithm (ln).
Magnitude abs(Numeric x) Returns the absolute value of the passed numeric parameter.
Random number rand(Numeric x) Returns a random number between 0 and 1.
Square Root sqrt(Numeric x) Calculates the square root of the passed numeric parameter.
Alternative if(Boolean condition, Numeric or textual trueValue, Numeric or textual falseValue) Returns the second parameter if the condition is true or the third if not.
Number to text str(Numeric or textual x) Returns a textual representation of the passed parameter.

Constants

  • Euler's number: e
  • π: pi

Usage examples

Calculating node ages

This example shows how to use the getParentValue() function to sum up value along the tree to calculate the age of all nodes from the lengths of the branches leading to them.


CalcColumnExampleNodeAges1.png


The tree in the screenshot above contains branch lengths that represent the time that passed during the evolution along that branch in million years. The aim here is to attach a label to each node that displays its age in million years.

To achieve this we sum up the branch length from the root to each node and store the results in a hidden node data column with the expression shown below.


CalcColumnExampleNodeAges2.png


It is important here to specify the default value 0 in getParentValue(THIS, 0) because the root node is calculated first and the parent value is undefined there. If we would omit the default value no result would be calculated for the root node and as a consequence the parent value of its child nodes would also be undefined and our expression would not calculate any values at all.


CalcColumnExampleNodeAges3.png


In the screenshot above we can see the results from our first calculation. The selected node Zea now has an age (measured from the root) of about 526 million years, which is the case for all terminal nodes. In contrast to that our aim was to attach information on how long ago the speciation events on the internal node were. Therefore we calculate a second node/branch data column consisting of text labels using the expression below.


CalcColumnExampleNodeAges4.png


To calculate the age measured from the tips from the present age measured from the root, we first determine the whole time span covered by our tree by calculating the maximum node age using maxOfLines("ageFromRoot"). The age measured from the tips can then be determined by subtracting the age from the root of each node from the overall time span as shown in the expression above.

The resulting values are then displayed as text labels left of each node as shown in the tree below:


CalcColumnExampleNodeAges5.png

[Download resulting tree file]

Converting ancestral character states into probability columns

This example demonstrates a possible application of calculating the target column ID by an expression. A tree contains a hidden node data column with ancestral character states with the name "states" as shown in the screenshot below.


CalcColumnExampleTerminalStates1.png


The aim is to create one probability column for each state from the currently present single column with the state name in each line. Each of this probability columns shall contain the character state name in its heading (its node/branch data ID) and contain a probability of 1 in all lines where its state is present and a 0 in all other columns. To achieve this we calculate the ID of the target column from the value of a line of "states" as shown below. In addition we set all other cells to 0.


CalcColumnExampleTerminalStates2.png


The screenshot below shows the result of the operation. The new hidden node data columns "prob_a" and "prob_b" could now act as source data columns for pie chart labels. (The probabilities for the states on the internal nodes could now be reconstructed in addition. TreeGraph 2 offers additional features for exchanging such data with BayesTraits. See Generating Bayes Traits input and Importing ancestral state probabilities from BayesTraits for details.)


CalcColumnExampleTerminalStates3.png

See also