Title: | Detect and Remove Outliers in Phylogenomics Datasets |
---|---|
Description: | Analyzis and filtering of phylogenomics datasets. It takes an input either a collection of gene trees (then transformed to matrices) or directly a collection of gene matrices and performs an iterative process to identify what species in what genes are outliers, and whose elimination significantly improves the concordance between the input matrices. The methods builds upon the Distatis approach (Abdi et al. (2005) <doi:10.1101/2021.09.08.459421>), a generalization of classical multidimensional scaling to multiple distance matrices. |
Authors: | Damien M. de Vienne [aut], Stéphane Dray [aut], Théo Tricou [aut], Aurélie Siberchicot [aut, cre] |
Maintainer: | Aurélie Siberchicot <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.9.12 |
Built: | 2024-11-22 05:08:09 UTC |
Source: | https://github.com/damiendevienne/phylter |
This dataset contains 125 carnivora gene trees for 53 carnivora species. It is a subset of the dataset used in Allio et al. (2021), obtained by randomly selecting 125 genes for which all 53 species were present.
carnivora
carnivora
A list of named trees of class multiPhylo
Allio, R., Tilak, M. K., Scornavacca, C., Avenant, N. L., Kitchener, A. C., Corre, E., ... & Delsuc, F. (2021). High-quality carnivoran genomes from roadkill samples enable comparative species delineation in aardwolf and bat-eared fox. Elife, 10, e63167.
data(carnivora) class(carnivora)
data(carnivora) class(carnivora)
Functions to detect outliers, in matrices or in arrays.
detect.outliers(X, k = 3, test.island = TRUE, normalizeby = "row") detect.outliers.array(arr, nbsp, k = 3)
detect.outliers(X, k = 3, test.island = TRUE, normalizeby = "row") detect.outliers.array(arr, nbsp, k = 3)
X |
2D matrix (gene x species) obtained with the |
k |
strength of outlier detection. High values (typically 3) correspond to less outliers detected than with lower ones (e.g. 1.5). |
test.island |
should islands of outliers be treated as such. If TRUE (the default), only the highest value in an island of outliers is removed. This prevents erroneous outliers due to hicthiking effect to be removed. |
normalizeby |
Should the 2D matrix be normalized prior to outlier detection, and how. Can be "row" (the default),"col" or "none". Normalization is done by dividing columns or rows by their median. |
arr |
Array of values, typically the weight of each gene matrix (alpha values). |
nbsp |
Number of species in the analysis |
These functions detect outliers either in matrices or in arrays. For the method to be adapted to skewed data, as is the case here, the outlier detection method used is the adjusted Tukey proposed by Hubert and Vandervieren (2008).
detect.outliers
: A matrix with outliers detected in the 2D matric. Each row x
contains the
gene (x[1]
) where the species (x[2]
) is an outlier.
detect.outliers.array
: An array listing the outliers detected (if any)
detect.outliers()
: detect outliers in 2D matrix
detect.outliers.array()
: detects outliers in 1D array
de Vienne D.M., Ollier S. et Aguileta G. (2012) Phylo-MCOA: A Fast and Efficient Method to Detect Outlier Genes and Species in Phylogenomics Using Multiple Co-inertia Analysis. Molecular Biology and Evolution 29 : 1587-1598.
Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis, 52, 5186-5201.
# Get the initial gene x species matrix # from the carnivora dataset data(carnivora) mat <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$WR # detect outliers in this matrix outliers<-detect.outliers(mat) outliers$cells # matrix where each row represents one cell in the input matrix
# Get the initial gene x species matrix # from the carnivora dataset data(carnivora) mat <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$WR # detect outliers in this matrix outliers<-detect.outliers(mat) outliers$cells # matrix where each row represents one cell in the input matrix
Dist2WR
computes the 2WR matrix from the results obtaind with DistatisFast
(the fast version of distatis). For internal use mostly.
Dist2WR(Distatis)
Dist2WR(Distatis)
Distatis |
output of the fonction |
A matrix (gene=rows x species=col). Each cell represents a gene/species pair, whose value represents the distance between (i) the position of this species in this gene tree and (ii) the average position of this species in all other gene trees.
data(carnivora) matrices<-phylter(carnivora, InitialOnly=TRUE, parallel=FALSE)$matrices ds<-DistatisFast(matrices, parallel = FALSE) WR<-Dist2WR(ds) #returns the gene x species matrix
data(carnivora) matrices<-phylter(carnivora, InitialOnly=TRUE, parallel=FALSE)$matrices ds<-DistatisFast(matrices, parallel = FALSE) WR<-Dist2WR(ds) #returns the gene x species matrix
New implementation of the DISTATIS method for K matrices of dimension IxI. This version of Distatis is faster than the original one because only the minimum required number of eigenvalues and eigenvectors is calculated. The difference in speed is particularly visible when the number of matrices is large.
DistatisFast(matrices, factorskept = "auto", parallel = TRUE)
DistatisFast(matrices, factorskept = "auto", parallel = TRUE)
matrices |
A list of K distance matrices, all of the same dimension (IxI). |
factorskept |
Number of factors to keep for the computation of the factor. When "auto" (the default), a brokenstick model is used for the choice of the number of components to keep. |
parallel |
Should the matrix products computations be parallelized? Default to TRUE. scores of the observations. |
Returns a list containing:
F
: projected coordinates of species in the compromise
PartialF
: list of projected coordinates of species for each gene.
alpha
: array of length K of the weight associated to each matrix.
lambda
: array of length K of the normalization factors used for each matrix. lambda=1 always.
RVmat
: a KxK matrix with RV correlation coefficient computed between all
pairs of matrices.
compromise
: an IxI matrix representing the best compromise between all matrices. This matrix is the weighted average of all K matrices, using 'alpha' as
a weighting array.
quality
: the quality of the compromise. This value is between 0 and 1
matrices.dblcent
: matrices after double centering
describes how much of the variance of the K matrices is captured by the compromise.
Abdi, H., Valentin, D., O'Toole, A.J., & Edelman, B. (2005). DISTATIS: The analysis of multiple distance matrices. Proceedings of the IEEE Computer Society: International Conference on Computer Vision and Pattern Recognition_. (San Diego, CA, USA). pp. 42-47.
# Get a list of matrices # from the carnivora dataset data(carnivora) matrices <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$matrices # Perform a Distatis analysis on these matrices: distatis <- DistatisFast(matrices, parallel = FALSE) #distatis is a list with multiple elements: distatis$alpha #weigh of each matrix (how much it correlates with others) distatis$RVmat #RV matrix: correlation of each matrix with each other distatis$compromise # distance matrix with "average" pairwise distance between species in matrices # etc.
# Get a list of matrices # from the carnivora dataset data(carnivora) matrices <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$matrices # Perform a Distatis analysis on these matrices: distatis <- DistatisFast(matrices, parallel = FALSE) #distatis is a list with multiple elements: distatis$alpha #weigh of each matrix (how much it correlates with others) distatis$RVmat #RV matrix: correlation of each matrix with each other distatis$compromise # distance matrix with "average" pairwise distance between species in matrices # etc.
Impute missing data in a list of matrices. Matrices are first given the same dimension, then missing entries are filled by computing the average value in non-missing corresponding entries in all matrices.
impMean(matrices)
impMean(matrices)
matrices |
A list of distance matrices. |
Returns a list of matrices with same dimensions, with rows and columns in the same order and missing data (if any) imputed.
data(carnivora) matrices <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$matrices # remove n species randomly (n between 1 and 5) in each matrix (to mimic missing data) fun<-function(mat) { species2remove<-sample(1:nrow(mat),sample(1:5,1)) mat<-mat[-species2remove,-species2remove] return(mat) } matrices.missing<-lapply(matrices, fun) #check that all matrices have now different dimensions: lapply(matrices.missing, dim) # Impute data to get back to the same dimensions matrices.ok<-impMean(matrices.missing) lapply(matrices.ok, dim) #all dimensions are now identical. Missing data have been imputed.
data(carnivora) matrices <- phylter(carnivora, InitialOnly = TRUE, parallel = FALSE)$matrices # remove n species randomly (n between 1 and 5) in each matrix (to mimic missing data) fun<-function(mat) { species2remove<-sample(1:nrow(mat),sample(1:5,1)) mat<-mat[-species2remove,-species2remove] return(mat) } matrices.missing<-lapply(matrices, fun) #check that all matrices have now different dimensions: lapply(matrices.missing, dim) # Impute data to get back to the same dimensions matrices.ok<-impMean(matrices.missing) lapply(matrices.ok, dim) #all dimensions are now identical. Missing data have been imputed.
Computes the medcouple, a robust measure of skewness for univariate data. For multivariate data the medcouple is calculated on each column of the data matrix.
medcouple(x, do.reflect = NULL)
medcouple(x, do.reflect = NULL)
x |
An |
do.reflect |
Logical indicating whether the medcouple should also be computed on the reflected sample |
The medcouple is a robust measure of skewness yielding values between and
. For left- and right-skewed data the medcouple is negative and positive respectively.
The medcouple is defined as the median of the kernel function
evaluated over all couples
where
is smaller than the median of
x
and larger than the median of
x
. When there are multiple observations tied to the median, the kernel is defined separately as the denominator is not defined for these observations. Let denote the indices of the observations which are tied to the median. Then
is defined to equal
if
,
when
and
if
. To compute the medcouple an algorithm with time complexity
is applied. For details, see https://en.wikipedia.org/wiki/Medcouple.
For numerical accuracy it is advised, for small data sets, to compute the medcouple on both
x
and -x
. The final value of the medcouple may then be obtained as a linear combination of both calculations. This procedure is warranted by the properties of the medcouple. Indeed the medcouple of the distribution equals minus the medcouple of the reflected distribution
. Moreover the medcouple is location and scale invariant.
Note that missing values are not allowed.
mc A -vector containing the medcouple of each column of the data matrix
x
.
This function is extracted from the package mrfDepth - 07/2023
P. Segaert with original code from M. Maechler and G. Brys.
Brys G., Hubert M., Struyf A. (2004). A robust measure of skewness. Journal of Computational and Graphical Statistics, 13, 996–1017.
# Calculate the medcouple of univariate data sets. # For 2000 normally distributed values # the medcouple value is close to 0 because # data are not skewed x<-rnorm(2000) medcouple(x) # For 2000 values following a lognormal # distribution (mean 0,sd 1), medcouple is close to 1 # because values are right-skewed y<-rnorm(2000) medcouple(y) # Use the option do.reflect to increase expected accuracy. medcouple(y, do.reflect = TRUE)
# Calculate the medcouple of univariate data sets. # For 2000 normally distributed values # the medcouple value is close to 0 because # data are not skewed x<-rnorm(2000) medcouple(x) # For 2000 values following a lognormal # distribution (mean 0,sd 1), medcouple is close to 1 # because values are right-skewed y<-rnorm(2000) medcouple(y) # Use the option do.reflect to increase expected accuracy. medcouple(y, do.reflect = TRUE)
This function normalizes the 2WR matrix (or any 2D matrix) according to the species (rows) or to the genes (columns), by dividing each row or each column by its median.
normalize(mat, what = "none")
normalize(mat, what = "none")
mat |
A matrix |
what |
Character string indicating whether the matrix should be normalized and how. If what="none", the matrix is not normalized (the default), if what="species", the matrix is normalized so that the difference between species is increased, and if what="genes", the matrix is normalized so that the difference between genes is increased. Normalization consists in dividing either each row or each columns by its median. |
A normalized matrix
# random matrix x<-matrix(rnorm(270), nrow=9, ncol=14) # normalize by row x1<-normalize(x, "genes") # normalize by column x2<-normalize(x, "species")
# random matrix x<-matrix(rnorm(270), nrow=9, ncol=14) # normalize by row x1<-normalize(x, "genes") # normalize by column x2<-normalize(x, "species")
Detection and filtering out of outliers in a list of trees or a list of distance matrices.
phylter( X, bvalue = 0, distance = "patristic", k = 3, k2 = k, Norm = "median", Norm.cutoff = 0.001, gene.names = NULL, test.island = TRUE, verbose = TRUE, stop.criteria = 1e-05, InitialOnly = FALSE, normalizeby = "row", parallel = TRUE )
phylter( X, bvalue = 0, distance = "patristic", k = 3, k2 = k, Norm = "median", Norm.cutoff = 0.001, gene.names = NULL, test.island = TRUE, verbose = TRUE, stop.criteria = 1e-05, InitialOnly = FALSE, normalizeby = "row", parallel = TRUE )
X |
A list of phylogenetic trees (phylo object) or a list of distance matrices. Trees can have different number of leaves and matrices can have different dimensions. If this is the case, missing values are imputed. |
bvalue |
If X is a list of trees, nodes with a support below 'bvalue' will be collapsed prior to the outlier detection. |
distance |
If X is a list of trees, type of distance used to compute the pairwise matrices for each tree. Can be "patristic" (sum of branch lengths separating tips, the default) or nodal (number of nodes separating tips). The "nodal" option should only be used if all species are present in all genes. |
k |
Strength of outlier detection. The higher this value the less outliers detected (see details). |
k2 |
Same as k for complete gene outlier detection. To preserve complete genes from being discarded, k2 can be increased. By default, k2 = k (see above). |
Norm |
Should the matrices be normalized prior to the complete analysis and how. If "median" (the default), matrices are divided by their median, if "mean" they are divided by their mean, if "none", no normalization if performed. Normalizing ensures that fast-evolving (and slow-evolving) genes are not treated as outliers. Normalization by median is a better choice as it is less sensitive to outlier values. |
Norm.cutoff |
Value of the median (if |
gene.names |
List of gene names used to rename elements in X. If NULL (the default), 0 elements are named 1,2,..,length(X). |
test.island |
If TRUE (the default), only the highest value in an 'island' of outliers is considered an outlier. This prevents non-outliers hitchhiked by outliers to be considered outliers themselves. |
verbose |
If TRUE (the default), messages are written during the filtering process to get information of what is happening |
stop.criteria |
The optimisation stops when the gain in concordance between matrices between round |
InitialOnly |
Logical. If TRUE, only the Initial state of the data is computed. |
normalizeby |
Should the gene x species matrix be normalized prior to outlier detection, and how. |
parallel |
Should the computations be parallelized when possible? Default to TRUE. Note that the number of threads cannot be set by the user when 'parallel=TRUE'. It uses all available cores on the machine. |
A list of class 'phylter' with the 'Initial' (before filtering) and 'Final' (after filtering) states, or a list of class 'phylterinitial' only, if InitialOnly=TRUE. The function also returns the list of DiscardedGenes, if any.
data(carnivora) # using default paramaters res <- phylter(carnivora, parallel = FALSE) # perform the phylter analysis res # brief summary of the analysis res$DiscardedGenes # list of genes discarded prior to the analysis res$Initial # See all elements prior to the analysis res$Final # See all elements at the end of the analysis res$Final$Outliers # Print all outliers detected # Change the call to phylter to use nodal distances, instead of patristic: res <- phylter(carnivora, distance = "nodal")
data(carnivora) # using default paramaters res <- phylter(carnivora, parallel = FALSE) # perform the phylter analysis res # brief summary of the analysis res$DiscardedGenes # list of genes discarded prior to the analysis res$Initial # See all elements prior to the analysis res$Final # See all elements at the end of the analysis res$Final$Outliers # Print all outliers detected # Change the call to phylter to use nodal distances, instead of patristic: res <- phylter(carnivora, distance = "nodal")
These functions take objects of class phylter as input and display various plots summarizing the results obtained (see details).
## S3 method for class 'phylter' plot(x, what = "all", layout = 1, sorted = TRUE, ...) plot2WR( x, show.missing = TRUE, show.outliers = TRUE, transpose = FALSE, clust = FALSE ) plotDispersion(x) plotRV(x, what = "Initial", labelnames = TRUE, clust = FALSE) plotopti(x)
## S3 method for class 'phylter' plot(x, what = "all", layout = 1, sorted = TRUE, ...) plot2WR( x, show.missing = TRUE, show.outliers = TRUE, transpose = FALSE, clust = FALSE ) plotDispersion(x) plotRV(x, what = "Initial", labelnames = TRUE, clust = FALSE) plotopti(x)
x |
The object returned by the 'phylter()' function. |
what |
Specifies what to plot. If "species", a barplot will show how many genes each species is in, and what proportion of thoses were detected as outliers. If "genes", a barplot shows how many species contains each gene and how many of them has been detected as outliers. If "all" (the defaut), the two plots described above are displayed one after the other, prompting the user to press ENTER to display the next one. |
layout |
What layout to use. Do not change if you don't know what it is. |
sorted |
Logical. Should bars be sorted by the number of outliers detected. Default to TRUE |
... |
Additional arguments to be passed to plot and print functions. |
show.missing |
Logical. Should missing data be represented on the heatmap. If TRUE (the default), white dots show were these missing entries are in both the initial and final 2WR matrices. |
show.outliers |
Logical. Should outliers be represented on the heatmap. If TRUE (the default), yellow dots indicate outliers on the final 2WR matrix. |
transpose |
Logical. If TRUE, the two matrices are piled up instaed of being displayed side by side. Default to FALSE. |
clust |
Logical. Should the rows or/and columns of the matrices that are plotted be reorderd prior to plotting. Reordering is based on a hierarchical clustering. Default to FALSE. This is conveninent when the names of the genes are very long for instance. |
labelnames |
Logical. If TRUE, the names of labels are indicated on the heatmap. If FALSE they are removed. This is conveninent when the names of the genes are very long for instance. |
plot(x)
and plot.phylter(x)
plot the genes found in each species or species
found in each gene as barplots, highlighting the outliers detected.
plot2WR(x)
plots side by side the initial and the final gene x species (unreable for large datasets)
matrices (the 2WR matrices), highlighting missing data and detected outliers.
plotDispersion(x)
plots dispersion of data before and after phylter, on a 2D
space. Each dot represents a gene-species association.
plotRV(x)
plots the RV coefficient matrix that descibes all agains all correlations between gene matrices
plotopti(x)
plots the compromise matrix score at each step of the
optimization.
The desired plots are returned. Note that you might want to call the pdf(),
png(), jpeg(), or tiff() function first if you want to save the plot(s) to an
external file. You can also use the function write.phylter(pdfreport=TRUE)
to write
all possible plots in a single pdf file.
data(carnivora) # perform phylter analysis res <- phylter(carnivora, parallel = FALSE) # plot for each gene the number of outlier species plot(res, "genes") # plot for each species the number genes where it is outlier plot(res, "species") # plot the dispersion of data before and after the use of phylter plotDispersion(res)
data(carnivora) # perform phylter analysis res <- phylter(carnivora, parallel = FALSE) # plot for each gene the number of outlier species plot(res, "genes") # plot for each species the number genes where it is outlier plot(res, "species") # plot the dispersion of data before and after the use of phylter plotDispersion(res)
Prepare datasets for the phylter
function. Detects possible issues,
discards genes if necessary, imputes missing data if any, and reorders row- and col-names.
For internal usage mostly.
PreparePhylterData( X, bvalue = 0, distance = "patristic", Norm = "median", Norm.cutoff = 0.001, gene.names = NULL, verbose = TRUE )
PreparePhylterData( X, bvalue = 0, distance = "patristic", Norm = "median", Norm.cutoff = 0.001, gene.names = NULL, verbose = TRUE )
X |
A list of phylogenetic trees (phylo object) or a list of distance matrices. Trees can have different number of leaves and matrices can have different dimensions. If this is the case, missing values are imputed. |
bvalue |
If X is a list of trees, nodes with a support below 'bvalue' will be collapsed prior to the outlier detection. |
distance |
If X is a list of trees, type of distance used to compute the pairwise matrices for each tree. Can be "patristic" (sum of branch lengths separating tips, the default) or nodal (number of nodes separating tips). |
Norm |
Should the matrices be normalized and how. If "median", matrices are divided by their median, if "mean" they are divided by their mean, if "none", no normalization if performed. Normalizing ensures that fast-evolving (and slow-evolving) genes are not treated as outliers. Normalization by median is less sensitive to outlier values but can lead to errors if some matrices have a median value of 0. are not considered outliers. |
Norm.cutoff |
Value of the median (if Norm="median") or the mean (if Norm="mean") below which matrices are simply discarded from the analysis. This prevents dividing by 0, and getting rid of genes that contain mostly branches of length 0 and are therefore uninformative anyway. |
gene.names |
List of gene names used to rename elements in X. If NULL (the default), elements are named 1,2,..,length(X). |
verbose |
If TRUE (the default), messages are written during the filtering process to get information of what is happening |
A list of class 'phylter' with the 'Initial' (before filtering) and 'Final' (after filtering) states, or a list of class 'phylterinitial' only, if InitialOnly=TRUE.
data(carnivora) # transform trees to a named list of matrices with same dimensions # and identical row and column orders and names carnivora.clean<-PreparePhylterData(carnivora)
data(carnivora) # transform trees to a named list of matrices with same dimensions # and identical row and column orders and names carnivora.clean<-PreparePhylterData(carnivora)
Print on screen a simple description of the content of phylter objects
## S3 method for class 'phylter' print(x, ...)
## S3 method for class 'phylter' print(x, ...)
x |
Object returned by function 'phylter()'. |
... |
Additional arguments. |
Print formatting
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res)
Print on screen a simple description of the content of objects of class phylterfinal
.
## S3 method for class 'phylterfinal' print(x, ...)
## S3 method for class 'phylterfinal' print(x, ...)
x |
Object returned by function 'phylter()'. |
... |
Additional arguments. |
NA
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res$Final)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res$Final)
Print on screen a simple description of the content of objects of class phylterinitial
.
## S3 method for class 'phylterinitial' print(x, ...)
## S3 method for class 'phylterinitial' print(x, ...)
x |
Object present in the |
... |
Additional arguments. |
NA
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res$Initial)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) print(res$Initial)
Prints on screen the summary of an object of class summmary.phylter
as returned by the summary.phylter()
function.
## S3 method for class 'summary.phylter' print(x, ...)
## S3 method for class 'summary.phylter' print(x, ...)
x |
Object returned by function 'summary.phylter()'. |
... |
Additional arguments. |
NA
data(carnivora) res <- phylter(carnivora, parallel = FALSE) summary <- summary(res) print(summary)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) summary <- summary(res) print(summary)
Name or rename a list of gene trees or gene matrices genes. For internal use mostly.
rename.genes(X, gene.names = NULL)
rename.genes(X, gene.names = NULL)
X |
A list of trees or matrices |
gene.names |
List of names to assign to the elements of X. Must be of the same length as length(X). If NULL (the default) the object are numbered 1,2,...,length(X). |
X with name assigned to each element.
data(carnivora) # names before renaming names(carnivora) carnivora.renamed<-rename.genes(carnivora, gene.names=as.character(1:length(carnivora))) # names after renaming names(carnivora.renamed)
data(carnivora) # names before renaming names(carnivora) carnivora.renamed<-rename.genes(carnivora, gene.names=as.character(1:length(carnivora))) # names after renaming names(carnivora.renamed)
Simple (simplistic) simulation of trees with outliers.
simtrees(Ngn, Nsp, Nsp.out = 0, Ngn.out = 0, Nb.cell.outlier = 0, brlen.sd = 0, out.type="topology",bl.mult=2)
simtrees(Ngn, Nsp, Nsp.out = 0, Ngn.out = 0, Nb.cell.outlier = 0, brlen.sd = 0, out.type="topology",bl.mult=2)
Ngn |
Number of gene trees to simulate. |
Nsp |
Number of species (tips) per tree. |
Nsp.out |
Number of outlier species (also called rogue taxa). 0 = none. |
Ngn.out |
Number of outlier genes. 0 = none. |
Nb.cell.outlier |
Number of times one species in one tree is an outlier. 0 = none. The type of outlier is set by |
brlen.sd |
Heterogeneity of branch lengths in trees. A value with mean 0 and standard deviation equal to brlen.sd is added to each branch length. If the resulting branch length has a negative value, its absolute value is taken. |
out.type |
The type of cell outlier. Can be "topology" (the default), where outlier species
are simulated by branching them elsewhere (randomly) in the tree, "brlength", where terminal branches of outlier species are
multiplied by |
bl.mult |
Multiplier of terminal branches of outlier species when |
The simulation process is as follows: a first tree is generated with the rtree()
function
and is then duplicated and modified according to the parameters chosen by the user.
A list X containing a list of trees in multiPhylo
format (X$trees) and a list of outliers (X$outl).
# Very basic simulator, for debugging purpose mainly. # examples: 30 genes, 120 species, 2 outlier species, 3 outlier genes # 4 gene/species outliers of type "topology", branch length variance = 0.6. # The branch length multiplier is set to 2 but this is # ignored if out.type="topology" simu<-simtrees(30,120,2,3,4,0.6, "topology",2) trees<-simu$trees #list of trees outl<-simu$outl #list of simulated outliers and their type
# Very basic simulator, for debugging purpose mainly. # examples: 30 genes, 120 species, 2 outlier species, 3 outlier genes # 4 gene/species outliers of type "topology", branch length variance = 0.6. # The branch length multiplier is set to 2 but this is # ignored if out.type="topology" simu<-simtrees(30,120,2,3,4,0.6, "topology",2) trees<-simu$trees #list of trees outl<-simu$outl #list of simulated outliers and their type
Get the summary of an object of class phylter
.
## S3 method for class 'phylter' summary(object, ...)
## S3 method for class 'phylter' summary(object, ...)
object |
Object returned by function 'phylter()'. |
... |
Additional arguments affecting the summary produced. |
On object of class summmary.phylter
data(carnivora) res <- phylter(carnivora, parallel = FALSE) summary <- summary(res) class(summary)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) summary <- summary(res) class(summary)
Transform a list of phylogenetic trees into a list of phylogenetic distance matrices.
trees2matrices(trees, distance = "patristic", bvalue = 0)
trees2matrices(trees, distance = "patristic", bvalue = 0)
trees |
A list of gene trees in format "multiphylo". |
distance |
A method to generate distance matrices. It could be "nodal" to establish that the distance between two species is the number of nodes that separate them. Or "patristic" (default) if the distance between two species is the sum of branch lengths separating them. The "nodal" option should only be used if all species are present in all trees (no missing data). |
bvalue |
This argument is only used if trees contain bootstrap values. It determines under what bootstrap values the nodes should be collapsed. Value 0 (the default) means that no nodes are collapsed. |
return a list of distance matrices
data(carnivora) matrices<-trees2matrices(carnivora)
data(carnivora) matrices<-trees2matrices(carnivora)
Write the summary of the phylter analysis in a txt file or as a pdf report.
write.phylter( x, file = "", include.discarded = TRUE, pdfreport = FALSE, pdfreport.file = "report.pdf" )
write.phylter( x, file = "", include.discarded = TRUE, pdfreport = FALSE, pdfreport.file = "report.pdf" )
x |
The object returned by the 'phylter()' function. |
file |
Name of the file where to write the text summary of the phylter output.
If |
include.discarded |
Logical. If TRUE (the default) the elements discarded before the analysis are still in the list of Outliers in the output. Useful for cleaning datasets after a phylter analysis. |
pdfreport |
Logical. Should a full report of the phylter analysis |
pdfreport.file |
If |
No return value
data(carnivora) res <- phylter(carnivora, parallel = FALSE) # write a full report to the standard output write.phylter(res)
data(carnivora) res <- phylter(carnivora, parallel = FALSE) # write a full report to the standard output write.phylter(res)