Phylogeny and Evolution

The first image we are used to associate to a phylogeny, is that of a tree (the tree of life, the genealogical tree of a family, the tree of languages..). What emerged in the last decade, and across different disciplines, is that this representation is too reductive in a wide number of cases. In biology, horizontal genes transfer is widely recognized to be a major mechanism of variation for many organisms. In linguistic, loans from a language to another play an important role in the evolution of a language. The identification of these phenomena induces a change in perspective with respect to the old idea of phylogenesis. In particular, in many cases, a clear hierarchy in the evolution lacks, so that phylogeny can no longer be represented as a tree and we have to consider some kind of network instead. This evidence uncovers the gap between the present and the needed analysis tools. While well established results are available for perfect phylogenies (i.e. evolutionary history that can be associated to a tree topology), when a deviation from a tree-like structure has to be considered very little is known, despite the efforts in this direction.

Our acivities can be ordered as follows.

New algorithm for phylogeny reconstruction aimed at providing methods to identify and to correctly take into account deviations from perfect phylogenies. We have recently introduced a new Stochastic Local Search algorithm for distance-based phylogeny reconstruction: SBiX.

Construction of benchmarks: a web experiment. The availability of valid benchmarks is of crucial importance for determining the validity of the different methods used to reconstruct phylogenetic trees. The standard way of testing the proposed algorithms is the construction of models to generate artificial phylogenies, so that the algorithmic results can be directly compared with the true, known, observables of interest. This procedure have an intrinsic limitation: when dealing with real data sets, we do not know which model of evolution is suitable for them. Aiming to overcome this difficulty, we propose experiments that would provide controlled but model-free data. Let us consider the phylogeny of texts. Manuscripts evolved because of errors or intentional modifications during the copying. The idea is to reproduce this evolution in an observable way. Copystree is a first example in this direction. The advantage of this experiment is that every useful information is available since we are able to know every detail of the evolution. Further, the evolution is not driven by any a priory determined force and the possibility of the diffusion in the web enables us to hopefully collect a wide number of data.

Biological applications. The underlying idea of phylogenetic analysis in biological problems is to establish present or future relations between the objects of interest from the knowledge of their evolution. As an example, proteins with similar evolutive paths are likely to share the same functions. Or again, an effective vaccination strategy is strictly connected to the knowledge of the virus or bacterium evolution. This is an open and exiting challenge, which is now possible to face thanks to the increasing amount of data available from high-throughput experiments. We focus on this problem, with a special stress towards the phylogenetic reconstruction of species where the presence of a high rate of recombination makes standard methods ineffective. Indeed, the exchanges of genetic material is one of the major mechanisms that both viruses and bacteria use to escape the host immune response. In such cases, as already stressed, a network, more than a tree, is the correct way to represent their evolution, and how to deal with it is the question we intend to answer.

Applications in linguistics Another interesting field of research is related to linguistics problems. Here the phenomena described in biology can be straightforwardly translated: the horizontal genes transfer and the exchange of gene- tic material are the analogous of the words loans and the cross influence between languages. In this perspective, the same approach can be shared with biological problems. A distinguishing issue is the characterization of a suitable notion of distance between languages. The present standard is to identify a language with a list of words and search for cognates. This implies a systematic work from linguists. We have recently completed a survey of the accuracy in the reconstruction of language trees using both state-of-the-art databases and algorithms.

Publications

2013

Taggi, Lorenzo; Colaiori, Francesca; Loreto, Vittorio; Tria, Francesca

Dynamical correlations in the escape strategy of Influenza A virus (Journal Article)

EUROPHYSICS LETTERS, 101 , 2013.

(Abstract | BibTeX)

Tria, Francesca; Pompei, Simone; Loreto, Vittorio

Dynamically correlated mutations drive human Influenza A evolution (Journal Article)

SCIENTIFIC REPORTS, 3 , 2013.

(Abstract | Links | BibTeX)

2012

Pompei, Simone; Loreto, Vittorio; Tria, Francesca

Phylogenetic Properties of RNA Viruses (Journal Article)

PLOS ONE, 7 , pp. e44849-1–e44849-10, 2012.

(Abstract | Links | BibTeX)

@article{b,
title = {Phylogenetic Properties of RNA Viruses},
author = {Simone Pompei and Vittorio Loreto and Francesca Tria},
url = {http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0044849},
year = {2012},
date = {2012-01-01},
journal = {PLOS ONE},
volume = {7},
pages = {e44849-1--e44849-10},
abstract = {A new word, phylodynamics, was coined to emphasize the interconnection between phylogenetic properties, as observed for instance in a phylogenetic tree, and the epidemic dynamics of viruses, where selection, mediated by the host immune response, and transmission play a crucial role. The challenges faced when investigating the evolution of RNA viruses call for a virtuous loop of data collection, data analysis and modeling. This already resulted both in the collection of massive sequences databases and in the formulation of hypotheses on the main mechanisms driving qualitative differences observed in the (reconstructed) evolutionary patterns of different RNA viruses. Qualitatively, it has been observed that selection driven by the host immune response induces an uneven survival ability among co-existing strains. As a consequence, the imbalance level of the phylogenetic tree is manifestly more pronounced if compared to the case when the interaction with the host immune system does not play a central role in the evolutive dynamics. While many imbalance metrics have been introduced, reliable methods to discriminate in a quantitative way different level of imbalance are still lacking. In our work, we reconstruct and analyze the phylogenetic trees of six RNA viruses, with a special emphasis on the human Influenza A virus, due to its relevance for vaccine preparation as well as for the theoretical challenges it poses due to its peculiar evolutionary dynamics. We focus in particular on topological properties. We point out the limitation featured by standard imbalance metrics, and we introduce a new methodology with which we assign the correct imbalance level of the phylogenetic trees, in agreement with the phylodynamics of the viruses. Our thorough quantitative analysis allows for a deeper understanding of the evolutionary dynamics of the considered RNA viruses, which is crucial in order to provide a valuable framework for a quantitative assessment of theoretical predictions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

A new word, phylodynamics, was coined to emphasize the interconnection between phylogenetic properties, as observed for instance in a phylogenetic tree, and the epidemic dynamics of viruses, where selection, mediated by the host immune response, and transmission play a crucial role. The challenges faced when investigating the evolution of RNA viruses call for a virtuous loop of data collection, data analysis and modeling. This already resulted both in the collection of massive sequences databases and in the formulation of hypotheses on the main mechanisms driving qualitative differences observed in the (reconstructed) evolutionary patterns of different RNA viruses. Qualitatively, it has been observed that selection driven by the host immune response induces an uneven survival ability among co-existing strains. As a consequence, the imbalance level of the phylogenetic tree is manifestly more pronounced if compared to the case when the interaction with the host immune system does not play a central role in the evolutive dynamics. While many imbalance metrics have been introduced, reliable methods to discriminate in a quantitative way different level of imbalance are still lacking. In our work, we reconstruct and analyze the phylogenetic trees of six RNA viruses, with a special emphasis on the human Influenza A virus, due to its relevance for vaccine preparation as well as for the theoretical challenges it poses due to its peculiar evolutionary dynamics. We focus in particular on topological properties. We point out the limitation featured by standard imbalance metrics, and we introduce a new methodology with which we assign the correct imbalance level of the phylogenetic trees, in agreement with the phylodynamics of the viruses. Our thorough quantitative analysis allows for a deeper understanding of the evolutionary dynamics of the considered RNA viruses, which is crucial in order to provide a valuable framework for a quantitative assessment of theoretical predictions.

2011

Pompei, Simone; Tria, Francesca; Loreto, Vittorio

On the accuracy of language trees (Journal Article)

PLOS ONE, 6(6) , 2011.

(Links | BibTeX)

2010

Tria, Francesca; Caglioti, Emanuele; Loreto, Vittorio; Pagnani, Andrea

A Stochastic Local Search Algorithm for Distance-Based Phylogeny Reconstruction (Journal Article)

MOLECULAR BIOLOGY AND EVOLUTION, 27 (11) , pp. 2587–2595, 2010.

(Links | BibTeX)

Tria, Francesca; Caglioti, Emanuele; Loreto, Vittorio; Pagnani, Andrea

A stochastic local search approach to language tree reconstruction (Journal Article)

27 , pp. 341–358, 2010.

(Links | BibTeX)

Caglioti, Emanuele; Loreto, Vittorio; Pompei, Simone; Tria, Francesca

Distance-based Phylogenetic algorithms: new insights and applications (Journal Article)

MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, 20 , pp. 1511–1532, 2010.

(Links | BibTeX)

Tria, Francesca; Caglioti, Emanuele; Loreto, Vittorio; Pompei, Simone

A fast noise reduction driven distance-based phylogenetic algorithm (Inproceeding)

Hamid R. Arabnia Quoc-Nam Tran, Rui Chang Matthew He Andy Marsh Ashu Solo Jack Yang (Eds.) (Ed.): Proceedings of BIOCOMP 2010 (2010), pp. 375–380, CSREA Press 2010, 2010.

(Links | BibTeX)