The first image we are used to associate to a phylogeny, is that of a tree (the tree of life, the genealogical tree of a family, the tree of languages..). What emerged in the last decade, and across different disciplines, is that this representation is too reductive in a wide number of cases. In biology, horizontal genes transfer is widely recognized to be a major mechanism of variation for many organisms. In linguistic, loans from a language to another play an important role in the evolution of a language. The identification of these phenomena induces a change in perspective with respect to the old idea of phylogenesis. In particular, in many cases, a clear hierarchy in the evolution lacks, so that phylogeny can no longer be represented as a tree and we have to consider some kind of network instead. This evidence uncovers the gap between the present and the needed analysis tools. While well established results are available for perfect phylogenies (i.e. evolutionary history that can be associated to a tree topology), when a deviation from a tree-like structure has to be considered very little is known, despite the efforts in this direction.
Our acivities can be ordered as follows.
New algorithm for phylogeny reconstruction aimed at providing methods to identify and to correctly take into account deviations from perfect phylogenies. We have recently introduced a new Stochastic Local Search algorithm for distance-based phylogeny reconstruction: SBiX.
Construction of benchmarks: a web experiment. The availability of valid benchmarks is of crucial importance for determining the validity of the different methods used to reconstruct phylogenetic trees. The standard way of testing the proposed algorithms is the construction of models to generate artificial phylogenies, so that the algorithmic results can be directly compared with the true, known, observables of interest. This procedure have an intrinsic limitation: when dealing with real data sets, we do not know which model of evolution is suitable for them. Aiming to overcome this difficulty, we propose experiments that would provide controlled but model-free data. Let us consider the phylogeny of texts. Manuscripts evolved because of errors or intentional modifications during the copying. The idea is to reproduce this evolution in an observable way. Copystree is a first example in this direction. The advantage of this experiment is that every useful information is available since we are able to know every detail of the evolution. Further, the evolution is not driven by any a priory determined force and the possibility of the diffusion in the web enables us to hopefully collect a wide number of data.
Biological applications. The underlying idea of phylogenetic analysis in biological problems is to establish present or future relations between the objects of interest from the knowledge of their evolution. As an example, proteins with similar evolutive paths are likely to share the same functions. Or again, an effective vaccination strategy is strictly connected to the knowledge of the virus or bacterium evolution. This is an open and exiting challenge, which is now possible to face thanks to the increasing amount of data available from high-throughput experiments. We focus on this problem, with a special stress towards the phylogenetic reconstruction of species where the presence of a high rate of recombination makes standard methods ineffective. Indeed, the exchanges of genetic material is one of the major mechanisms that both viruses and bacteria use to escape the host immune response. In such cases, as already stressed, a network, more than a tree, is the correct way to represent their evolution, and how to deal with it is the question we intend to answer.
Applications in linguistics Another interesting field of research is related to linguistics problems. Here the phenomena described in biology can be straightforwardly translated: the horizontal genes transfer and the exchange of gene- tic material are the analogous of the words loans and the cross influence between languages. In this perspective, the same approach can be shared with biological problems. A distinguishing issue is the characterization of a suitable notion of distance between languages. The present standard is to identify a language with a list of words and search for cognates. This implies a systematic work from linguists. We have recently completed a survey of the accuracy in the reconstruction of language trees using both state-of-the-art databases and algorithms.
Dynamical correlations in the escape strategy of Influenza A virus (Journal Article)
EUROPHYSICS LETTERS, 101 , 2013.
Dynamically correlated mutations drive human Influenza A evolution (Journal Article)
SCIENTIFIC REPORTS, 3 , 2013.
Phylogenetic Properties of RNA Viruses (Journal Article)
PLOS ONE, 7 , pp. e44849-1–e44849-10, 2012.
On the accuracy of language trees (Journal Article)
PLOS ONE, 6(6) , 2011.
MOLECULAR BIOLOGY AND EVOLUTION, 27 (11) , pp. 2587–2595, 2010.
A stochastic local search approach to language tree reconstruction (Journal Article)
27 , pp. 341–358, 2010.
Distance-based Phylogenetic algorithms: new insights and applications (Journal Article)
MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, 20 , pp. 1511–1532, 2010.
Hamid R. Arabnia Quoc-Nam Tran, Rui Chang Matthew He Andy Marsh Ashu Solo Jack Yang (Eds.) (Ed.): Proceedings of BIOCOMP 2010 (2010), pp. 375–380, CSREA Press 2010, 2010.