Genomics and Biomathematics
Each activity is made up of a number of projects. Some of these projects are described here and the project leader/s is listed.
Activity: Making cutting-edge technologies available to New Zealand
Project: Analysing the massive increase in sequence data from the Next-Gen DNA sequencers
Project leader: David Penny
We are now in the middle of a revolution in the amount of DNA sequences becoming available from the new technologies. Our mix of evolutionary understanding, mathematical background, and computing that gives the AWC an important advantage for a wide range of problems, from problems relative to the NZ biota, and to aiding other researchers who need to use the results of the new technologies, and understanding and/or controlling the new small RNAs that are being found all the time. For some simpler problems we have found that we can mix samples and use computers to separate the results – greatly increasing g the rate of data production while reducing costs. We have found and identified more of the roles of small RNA molecules in eukaryote (nucleated) cells, and have produced two reviews and a book chapter on them. We now have an excellent basic theoretical and evolutionary understanding of small RNAs from information theory, origin of life analyses, properties of RNA viruses and relationships between different groups of organisms. Thus one section of our work is integrating the picture of these small RNAs in eukaryotes Because these small RNAs are potentially very important in the control of pathogenesis, we have a broader overview from the theoretical understanding to the applied and practical.
Questions:
- Can we increase the information, and reduce costs, by combining samples in the Solexa runs?
- Can we automate the early processing of data from the Next-Gen sequencers?
- Are the small RNA processing fundamental to all eukaryotes (nuclear organisms), or a later addition (especially in multicellular organisms)?
- Can we learn quickly whether the small RNAs are potential targets for controlling some pathogens?
Activity: Using new genetic knowledge for molecular ecology and evolution
Project: Statistical metagenomics
Project leaders: Dr David Bryant, Associate Professor Alexei Drummond, Dr Howard Ross
New short-read sequencing technologies work well when existing genomes (or closely-related genomes) are resequenced. When the sample consists of a genetically variable assemblage of organisms, and the purpose is to identify the composition of the sample, next generation methods also perform well. However, a significant challenge presents itself when the sample consists of genetic variants of the same species, as may be the case with a sample of virus from a host. In this case, there is no prior information on the genetic constitution of the sample. If the short read sequences are to be used for evolutionary inference, there are two possible options:
1. Estimate the constituent haplotypes
2. Use assembly-free methods to make these inferences.
Our project will look at both of these options. For (1), we have begun implementing the method described by Ericksson et al (2008; PLoS Comput Biol. 2008 May; 4(5): e1000074), but we find that the method can be improved. Once this is done, we will be able to determine the extent to which haplotype estimation adds to the variation of our estimates of evolutionary rate, phylogenetic reconstruction, and other evolutionary parameters (e.g., selection parameters).
The alternative (2) to haplotype assembly is to use site frequencies and/or summary statistics to infer these same parameters. For instance, Liu and Fu (2008; Theoretical Population Biology 74:56-67) have developed a method to estimate evolutionary rates from serially sampled data using summary statistics of site-frequency data. For both (1) and (2), we will use simulated and real data to benchmark these methods against standard sequencing approaches.
Objectives
- To extend existing methods, or develop new methods, for the assembly of haplotypes from short-read sequences obtained from a genetically variable sample
- To estimate evolutionary parameters using these reconstructed haplotypes, and compare these estimates against those obtained with sequences collected by standard methods.
- To develop assembly-free methods for the estimation of evolutionary parameters
Activity: Analytic and predictive modelling in ecology and evolution
Project: Webs without strings: networks for evolution and ecology
Project leaders: Associate Professor Charles Semple and Professor Mike Steel
This project will pioneer the development of new mathematical and computational approaches to represent, model and analyse data in evolution and ecology. In evolution, networks (directed acyclic graphs) provide a tool to describe reticulate evolution -- such as the formation of hybrid species, horizontal gene transfer and recombination -- that simpler phylogenetic tree models fail to capture. At a population level networks can represent family (pedigree) relationships between individuals back in time, recombination of sequences, and haplotype relationships. Yet the analysis of these networks is in its infancy and there are many questions we would like to study (see below). In ecology, networks describe dependencies between species (or populations), familiar examples include food webs. If some species go extinct this can cause a flow-on effect through an ecosystem so it is useful to model the impact of possible future extinction (perhaps due to climate change or other human intervention) using stochastic models based on ecological dependency networks. We will extend simple 'field of bullets' models of species extinction (originally developed by Raup, and Nee and May) to incorporate ecological dependencies and to predict the likely loss of phylogenetic diversity under various scenarios of species extinction.
Questions
- Are networks determined by the genomic at their leaves, or by the trees that they display?
- Can one develop efficient algorithms to reconstruct networks from data? How can one quantify the extent of horizontal gene transfer?
- How should conflicting gene trees be used to build a species supernetwork?
- What effect does ecological constraints impose on the expected loss of future biodiversity?
|