Software and Supplementary Materials

This page contains an archive of supplementary materials from previous AWC publications, and software produced by AWC members that is available to download.

Index-free De Novo Assembly of Mixed Mitochondrial Genomes
Index-free De Novo Assembly of Mixed Mitochondrial Genomes Scripts are described in the paper "Index-free de novo assembly and deconvolution of mixed mitochondrial genomes" by McComish BJ, Hills SFK, Biggs P and Penny D (2010)(Submitted to Genome Biology and Evolution) .

Contact Details
Bennet McComish, Massey University, Palmerston North,  Phone: +64 6 356 9099 extn 2569
index_free_assembly.rar (21 KB)

NTRFinder 1.0
NTRFinder 1.0 is developed to find Nested Tandem Repeats (NTRs). This program takes a fasta file as input and display the output in the textarea of the main page. It is developed under Java JDK 1.5 and requires Java Runtime Environment 1.5+ for execution. (20,844 KB)

Spectronet 1.27
Please uninstall any existing version of Spectronet before installing this version. This new version of Spectronet includes Closest Tree and cluster-powered Fast Hadamard Transform algorithms, the Treeness Triangle for visualising sequence information, and several bug fixes. Consult the in-program help for more information on the new features.
Spectronet127.exe (815 KB)

Min-Max Squeeze
This program implements the methods described in

Holland, B. R., K.T. Huber, D. Penny, and V. Moulton. 2005. The MinMax Squeeze: Guaranteeing a minimal tree for
population data. Mol. Biol. Evol. 22:235-242.
and Pierson, M.J., R. Martinez-Arias, B.R. Holland, N.J. Gemmell, M.E. Hurles, and D. Penny. 2006. Deciphering Past
Human Population Movements in Oceania: Provably Optimal Trees of 127 mtDNA Genomes. Mol. Biol. Evol. 23(10).

for finding lower bounds on the parsimony score of an alignment.
mms (364 KB)

Two-States Triplet Markov — 2STM
2STM calculates Markov matrices from 2-state character data sets with 3 sequences simultaneously. The program reads 4-state character nucleotide data sets and outputs estimates of the three Markov matrices from the root to each taxon. 2STM also calculates the variability of estimates (bootstrap) and some simple statistics, such as composition of nucleotide characters, either in 4 states or 2 states.

Executable Windows program, C code and example data can be  obtained from this WinZip file (224 KB)

Contact Details
Michael Woodhams

Site Strip Search — For site-stripping analyses of nucleotide alignments
This script selects subsets of taxa from a given alignment. The subsets are chosen arcording to the homoplasy of the sites. The resulting data set may be automatically sent to PAUP* or MrBayes for further analysis.
This beta version has been tested on Linux and Windows operating systems.
The Perl script can be downloaded as a zip or gzip archive. (79 KB)
site_strip_search.gz (79 KB)

Contact Details
Warwick Allen

Quartet-Imputation Supernetworks
This program implements the methods described in
Holland, B. R., G. Conner, K. Huber, V. Moulton. 2006. Imputing supertrees and supernetworks from quartets. Systematic Biology (to appear).
and Holland, B. R., G. Conner, K. T. Huber, V. Moulton. 2006. Imputing supertrees and supernetworks from quartets, (1 page abstract). In: 6th Workshop on Algorithms in Bioinformatics (WABI 2006) Eds B. Moret and P. Buchner, Lecture Notes in Bioinformatics. 4175:162.

Contact Details Barbara Holland

Genotyping Utilities Package
GenoTyper Rearranger (GTR) is a utility that converts the output AFLP data from genotyping programs (currently ABI's Gene­Mapper and Soft­Genetics' Gene­Marker) to various formats, allowing easier display and mani­pulation.

AFLP Replicate Difference Calculator is a utility calculates the difference of some parameter (peak height, for example) between replicates in a table of AFLP data. This script takes two inputs: the AFLP data and a table that declares which samples are replicates of which other samples.

See the AFLP page for more details.

The Perl scripts can be downloaded as a zip archive or a gzip tarball. (116 KB)
quartet_source-1_0_tar.gz (36 KB)

Contact Details
Warwick Allen

LineageSpecificSeqgen is an extension to the seq-gen program that allows generation of sequences with both changes in the proportion of variable sites and changes in the rate at which sites switch between being variable and invariable.
Ref: Shavit L, Penny D, Hendy MD, Holland BR: LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites. BMC Evolutionary Biology 2008 (in press). (2,367 KB)

Supplementary Material for Treeness Triangle Paper
White W.T.1, Hills S.F.1, Gaddam R.1, Holland B.R.1, Martin W.2 and Penny D.1 (2007) Treeness Triangles: Visualizing the loss of phylogenetic signal (submitted to Molecular Biology and Evolution) 

Contact Details
1. Allan Wilson Center for Molecular Ecology and Evolution Massey University, Palmerston North New Zealand
2. Institute for Botanik III University of Düsseldorf Düsseldorf Germany Email: 

File Content Format
TT_Suppl_Information_6-3-2007_DP.doc (260 KB) Supplementary tables and figures MS World (3,915 KB) The chloroplast data analyzed in the paper WinZIP
Readme How to set up and use the Treeness Triangle software   HTML (320 KB) Treeness Triangle software for Windows users WinZIP
treeness_triangle_mac_osx_10_3_9_tar.gz (561 KB) Treeness Triangle software for Mac OS X users tar gzip
treeness_triangle_source_tar.gz (51 KB) Source code for the Treeness Triangle software (also needed by of users Linux and other Unix operating systems) tar gzip

 Supplementary Material for Oceanic Paper
Pierson MJ, Martinez-Arias R, Holland BR, Gemmell NJ, Hurles ME, Penny D. (2006) Deciphering Past Human Population Movements in Oceania: Provably Optimal Trees of 127 mtDNA Genomes. Mol Biol Evol. 2006 Jul 19’

Contact Details
Melanie Pierson. Email:

File Content Format
B4a.nex Data set Nexus
B5a.nex   Nexus
M7bcM22.nex   Nexus
M27M28.nex   Nexus
oceaniadataset.nex   Nexus
PwithR21.nex   Nexus
QwithM29.nex   Nexus
supplementary information
DOWNLOADS_Oceanic_supplementary.pdf (1,081 KB)

Supplementary Material for Hexapods Paper
Delsuc F, Phillips MJ and Penny D (2003). Comment on: "Hexapod origins: monophyletic or paraphyletic?". Science 301: 1482d.

Contact Details
Professor David Penny Research Director, Professor of Theoretical Biology, Massey University - Palmerston North Phone: +64 6 350 5033 Fax: +64 6 350 5626 Email:

File Content Format
arthro35-taxonomy.doc (25 KB) Taxonomy of 35-taxon dataset MS World
arthro25-taxonomy.doc (24 KB) Taxonomy of 25-taxon dataset MS World
arthro35-data 35-taxon dataset Nexus
arthro25-data 25-taxon dataset Nexus

Supplementary Material for NTRFinder program
NTRFinder: An Algorithm to Find Nested Tandem Repeats A. A. Matroud , M. D. Hendy and C. P. Tuffley

Contact Details
Atheer Matroud Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Private Bag 11 222 Email:

File Content Format

Using Ancestral Sequences to Uncover Potential Gene Homologs
L. J. Collins1+, A. M. Poole1,2 and D. Penny1

Gene homologs between distantly related species can be difficult to identify. We test the idea that inferred ancestral sequences could aid in finding gene homologs. Ancestral sequences are inferred by aligning gene homologs on a known tree and estimating the most-likely amino acid for each position at each node in that tree. BLAST, HMMER are used separately and together with ancestral sequences, to search the genome sequence databases of Encephalitozoon cuniculi, Entamoeba histolytica and Giardia lamblia for RNase P protein homologs. RNase P proteins, Pop4, Pop1, Pop5 and Rpp21 have been reported in humans and at least two other eukaryotic species but have yet to be identified in the above genomes. Using ancestral sequences reconstruction (ASR) for these proteins, we successfully identified putative homologs from E. histolytica, G. lamblia and E. cuniculi. In some cases the use of ASR outperformed BLAST and HMMER. Overall including ancestral sequences in searches with BLAST and/or HMMER was the most successful approach in the recovery of four potential RNase P protein gene homologs from G. lamblia, making this a useful technique in early homolog identification.

Contact Details
1 Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Private Bag 11222, Palmerston North, New Zealand.
2 Department of Molecular Biology and Functional Genomics, Stockholm University, SE-106 91, Stockholm, Sweden. + Corresponding Author:

File Format
Supp_FiguresB.doc (601 KB) MS World
Supp_Tables.doc (123 KB) MS World

Avian Datasets and Supplementary Information
Contact Details
All Queries to Gillian:

Kerryn E. Slack, Frédéric Delsuc, P.A. (Trish) McLenachan, Ulfur Arnason and David Penny (2006) Resolving the root of the avian mitogenomic tree by breaking up long branches. Molecular Phylogenetics and Evolution (in press)

File Content Coding Format Update
30b6r.12SLnt3ry 30 birds + all 6 reptile outgroups DNA (1+2 NT, 3 RY, S+L NT) Nexus 29/05/06
30b.12nt3rySLnt 30 birds DNA (1+2 NT, 3 RY, S+L NT) Nexus 29/05/06


K. E. Slack, C. M Jones, T. Ando, G. L.(Abby) Harrison, E. Fordyce, U. Arnason and D. Penny (2006) Early penguin fossils, plus mitochondrial genomes, calibrate avian evolution. Molecular Biology and Evolution 23: 1144-1155.

File Content Coding Format Update
25b+6rept 25 birds + all 6 reptile outgroups

DNA (1+2 NT, 3 RY, S+L NT)

Nexus 16/11/05


Harrison G.L., McLenachan P.A., Phillips M.J., Slack K.E., Cooper A., and Penny, D. (2004) Four new avian mitochondrial genomes help get to basic evolutionary questions in the Late Cretaceous. Mol. Biol. Evol. 21(6):974-983.

File Content Coding Format Update
24b6r12n3rSLn 24 birds + all 6 reptile outgroups

DNA (1+2 NT, 3 RY, S+L NT

Nexus 29/05/06
Bird_Taxa_List.doc (33 KB)     MS World 20/2/04
Revised_Bird_Annotations.doc (43 KB)     MS World 20/2/04
Bird_Tables_Headings.doc (23 KB)     MS World 20/2/04
Bird_Tables.xls (96 KB)     MS Excel 20/2/04
Reptile_Taxa_List.doc (23 KB)       MS World 24/7/02
Revised_Reptile_Annotatns.doc (101 KB)     MS World 9/9/02
Reptile_Tables_Headings.doc (23 KB)     MS World 24/7/02
Reptile_Tables.xls (37 KB)     MS Excel 24/7/02 (73 KB)     WinZip 20/2/04
sup_inf_penguin_goose_tar.gz (70 KB)     Tar gzip 20/2/04



Slack K. E., Janke A., Penny D. & Arnason U. (2003). Two new avian mitochondrial genomes (penguin and goose) and a summary of bird and reptile mitogenomic features. Gene 302: 43-52.

File Content Coding Format Update
19bird 19 birds Protein (Amino acid) PHYLIP 26/8/02

19 birds + 2 crocodilians

Protein (Amino acid) PHYLIP 26/8/02

19 birds + 2 lizards

Protein (Amino acid) PHYLIP 26/8/02

19 birds + 2 turtles

Protein (Amino acid) PHYLIP 26/8/02

19 birds + all 6 reptile outgroups

Protein (Amino acid) PHYLIP 29/05/06 (400 KB) ALL datasets   WinZip 20/05/06
datasets_tar.gz (336 KB)     Tar gzip 20/2/04

Recent Sequences
Contact Details

All queries to Matt Phillips Email:

Mammals (current) Nexus
Laurasiatherian AA Nexus
Laurasiatherian RNA Nexus
Laurasiatherian12 Nexus
Pika/Vole AA Nexus
Pika/Vole RNA Nexus
Pika/Vole RNA Nexus


2006 Asian Institute in Statistical Genetics and Genomics at Jeju Islands, Korea 
(lectures by Matt Phillips and Barbara Holland)

SK_OverviewPhylogenetics.ppt (3,906 KB) Overview of Phylogenetic methods and applications MS Powerpoint
SK_DistanceBasedMethods.ppt (379 KB) Distance Based Methods for estimating phylogenetic trees MS Powerpoint
SK_Parsimony_and_search.ppt (606 KB) Parsimony and searching tree-space MS Powerpoint
SK_MaximumLikelihood.ppt (2,107 KB) Maximum Likelihood and model selection MS Powerpoint
SK_BayesianMethods.ppt (2,980 KB) Bayesian Inference and Molecular Dating MS Powerpoint
SK_BtspConsensusSupertrees.ppt (193 KB) The bootstrap, consenus-trees, and super-trees MS Powerpoint
SK_SplitsGraphs.ppt (489 KB) Exploring Phylogenetic Data with Splits-Graphs MS Powerpoint
SK_DifficultProblems.ppt (3,559 KB) Difficult problems ... and solutions MS Powerpoint