Where did I come from?

Phylogenetic Relationships among Humans, again

Allan Wilson, with Linda Vigilant and other colleagues, estimated the phylogenetic relationships among the humans whom they sampled. Here you will be able to recreate their analysis to decide which of the competing hypotheses is supported. There are two parts: estimating thephylogenetic relationships among modern human populations and estimating the age of the common ancestor of all modern humans, and of all non-Africans.

Phylogenetic Relationships
The evolutionary relationships among people from different geographic regions will provide important evidence about the origins of modern humans. Here you have the opportunity to study the datasets of genetic sequences obtained by Wilson and his colleagues. Your analyses should help you to decide which hypothesis is a better explanation of human origins.

Get Geneious
Take a few minutes now to download and install Geneious, software for managing genetic sequences and constructing evolutionary trees. Spend a few minutes to become familiar with navigating in the program. Return to this window when you are ready to proceed. [Get Geneious]

Get Figtree
Take a few minutes now to download and install FigTree, software for viewing and manipulating phylogenetic trees. Return to this window when you are ready to proceed. [Get FigTree]

Get the Sequences
Two versions of the dataset are available. The large set HumanDloop_large.zip (3 KB)   contains 100 human sequences, while the small set HumanDloop_small.zip (2 KB)  contains a subset of 30 sequences. Decide whether you want the large or small dataset. Then right-click on the corresponding large or small sequence image, choose Save the Link As ... and save the file on your desktop computer (Please extract zip file before use) .

Import the Human Sequences into Geneious
The sequences obtained by the Wilson group need to be imported into Geneious.

  • Make a working folder to store your analysis.
    • In Geneious, click on the Local folder in the left-hand Sources window.
    • In the File menu, choose New folder to make a new folder.
    • Give the folder a name.
  • Import the sequences.
    • Select your new folder.
    • In the File menu, choose Import ... From File ...
    • Select Auto-detect format and click OK.
    • Browse to find your file and import it.


The sequences are presented in an alignment. You can scroll through the samples or along the sequence to see how the samples differ from one another.

Build the tree
Now, you'll build the phylogenetic tree to identify the relationships among geographical regions.

  • In Geneious, open your working folder and select the alignment which you downloaded.
  • Click on the tree building tool tree_icon.png
  • In the dialog box that appears, choose the options shown below. As the Outgroup, choose the Chimpanzee sequence. 


The tree looks pretty complicated! [Have a peek.] How are you going to interpret this and compare it with the predicted trees?

Making the tree more readable
As we saw in the first study, there are some things that you can do to make the tree more manageable.

  • First you need to export the tree from Geneious and then import it into FigTree.
    • In Geneious, select the tree document.
    • From the File menu, choose Export : Selected Documents.
    • In the list of formats, choose Nexus.
    • Enter a filename, such as humantree and save the file on your desktop computer.
    • Open Geneious, and in the File menu choose Open.
    • Browse to your file humantree.nex and open it.
    • Note: if you have the professional version of Geneious, then you can perform the following functions in it.
  • Zoom in and Expand - make the details of the tree easier to read
  • Order nodes - arrange the tree by rotating clades (branches) to give it a tidy appearance
  • Colour the samples - select all of the samples from each region and give them a distinctive colour. You could colour the trees in at least two ways:
    1. Colour the samples from each major region (e.g., Africa, Asia, etc).
    2. Colour the samples from each region of Africa but leave the others Black.

What different types of information do you get from the two strategies for colouring the tree?
These actions should be bringing your tree into a more readable form.

Interpret the Tree
Now you need to assess the tree, to decide whether it supports one or the other of the two hypotheses of human ancestry. Recall the differences in the trees predicted by the Multi-regional and Recent African Origin Hypotheses.

Here are some features of the tree which you need to consider:

  • Are there several deep divergences in the tree, or just one?
  • For the samples from each geographic region, how deep in the tree is the most recent common ancestor?
  • Do the samples from each geographic region have the same amount of genetic variation?
  • Do the African samples show a recent or an old degree of relationship with samples from other regions?
  • For each non-African region, is there a tendency for them to have a single, or closely-related set of ancestors, or were there multiple lineages among the ancestors?

Are there major differences between this tree and the tree obtained in the first study?

Which hypothesis, Multi-regional or Recent African Origin, do you think is supported by this tree?

Consider the placement of African Americans in your tree. What does it tell us about their maternal ancestry? Were Wilson and his colleagues correct in using them as representatives of Africans in the first study? Does the placement of African Americans strengthen or weaken the support for either hypothesis?

If the Recent African Origin Hypothesis is correct, were all African populations represented among those who migrated out of the continent?

Age of Most Recent Common Ancestor
Recall that the Multi-regional Hypothesis predicts an ancient ancestor for all humans, occurring about a million years ago, while the Recent African Origin Hypothesis predicts a much more recent common ancestor.

Again, Wilson and colleagues estimated the age of the common ancestor of modern humans using the molecular clock. Certain types of genetic changes accumulate at an approximately constant rate. We can estimate that rate by comparing organisms which have been separated for a known period of time. When we observe genetic differences between two other organisms, we can estimate the date of their common ancestor as

Date of common ancestor = Observed genetic difference / Rate of Genetic change

Wilson had earlier estimated that the chimpanzee and human lineages diverged about 4 - 6 million years ago. He estimated that the amount of genetic divergence in mtDNA between humans and chimpanzee is about 69.2%. From this, he estimated that the rate at which genetic differences accumulated was 11.5 - 17.3% per million years.

Look at your tree and try to find the common ancestor of all humans. Use Wilson's estimate of the rate of genetic change to infer the age of modern humans.

Hi Rate
Lo Rate
  Clade of all humans 2.87%


Is this age similar to the age obtained in the first study?

Which hypothesis, Multi-regional or Recent African Origin, do you think is supported by these ages or dates?

More on Out of Africa
How can we estimate the age of the common ancestor of non-Africans? In other words, if non-Africans are descended from a group of Africans who left that continent, then when did they do so? Wilson and colleagues did not explicitly address this question in this study. Have you any ideas on how we can estimate this date? Is there information in the tree that you estimated which might help?

Remember that the molecular clock is based on the idea that change occurs at a relatively constant rate. The amount of genetic divergence in a clade of the tree should increase with the age of the clade. Evolutionary biologists have identified many different biological and statistical reasons why this does not always hold, but the general trend remains.

So if the members of a clade on a tree have 75% as much genetic divergence as the members of a larger, enclosing, clade then as a rough estimate the smaller clade is 75% as old as the larger clade.

We can estimate the age of human departure from Africa as 
(age of human ancestor) 
 * (divergence in clade containing all non-Africans) 
  / (divergence in clade containing all humans)


AgenonAf = AgeallHuman * DnonAf / DallHuman

To get those estimates of divergence, let's return to the tree that you estimated in Geneious. Click on the node at the base of the clade containing all humans. This should select all of the human, but not the chimpanzee, sequences.


Now switch from the Tree View to the Alignment View.


Go to the Statistics panel on the right side of the tree. It will show you the Pairwise Percent Identity. This is a measure of the average similarity among the human sequences. So the average divergence is simply 100% - %Identity. In this example the average divergence is 100% - 96.8% = 3.2%.


Next you need to find the base of the clade containing all non-Africans. [It will contain Africans, of course. Why?] Find the Pairwise % Identity for this clade, and from it get the divergence.



So let's return to the table of ages. Fill it in using the age of all humans estimated above, and the divergences estimated from your tree to estimate the date when humans may have left Africa.

Your Observed
Ratio of 
Hi Rate
Lo Rate
  Clade of all humans ________%  
  Clade of all non-Africans ________%


What is the estimated range of dates for the departure of humans from Africa?

Are these dates similar to those obtained in the first study?