A Refined Phylogeny for mtDNA Haplogroup J
This short report presents an updated version of the phylogeny for mtDNA Haplogroup J based on 253 full genome sequences plus 38 sequences that are complete in the coding region but incomplete in the control region.
preliminary phylogeny for Haplogroup J was presented in “The Subclades of mtDNA
Haplogroup J” (
established an initial structure for Haplogroup J, based on 111 full genome
sequences, the next step was
to develop a broader perspective on the haplogroup and the
results of this work were presented in the article, “A Comprehensive Analysis
of mtDNA Haplogroup J” (
availability of mtDNA sequences continues to grow with over 5400 human
mitochondrial "full genome sequences" currently available in GenBank,
of which over 200 sequences are Haplogroup J.
In addition a number of full genome sequences were made available through
the Haplogroup J testing program at Family Tree
Analysis of sequences and
development of a matrix from which the phylogeny was inferred are as described
in previous reports (
A formal definition of the clades is given in the Table 1. Each clade name is included in a box which is indented from the left to show the hierarchical level of that clade. It is followed by a list of the polymorphisms relative to the revised Cambridge Reference Sequence (Andrews et al 1999, Mitomap, 2008). For the convenience of those who wish to use this table to estimate the clade of a sequence using control region test results, the control region polymorphisms are shown in bold and those from the HVR2 region are further italicized. Each polymorphism is shown as a numeric position indicator preceded by a letter indicating the reference sequence allele and followed by a letter indicating the observed allele. Exceptions include insertions where there is no reference value and deletions where a "d" suffix is used indicate the absence of a nucleotide at that position. For back mutations the position number is followed by an "@" rather than repeating the reference value. The underscored polymorphisms indicate that they have significant homoplasic presence in other clades of Haplogroup J. Similarly, parentheses are used to indicate that a given polymorphism is absent in a significant number of sequence, such as due to back mutations. For the convenience of those who may wish to evaluate the support for these definitions, the number of times that the indicated set of polymorphisms occurred in the database is shown near the first column.
A Phylogeny for mtDNA Haplogroup J
This data is also presented here in the form of a two-part graphic. Figure 1 shows the overall structure of mtDNA Haplogroup J, and details of the various subclades, except for the J1c subclade that is detailed within its context in Figure 2.
Figure 1 The phylogeny of mtDNA Haplogroup J in tree format.
Figure 2 Details of mtDNA subclade J1c, shown in context of overall Haplogroup J.
A matrix of the aligned and analyzed haplotypes used in the development of this phylogeny is available in the supplementary material. Note that selected columns of the matrix are lightly shaded to indicate those sequences that are complete in the coding region but not in the control region. Thus, empty cells that are shaded and correspond to control regions polymorphisms should be considered as "not known" rather than "no polymorphism." This matrix, along with the table and figures, will be periodically updated in the supplementary data files as new information becomes available.
The purpose of this brief report is to make the updated phylogeny presented here freely available to all interested parties. However, the results must be considered a work in progress and further refinements may take place as new data is acquired and analyzed. In particular, some of the clade definitions at the end of limbs and branches must be considered tentative because they are based on small sample sizes and will be confirmed or restructured with further analysis incorporating additional data. Furthermore, the nomenclature is subject to change as a result of harmonization with other researchers. For example, an active effort is underway to harmonize this work with the updates to the tree of van Oven and Kayser (2009).
As of this writing there are three clusters that have been flagged for possible future definition as subclades. Each of these are clearly identifiable in the supplementary matrix and all three of them are marked on the graphic version of the phylogeny. However, these are not included in Table 1, which includes the definitions of the subclades.
The first issue is the apparent further subdivision of clade J1c8 that is clearly visible in the supplementary matrix. Upon closer examination it was determined that some sequences were reported to have heteroplasmic results at T16092, whereas others reported simple substitutions. It is probable that the difference is caused by different testing and/or reporting standards. For this release, this polymorphism has simply been ignored. This is of little overall significance since this occurs at the extreme of the phylogeny.
The second indication of possible refinement is the possible addition of a J1c10 based on a 16188 insertion and includes three sequences. Closer examination show that two of these three sequence are identical and the third one differs from these two at a single nucleotide position. Since they all came from the same study, it is possible that they are all from the same family and thus are not independent samples suitable for defining a clade. Thus, this clade is held in abeyance pending confirmation from additional data.
The third is an unresolved reticulation at J2a1a. It appears that G513A and A3447G could be used to define a new branch, but so could T1850C, together with the T insertion at position 310. However these two potential definitions have a substantial overlap making clear definition impossible. This situation also occurs at the extreme of the tree and will likely resolve itself as additional data is gathered.
I wish to thank Mannis van Oven for his thorough review of various forms of the phylogeny presented in this paper and pointing out the back mutation at A2706 that I had missed. Not only have his comments improved this paper but also his collaboration has brought our respective work into basic harmonization and established a firm basis for developing a worldwide consensus for the phylogeny of Haplogroup J.
Note: Corrections added 31 May 2009 and 15 July 2009.
Supplementary data is available at:
Home page for
page for J-mtDNA Project at Family Tree
Website that includes description of Greasemonkey scripts used to extract polymorphisms associated with full genome mtDNA sequences.
Home page for
Reference page for the revised Cambridge Reference Sequence provided by the MitoMap organization.
Search page for retrieving mtDNA sequences from GenBank.
A web page that is
periodically updated to show the entire mtDNA phylogeny as it is
developed. It is maintained by Mannis van Oven at
Benson DA, Karsch-Mizrachi I, Lipman DJ , Ostell J, Wheeler DL (2007) GenBank. Nuc Acids Res, 35:D21-D25 (Database Issue). The database is available at the following URL:
FTDNA (2008) Family Tree
A database that contains publicly available
J-mtDNA Project (2008) The J-mtDNA Project at Family Tree
Logan Ian (2009) Ian Logan website. See Web Resources.
MitoMap (2008) Revised Cambridge Reference Sequence (rCRS) of the Human Mitochondrial
Palanichamy M, Sun C, Agrawal S, Bandelt HJ, et al (2004) Phylogeny of mitochondrial DNA Macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet, 75:966-975.
van Oven M, Kayser M (2009) Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Human Mutation, 30:E386-E394. See also Phylotree.org under Web Resource.