A Comprehensive Analysis of mtDNA Haplogroup J
In the furtherance of a better understanding of human genetic origins and migration history, the Federal GenBank database was mined for all Haplogroup J full-genome mtDNA sequences plus additional sequences that are complete for the coding region. These data were used to develop a phylogeny for Haplogroup J using a matrix developed to show polymorphisms for each sequence organized within clades of the haplogroup. The diversity within clades was then used to compute estimates of the age of each clade. In the process, polymorphisms were analyzed to show their relationship to various genes as well as their relationship to selected medical conditions as reported in the literature. Finally, the literature was reviewed for relevant phylogeographic data toward the ultimate development of a comprehensive history for human mtDNA Haplogroup J.
analysis of mitochondrial
Until very recently, sequencing has typically been limited to the control region (displacement loop) of the mtDNA genome, which contains two hypervariable regions (i.e., regions of significantly higher mutation rates than the coding region) which provided relatively more information for a given length of sequence. It was soon discovered, however, that these hyper variable regions have significantly higher instances of back mutations and homoplasies, (i.e., the occurrence of a given polymorphism in more that one haplogroup or even clades of the same haplogroup), thus leading to ambiguities and uncertainties for haplogroup assignment. Some studies then turned to the use of selected markers from the coding region for the broad classification into haplogroups and then used results of sequencing the hypervariable regions to develop the clade structure within the haplogroup. This has been successful for many purposes, but can lead to errors in specific haplogroups. For example, although polymorphisms at nucleotide positions (nps) 16126 and 16069 are adequate for identifying Haplogroup J, and sequencing the complete control region can provide some substructure for J1, there are no polymorphisms within the hypervariable region 1 (HVR1) to cleanly differentiate the J2 clades from those in J1 (Logan, 2008). Whether the purpose was to study a geographic region, a specific disease, or some other purpose, scientists have now published a sufficient number of full-genome sequences to permit a multi-level phylogeny for Haplogroup J and develop estimates of ages of the various clades.
sequence data used in the current analysis was extracted from the GenBank
database maintained by the
Studies Cited and the Geographic Locations of the Haplogroup J Sequences Used in the Present Study
156 sequences selected, 111 are the same sequences used in the previous study (
Each of these sequences was parsed and a matrix was developed to include a column for each sequence and a row for each polymorphism identified. This matrix is the reference for both a detailed analysis of the polymorphisms (including a survey of medical relationships) and the refinement of the Haplogroup J phylogeny. These data were used to compute the average number of polymorphisms in each branch of the phylogeny and to estimate the age of the clades in the phylogeny.
certain limitations of this matrix and its origins should be noted. First, although the ethnic origins of donors
were generally from European populations or from those located in the western
or southern regions of
there was no uniformity in the
Third, there is no data about either the age or gender of the donor. As shown in the section on Medical Implications below, both factors are significant in analysis of certain diseases and in longevity studies.
Analysis of the Polymorphisms
development of a phylogeographic analysis is dependent on both good geographic
data and characterization of the
of 156 sequences of Haplogroup J identified 411 distinct polymorphisms, of
which 106 were observed three or more times.
The 243 singletons and 62 doubletons, representing almost three-fourths
of the total polymorphisms observed, are apparently rare within J. Although these rare polymorphisms are not
significant in defining the basal phylogeny, they are useful for inferring the
ages of the clades of that phylogeny, and as additional
base-pair length of the Cambridge Reference Sequence (
can be arranged into two major categories–those that involve point substitution
(transitions, transversions, and heteroplasmies) and those that affect the
length of the sequence (insertions and deletions, or indels). If a mutation of the first category occurs
within a gene, that mutation has the potential for making a change in the
protein for which the gene encodes and thus affecting the phenotype. However, since there is redundancy in the
genetic code, many of these mutations (referred to as synonymous mutations) do
not result in an amino acid substitution and, thus no change in the protein for
which the gene codes. A mutation of the
second category occurring within a gene results in a shift in the reading frame
which can cause a complete failure of the production of the prospective
protein. On the other hand, with the
possible exception of interfering with replication of the
of polymorphisms was developed from the results of comparing each sequence in
the reference database with the revised Cambridge Reference Sequence (Andrews,
1999). The sequences in the reference
database, as identified in Table 1, were extracted from GenBank (Benson
et al., 2007) and the polymorphisms were identified through the use of
A summary of the type of polymorphism versus its locus class is provided in Table 2. Note, however, that polymorphisms in the control region are probably under-reported since some of the sequences in the reference database were not complete in that region. Of the 411 polymorphisms detected, only 20 (4%) were insertions or deletions (indels) and these occurred primarily in the non-coding region with a few occurring within the region that codes for the ribosomal RNA. The fact that none occurred in either the genes or in the transfer RNAs is probably due to the deleterious effects that would result and thus would not be passed along in the germ line.
Distributions of Types of Polymorphisms Across the Mitochondrial Genome
Of the 20 indels detected, one deletion and three insertions occurred within regions defining ribosomal RNA. Each of these is associated with a successive repeat sequence within that RNA and thus impact would be expected to be minimal. For example at positions 2141 through 2149 of the revised Cambridge Reference Sequence (rCRS) there is a pattern of four AG repeats. The insertion shown as 2149.1A and 2149.2G simply extends the length of this repeat sequence to five repeats. All remaining indels occurred in non-coding regions and all but three of these are also associated with repeat sequences. For example, at locations 514 through 523 of the rCRS there is a pattern of five CA repeats, CACACACACA. There are eight instances of C522 and A523 deletes, reducing the length to four, but there are also two instances of a 523.1C and 523.2A, extending the length to six, and one instance of a 523.1C and 523.2C (See Hurst (2007) for further discussion on length heteroplasmies).
Most of the insertions observed were associated with repeats of a single nucleotide type – most commonly a C. For example the 309.1C insertion was observed 48 times in the sample set of 118 full genome sequences. This insertion relates to the well known sequence from 303 through 315 of the rCRS which consists of a sequence of seven C repeats followed by a T and this followed by five C repeats, CCCCCCCTCCCCC. The 309.1C indicates that there was the insertion of a C after position 309–that is, insertion of a C somewhere before the T in the above sequence. Associated with the same sequence there were also insertions 309.2C, 310.1T and 315.1C.
substitutions, the vast majority (89 % of the total) were simple transitions
where a purine was substituted for a purine or a pyrimidine was substituted for
a pyrimidine. A little over 4% of the
substitutions, however, were transversions (mostly singletons) where a purine
was substituted for a pyrimidine or visa versa.
Less than 2% were heteroplasmies – a polymorphism within a single
organism where the state at a given locus in some
Table 3 shows how each of these polymorphism types were distributed throughout the various segments of the mitochondrial genome. Note that due to several small overlaps in segment definitions, the lengths of the segments add to slightly greater that the 16569 base pair length of the rCRS genome. As an indication of variability of polymorphisms across the genome, the table also shows the polymorphism density defined as the ratio of the number of polymorphisms within a gene or region divided by the length of that sequence. Note that considering the small numbers involved, the density of polymorphisms throughout the genes encoding for proteins is fairly uniform with an average of 2.1% compared to the 8.6% for the control regions. This four-to-one ratio is no doubt low because of the incompleteness of some of the available sequences as described above. The frequency of polymorphism in the genes for ribosomal RNA is somewhat lower at 1.0%. The control region, which accounts for less than 7% of the mtDNA genome, produced over 23% of the polymorphisms.
Statistical Distribution of Polymorphisms for Various Regions of the Mitochondrial Genome
A single nucleotide
change within a sequence can cause deleterious or advantageous changes in the
performance of mitochondrial-coded products (e.g., proteins). Such changes can be inherited through the
gene line from mother to child or they may occur somatically within selected
tissues of the individual. Several
recent studies have shown correlation between the frequency of selective
mutations and a variety of diseases and longevity itself. Such correlation, however, does not
necessarily imply a cause and effect relationship. There are very complex relationships between
the workings of mitochondrial
Aging and longevity, as complex traits having a significant genetic component, likely depend on many nuclear gene variants interacting with mtDNA variability, both inherited and somatic. We also surmise that what we hypothesize for aging and longevity could have more general relevance and be extended to other complex trains, such as age-related diseases like cardiovascular diseases and diabetes . . .
Alzheimer’s Disease and Parkinson’s Disease.
The description of such nuclear and mitochondrial
Polymorphisms Observed in the Haplogroup J Reference Database that Have Been Reported as Associated with mtDNA-Related Diseases
study of the relationships between mtDNA polymorphisms and aging, De Benedictis
et al. (1999), found that 23% a group of centurions in northern
In a similar study of an Irish population (Ross et al., 2001), Haplogroup J was singled out for special study of longevity. No significant association was found when considering that haplogroup as a whole. However, when they separated the samples into two categories based on restriction fragment analysis, they found that one category had a much higher frequency of centenarians than that the control group whereas the other had a much lower frequency. Then, in a later paper (Ross et al., 2003), and using the same population, they looked specifically at Parkinson’s disease. They found of the 12% of the population that was diseased, 2% were in one J group whereas 10%% were in the other J group. They called the first group J1 and the second J2 but unfortunately, their subdivision cannot be correlated with the subclades of J found in the present study since the polymorphic restriction sites have not been identified or to correspond to any polymorphisms found in the reference database.
related study of the control region only, Zhang et al. (2003) looked at 207
subjects from Northern, Central, and
The somatic event(s) at or near position 150 transition may be part of a general remodeling of the mtDNA replication machinery, probably nuclearly controlled. This remodeling could accelerate mtDNA replication and compensate for the oxidative damage of mtDNA and its functional deterioration occurring in old age.
The current study found that T150C occurred exclusively in the J2 subclade of Haplogroup J and is thus a strong indicator of that subclade, although not definitive. The reason for this phenomenon has not been determined.
The latest available study to look at the relationship between longevity and Haplogroup J found no significance in the Ashkenazi Jewish centenarians relative to their control group (Shlush, et al, 2008). Although they referenced the study by Zhang (2003), who pointed out the possible significance of the polymorphism 150C, they missed an opportunity for follow-up testing in their well defined and well understood study population. Unfortunately, 150 is not within the narrow range of the control region they sequenced (16024-16300). Similarly, they would be required to acquire additional test data to permit them to assess the possible borader relationship between longevity and the J2 clade for which 150C is an indicator.
The disease most commonly associated with mtDNA Haplogroup J is Leber’s Hereditary Optic Neuropathy (LHON), also known as Leber Optic Atrophy (LOA). This disease occurs about five times more frequently in Haplogroup J than it does in the general population (Torroni et al., 1997). LHON is a maternally inherited disease that presents itself in adolescence or adulthood and can lead to partial or total blindness (Wallace 1988). Although some twenty-five mtDNA variants have been observed to be related, the primary mutations are G3460A, G11778A, and T14484C (Brown et al., 2002). One or another of these mutations is found in ninety percent of the families with LHON, although they rarely occur together (John Hopkins, 2008). Of the 156 sequences in the reference database, G11778A occurred four times (twice in J1c4 and twice in J1d), T14484C occurred twice (once in J1d and once in J2b1), and G3460A occurred once in J1c5. MitoMap (Ruiz-Pesini et al., 2007) also listed two reports of progressive dystonia as associated with LHON and specifically with G11778A. The insulin resistance associated with T4216C may just be due to that position being a point mutation for the super-haplogroup JT.
Within the Haplogroup J population, the polymorphism most commonly associated with either Parkinson’s or Alzheimer’s disease is G5460A, which, incidentally, is one of the two definitive coding region markers that define subclade J1b1. In addition both Parkinson’s and Alzheimer’s are highly correlated with deterioration of mitochondrial performance, brought on by increasing frequency of polymorphisms, many, or most of which are in heteroplasmic form.
showed a relationship between the T11084C polymorphism and the disease MELAS
(mitochondrial myopathy, encephalopathy, lactic
acidosis, and stroke-like episodes). A
search of the associated bibliography showed only a weak statistical
association and that the most common polymorphism for the disease is at
position 3243, which was not observed in the reference database. Finally, T16189C has been reported as being
associated with various diseases including type 2 diabetes,
cardiomyopathy, and e
a major study currently underway in
A Refined Phylogeny
initial phylogeny for mtDNA haplogroup J was presented in an earlier paper (
As described in the earlier paper, this phylogeny was developed using a maximum parsimony approach ignoring insertions and deletions (see Analysis of The Polymorphisms above). In addition, the polymorphisms located at sites 16311 and 16519 were excluded from the analysis as being too variable to be useful. However, Hagelberg (2003) has suggested that 16311, and possibly 16519, could be the result of ancient recombination. No recent study has been found to support this hypothesis. Future research may ultimately show utility of these polymorphisms.
The refined phylogeny is present in graphic form in Figure 1. The supporting data is available in the supplementary files. Note that this chart includes polymorphisms that are in parentheses or are underlined to indicate special conditions. For example the 185 and 228 shown as markers for J1d are both in parentheses because they appear to be subject to back mutations with neither of them appearing in all samples for the J1c clade, nor either of them defining a proper subclade of J1c. However, of the 74 full-genome sequences that are classified as J1c, all but two include one or both of these markers and there is only one occurrence outside the J1c subclades. Similarly the polymorphisms at 152 and 16193, shown in conjunction with subclades J1c, appear to have originated more than once within the haplogroup. These and similar special markers are included to be used as classification aids for cases that are not full genome sequences, but do have sequences from the control region.
Age of The Clades
the first uses of molecular biology to determine the age of the human species
was just over 40 years ago. Sarich and Wilson (1967) looked at the variations of serum
albumins (a blood protein) in humans and non-human primates and concluded that
the split between homo, chimpanzee, and gorilla was approximately 5 to 8
million years ago. For calibration, they
used the assumption that hominoids in general separated from the old world
monkeys 30 million years ago. Within a
decade of that study, techniques were sufficiently developed to analyze the
another decade was complete, excitement was aroused in the press and
anthropology community when Cann et al. (1987) used
mtDNA variations to propose that the current human population “stems from one
woman who is postulated to have lived about 200,000 years ago, probably in
Subsequently, Mishmar et al. (2003) used the 53 sequences of Ingman and Gullenstein, but added 48 from African, Asian, European, Siberian and North American populations, to conclude that there are significant differences between geographic populations caused by natural selection brought on by differences in climate and diet. Comparing the ratio of non-synonymous to synonymous mutations within the various genes, they found significant differences between tropical, temperate, and arctic-based populations. Based on estimated coalescence dates for various haplogroups, they estimated the mtDNA evolution rate to be 1.26 x 10-8 substitutions per nucleotide per year.
alternate basis for calibration of substitution rates was demonstrated by Stoneking et al. (1992; 2005) by capitalizing on a founding
event to analyze the population of
described above estimated mutation rates based on evolutionary models with
calibration typically based on assumed date of separation between humans and
chimpanzees. Attempts have also been
made to compute mutation rates directly from pedigree data. Early divergence estimates were typically
obtained using family data developed for disease studies and consisting of very
small sample sizes relative to the rates being estimated. Nevertheless, the general conclusion was that
divergence rates for pedigree data were approximately an order of magnitude
higher that evolutionary rates (e.g., Howell et al., 2003.) However, as described by
This is a
good point to note the imprecision of terminology between mutation rates and
substitution rates. Mutation rate has to
do with the actual change in a
The problem of calibration and the variability of mutation rates across the mitochondrial genome have been studied in some detail by Endicott and Ho (2008). Eventually we will be able to account for more of the variability in our analysis. In the meantime, the present work takes a very straightforward but simplified approach for computing the ages of clades of mtDNA Haplogroup J. A substitution rate of 1.7 x 10-8 substitutions/site/year for the coding region was chosen as representative of the literature. Using 15447 for the number of base-pairs in the coding region, this converts to 3808 years per substitution. For each clade the mean length of the branches (i.e., the average number of substitutions observed back to the defining polymorphisms) is multiplied by this factor. The result is an estimate of the coalescence time, or Time to the Most Recent Common Ancestor (TMRCA) of the members of that clade. The result of these computations is given in Table 5 and shown on a time-scaled phylogeny in Figure 2. It should be noted, the standard deviation of length, and subsequently the range of ages estimates, is related to the variability of the data; it is not a confidence interval relative to the estimated age.
Estimated ages of the clades of mtDNA Haplogroup J
Figure 2. Estimated ages of the clades of mtDNA Haplogroup J
ages should be taken as indicating the approximate relative ages of the
clades. The astute reader will notice
anomalies within these ages. For
example, mechanistic computations produced an age for J2 and J2a that are
somewhat older than J as the complete clade.
This is an artifact of the ra
After describing caveats in their extensive review of status of mutation rates, Bandelt et al. (2006) concluded that the
. . . extreme form of weighting that only accepts the coding region but rejects the entire control region is at best provisional and certainly not recommended in the long run. An informed strategy would use rules to decode on a site-by-site basis and contrast synonymous with non-synonymous mutations.
The technology and data should be available to do such a study in the next few years. For example, data collected in association with the Genographic Project has been used to develop substitution rates for a few selected polymorphisms within the coding region (Rosset et al., 2008).
Origins and Migrations
general agreement that there have been three major movements in the peopling of
One approach to develop such details is the use of genetics and founder analysis to identify populations, date them through using substitution rates for calibration, and analyze the associated geographic data (Stoneking et al., 1992). Phylogeographic analysis, that is the geographic profile of clusters of haplotypes, can provide the basis for inferring geographic origins of selected populations, and probably migration paths. Such inferences take on additional importance in anthropology and population genetics when they are supported by studies from archaeology, climatology, ecology, and linguistics.
One of the earliest uses of the founder analysis approach was the work of Torroni et al. (1992), which concluded that the Amerind and Nadene populations Native Americans were primarily from two independent migrations that probably occurred several thousand years apart. However, using the modern technique of Bayesian skyline plot analysis (Drummond et al., 2005), Mulligan et al. (2008) have developed a three-stage model for the peopling of the Americas; this was one long migration sequence that included three identifiable stages: (1) divergence of Amerind ancestor from the Asian gene pool, (2) a prolonged period of isolation, and (3) rapid expansion into the Americas with a large population increase.
al. (1997) demonstrated the potential of mtDNA founder analysis when they
analyzed data from nine distinct European and West Asian populations and
performed analyses to identify statistical similarities between them. Each population came from published samples
from a different research team that focused on a specific geographic area,
including a Basque, British, Sardinian, Swiss, Tuscan, Bulgarian, two different
Turkish, and a Middle Eastern region.
Although differences appeared to be quite low when compared to other
world populations (e.g.,
large-scale phylogeographic study of mtDNA in
much expanded study group, Richards et al. (2000) “formalized the procedure for
founder analysis, investigated the extent of confounding recurrent gene flow
between the putative source and derived populations, and developed criteria
that take into account the effects of both gene flow and recurrent
mutations." Among their results, they
refined the overall age of Haplogroup J to 42,400-53,700 years as determined
from the Near East samples and to 23,000-27,400 years as determined from
European samples. The corresponding ages
for Haplogroup T are 41,900-52,000 and 33,100-40,200 respectively. Although these two clades were apparently
contemporary in the
attempt to identify and describe the effects on mtDNA of “demographic phenomena
dating back to the Paleolithic, the Mesolithic, or the Neolithic” periods,
Simoni et al. (2000) collected 2619 mtDNA sequences for HVR1 distributed over
36 regions of Europe. Although the
sample size was relatively small in some regions, they developed an overall table
of frequencies for the major haplogroups in each of the regions. No occurrences of Haplogroup J were
identified is several regions such as
not yet available a comprehensive founder analysis for Haplogroups J or T
origin of Haplogroups J and T in the
Malyarchuk and his associates did a series of studies of Eastern European populations relating to the origin of the Slavs: Russians and Ukrainians (Malyarchuk and Derenko, 2001), Poles and Russians (Malyarchuk et al., 2002), Bosnians and Slovenians (Malyarchuk et al., 2003), and Czechs (Malyarchuk et al., 2006). In each of these studies they found that most of the mtDNA found belonged to western haplogroups (H, HV, J, T, U, N1, W, and X). Within this broad similarity, they did find heterogeneity between regions with a very broad north-south correlation between their test populations and the corresponding regions to the west. The overall frequencies of Haplogroups J and T found in each region are shown in Table 6.
of Haplogroups J and T within
and his associates also investigated the origin of the Roma (Gypsies) in
of extraction and analysis of mtDNA has progressed to the point where studies
a recent study was conduced to provide “a more complete characterization of the
mitochondiral genome variability of the Basques”
(Alfonso-Sanchez et al., 2008). They
sequenced HVR1 and HVR2 of 55 healthy men selected to be non-related based on a
three-generation pedigree charts. The
most interesting result from that study was the high frequency of J, especially
J1c and J2a with frequencies of 10.9% and 3.6% respectively. This 14.5% total J is in sharp contrast to
the 2.4% commonly referenced for the Basques.
On the other hand, it is in line both with the results from ancient
et al. (2000) were cited above as the team that formalized founder analysis of
populations using mtDNA data.
Thirty-five team members were represented as co-authors of that paper
and the supplementary data they produced deserves a more detailed review. Their database (Macaulay, 2001) includes
results of HVR1 analysis of 4100 samples from 24 widely distributed regions of
the Near East and
For the present study sample sizes and counts for Haplogroups J and T were extracted from the Macaulay database for each geographical region, and the frequencies of the corresponding haplogroups were computed. The results are shown in Table 7. The average length for each of the haplogroups is also shown.
Summary of Geographic Analysis of Haplogroup Data Extracted from Macaulay (2000)
showing the frequency by regions for Haplogroups J and T are presented in Figures
3 and 4, respectively. However,
caution must be exercised to avoid reading too much into these maps. In addition to dealing with relative small
numbers for some regions, many other factors must be taken into consideration
before actual migration paths can be drawn.
More specifically, the current frequency in any given region is affected
by many factors: population movements do not necessarily produce smooth
gradients, but may instead represent movements for relative long distances in
an irregular manner; there are back migrations; a population may be decimated
by natural disasters or diseases; etc.
For example, a casual glance at the map for J might suggest that it
originated in what is now
Figure 3. Relative frequency (in percent) of Haplogroup J as derived from (Macaulay 2001)
Figure 4. Relative frequency (in percent) of Haplogroup T as derived from (Macaulay 2001)
This review of the literature concerning the origins of the clades is representative but is certainly not exhaustive. More work is required to integrate results, but more importantly, new research is required to provide more data and more complete data. There are several reasons for this.
First, studies have not kept up with the technology. For some geographical areas, the only results available are from RFLP analysis. In other studies the sequence data was limited to HVR1, sometimes complemented with RFLP typing and selective sequencing. Very few results are available for the entire mitochondrial genome.
Second, knowledge of the general phylogeny of Haplogroup J is still evolving and consensus has not yet been reached. The HVR1 motifs used by most of the available studies are not adequate for high-resolution classification of Haplogroup J, only for identifying a haplogroup as a whole. Most studies are not even distinguishing J2 from J1. Furthermore, errors have been identified, but not all later studies have recognized these errors, or at least have not taken them into consideration consistently.
Third, most basic research is geographically very limited in scope, but then comparisons are made with data from studies of other geographic areas--studies that may be inconsistent in purpose.
Fourth, global databases (such as
GenBank) are a great asset for comparing sequences, but are not structured to
capture context data beyond literature citations. Founder analysis, for example, requires
location. Supplementary databases are
needed to cross reference each
With time, the improvements will be made, but of course, the technology will have moved on. Nevertheless, the author, for one, expects to continue to review the literature for data relative to a better understanding of Haplogroup J and performing analyses toward that understanding, including refinement of the classification structure, development of expanded databases, and integration of pieces into a global anthropology.
The work described in this paper is a work-in-progress. It provides a broad review of available data concerning mtDNA Haplogroup J and tries to contribute to the evolving knowledge by developing a phylogeny and associated age estimates. It must be noted, however, that the quality of the product is limited by the techniques employed. For example, as stated above, a single mutation rate cannot adequately represent the entire genome. Future analyses should consider both the differences across the various types of gene (e.g., coding for protein versus RNA) and even specific genes. For genes that encode proteins, the analysis should differentiate those polymorphisms that affect amino acid sequences from those that do not. Currently, neither the size of the database, nor knowledge of various mutation rates, were adequate to take these issues into consideration.
Furthermore, as illustrated by the discussion of Origins and Migrations, just the tip of the iceberg has been addressed. Much work is needed to bring together and integrate the many ongoing relevant studies. For example, no attempt has yet been made to analyze population size growth for Haplogroup J. The potential for such analysis can be seen in the study by Atkinson et al. (2008). They employed the Bayesian skyline plot (BSP) with simulation (Drummond et al., 2005) to “simultaneously estimate a posterior probability distribution for the ancestral genealogy, branch lengths, substitution model parameters, and population parameters through time. Such analyses can then be integrated with the archaeological record, legend, and recorded history to develop a more complete story of Haplogroup J.
New studies are required, with the data needs to be developed and integrated. A single project that is both focused on Haplogroup J (and T) and of broad geographic scope may not be feasible at this time. However, it is hoped that a consortium might develop to permit multiple researchers to contribute to an appropriately designed comprehensive project. The author is currently administrating a public discussion group and associated file exchange to further the cause. Interested persons may join through the link to the mtDNA Haplogroup J Project shown under Web Resources, below.
Supplementary data is available at:
mtDNA Haplogroup J Project
Human Mitochondrial Genome Database
Anderson S, Bankier AT, Barrell BG, de Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG (1981) Sequence and organization of the human mitochondrial genome. Nature, 290:457-465.
Andrews RM, Hubacka I, Chinnery PF, Lightowlers RN,
Turnbull DM, Howell N (1999)
Reanalysis and revision of the
Bandelt HJ, Kong QP, Richards M, Macaulay V (2006) Estimates of mutation rates and coalescence times. In: Bandelt HJ, Macaulay V, Richards M (Eds.) Nucleic Acids and Molecular Biology, Vol. 18, Springer-Verlag.
Benson DA, Karsch-Mizrachi I, Lipman DJ , Ostell J, Wheeler DL (2007) GenBank. Nuc Acids Res, 35:D21-D25 (Database Issue). The database is available at the following URL:
Carelli V, Achilli A, Valentino ML, Rengo C, Semino O, Pala M, Olivieri A, Mattiazzi M, Pallotti F, Carrara F, Zeviani M, Leuzzi V, Carducci C, Valle G, Simionati B, Mendieta L, Salomao S, Belfort R, Sadun AA, Torroni A (2006) Haplogroup effects and recombination of mitochondrial DNA: Novel clues from the analysis of Leber hereditary optic neuropathy pedigrees. Am J Hum Genet, 78:564-574.
Coble MD, Just RS, O’Callaghan JE, Letmanyl IH, Peterson CT, Irwin JA, Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians. Int J Legal Med, 118:137-146.
Detjen K. A.,
Fraumene C, Belle EMS, Castri L, Sanna S, Mancosu G, Cosso M, Marras F, Barbujani G, Pirastu M, Angius A (2006) High resolution analysis and phylogenetic network construction using complete mtDNA sequences in Sardinian genetic isolates. Mol Biol Evol, 23:2101-2111.
Gasparre G, Porcelli AM, Bonora E, Pennisi LF, Toller M, Iommarini L, Ghelli A, Moretti M, Betts CM, Martinelli GN, Ceroni AR, Curcio F, Carelli V, Rugolo M, Tallini G, Romeo G (2007) Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors, Proc Nat Acad Sci (USA), 104(21):9001-9008.
Greenspan B (2007) Direct submission
of Family Tree
Hartmann A., M. Thieme, L. K. Nanduri, T. Stempfl, C. Moehle, T. Kivisild, Oefner PJ (2008) Validation of microarray-based sequencing of 93 worldwide mitochondrial genomes. Unpublished.
Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, Anderson C, Ghosh SS, Olefsky JM, Beal MF, Davis RE, Howell N (2007) Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups. Am J Hum Genet, 70:1152-1171. See also Elson (2007) for an update of the phylogeny.
. The database is available at .
Jobling MA, Hurles ME, Tyler-Smith C (2004) Human Evolutionary Genetics,
Logan Ian (2007) Mitochondrial
Macaulay V (2001) “Supplementary data from Richards et al. (2000),” available at http://www.stats.gla.ac.uk/~vincent/founder2000/index.html.
Mishmar D, Ruiz-Pesini E, Golick P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley MK, Chen E, Brown MD, Sukernik RI, Oickers A, Wallace DM (2003) Natural selection shaped regional mtDNA variations in humans. Proc Nat Acad Sci (USA), 100:171-176.
Mitomap – A human mitochondrial genome database (2008), http://www.mitomap.org/
Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP (2004) Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: Implications for the peopling of South Asia. Am J Hum Genet, 75:966-975.
Parsons TJ (2005) Singular nucleotide polymorphisms over the entire mtDNA genome that increase the forensic discrimination of common HV1/HV2 types in ‘Hispanics.’ Unpublished.
Pereira L, Goncalves J, Franco-Duarte R, Silva J, Rocha T, Arnold C, Richards M, Macaulay V (2006) No evidence for a mtDNA role in sperm motility: data from complete sequencing of asthenozoospermic males. Mol Biol Evol, 24:868-874.
Richards M, Corte-Real H, Forster P, Macaulay V, Wilkinson-Herbots H, Demaine A, Papiha S, Hedges R, Bandelt HJ, Sykes B (1996) Paleolithic and Neolithic lineages in the European mitochondrial gene pool. Am J Hum Genet, 59:185-203. See also the critique by L L. Cavalli-Sforza and E. Minch (1997) in 61:247-251 and the authors’ reply in 61:251-254.
Ross OA, McCormack R, Maxwell LD, Dugrud RA, Quinn DJ, Barnett YA, Rea IM. El-Agnaf OMA, Gibson JM, Wallace A, Middleton D, Curran MD (2003) mt4216C variant in linkage with the mtDNA TJ cluster may confer a susceptibility to mitochondrial dysfunction resulting in an increased risk of Parkinson’s disease in the Irish. Exp Gerentol, 38:397-405.
Rosset S, Wells RS, Soria-Hernanz
DF, Tyler-Smith C, Royyuru AK, Behar DM, Genographic
Maximum likelihood estimation of site-specific mutation rates in
Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C, Kreuziger J, Baldi P, Wallace DC (2007) An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucl. Acids Res, 35:D823-D828.
A, Salvioli S, Raule N, Carpi M, Sevina F, Valensin S, Monti D, Bellizzi D, Passarino G, Rose G, Benedictic GD, Franceschi C
Simoni L, Calafell F, Pettener D, Bertranpetit J, Barbujani G (2000) Geographic patters of mtDNA diversity in Europe. Am J Hum Genet, 66:262-278. and Erratum. Am J Hum Genet, 66:1785. See also the comments on the article, along with the authors’ response, inTorroni, et al. (2000) Letter to the editor. Am J Hum Genet 66:1173-1179.
K, Wilson AC (1986) Rate of sequence
divergence estimated from restriction maps of mitochondrial DNAs
Torroni A, Schurr TG, Yang CC, Szathmary EJE, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, Wallace DC (1992) Native American mitochondrial DNA analysis indicates that the Amerind and Nadine populations were founded by two independent migrations. Genetics, 130:153-162.
DC, Singh G, Lott MT, Hodge JA, Shurr TG, Lezza
Wills C (1995) When did Eve live? An evolutionary detective story. Evolution, 49:593-607.
Zhang J, Asin-Cayuela J, Fish J, Michikawa Y, Bonafe M, Olivieri F, Passarine G, Benedictis GD, Franceschi C, Atardi G (2003) Strikingly higher frequency in centenarians and twins of mtDNA mutation causing remodeling of replication origin in leukocytes. Proc Nat Acad Sci (USA), 100:1116-1121.
Zsurka G, Schroder R, Hornblum C, Rudolph J, Wiesner RJ, Elger CE, Krunz WS (2004) Tissue dependent co-segregation of the novel pathogenic G12276A mitochondrial tRNALeu(CUN) mutation with the A185G D-loop polymorphism. J Med Genet, 41:e124.