The Continuing Hunt for Nuclear Mitochondrial DNA Sequences (NUMTs) in the Human Genome

 

Ian Logan

 

 

Abstract

 

Hunting for Nuclear Mitochondrial DNA sequences (NUMTs) has attracted the attention of many researchers in the last few years.  In most studies there has been an emphasis on identifying the number of NUMTs in the human genome.  But the present study describes a process of matching the parts of a NUMT sequence that are similar to the  tRNA coding sequences in mitochondrial DNA.  Using this method the author reports the discovery of NUMTs that are common to the genomes of the human, the chimpanzee and the rhesus monkey.  These NUMTs were therefore formed before the branching off of the rhesus monkey from the human evolutionary line.

 

 

 

Address for correspondence: Ian Logan, [email protected]

 

Received:  November 4, 2008; accepted:  February 18, 2009.

 

 

 

 

Introduction

 

The 46 chromosomes in the Human genome contain many hundreds of short sequences of bases that match sections of the DNA found in mitochondria (the mtDNA).  These chromosomal sequences are known as nuclear mitochondrial DNA sequences or more simply as NUMTs, which can be pronounced as “new‑mights.”

 

NUMTs are found in the chromosomes of most species (Richly, 2004), and a wide variety of species have been the subject of articles describing their NUMTs, including the domestic cat (Lopez, 1994; Antunes, 2007), and the ant (Martins, 2007).

 

A NUMT is formed by the incorporation of a fragment of the mtDNA into a chromosome.  This type of event is very rare; but over a period of millions of years the number of times this has happened has becomes appreciable.  The formation of  a NUMT is essentially a random event and the fragment of mtDNA involved can be of any length, from just a few bases to many thousands of bases, and any of the chromosomes can be involved.  In many ways NUMTs are considered to be “fossils” preserving the mtDNA sequence as it used to be at various times in our evolutionary past.

 

After formation a NUMT becomes an ordinary part of the chromosome and the integrity of its DNA is maintained by the chromosomal repair mechanisms—a pro­cess that is not available to mtDNA in the mitochondria.  But, whereas the chromosomal repair mechanism will tend to preserve a NUMT, its sequence may still be altered by several processes.  The bases of a sequence are subject to a very low mutation rate, a NUMT may become split during the process of “recombination,” when parts of chromosomes are exchanged between chromosomes, or by an “intrusion” of another piece of DNA, and also the part of a chromosome containing a NUMT may be duplicated completely, or in part, just once or many times.

 

As a result of these processes, the sequences of most NUMTs differ considerably from the sequence of  modern mtDNA and the identifying NUMTs can be considered to be a bit of a “treasure hunt."  This has led to different researchers, unsurprisingly, coming to differing conclusions as to whether a particular part of a chromosome represents a NUMT; and, if so, just where that NUMT begins and ends.

 

It is possible by comparing the sequence of bases in NUMTs against the sequence of modern mtDNA and counting the number of differences in the sequences  to suggest a possible order for the formation of NUMTs.  So when a sequence matches well against modern DNA the NUMT can be said to be of “recent origin”, say, with a date of formation within the last 10 million years.  Whereas, NUMT sequences that match less well, will have a “distant origin” - ranging from 10 million to around 50 million years of age (Benasson, 2003).  This method of ageing NUMTs is however self-limiting as it becomes more and more difficult to identify a part of a chromosome as being a NUMT as the sequence of modern mtDNA will have diverged further and further from that of a NUMT.

 

The identification of NUMT sequences is of importance to the study of genetic genealogy for two reasons.  Firstly, it allows for suggestions to be made as to which mutations might have occurred in the human mtDNA before the time of  ‘Mitochondrial Eve”, and secondly, during the sequencing of human mtDNA laboratories need to take care so as not to amplify NUMT sequences and mistake them for mitochondrial DNA.

 

The study undertaken for this paper is not primarily concerned with the number of NUMTs and their positions in the human genome, something previously considered in detail by Mourier (2001), Tourmen (2002), Woischnik (2002), Hazkani-Covo (2003), Bensasson (2003), Mishmar (2004), Ricchetti (2004), Hazkani-Covo (2007), and most recently by Lascaro (2008).  But instead this study concentrates on what can be learnt from looking at the sequences themselves.

 

In particular, the study concentrates on the NUMT sequences that contain matching sequences to the coding sequences for the 22 Transfer RNA’s found in modern mtDNA.   In the mtDNA there is one tRNA sequence for each of 18 amino acids and two tRNA sequences for each of the amino acids, leucine and serine.

 

Each of the tRNAs can be represented as having a two-dimensional “cloverleaf” structure with stems and loops.  Figure 1 shows the suggested structures for two of the tRNAs.  All of the tRNA’s have a similar structure, but the sequences are sufficiently different from each other that they are easily distinguished.

 

 

Figure 1  The two-dimensional structures for the t-RNAs isoleucine and cysteine.

 

 

Methods

 

Early studies of NUMTs relied on the actual sequencing of chromosomal sequences (for an example of this method, see Herrnstadt, 1999).  But with the publication of the Human Genome, and the genomes of several other species, it is now possible to identify NUMTs using computer search programs.

 

The genome sequences for the human - Homo sapiens sapiens, the chimpanzee - Pan troglodytes, and the Rhesus monkey - Macaca mulatta are to be found on the web site:  http://www.ncbi.nlm.nih.gov/mapview/.

 

For this study the genome sequences were examined for NUMTs using the Basic Local Alignment and Search Tool or BLAST, and in particular the “BLASTN: Compare Nucleotide Sequences” program (Altschul, 1990).

 

In most instances the searches were made on the reference only sequences as they are the sequences that have been shown to be common to the various assemblies and can be assigned to the different chromosomes.

 

At present reference only sequences  are available for:

 

Homo sapiens sapiens   build 36.3 – 368 sequences, covering 2,870,843,926 bases,

 

Pan troglodytes – build 2.1 – 32,296 sequences, covering 3,010,437,433 bases, and

 

Macaca mulatta – build 1.1 – 124,049 sequences, covering 3,011,952,279 bases.

 

The program BLASTN was used to compare nucleotide sequences.  Initially the program was used with its default values.  However, the default Expect value of 0.01 limits the program to reporting only close matches, while using an Expect value of 10 can allow chromosomal sequences that match less well to be reported.

 

In the Advanced options it is also possible to change the Word Size and this makes the matching algorithm less sensitive.  The default value is W11, but using the parameter at its limit of W4 can be useful, however this does make the program take a much longer time for each comparison.

 

Initially, the search string used with BLASTN was the whole sequence of the Cambridge Reference Sequence (CRS) (Anderson, 1981; Andrews, 1999), and this gave a general idea as to how many large and closely matched NUMTs do exist in the human genome.  But in practice, it is much better to use only small parts of the mtDNA sequence, and this study concentrates on using as search strings the areas of the mtDNA that code for the 22 Transfer RNA’s (tRNA).

 

Table 1 gives the names of the amino acids, the locations of their corresponding tRNAs in the CRS, and the sequence of bases in the CRS for each of the 22 tRNAs.

 

 

Table 1

The 22 tRNA Coding Sequences in the CRS

 

 

 

Results

 

The results of the present study are given here in three sections.

 

NUMTs that match tRNA sequences.

NUMTs of “recent origin”

NUMTs of “distant origin”

 

NUMTs that Match tRNA Sequences

 

For each tRNA sequence in the CRS the BLAST search program has been used to find NUMTs that in part match against tRNA sequences.

 

As an example, Table 2 shows the results of searching the human genome for NUMTs that match the sequence for the tRNA for the amino acid alanine.  The table identifies 32 NUMTs that satisfy the search criteria.  The NUMTs vary from having part of their sequence matching exactly, to having a sequence  in which about a fifth of the bases have changed.  The table contains only those NUMTs with a sequence that covers the whole of the tRNA sequence.  There are other NUMT sequences which match partially, but for the purpose of this paper they have been excluded.

 

 

 

Table 2

NUMTs That Match the tRNA Sequence in CRS for Alanine

 

 

 

 

It was found that the BLAST program did not produce the complete set of matches in a single run when the modern mtDNA sequence is used as a search string.  However, when these matches were in turn used as search strings it was possible to find further matches.  This procedure was then repeated again and again until no more sequences were found.

 

For the tRNA for alanine there are 2 NUMTs with sequences that do not show any variation from the CRS and these can be considered to be of “recent origin” and are discussed in more detail later.  The other NUMTs are considered to be older and therefore in the range 10-50 million years of age.

 

Table 3 shows a similar pattern of NUMTs was produced for the amino acid arginine.  In this instance there are 27 NUMTs that have been identified, but none is of a “recent origin."

 

 

 

Table 3

NUMTs That Match the tRNA Sequence in CRS for Arginine

 

 

 

 

NUMTs of “Recent Origin”

 

In the human genome there is only one large NUMT of “recent origin” and this was first identified by Herrnstadt (1999).   The NUMT was presumably formed after the split with the chimpanzee as it is only to be found in the human  genome, and is not in the genomes of either the chimpanzee or the rhesus monkey.   The hominid in whom this occurred lived prior to “Mitochondrial Eve,” since this NUMT is more divergent from CRS than is any modern human.  The NUMT is 5,841 bases in length and matches against the CRS from location 3915 to 9756.  Figure 2 shows that this NUMT matches against about 3/8 of the mtDNA and is located very close to the tip of chromosome 1.

 

 

 

Figure 2.  Formation of the “Herrnstadt” NUMT.  Initially, the mtDNA was only found in mitochondria, but the partial destruction of a mtDNA ring led to the passage of a fragment into the nucleus where it became incorporated into chromosome 1.

 

 

 

 

 

Table 4 shows there are 85 differences between this NUMT and the CRS.  The differences result mostly from mutations in the mtDNA along the maternal line leading to modern humans, but a few may have occurred in the NUMT, and a few may have been present in the original mtDNA that was captured in the NUMT.  The differences from CRS are shown for the entire NUMT as a conventional mutation list in Table 4a.  Six of the mutations occurred in tRNA sequences and these are shown in Table 4b.

 

 

 

 

 

 

 

On chromosome 14 there is a second, but much smaller, NUMT of “recent origin.” This NUMT is 1,021 bases in length and matches against the CRS from 5583-6606.  Table 5a shows the 71 mutational differences between this NUMT and the CRS.  The mutations that have occurred in the tRNAs are shown in Table 5b.

 

 

 

 

 

 

 

The recent paper by Hazkani-Covo and Covo (2008) gives a list of NUMTs of “recent origin” - most of which are very short in length and do not match against a complete tRNA sequence.  But for reasons that are not totally clear, the two NUMTs discussed above are not on the list.

 

NUMTs of “Distant Origin”

 

The sequence of bases in a NUMT of “recent origin” matches the CRS very well; but as described above there are very few NUMTs of that type.  The majority of  NUMTs are much older - possibly in the range of 10 - 50 million years of age.

 

Tables 2 and 3 show the details of NUMTs with sequences that match against the tRNAs of alanine and arginine; and it is possible to prepare a detailed analysis for any individual NUMTs.  However, there are some NUMTs of particular interest as it has been possible to  show that there are NUMTs that can be found in the genome of Homo sapiens AND ALSO in the genomes of the Chimpanzee, Pan troglodytes, and the Rhesus Monkey, Macaca mulatta.  This fact suggests that these NUMTs were incorporated into the genome of an ancestor common to all three species.

 

The best example of this type of NUMT that is common to the Human, Chimpanzee and Rhesus Monkey has been found on Chromosome 21.  This NUMT of length 1851 bases corresponds to the part of the mtDNA containing the tRNAs for tryptophan, alanine, aspara­gine, cysteine and tyrosine.  In the Chimpanzee, Pan troglodytes, the whole of the NUMT is also found on Chromosome 21.  However in the Rhesus Monkey, Macaca Mulatta where there is no Chromosome 21, it is found on Chromosome 3.

 

The sequence from the genome of Homo sapiens shows a considerable number of differences from the CRS.  Nevertheless, the three NUMT sequences from the genomes of Homo sapiens, Pan troglodytes and Macaca mulatta are almost identical to each other suggesting that they had a common formation.

 

The details of this NUMT are shown in Table 6.

 

 

Table 6

A NUMT of “Distant Origin” on Human Chromosome 21

 

 

 

 

 

 

Whereas the NUMT on chromosome 21 has been found to be the largest NUMT that is common to the Human, Chimpanzee and Rhesus Monkey, there are several others smaller NUMTs of this type.

 

Table 7 gives the details of a further 5 NUMTs that are found on the Human chromosomes 3, 4, 8, and X.

 

 

 

 

 

 

 

Discussion

 

This paper has concentrated on identifying NUMTs in the human genome by using the BLAST program to find matches against tRNA sequences in modern mtDNA.  This technique has led to the identification of several NUMTs which are common to the genomes of the Human, the Chimpanzee and the Rhesus Monkey.  But developing these ideas has only been possible by considering the published findings in various papers that have appeared over the last few years.  Actual quotations from the papers are shown in italics.

 

The early researchers used a laboratory system which involved using bacterial clones, specially prepared primers and direct sequencing.  This method was very laborious, but nevertheless, was quite successful.

 

For example, Nomiyama (1985) used this system to identify 2 NUMTs, subsequently shown to be located on chromosome 3 (GenBank numbers X2226, M12298); and even then it was clear that NUMTs were old as the author suggested these 2 NUMTS were transferred from mitochondria into nuclei about 12 and 15 millions of years ago, respectively.”

 

Later Herrnstadt (1999) used a similar method to identify a NUMT on chromosome 1 (GenBank number AF134583).  This NUMT was shown to have a length 5,841 nucleotide bases.  The authors were able to link the NUMT to “a very distal portion of Chromosome 1” and in their discussion they recognise that their NUMT was of a very recent origin and said “ it is estimated that this sequence was transferred to the nucleus during evolution long after the divergence of humans from other nonhuman primates."  Although only the single NUMT was identified in the study,  the paper did suggest the possibility of  there being other “hitherto unidentified numtDNA sequences."

 

By 2001, the method of identifying NUMTs by searching Human DNA databases had begun to replace laboratory methods; and Mourier (2001) published “the first extensive analysis of NUMTs in the human nuclear genome."  This study found “296 numts ranging between 106 and 14,654 bp in size."  The paper is also important as it discusses the possibility of NUMTs being formed at different stages of mammalian evolution.  However, whilst this paper is very useful, the identification of the NUMTs was based on early Human Genome Project data and it is now quite difficult to correlate the results with the latest analyses.

 

In 2002 a paper from France (Tourmen, 2002) suggested there were 286 NUMTs and stressed “Some pseudogenes [NUMTs] appeared highly modified, containing inversions, deletions, duplications, and displaced sequences."

 

Later, a  paper from the USA (Woischnik, 2002) identified 612 NUMTs and showed that NUMTs can be found on every chromosome.

 

In 2003, a paper from Israel (Hazkani-Covo, 2003) discussed the features of 82 large NUMTs; and in particular the workers concluded “only about a third of all the numt repertoire in the human nuclear genome is due to insertions … the rest originated as duplications of preexisting numts."

 

In a paper from the USA (Benasson, 2003), 348 NUMTs with a length greater than 500 bp are discussed.  The paper suggests an age of 25-40 million years for the majority of the NUMTs, and considers that numts arose continuously over the last 58 million years.”

 

Mishmar (2004) was able to identify 247 NUMTs and discusses how it is possible by looking for selected mutations to determine if one NUMT is more ancient than another.  The author suggests nuclear mtDNA pseudogenes are genetic fossils that reflect our past."

 

Later, Richetti (2004) was able to identify 211 NUMTs.  The paper is also interesting as the author made the suggestion that NUMT integrations preferentially target coding or regulatory sequences."

 

The paper of Schmitz, et al. (2005) is rather different to the earlier papers as it discusses the evolutionary pathway of a pseudogene which separated from the corresponding mitochondrial gene more than 40 mya [million years ago]."  Their study concentrated on the larger of the ‘NomiyamaNUMTs (GenBank number X02226).  The authors suggest that numt sequences provide a much more reliable base for dating[than] molecular dating based on primate mtDNA."

 

More recently, Hazkani-Covo (2007), produced a survey of NUMTs common to both human and the chimpanzee.  But, the researchers did not report any NUMTs found also in the rhesus monkey.

 

Lascaro (2008) has produced a compilation of the 90 longest NUMTs found in the human genome.  But in the present author’s opinion the actual figures given for the start and finishing points for the NUMTs are still inaccurate.  In particular, the data from Lascaro has not taken note of the parts of NUMT sequences that match to tRNA sequences and this has resulted in many of the NUMTs being reported as having lengths which are much less than they really are.  Nevertheless, Lascaro’s compilation is far more accurate than earlier attempts. 

 

Finally, Covo (2008) discusses just how NUMTs might be formed by the inclusion of mtDNA material following breaks in chromosomal DNA.

 

The present study reports the result of carefully matching the respective parts of NUMT sequences against the coding area of tRNA sequences in modern mtDNA.

 

This has shown that there are a few NUMTs of  recent origin”—that is of NUMTs formed since the branching off of the human evolutionary line from the rhesus monkey and the chimpanzee.

 

But more importantly the study has shown that there is a small number of NUMTs that are common to the genomes of the human, chimpanzee and the rhesus monkey.  These NUMTs have a date of formation which predates the branching of these primates from the human evolutionary line.

 

The study also shows that there is not as yet a consensus view as to which parts of the human genome are NUMTs, and thereby have an origin in the mitochondrial DNA.

 

However, the search for NUMTs continues and the results presented in this paper are based on an analysis of the genomes that are currently available.  There is a lot more yet to be discovered about NUMTs in the human genome.

 

References

 

Anderson S., Bankier AT, Barrell BG, de Bruijn MHL, Coulson AC,  Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH,  Smith AJH, Staden R, Young IG (1981)  Sequence and organization of the human mitochondrial genome.  Nature, 290:457-465.

 

Andrews RM, Hubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999)  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.  Nat Genet, 23:147.

 

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990)  Basic local alignment search tool.  J Mol Biol, 215:403-410.

 

Antunes A, Pontius J, Ramos MJ, O'Brien SJ, Johnson WE (2007)  Mitochondrial introgressions into the nuclear genome of the domestic cat.  J Hered, 98:414-420.

 

Bensasson D, Feldman MW, Petrov DA (2003)  Rates of DNA duplication and mitochondrial DNA insertion in the human genome.  J Mol Evol, 57:343-354.

 

Hazkani-Covo E, Sorek R, Graur D (2003)  Evolutionary dynamics of large Numts in the human genome: Rarity of independent insertions and abundance of post-insertion duplications.  J Mol Evol, 56:169-174.

 

Hazkani-Covo E, Graur D (2007)  A comparative analysis of numt evolution in human and chimpanzee.  Mol Biol Evol, 24:13-18.

 

Hazkani-Covo E, Covo S (2008)  Numt-mediated double-strand break repair mitigates deletions during primate genome evolution.  PLoS Genet, Oct;4(10).

 

Herrnstadt C, Clevenger W, Ghosh SS, Anderson C, Fahy E, Miller S, Howell N, Davis RE (1999)  A novel mitochondrial DNA-like sequence in the human nuclear genome.  Genomics, 60:67-77.

 

Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M (2008)  The RHNumtS compilation: features and bioinformatics approaches to locate and quantify human NumtS.  BMC Genomics, 9:267.

 

Lopez JV, Yuhki N, Masuda R, Modi W, O'Brien SJ(1994)  Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat.  J Mol Evol, 39:174-190. Erratum in: J Mol Evol, 39:544.

 

Martins J Jr, Solomon SE, Mikheyev AS, Mueller UG, Ortiz A, Bacci M Jr (2007)  Nuclear mitochondrial-like sequences in ants: evidence from Atta cephalotes (Formicidae: Attini).  Insect Mol Biol, 16:777-784.

 

Mishmar D, Ruiz-Pesini E, Brandon M, Wallace DC (2004)  Mitochondrial DNA-like sequences in the nucleus (NUMTs): insights into our African origins and the mechanism of foreign DNA integration.  Hum Mutat, 23:125-133.

 

Mourier T, Hansen AJ, Willerslev E, Arctander P (2001)  The Human Genome Project reveals a continuous transfer of large mitochondrial fragments to the nucleus.  Mol Biol Evol, 18:1833-1837.

 

Nomiyama H, Fukuda M, Wakasugi S, Tsuzuki T, Shimada K (1985)  Molecular structures of mitochondrial-DNA-like sequences in human nuclear DNA.  Nucleic Acids Res, 13:1649-1658.

 

Ricchetti M, Tekaia F, Dujon B (2004)  Continued colonization of the human genome by mitochondrial DNA.  PLoS Biol, 2:E273.

 

Richly E, Leister D (2004)  NUMTs in sequenced eukaryotic genomes.  Mol Biol Evol, 21:1081-1084.

 

Schmitz J, Piskurek O, Zischler H (2005)  Forty million years of independent evolution: a mitochondrial gene and its corresponding nuclear pseudogene in primates.  J Mol Evol, 61:1-11.

 

Tourmen Y, Baris O, Dessen P, Jacques C, Malthiery Y, Reynier P (2002)  Structure and chromosomal distribution of human mitochondrial pseudogenes.  Genomics, 80:71-77.

 

Woischnik M, Moraes CT (2002)  Pattern of organization of human mitochondrial pseudogenes in the nuclear genome.  Genome Res, 12:885-893.