Going Through a Phase:
Haplotyping the Female X Chromosomes
‘Satiable Curiosity is a column dedicated to the proposition that genetic genealogists are an untapped resource for resolving questions about
genealogists have relied primarily on analysis of mitochondrial
But the Y
and mtDNA are only a small fraction of our
testing of many hundreds of thousands of markers is now available to the
ordinary consumer from companies such as deCODEme and 23andMe. These markers are
primarily SNPs (Single Nucleotide Polymorphisms, a substitution of one base
A/C/G/T for another). But the analysis
is complicated by the fact that any given stretch of our
The difficulty is also compounded by that fact that we females can’t even separate out which of our two X chromosome results came from our father’s side of the family and which from our mother’s side. Males have it lucky, since they know their single X chromosome came from the mother’s side. The two alleles (alternative versions of a marker) comprising the female genotype are always listed in alphabetical order, as shown in Table 1, and we don’t know which alleles are on the same chromosome. This small set of SNP genotypes, located close together on my X chromosome, could represent a number of haplotypes (alleles located on the same chromosome).
Reference SNP ID #, Chromosome (X),
Base Position, Alleles
For the first two SNPs, I could have inherited four different haplotypes: A-G, or A-T, or G-G, or G-T from my father, with the leftover alleles coming from my mother. Adding the alleles from the third SNP would double the possibilities: the C could go with any of the four haplotypes, and likewise the T, so now we’re up to eight distinct haplotypes. The number doubles with each additional heterozygous SNP, making 24 or sixteen possibilities for a haplotype composed of just these four markers.
Figure 1. Descendancy chart
for X chromosomes
Figure 2A shows graphically the actual overlap between my cousin and myself. Figure 2B and 2C compare me to two males of European ancestry, not known to be related to me. The narrowest green bands represent approximately one million bases where I am at least half-identical to them. I share some green bands with both unrelated males, although they occur at different positions on the X chromosome. These bands simply reflect different parts of the European gene pool in general. The two broad green bands in 2A, one at the tip of the short arm and the other near the end of the long arm, are clearly more extensive than those of the randomly selected persons.
Figure 2. deCODEme
“Compare Me” ideograms.
Colored bars represent segments
that are at least half-identical.
By examining the raw data, I learned that the top green band in Figure 2A covered bases 214,201 to 21,962, 112. This region contains a run of 249 consecutive SNPs, 86 of which are heterozygous. The theoretical number of distinct haplotypes would thus be 286 , which rounds off to 77,371,252,455,336,300,000,000,000. Yet I can look at my male cousin’s haplotype and instantly convert my genotype into two haplotypes: one will match my cousin, and the other (which came from my mother’s side of the family) will have the leftovers. The process of deducing which alleles come from the same chromosome is called phasing, often performed by software programs using a large number of population samples rather than the pedigree analysis developed here. Table 2 adds genotype results for samples in Figure 2A and 2B and divides my genotype into two phased haplotypes, P1 and P2.
Genotype Data for 2A (cousin), Ann, Phased
Haplotype 1 (Matching 2A from the Paternal
Side), Phased Haplotype 2 (Deduced for the
Maternal Side), and an Unrelated Male 2B.
The male in Figure 2B has an overlapping green band, covering bases 18,359,428 to 21,638,884. His haplotype should match one or the other of my phased haplotypes, and in fact it does. If I did not have data from my cousin, I could still deduce my phased haplotypes by comparison with 2B, although I would not know which haplotype came from the paternal side and which from the maternal side. It would be exceedingly improbable to match that many consecutive SNPs by chance–we both must have inherited the haplotype block from a common ancestor. By comparing myself with a number of males, related or not, I could eventually phase a goodly part of my X chromosome.
Although this column uses diagrams from deCODEme to illustrate the process visually, there is enough overlap with markers used by 23andMe to make it feasible to merge raw data from the two companies. 23andMe’s “Family Inheritance” feature is similar in principle to deCODEme’s “Compare Me,” but it does not highlight shared regions unless they are more extensive – about 10 million bases, enough to justify the “Family” aspect of the comparison. However, a lower threshold could be set for analyzing haplotype blocks in the raw data.
Male-to-male comparisons are also possible. Here the phase is already known, and the point of interest is whether they share extended regions of similarity, which would be evidence of descent from a common ancestor. Longer haplotype blocks would indicate more recent ancestry, while shorter haplotype blocks would be identical by descent from a more distant ancestor, perhaps even thousands of years ago. A collaborative project could perhaps develop a “dictionary” of haplotype blocks correlated with geographic information. A pilot study might pick a non-coding region of some optimum length and solicit data from people without raising concerns of revealing medically sensitive information. With the method described in this column, both males and females might be able to pool data, creating a larger sample size than used by many publications.
The term haplotype (the results from testing a set of markers located on a single chromosome) has actually been adopted from its original application. It was first used about 1969 in conjunction with the Human Leukocyte Antigen (HLA) system, a set of genes located close together on chromosome 6 and important for determining tissue compatibility for transplants. It was observed that if one member of a family had certain versions (alleles) for HLA-A and HLA-B, other members who matched the allele for HLA-A would almost invariably match the allele for HLA-B as well. This was evidence that the two alleles traveled together as a package on a single chromosome, whether inherited from the father or from the mother.
For an animated illustration of the different pathways of inheritance, see http://www.smgf.org/pages/animations.jspx