*About the notation used for mitochondrial DNA

Each mitochondrion (plural mitochondria) comprises more or less 16,569 base-pairs. The four bases are Adenine, Cytosine, Guanine and Thymine. Thus the basic DNA alphabet A,C,G,T. Base-pairs are always composed of A-T and C-G. It is like a street with 16,569 home addresses on each side of the street. But only one side is examined, the other side being composed of the complementary base. The whole genome of the mitochondrion was divided into 3 sections called regions:
Hyper Variable Region # 1: from base address 16,001 to 16,569th.
Hyper Variable Region #2: from base address 1 to 570th.
Coding Region: from base address 571 to 16,000th.

Each of these regions can be examined by the laboratory in order to detect which base occupies which address, also called a locus or sometimes a position.

In order to know if the base found at a given locus shows a mutation, it is compared to a referent (a standard) which, by definition, does not comprise mutations. The referent used by FTDNA is the RSRS which is assumed to be the oldest mtDNA which existed in the ‘first’ Homo sapiens women. This system of reference is called the *Reconstructed Sapiens Reference Sequence* (RSRS). More information on the RSRS at http://bit.ly/1K0abIv

There is an alternate system of reference which is still the one mainly used in the biological/medical/judiciary sciences. It is called the rCRS for Revised Cambridge Reference Sequence and explained at http://bit.ly/1K0au5S

The notation used to represent a mutation at a given mtDNA locus is simple.  Some mutations are presented in the next table. They were taken from a personal page at FTDNA. Mutations are expressed as a difference from the RSRS reference system, that used by default by FTDNA.

 Figure 1. List of RSRS mutations found in the mtDNA of a members of the H7 mtgenome project at FTDNA

Thus, mutation A16129G  was found in the hypervariable region 1 (HVR1). The digit 16129 is the address of the locus on the considered mitochondrial strand. The prefix letter A corresponds to the base occurring in the reference system at this precise locus 16129. Prefix A stands for the base Adenine. The postfix letter G (which stands for Guanine) is the value found in the mtDNA examined at the same locus. Since at locus 16129 the mtDNA examined does not possess the same base as the reference system, the presence of a mutation is declared and expressed as A16129G.

This kind of substitution of bases is called a transition when it occurs within the same category of bases. Thus the replacement of a purine A by the complementary purine G  or vice versa, or within the pyrimidines C to T or vice versa, is called a *transition*. Mutations are most of the time of the transition type. More rarely the substitution will imply a base of the opposite group. For instance pyrimidine C would replace purine A. This kind of substitution is called a *transversion*. For example, A16129c would express a transversion since a purine A would have been replaced by a pyrimidine C. To put an emphase on this rare event, a small letter c is used instead of a capital C.

The notation is different for insertions and deletions called INDELS. Insertions as the name indicates are mutations introduced by an insertion of an additional base after a given address locus. Thus expression 522.1A in the above example signifies that an additional Adenine (A) base was found right after locus 522. Similarly, 522.2C means that another mutation is present 2 bases after locus 522. This time a Cytosine (C) has been inserted.

Base deletion is another form of mutation.  At a given locus, the base which was supposed to be present there according to the reference system is not found due to deletion. This deletion is expressed in the notation, by the locus address where the deletion occurred, followed by the minus sign or by the three letters DEL. Thus 552- or 522DEL would indicate a deletion of the base normally found at locus address 552.

Other letters than A,C,G,T  are used to express a special condition called *heteroplasmy*.  Heteroplasmy is the presence in the same organism of mitochondria having a mutation at a given locus as well as of other mitochondria which are conform to the reference system (RSRS or rCRS).

For example, the presence of heteroplasmy at locus 73  in the mitogenome of a tested person would be expressed as G73R. As can be seen in the next table, R stands for A or G. G73A is a transition while G73G is not a mutation. The next table shows the correspondence of the various letters used to express heteroplasmy.


As my reader can readily understand, a person showing heteroplasmy at a given locus may well match and share a common ancestress with another person not showing  heteroplasmy at the same locus. However a testing company may well not be in a position to find that these two persons mtDNA match. But an expert in mtDNA can.

Jacques P. Beaugrand 2015-SEP-07