Poodle Diversity Project - Blog

  • Primer on Genetic Data #1

    There seems to be some confusion about STRs, SNPs, frequencies and the database, and since these are crucial concepts in understanding whether this test is valid, this has to be explained carefully.

    If you hear someone comparing 200,000 “markers” to the 33 loci available from the UC Davis Genetic Diversity test, as though the 200,000 markers offer more information, you can be sure they are completely confused and do not comprehend the science. This would be the difference between counting apples, and counting huge apple orchards, and then comparing 1,000 apples to 100 apple orchards and saying one obviously has more apples because 1,000 is bigger than 100.

    So I will explain in a few posts. First, most people are familiar with the image of DNA, a spiral ladder. Each rung on the ladder is made up of two purine bases - certain organic materials - each half of the rung being one kind. There are only 4 of these purine bases that code DNA: guanine, adenine, cytosine and thymine, which are represented by the letters G, A, C, and T. Each rung on the ladder is either guanine and cytosine, or adenine and thymine. Those two pairs in various orders make up all DNA.

    A single gene is tens of thousands of base pairs long. The dog genome was mapped in 2003, and is made up of about 2.5 billion base pairs, found in 39 chromosomes, which are separate, compacted pieces of DNA. These 2.5 billion base pairs make up about 19,000 genes in total.

    The first whole genome sequencing of the dog - meaning a recording of every base pair in order - cost $30 million. It can be done now for about $7,000 per dog. The process of comparing the 2.5 billion base pairs of one dog to the 2.5 billion base pairs is quite a job, as we can imagine.

    Because there is so much genetic material, nearly all practical DNA testing involves testing only very small sections of the DNA. When looking for a disease gene, first researchers search regions of the genome they suspect may control the function of the disease. They don't search all 2.5 billion base pairs.

    There are several practical ways of recording DNA, or genotyping it, but the two main ones are STRs (short tandem repeats) or SNPs (single nucleotide polymorphisms). SNPs identify single base pairs. STRs identify patterns that indicate genes. An SNP panel of, say, 189,000 SNPs sounds like a lot - but it’s 189,000 out of 2.5 billion base pairs, which is about 0.00756% of the DNA of a dog. The UC Davis Genetic Diversity Test tests 66 genes, in 33 gene pairs, and 66 out of 19,000 is about 0.347% of the genes in the genome. While in each case this is only a tiny amount of DNA, because DNA is so specific in each individual, testing this amount is enough to be able to tell a great deal about individuals, to compare them to other individuals, and when done on a group, to develop a picture of their population structure.