Poodle Diversity Project - Blog

  • Genetic Markers? Pedigrees? What should we believe?

    With recent progress  on genetic marker analyses for pure bred dogs, breeders get told a lot of different, conflicting information. There are those enthusiastically for the idea, and those staunchly against it. That's very normal for new things - it happens with all significant progress. In this case, here are those who have invested heavily in inbred lines who don't want to be told they've been doing it wrong.

    For all of those who aren't sure what to think about the conflicting information about microsatellite data vs. pedigrees, and about the value of genetic markers to assess diversity, and various other related rumors that continue to float about, this is for you.

    Some say pedigrees are most important, some say genetic markers are what matters, some say using genetic diversity tests is brand new and untested. While these discussions may be new to the purebred dog world, they have long been hotly debated by researchers in population genetics, conservation and ecology. The pedigree based calculation of coefficient of inbreeding (COI) has been around since the 1922, throughout the period of time when inbreeding in dogs became considered normal and even preferred. In 1989 Queller and Goodnight published their method in a paper called "Estimating Relatedness using Genetic Markers." Since then various other mathematical ways of using genetic markers to estimate the heterozygosity in both an individual and a population were discovered. These have been in use and have improved significantly over the last 15 years, as has the ability and cost to genotype and analyze vast amounts of genetic data.

    The experts in the field are way ahead of the dog world on these discussions, and have already asked and answered a lot of the questions, comments, and opinions now passed around among us. Since a lot of those discussions have happened among population geneticists, dog breeders don't know about them. Here are some you might have had:

    • What markers are chosen and why? What do those markers do? What traits do they control? 
    • What's the difference in quality of information between different kinds of tests? Why do some tests say they have thousands or hundreds of thousands of markers and some of them have only 21 or 33 or 58? How many is enough?
    • How can a tiny sample of DNA possibly tell us about the rest of the DNA in a dog? 
    • How can a few hundred dogs from a single breed represent the whole breed? Don't you have to get DNA from all dogs for it to be accurate?
    • What about COI and other pedigree-based information? When genetic results are so different from the COI, how can you trust either?
    You can rest assured that people whose careers are based on studying these precise questions, i.e. actual population geneticists, have already hotly debated these. In fact, a definitive paper came out 6 years ago that answered many of these questions, and there have been hundreds of studies since that have built up a body of research, replication of results, and more comparisons. Some things became clear - like that inbreeding estimates derived from genetic markers are most accurate with inbred populations, and that more markers make for more accurate results. There are also studies that have shown that not all positive fitness traits are caused by more heterozygosity - some are due to specific genes. But many traits, often generalized ones like body weight or the ability to adapt to environmental stressors are strongly connected to heterozygosity.
    I strongly suggest that readers NOT automatically believe those on Facebook who spout off theories, because many (and there are many) only sound like they understand and don't. If someone doesn't answer your questions, or can't or won't, there are many ways of finding out for yourself. Ask for documentation and don't take "because it's true" for an answer - not even from me. The truth is out there and it's not all so scientific that you need someone to translate it.
    First, to answer the questions above:
    • What markers are chosen and why? What do those markers do? What traits do they control?

    Genetic diversity tests do not look for causative markers for specific traits, nor do they track what traits those markers control. Breeders are used to DNA tests for specific genes or markers - ones that cause disease or don't cause disease. These diversity tests use loci, or specific places on the DNA, that have a number of different alleles found at them, not just a normal one or a mutation, like in the DNA tests for diseases.  These genetic diversity tests look for markers that, when viewed as a group, are most informative about the overall genetic diversity in a population or an animal. In the case of UC Davis and Genoscoper, both of which offer genetic marker tests for genetic diversity, they have confirmed that their methods offer reasonable estimates of DNA heterozygosity across the genome. Not only have hundreds of studies used this method with success to show levels of inbreeding in populations of various species, but more studies on dogs are replicating recent results.

    • What's the difference in quality of information between different kinds of tests? Why do some tests say they have thousands or hundreds of thousands of markers and some of them have only 21 or 33 or 58? How many is enough?

    STRs and SNPs are different ways to genotype (record) DNA. STRs identify larger markers often are a group of genes, while SNPs identify the tiny pairs of chemicals (purine bases to be precise) that are the building blocks of DNA - called base pairs. When you see pictures of DNA that look like a twirling ladder, each base pair makes up one rung on that ladder. There are about 2.8 billion base pairs in a single dog's DNA, which comprise about 19,000 genes in total. That means about 147 million base pairs make up each gene! The use of tens of STRs, 21 or 33 or 58, offers about as much information as using tens of thousands of SNPs, say 170,000 or 220,000. Either way, it's a fraction of all the DNA. 

    The UC Davis method uses only STRs  because they are cost effective, don't require blood draws, and can be turned around very quickly. However, they have compared their panels to extensive SNP panels, as well as larger STR panels, to see if they get as much necessary information with them, and they do.  Genoscoper uses a combination of SNPs and STRs to assess dogs. While some breeders have discussed whole genome testing as being what's "really" going to matter, the computational requirements for the analysis of 2.8 billion pieces of data are expensive,  time consuming and impractical at the moment. Most importantly, it's not necessary for the purposes of genetic diversity testing. Samples like the panels used by UC Davis and Genoscoper are effective, just as sampling data is effective in many, many areas of science. 

    • How can a tiny sample of DNA possibly tell us about the rest of the DNA in a dog? 
    DNA samples have been able to identify individual humans and animals with certainty for decades, and they've been able to find obvious single locus disease genes. Every test should be appropriate for its purpose, and as discussed above, these estimates do an excellent job of assessing diversity. While there have been some debates on how well markers can accurately estimate overall diversity, an ever growing body of research has proven that genetic markers and the methods of assessing them work very well under varied circumstances to estimate heterozygosity in both individuals and populations. Researchers now regularly use them to assess how these different levels of heterozygosity affect certain "fitness" traits in a population. Usually they find that more heterozygous populations are better able to survive under certain circumstances. For some traits, heterozygosity has no effect.

    There have of course been improvements made in response to some early criticism, such as selecting which loci to use, as well as how many is necessary. The mathematical methods to estimate overall heterozygosity have also improved. There are still a few researchers who advise caution, but more and more, researchers have simply changed their main focus from pedigree based assessments to genomic assessments. 

    • How can a few hundred dogs from a single breed represent the whole breed? Don't you have to get DNA from all dogs for it to be accurate?

     Depending on the size of a population and its species, it typically only takes a relatively small number of individuals to have a full set of all possible genes. In dogs, that amount can vary depending on how popular a breed is (rare breeds might take fewer dogs) and how inbred that breed is. A very varied breed that is popular around the world would require more individuals, where as less well known, highly inbred breed would require fewer.  

    This is because there is a variety of possible genes at each locus on the DNA, and each breed has a limited number of versions of these genes. For the first few dozen dogs, for instance, more and more new alleles are identified at each locus. Inbred breeds, - ones with few founders, or ones that have had a significant genetic bottleneck sometime after the founders - have many fewer alleles, and the more diverse populations have more. Most breeds have some of the same markers or haplotypes at the same locus, but this doesn't mean they have been crossed to one another. Often they simply have the same ancestors from before breed development- which for many breeds was a very long time ago. No matter the make up of the breed, once new alleles are no longer or only very rarely being found at each locus as new samples come in, then the population structure is well known. From that point on, the changes are mostly in the frequencies of each allele (how often they appear in the population) and more samples will only change frequencies very slightly. 

    • What about COI and other pedigree-based information? When genetic results are so different from the COI, how can you trust either?


    Pedigree analysis is still helpful for its historical data, none of which can be had by reading a sample of DNA. Knowing the actual qualities of the dogs in the recent pedigree will always be extremely important, therefore.  However, simply knowing names of distant ancestors with no attached data is of little value, as are COIs for individuals or those based on fewer than 10 generations. Before there was available genetic marker based assessment, COI was better than guessing, but there have often been wild populations where no pedigrees were available, and some breeds have abysmal pedigree records or very closely guarded ones. In such circumstances, genetic marker assessments are excellent methods of showing relatedness. Without any pedigree information, modern assessment methods can quickly and easily identify close genetic relationships and accurately describe populations structure.

    Pedigree-based inbreeding estimates are not only dependent on the quality and depth of the pedigrees, but also on how related the founders were. Every pedigree calculation method assumes founders were unrelated and this is often untrue. However, there's more and more evidence that genetic marker data matches high quality pedigree data when looking at whole populations. This is great news for both populations that have good pedigree databases (only some do) and populations who don't (most wild populations and even many domesticated species and breeds.) 

    Below are some excerpts from only some of the research on which I based the statements above. Reading even the summaries of these papers (called abstracts) will help you decide what to think about all the new advancements being made right now. If you still have questions, do some searches on pubmed.gov. You will be surprised at how much information is available!

    Excerpts

    When considering the methods used to correlate marker data and inbreeding, the authors of this somewhat older but comprehensive paper Heterozygosity-Fitness Correlations: Time for a Reappraisal   (Szulkin M, Bierne N, David P., Evolution. 2010 May;64(5):1202-17. doi: 10.1111/j.1558-5646.2010.00966.x. Epub 2010 Feb 9.
    https://web.natur.cuni.cz/~muncling/Szulkin2010Heterozygosity.pdf) from 2010 say,

    These theoretical expectations are highly concordant with observed correlations between heterozygosity and f(population inbreeding) when the latter can be estimated independently using pedigree data.

    I was very interested to see that, just like the studies coming out of UC Davis that have pedigree data, this recent study on Bullmastiffs, which used microsatellite data and pedigrees came to highly similar mean inbreeding coefficients in the genotyped population using genetic markers (0.35.) and the larger pedigree database of 16,739 dogs (0.39.) (They cited the recent Standard Poodle paper, along with many other recent studies.)  
    Comparative Analysis of Genome Diversity in Bullmastiff  Dogs
    (Sally-Anne Mortlock, Mehar S. Khatkar, Peter Williamson, Published: January 29, 2016, DOI: 10.1371/journal.pone.0147941 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0147941)

    The molecular analysis overcomes limitations in calculations based on pedigree data, and takes into account recent breeding events plus the effects of past inbreeding, selection and genetic drift. Molecular data has the advantage of accurately measuring genetic heterogeneity between individuals used for breeding.


    From the Jan 2015 paper entitled  Relatedness in the post-genomic era: is it still useful?
    (Speed D, Balding DJ, Nat Rev Genet. 2015 Jan;16(1):33-44. doi: 10.1038/nrg3821. Epub 2014 Nov 18.  http://www.ncbi.nlm.nih.gov/pubmed/25404112)

    The classical theory of kinship coefficients based on lineage paths in pedigrees provides a mathematically beautiful structure that has historically been useful, but its weaknesses are apparent. Pedigree founders are typically assumed to be unrelated, but this is only realistic in certain settings, such as some designed breeding programmes or an isolated population created by a specific founding event. All pairs of individuals with no common ancestor in the pedigree have coancestry (θ) of zero, but in practice they can have important differences in genome similarity.

    In this paper, the researchers tested the accuracy of using over 35,000 SNPs compared to pedigree inbreeding coefficients:   Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees  (Kardos M, Luikart G, Allendorf FW.,Heredity (Edinb). 2015 Jul;115(1):63-72. doi: 10.1038/hdy.2015.17. Epub 2015 Mar 18.  http://www.ncbi.nlm.nih.gov/pubmed/26059970)

    We used computer simulations to test whether the realized proportion of the genome that is identical by descent (IBDG) is predicted better by the pedigree inbreeding coefficient (FP) or by genomic (marker-based) measures of inbreeding...  Our results demonstrate that IBDG can be more precisely estimated with large numbers of genetic markers than with pedigrees. We encourage researchers to adopt genomic marker-based measures of IBDG as thousands of loci can now be genotyped in any species.

    This next recent paper from New Zealand rightly urges caution and careful analysis of the methods used to assess heterozygosity in a population. It is possible to test methods by matching known relationships to genetic estimates to the genetic marker based relatedness estimates. I was able to use the method studied to assess the Standard Poodle population from the recent study and confirm the usefulness of the method as suggested in the paper.  The use and abuse of genetic marker-based estimates of relatedness and inbreeding  (Helen R. Taylor,  Ecol Evol. 2015 Aug; 5(15): 3140–3150.  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559056)

    Genetic marker-based estimators remain a popular tool for measuring relatedness (r xy) and inbreeding (F) coefficients at both the population and individual level. The performance of these estimators fluctuates with the number and variability of markers available, and the relatedness composition and demographic history of a population.

    Finally, there is this recent and important synopsis, entitled Pedigrees or markers: Which are better in estimating relatedness and inbreeding coefficient? 

    ( Wang, J., Theor Popul Biol. 2016 Feb;107:4-13. doi: 10.1016/j.tpb.2015.08.006. Epub 2015 Sep 3.
    http://www.ncbi.nlm.nih.gov/pubmed/26344786 )  Dr. Jinliang Wang is a leading researcher who also writes computer programs that assess inbreeding and relatedness in various ways. He wrote the program discussed in the paper above, and he sums it all up rather nicely in the abstract:

    Individual inbreeding coefficient (F) and pairwise relatedness (r) are fundamental parameters in population genetics and have important applications in diverse fields such as human medicine, forensics, plant and animal breeding, conservation and evolutionary biology. Traditionally, both parameters are calculated from pedigrees, but are now increasingly estimated from genetic marker data. Conceptually, a pedigree gives the expected F and r values, FP and rP, with the expectations being taken (hypothetically) over an infinite number of individuals with the same pedigree. In contrast, markers give the realised (actual) F and r values at the particular marker loci of the particular individuals, FM and rM. Both pedigree (FP, rP) and marker (FM, rM) estimates can be used as inferences of genomic inbreeding coefficients FG and genomic relatedness rG, which are the underlying quantities relevant to most applications (such as estimating inbreeding depression and heritability) of F and r. In the pre-genomic era, it was widely accepted that pedigrees are much better than markers in delineating FG and rG, and markers should better be used to validate, amend and construct pedigrees rather than to replace them. Is this still true in the genomic era when genome-wide dense SNPs are available? In this simulation study, I showed that genomic markers can yield much better estimates of FG and rG than pedigrees when they are numerous (say, 10(4) SNPs) under realistic situations (e.g. genome and population sizes). Pedigree estimates are especially poor for species with a small genome, where FG and rG are determined to a large extent by Mendelian segregations and may thus deviate substantially from their expectations (FP and rP). Simulations also confirmed that FM, when estimated from many SNPs, can be much more powerful than FP for detecting inbreeding depression in viability. However, I argue that pedigrees cannot be replaced completely by genomic SNPs, because the former allows for the calculation of more complicated IBD coefficients (involving more than 2 individuals, more than one locus, and more than 2 genes at a locus) for which the latter may have reduced capacity or limited power, and because the former has social and other significance for remote relationships which have little genetic significance and cannot be inferred reliably from markers.

    And here are lots more to read, if you are so inclined:

    A comparison of approaches to estimate the inbreeding coefficient and pairwise relatedness using genomic and pedigree data in a sheep population.
    Li MH, Strandén I, Tiirikka T, Sevón-Aimonen ML, Kantanen J.
    PLoS One. 2011;6(11):e26256. doi: 10.1371/journal.pone.0026256. Epub 2011 Nov 9.
    Improved estimation of inbreeding and kinship in pigs using optimized SNP panels.
    Lopes MS, Silva FF, Harlizius B, Duijvesteijn N, Lopes PS, Guimarães SE, Knol EF.
    BMC Genet. 2013 Sep 25;14:92. doi: 10.1186/1471-2156-14-92.

    The effect of rare alleles on estimated genomic relationships from whole genome sequence data.
    Eynard SE, Windig JJ, Leroy G, van Binsbergen R,  Calus MP
    BMC Genet. 2015 Mar 12;16:24. doi: 10.1186/s12863-015-0185-0.
    How many SNPs are enough?
    Smouse PE
    Mol Ecol. 2010 Apr;19(7):1265-6. doi: 10.1111/j.1365-294X.2010.04555.x.
    http://www.ncbi.nlm.nih.gov/pubmed/20456228

    On the use of large marker panels to estimate inbreeding and relatedness: empirical and simulation studies of a pedigreed zebra finch population typed at 771 SNPs.
    Santure AW, Stapley J, Ball AD, Birkhead TR, Burke T, Slate J.
    Mol Ecol. 2010 Apr;19(7):1439-51. doi: 10.1111/j.1365-294X.2010.04554.x. Epub 2010 Feb 10.
    http://www.ncbi.nlm.nih.gov/pubmed/20149098

    Heterozygosity-fitness correlations in zebra finches: microsatellite markers can be better than their reputation.
    Forstmeier W, Schielzeth H, Mueller JC, Ellegren H, Kempenaers B.
    Mol Ecol. 2012 Jul;21(13):3237-49. doi: 10.1111/j.1365-294X.2012.05593.x. Epub 2012 May 3.
    http://www.ncbi.nlm.nih.gov/pubmed/22554318

    Heterozygosity-fitness correlations and inbreeding depression in two critically endangered mammals.
    Ruiz-López MJ, Gañan N, Godoy JA, Del Olmo A, Garde J, Espeso G, Vargas A, Martinez F, Roldán ER, Gomendio M.
    Conserv Biol. 2012 Dec;26(6):1121-9. doi: 10.1111/j.1523-1739.2012.01916.x. Epub 2012 Aug 16.
    http://www.ncbi.nlm.nih.gov/pubmed/22897325

    The imprecision of heterozygosity-fitness correlations hinders the detection of inbreeding and inbreeding depression in a threatened species.
    Grueber CE, Waters JM, Jamieson IG.
    Mol Ecol. 2011 Jan;20(1):67-79. doi: 10.1111/j.1365-294X.2010.04930.x. Epub 2010 Nov 19.
    http://www.ncbi.nlm.nih.gov/pubmed/21087447

    Context-dependent associations between heterozygosity and immune variation in a wild carnivore.
    Brock PM, Goodman SJ, Hall AJ, Cruz M, Acevedo-Whitehouse K.
    BMC Evol Biol. 2015 Nov 4;15:242. doi: 10.1186/s12862-015-0519-6.
    http://www.ncbi.nlm.nih.gov/pubmed/26537228

    Direct and indirect causal effects of heterozygosity on fitness-related traits in Alpine ibex.
    Brambilla A, Biebach I, Bassano B, Bogliani G, von Hardenberg A.
    Proc Biol Sci. 2015 Jan 7;282(1798):20141873. doi: 10.1098/rspb.2014.1873.

    Heterozygosity at a single locus explains a large proportion of variation in two fitness-related traits in great tits: a general or a local effect?
    García-Navas V1, Cáliz-Campal C, Ferrer ES, Sanz JJ, Ortego J.
    J Evol Biol. 2014 Dec;27(12):2807-19. doi: 10.1111/jeb.12539. Epub 2014 Nov 23.
    http://www.ncbi.nlm.nih.gov/pubmed/25370831

    Heterozygosity-fitness correlations in a wild mammal population: accounting for parental and environmental effects.
    Annavi G, Newman C, Buesching CD, Macdonald DW, Burke T, Dugdale HL.
    Ecol Evol. 2014 Jun;4(12):2594-609. doi: 10.1002/ece3.1112. Epub 2014 May 27.
    http://www.ncbi.nlm.nih.gov/pubmed/25360289

    Effects of inbreeding on fitness-related traits in a small isolated moose population.
    Haanes H, Markussen SS, Herfindal I, Røed KH, Solberg EJ, Heim M, Midthjell L, Sæther BE.
    Ecol Evol. 2013 Oct;3(12):4230-42. doi: 10.1002/ece3.819. Epub 2013 Sep 30.
    http://www.ncbi.nlm.nih.gov/pubmed/24324873

    Estimating genome-wide heterozygosity: effects of demographic history and marker type.
    Miller JM, Malenfant RM, David P, Davis CS, Poissant J, Hogg JT, Festa-Bianchet M, Coltman DW.
    Heredity (Edinb). 2014 Mar;112(3):240-7. doi: 10.1038/hdy.2013.99. Epub 2013 Oct 23.
    http://www.ncbi.nlm.nih.gov/pubmed/24149650
    Applications and implications of neutral versus non-neutral markers in molecular ecology.
    Kirk H1, Freeland JR.
    Int J Mol Sci. 2011;12(6):3966-88. doi: 10.3390/ijms12063966. Epub 2011 Jun 14.
    http://www.ncbi.nlm.nih.gov/pubmed/21747718