Tuesday, July 4, 2017

Why Bayes Law and Genetic Admixture Programs can not Accurately Access African-Eurasian Admixture

Many geneticists claim they use Bayesian statistics to determine the phylogeography of African and Afro-American populations,  "In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event." 

In genetics researchers are attempting to match individuals to their ancestry based on the genetic profile they carry in their genes. Researchers have identified a set of particular mtDNA and Y_Chromosome haplogroups that are carried by the four major populations: Sub-Saharan Africans, Western and Eastern Eurasian, and Native Americans.

When people take a genetics test they often self identify the population they belong too. Today, researchers can identify an individual's ancestry by examining single nucleotide polymorphisms or SNPs. The pattern of SNPs in an individual's genome  indicates a person's ancestry.

If the pattern of ones SNPs can indicate an individual's ancestry (and ethnicity), using Bayes’ theorem, a person’s ethnicity can be used to more accurately assess the probability that they carry a particular haplogroup, compared to the assessment of the probability of an individual's ancestry made without knowledge of the person's ethnicity.

Admixture and Structure programs assume that their are four pristine ethnic population or races: Sub-Saharan African, Western Eurasian (Sub-clade in Middle East), Eastern Eurasian and Native American. Because these races are considered pristine, each population is assigned a specific set of haplogroups, e.g., SSA population mtDNA belong to L haplogroups and Y-DNA is A and E. The problem with these assumptions is that SSA carry all the haplogroups associated with the Eurasians and Native Americans. Due to this, geneticist have to mask selected genes so they can get the results they want.

Bayes Law mathematically is the following equation:

 -

Where A and B are the probabilities of observing A and B without regard to each other.
P(A\B), a conditional probability is the probability of observing event A given that B is true.
P(B\A) is the probability of observing event B given that A is true.

If we apply this to genetic ancestry and admixture testing we have the following equation:

P(ethnicity\SNPs) is a conditional probability it the probability of observing that the ethnicity (of individuals) (A) x SNPs (B) = individual’s P(ethnicity\SNPs) is a conditional probability it the probability of observing that the ethnicity (of individuals) (A) x SNPs (B) = an individual’s haplogroup or membership in a population is true.

P(SNPs\ethnicity) is the probability of observing that given SNPs x ethnicity= an individual’s haplogroup/ancestral component as a member of a given population is true.

The equation might fail in determining the admixture between SSAs and Eurasians, because Africans carry all the genes found among Eurasians.

As a result, using Bayesian statistics in admixture programs may provide invalid results, because P(ethnicity \SNPs) does not accurately predict the ancestral components carried by SSAs because we carry ancestral components carried by Eurasians and Native Americans.

We carry these haplogroups because Africans were the first anatomically modern humans to migrant into Eurasia and the Americas carrying these genes.

No comments: