Wednesday, 26 October 2016

Family Tree DNA and Assassin's Creed The Movie

Family Tree DNA has teamed up with 20th Century Fox to offer a special DNA testing package which will be promoted with the forthcoming action adventure film Assassin's Creed.

For the duration of the promotion it will be be be possible to purchase a special Assassin’s Creed DNA Testing Bundle for $89 which includes a Family Finder test, a Warrior Gene test and a one-month premium subscription to Findmypast.

There is a also a competition (what they have called a "sweepstakes") to win a trip for two to Las Vegas for an "Assassin’s Creed-themed adventure". The competition appears to be open worldwide but note that the prize only includes domestic flights in the US so if you were one of the lucky winners you would have to pay your own air fare to the US.

The film is released worldwide on 21st December but the tests are available with immediate effect and the competition has already started. Here's the promotion for the Assassin's Creed package.

The Warrior Gene is interesting because it's transmitted on the X-chromosome. At one time Family Tree DNA offered a standalone Warrior Gene test. Jobling et al comment on the Warrior Gene in their article In the blood; the myth and reality of genetic markers of identity (Ethnic and Racial Studies 2016 39(2): 142-161):
The enzyme monoamine oxidase A (MAOA) degrades a subset of neurotransmitters including serotonin, epinephrine, and norepinephrine – molecules that transmit information from one neuron to another. Adjacent to the MAOA gene is a region of DNA that controls how much enzyme is produced, and a common variant of the length of this region (called 3R) leads to reduced production of enzyme compared to other common versions (Sabol, Hu, and Hamer 1998). The gene lies on the X chromosome, so males, who have only one X, show the simplest relationship between the version of the gene they carry and its behavioural consequences. Men carrying the 3R version (the ‘warrior gene’) are more likely to respond aggressively to maltreatment or stress (Caspi et al. 2002). Despite charging almost 100 dollars for the ‘warrior gene’ test, the testing company calls the association between gene variant and behaviour a ‘factoid’, and best used as a ‘cocktail conversation starter’. Nonetheless we might wonder if the results of the test have any influence on the behaviour of people who are tested; the possible influence of the 3R variant was used in 2009 as part of a successful criminal defence in the USA (Brooks-Crozier 2011), and made the difference between thirty-two years’ imprisonment and the death penalty.
See also this excellent article by Adam Rutherford for the New Statesman on Why we can't blame "warrior genes" for violent crimes. (Thanks to Ann Turner for alerting me to this article.)

23andMe and AncestryDNA are already advertising on TV and, as DNA testing goes mainstream, it's important that Family Tree DNA promote their products on mass media to keep up with the competition. So whatever you might think about the Warrior Gene test it's good news that Family Tree DNA are now advertising in cinemas and actively promoting the Family Finder test. This will help to familiarise people with the company name, and perhaps introduce a new demographic to DNA testing who might not otherwise have considered buying a test.

To learn more about the Assassin's Creed package and the competition visit:

You need to scroll right down to the bottom of the page to find the information about the competition.

Here is the official press release from Family Tree DNA and 20th Century Fox.
Family Tree DNA and 20th Century Fox Team Up for Historical Adventure 
Genetic genealogy pioneers announce exciting partnership with the theatrical release of Assassin’s Creed. 
Houston, Texas — October 25, 2016:

In association with the upcoming theatrical release of the epic adventure film ASSASSIN’S CREED, in theaters December 21, Family Tree DNA is pleased to announce a new partnership with 20th Century Fox and Findmypast, which features the Assassin’s Creed DNA Testing Bundle and Assassin’s Creed Sweepstakes. 
Loosely based on the popular video game franchise of the same name, and starring award-winning actors Michael Fassbender and Marion Cotillard, the movie’s main character Callum Lynch—through a revolutionary technology called the Animus—travels deep into the past to discover that his genetic ancestor, Aguilar, was part of a mysterious secret organization, the Assassin’s, in 15th Century Spain. The action-adventure follows Callum as he relives Aguilar's memories in present day.

As pioneers in the direct-to-consumer DNA testing industry, Family Tree DNA was tapped by 20th Century Fox to be the exclusive testing partner for the film. The company’s premier suite of DNA tests along with the world’s most comprehensive matching database enable users to trace their lineage through time, explore ancestry and connect with relatives across the globe.

Family Tree DNA Director of Product Development, Michael Davila, noted that “The opportunity to partner with 20th Century Fox on the release of Assassin’s Creed is not only exciting but serendipitous. The storyline of Callum Lynch connecting to his ancestral past ties in completely with what our company does in helping people discover their origins and explore family history,” said Davila. 
“We are excited to be partnering with Family Tree DNA,” said Zachary Eller, Senior Vice President, Marketing Partnerships, 20th Century Fox. “They provide a fantastic opportunity to bring the central themes of Assassin’s Creed to a real world application by allowing consumers to actually discover their past.” 
With the purchase of the special limited-time Assassin’s Creed Bundle, customers will be mailed a sample collection kit which, when processed, will provide both Family Tree DNA’s signature Family Finder test and the Warrior Gene DNA test. They will also receive a free one-month premium subscription to Findmypast’s online genealogy service. 
According to Belinda Hanton, Global Head of Partnerships at Findmypast, “We are thrilled to be teaming up with Fox and Family Tree DNA to promote family history research and genetic genealogy. It’s partnerships like this that allow us to speak to completely new audiences and help spread the word that anyone can start exploring their heritage at the click of a mouse. The lives of our ancestors are not only recorded in historical records, but are also written in our DNA and it is now easier than ever before to unlock the incredible stories hidden in our families’ past.” 
Using a simple cheek swab and step-by-step instructions, users return the sample collection test kit by mail, in a provided envelope, directly to Family Tree DNA. Results typically take four to five weeks and are delivered through a private customer dashboard with email notification. Unlike other testing companies, Family Tree DNA results are kept completely confidential and secure privacy settings put users in control of how much information they choose to share.

Family Finder is an autosomal (non-sex) DNA test that finds matches within five generations and includes myOrigins,a powerful mapping tool that provides a detailed geographic and ethnic breakdown of personal genetic ancestry. The Warrior Gene test determines whether a person carries the Monoamine Oxidase A (MAOA) gene variant, dubbed the “Warrior Gene,” which some researchers say may cause certain carriers to engage in more risk-taking behaviors and be able to better assess their chances of success in critical situations. 
Together with the Assassin’s Creed DNA Testing Bundle is the Assassin’s Creed Sweepstakes and a chance to win a Grand Prize trip for two to Las Vegas for an Assassin’s Creed-themed adventure. The experience includes a series of high-octane Assassin’s Creed-inspired activities like a master parkour class, nighttime zip lining and an electrifying sky jump from the tallest tower in the city.

Although no purchase is necessary to enter the contest, purchasing the Assassin’s Creed Bundle earns customers ten additional entries into the Sweepstakes for a greater chance to win a trip to Las Vegas as well as other prizes. Followers will also have the opportunity to earn bonus entries by sharing Sweepstakes social posts on their Facebook and Twitter pages. 
With the exclusive DNA Testing Bundle and Sweepstakes movie tie-in, Assassin’s Creed fans everywhere will be able to jump back in time, embrace their inner warriors and unlock their genetic memories.

“The partnership between Fox’s Assassin’s Creed and Family Tree DNA is a perfect fit,” Davila said. “Test-takers get to find out if they carry the “Warrior Gene” in their DNA, and while they’re at it, will be able to delve into the exciting world of genetic genealogy and discover their own family histories…all through DNA. Everyone has a story to tell…so it’s an absolute win-win scenario.

Wednesday, 19 October 2016

My pick of the abstracts and posters from ASHG 2016

The American Society for Human Genetics is holding its annual conference from 18th to 22nd October in Vancouver, Canada. The Platform and Poster Abstracts are now available online. The research presented at this meeting gives a taste of some of the publications and developments to come in the next year or so. There are a number of abstracts that are of particular interest to genetic genealogists. In particular I note that AncestryDNA are presenting a number of interesting posters which hint at some new tools that might be on the way. I've highlighted below my picks from the conference programme.

23andMe will also be at the ASHG meeting. They have published a list of the abstracts for their presentations and posters on their blog, though none of the content is of direct interest to genetic genealogists. 

Platform Abstracts

Ultra-fine structural inference and population assignment using IBD network clustering and classifiers accurately assign sub-continental origins represented in a large admixed U.S. cohort.
E. Han, R. Curtis, P. Carbonetto, K. Noto, J. Byrnes, Y. Wang, J. Granka, A. Kermany, K. Rand, E. Elyashiv, H. Guturu, N. Myres, E. Hong, C. Ball, K. Chahine. DNA, LLC,
San Francisco, CA.
Motivation & Objectives: Identifying the geographic origin of individuals using genetic data has broad application in forensics, human disease and evolution. There have been multiple methods proposed to achieve this goal, such as Principle Component Analysis (PCA), Spatial Ancestry Analysis (SPA) and Geographic Population Structure (GPS). However, most methods suffer from decreased prediction accuracy outside Europe and do not apply to the US population comprised of admixed immigrants. In this study, we describe a new method and demonstrate its accuracy in predicting geographic origins in the US post-European colonization or internationally for single origin and admixed samples. Methods: We use a database of over 1.5 million consented genotype samples collected from the US and internationally, along with samples from public databases such as POBI. We build a genetic network by estimating the amount of identity-by-descent (IBD) sharing between all individuals. By iteratively applying the Louvain method for community detection, we find a hierarchy of genetic clusters in the network. Levering user-generated pedigrees going back 6-8 generations, we annotate each cluster with birth locations that are enriched in historical time periods. The birth locations of these clusters are generally specific to locations in the US or internationally, allowing for concise geographical interpretation. Although community detection results assign samples to only one cluster, we use machine learning classification to assign samples to multiple clusters. Given this classification and enriched birth locations, we identify the likely geographic origins of each sample. Results: Our results include over 300 stable clusters, each comprised of more than 1000 samples. Some clusters correspond to narrow geographical regions, such as people descended from southern West Virginia in the 19th century, and others to broader groups, such as European Jews from Poland. By using the associated pedigrees, we demonstrate the accuracy of these predictions: over 95% of the assigned individuals have at least one known ancestor born in the enriched region defined by most clusters. Conclusion: By utilizing large-scale genetic data with associated pedigrees, we have developed the first method for predicting the geographic origin of individuals within the US or internationally with high accuracy. This approach can be used for ultra fine scale genetic ancestry mapping in any population.

A massively scalable phenotyping approach using social media for genetic studies.
J. Yuan1,2, A. Gordon1, D. Speyer1,2, D. Zielinski1, R. Aufrichtig1, J. Pickrell1,3, Y. Erlich1,2. 1) New York Genome Center, New York, NY; 2) Computer Science, Columbia University, New York, NY; 3) Biological Sciences, Columbia University, New York, NY.

While DNA sequencing is largely a tractable problem, massive phenotyping is still a challenge, especially for Internet-based studies. Traditional methods, such as physical exams, scale poorly for large numbers of individuals. Questionnaires are easier to collect, but administering lengthy or frequent questionnaires creates a negative experience for participants, leading to lower completion rates. Electronic health records are a great resource for phenotypes, but they exhibit large heterogeneity when collected from various resources and are subject to an array of confidentiality restrictions that complicate their collection. Recent studies have highlighted the value of obtaining digital phenotypes by interpreting the interactions of users with digital outlets as a reflection of underlying traits. In particular, these studies have shown that social media data enables the collection of various phenotypes including big five personality traits, sexual orientation, sleeping patterns, and even heart rate from regular user videos. The ubiquity of the data and its ease of collection through standard APIs enable a new methodology for large scale phenotypic collection. Here, we report our ongoing efforts to enable participants to donate their social-media data along with their genomes in order to understand the genetics of digital phenotypes. In our previous work, we developed DNA.Land (, an online platform where users may register and securely contribute their Direct to Consumer genomic data, as well as receive reports of ancestry and shared relatives with other DNA.Land users. Since our launch in ASHG2015, we have obtained over 20,000 users, many of whom have been eager to share personal information such as family history. We are now building a new component in DNA.Land in which users can contribute their Facebook data for scientific studies. We will present our IBM Watson-based system to predict traits from social media data and will describe the type of information DNA.Land users will receive. In addition, we will discuss the particular challenges in collecting this data with respect to both computational efforts and privacy concerns. Our approach is applicable for other types of large scale efforts such as the Precision Medicine Initiative and can easily scale to millions of people.

Poster Abstracts

Insights into the geographical distribution of genetic admixture of unrelated volunteer donors and recipients of stem-cell transplants.
A. Madbouly 1, K. Besse 1, Y. Wang 2, J. Byrnes 2, C. Ball 2, N. Myres 2, M. Maiers 1. 1) Bioinformatic Research, National Marrow Donor Program, Minneapolis, MN; 2), San Francisco, CA, USA.

Genetic ancestry of self-described groups may vary across geographic locations in the US, a phenomenon documented anecdotally but not thoroughly explored in the literature. We studied the genetic ancestry of 995 HLA matched donor/recipient (DR) pairs from the Be The Match® registry with a focus on regional ancestry differences among ethnic groups. We hypothesized that, along with historical events, donor/transplant center distribution and socioeconomic factors might influence the geographical spread of some genetic admixtures. We genotyped 995 DR pairs on the Illumina OmniExpress chip with approximately 730,000 SNPs. Self-reported race and ethnicity was collected for donors at the time of registry recruitment. Recipients’ race and ethnicity was recorded at the transplant hospital once at the time of diagnosis and again after transplant. The majority of the study cohort (94%) self-identified as European Caucasian (CAU). The rest identified as Hispanic (HIS) (3.5%), African-American (1%) and Asian or Pacific Islander (1.5%). Address zip code information was available for 99% of recipients but only 59% of donors. Genetic ancestry was estimated by applying the AncestryDNA ethnicity estimator pipeline, which provides a vector of 26 admixtures. Some admixtures were combined for the analysis due to small counts and minimal impact such as detailed African (AFR) admixtures. We then mapped the geographical distribution of European (EUR) and non-EUR genetic admixtures for self-reported CAU and non-CAU individuals, optimizing geographical regions for subject privacy. The main self-reported race groups showed average proportions of AFR and EUR admixtures compatible with Bryc and colleagues (2015). However, our results revealed larger Amerindian admixture in self-reported HIS, especially among recipients. When stratifying regionally, systematic differences emerged in admixture distribution among similar race groups mostly interpretable by historic events. Separating donors and recipients suggested possible additional influences, such as donor and transplant center geographical spread. Importantly, we observed differences in the distribution of non-majority admixtures such as increased AFR admixture in self-reported CAU donors (but not recipients) in some southern states suggesting a possible socioeconomic link. This work has the potential of guiding stem-cell donor registry strategies on volunteer donor recruitment and donor and transplant center planning.

Geographic and historic changes in runs of homozygosity among more than 1,000,000 individuals sheds light into the recent demographic history of US population.
A. Kermany, C. Ball, J. Byrnes, P. Carbonetto, K. Chahine, R. Curtis, E. Elyashiv, J. Granka, H. Guturu, E. Han, E. Hong, N. Myres, K. Noto, K. Rand, Y. Wang. DNA, LLC, San Francisco, CA.

Runs of Homozygosity (ROH) are indicators of segments of chromosomes identical by descent between parental haplotypes. Distribution of such runs along the chromosome contains information regarding the demographic history of the population under study, in particular it reveals trends in consanguinity. In this study, we analyze the distribution of runs of homozygosity – chromosomal locations, number of runs and lengths of runs - as well as estimated inbreeding coefficient (F) among more than 1,000,000 consented AncestryDNA customers. We report on observed variations in distribution of ROH based on geographic origins - inferred from the available pedigree data – admixture proportions as well as birth year cohort. In particular, we present our results on variations in the distribution of ROH within 19 communities within the US population - identified based on analysing a network of genetic matches in the database - and investigate differences in patterns of ROH between each group and comment on the inferred demographic history within each group.

Y-chromosomal sequencing and screening reveal both stability and migrations in North Eurasian populations.
O. Balanovsky 1,2, V. Zaporozhchenko 2,1, A. Agdzhoyan 1,2, I. Alborova 5, M. Kuznetsova 2, V. Urasin 3, M. Zhabagin 4, M. Chukhryaeva 2,1, Kh. Mustafi n 5, C. Tyler-Smith 6, E. Balanovska 2 . 1) Vavilov Institute of General Genetics, Moscow, Russian Federation; 2) Research Centre for Medical Genetics, Moscow, Russia; 3) YFull service, Moscow, Russia; 4) National Laboratory Astana, Nazarbayev University, Astana, Republic of Kazakhstan; 5) Moscow Institute of Physics and Technology (State University), Moscow, Russia; 6) The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

Y-chromosomal markers exhibit the highest interpopulation diversity in the genome and thus form one of the most informative tools for tracing population history. However, their information value depends on discovering SNPs which subdivide haplogroups with broad geographic distribution into branches revealing fine population structure. Progress in such discoveries has recently moved from a slow linear phase to a rapid exponential phase due to NGS. We applied this approach to the Y-chromosomal pool of North Eurasian populations and concentrated on haplogroups C, G1, G2, N1b, N1c, and R1b. We sequenced 181 Y-chromosomes (capturing 11 Mb from each sample), developed the NGSConv software for calling Y-chromosomal SNPs, and identified roughly 2,500 SNPs, most of which were new. Then we constructed phylogenetic trees and dated dozens of their branches using our estimates of the mutation rate. The last – but not the least – step included screening branch-defining SNPs in the entire Biobank of indigenous North Eurasian populations (led by prof. Elena Balanovska), which includes 26,000 samples from 260 populations. This screening resulted in frequency distribution maps of 29 branches of haplogroups R1b and C, thus increasing the phylogenetic resolution by an order of magnitude compared to the two initial haplogroups. For haplogroup R1b, we identified a previously unstudied “eastern” branch, R1b-GG400, found in East Europeans and West Asians and forming a brother clade to the “western” branch R1b-L51 found in West Europeans. The ancient samples from the Yamnaya archaeological culture are located on this eastern branch, showing that the paternal descendants of the Yamnaya population – in contrast to the published autosomal findings - still live in the Pontic steppe and were not an important source of paternal lineages in present-day West Europeans. For haplogroup C-M217 - the predominant paternal component in Central Asians - we found signals of simultaneous expansion in two independent branches. Both expansion times and gene geographic maps of the expanded lineages indicated the emergence of the Mongol Empire as the likely trigger. We conclude that simply discovering new SNP is not enough, but in combination with screening for the branch-defining SNPs in large biobanks of indigenous populations, it allows comprehensive reconstruction of male population history. The study was supported by the Russian Science Foundationgrant 14-14-00827 to OB.

Admixture inference of African Americans and Latinos in the United States through time.
M.L. Spear 1, D.G. Torgerson 2, R.D. Hernandez 1,3,4. 1) Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; 2) Department of Medicine, University of California, San Francisco, San Francisco, CA; 3) California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA; 4) Institute for Human Genetics, University of California, San Francisco, CA.

The study of admixed populations has provided important insights into medical genetics and population history. The genomes of admixed individuals are mosaics of segments originating from different ancestral populations. At the genome-wide level, the proportion of one’s genome deriving from each ancestral population is referred to as “global ancestry proportions”. However, modern statistical methods enable inference of the ancestry at individual SNPs within a genome, “local ancestry”, which allow us to reconstruct the mosaic pattern of ancestry tracts across an individual’s genome. Local ancestry inference is critical for the analysis of admixed genomes and has been widely studied in the fields of medical genetics and human demographic history. Local ancestry tracts can be used to infer migration histories but the question remains how these histories have shaped ancestry proportions over time, particularly in the United States, a “melting pot” country that has faced changing societal norms over the past century. It has yet to be determined how the length distribution of ancestry tracts in admixed individuals has changed over decades as well as how the variation in ancestry proportions across chromosomes and individuals may differ. Thus, we estimated local ancestry for 4,600 Latinos and 2,100 African Americans from the Genetic Epidemiology Research on Adult Health and Aging (GERA) dataset using RFMix. With these local ancestry tracts, we used TRACTS to compare the observed length of the ancestry tracts to predictions of different demographic models of migration scenarios. Individuals were grouped by 5-year birth year categories, and comparisons were made between the demographic models generated from each birth year category. Overall, the local ancestry tracts of African Americans and Latinos from the United States have provided insights into the change in complexity of their genetic structure throughout the 20th century.

Fine-scale population structure in France: Loire River as genetic barrier.
C. Dina 1,2, J. Giemza 1, M. Karakachoff 1,2, F. Simonet 1,2, K. Rouault 3, E. Charpentier 1,2, S. Lecointe 1,2, P. Lindenbaum 1, J. Violleau 1,2, H. Le Marec 1,2, C. Férec 3, S. Chatel 1,2, S. Hercberg 4, P. Galan 4, J-J. Schott 1,2, E. Génin 3, R. Redon 1,2. 1) Thorax Inst, INSERM-CNRS, Nantes, France; 2) CHU Nantes, Nantes University; 3) Inserm UMR 1078, CHRU Brest, University Bretagne Occidentale, EFS, Brest France; 4) Université Paris 13, Equipe de Recherche en Epidémiologie Nutritionnelle, Centre de Recherche en Epidémiologie et Statistiques, Inserm (U1153), Inra (U1125), Cnam, COMUE Sorbonne Paris Cité, F-93017, Bobigny, France.

Background The genetic structure of human populations varies throughout the world, being infl uenced by migration, admixture, natural selection and genetic drift. Human population structure has first been investigated at broad scales, between and within continents. Currently researchers focus on finer scales, examining genetic structure within countries. Characterising such genetic variation is of interest as it provides insight into demographical history and informs research on disease association studies, especially on rare variants. We here explored the genetic structure of a population living on the French territory (hereafter called French population) both on the whole territory and then on Western part where interesting stratification was identified.
Methods and Results We genotyped genome-wide ; 2276 individuals with known department of origin from French Population (SU.VI.MAX study) using Illumina Chip; 456 individuals (PREGO study) from Western France Atlantic Coast, from Finistère to Vendée, with at least three of their grandparents born within a 15 kilometres distance using Axiom CEU Chip. With EEMS software we visualised areas with low effective migration rates - the migration barriers, which match with geographical features, with particularly strong barrier on the lower course of Loire in Western France. We then focused on the PREGO study and Principal Components analysis revealed that individuals from the same departments form clusters. In both datasets we observed a high correlation between geographical position and components (p-value < 2e-16). Many independent methods support the hypothesis that Loire River is a genetic barrier. The two groups of individuals, from north or south of Loire, are well differentiated along PC1 axis. ADMIXTURE estimated different ancestry proportions for the two groups. The first split of hierarchical clustering returned by fi neSTRUCTURE, and the one based on normalized counts of identity-by-descent segments is between north and south of Loire.
Conclusion We here report genetic stratification at the level of continental French territory. The migration pattern is following the geographical structure. A specific pattern is noticed around the Loire River. We confirm both evidence for isolation by distance and existence of a genetic barrier, the Loire River. The discovered fi ne-scale population structure may have consequences in association analyses, especially for rare variants which tend to be geographically clustered.

Identification and characterization of common haplotypes found in a database of one million human genomes.
H. Guturu 1 , K. Noto 1 , J. Byrnes 1 , S. Song 1 , P. Carbonetto 1 , R.E. Curtis 2 , E. Elyashiv 1 , J.M. Granka 1 , E. Han 1 , E.L. Hong 1 , A.R. Kermany 1 , N.M. Myres 2 , K.A. Rand 1 , Y. Wang 1 , C.A. Ball 1 , K.G. Chahine 2 . 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

Introduction: A common DNA-based method to detect relatives and ancestors (“cousins”) is to identify and match shared portions of chromosomes (haplotype blocks) between an individual and their potential relatives. Identifying and matching the shared haplotype blocks is challenging due to the non-uniform halving of genetic information that takes place during the meiosis events of each generation. As the number of generations increases, the average size of matching haplotype blocks shrink, due to successive chromosomal recombination. Additionally, genetic drift, fl ow and selection establish population structure that skews the distribution of frequency and size of some haplotype blocks. We aim to characterize haplotype blocks based on their frequency pro- fi les and link haplotypes to ancestral communities (“genetic ethnicities”) and more recent admixed communities. Methods: Using a novel haplotype block matching algorithm, we identify haplotype blocks that occur frequently in a database of over one million samples genotyped by DNA, LLC. We review the frequency profiles of each haplotype, and associate them with metadata inferred from global and local estimated admixture ("genetic ethnicity") as well as aggregated family history data from public family trees associated with some of the genotypes.
Results: Common SNP windows have been characterized as identifying signatures of the gamut from ethnicities to more recent admixed communities resulting from migration. Further, we show that these signals of ethnic populations and communities can be used to improve the accuracy of identifying distant “cousin” matches by correcting for matches that are predominately generated due to more ancient signals of ancestry.
Conclusion: By linking common haplotype blocks to ancestral groups of varying age of origin, we can improve the accuracy of ancestor identification for the desired task – ancient haplotype blocks for ethnicity admixture detection to more recent haplotype blocks that reflect recent cousins. Additionally, our characterization of haplotype blocks by ancestral groups reveals interesting candidates for further study and interpretation of their functional implications in various ethnic and community groups.

Maps of effective migration as a summary of human genetic diversity. 
B. Peter, D. Petkova, M. Stephens, J. Novembre. University of Chicago, Chicago, IL.

A dominant pattern of genetic diversity in humans is that geographically proximal populations are generally more genetically similar to one another; however, there are exceptions to this rule. Persistent geographical features such as mountains, oceans, or deserts, have allowed excess genetic differences to accumulate in some regions more than others. Conversely, historical migrations and population movements have led to cases where exceptional levels of similarity persist across large geographic distances. To provide more insight into how genetic differentiation is distributed geographically in humans, we examine the fine-scale genetic structure of humans. We produce maps that represent the spatial structure of human genetic diversity using a recently developed, spatially explicit method (EEMS, Estimation of Effective Migration Surfaces). We apply EEMS on global, continental, and sub-continental scales, analyzing genetic data from 8,740 individuals from 469 geographically localized populations, obtained from 24 different source studies. In addition to the major, well-known barriers such as the Sahara, Himalayas and Mediterranean, we detect barriers that correlate with historic language group boundaries (boundaries of Slavic and Bantu speakers with their neighbors), mountain ranges (Zagros, Caucasus, Ural) and marine features (English Channel, Adriatic Sea, Wallace line). We also identify regions showing high connectivity despite having geographic separation (Britain and Scandanavia, Iceland and Denmark, among the Lesser Sunda Islands). Simultaneously, we find that levels of diversity vary more smoothly, decreasing gradually with distance from Africa. Overall, our results suggest that diversity patterns are consistent and primarily shaped by the signature of the Out-of-Africa expansion, but that migration rates are strongly influenced by geography and local events.

The African Genome Resource Project: Patrilineal and matrilineal inheritance through the Y chromosome and the mitochondrial genome.
F. Abascal, D. Gurdasani, T. Carstensen, M. Pollard, C. Pomilla, M. Sandhu on behalf of AGR investigators. Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

Background The Y chromosome and the mitochondrial genome are inherited from the paternal and maternal lines, respectively. The lack of recombination in the mitochondrial genome and in large part of the Y chromosome leads to evolution almost in isolation from the autosomal genome. As a result, the Y chromosome and the mitochondrial genome offer a unique perspective on human demographic processes. Y chromosome (Y-) and mitochondrial (mt-) haplogroups can be very informative about human origins, migrations and admixture, as well as about potential sex biases during these processes. Further characterisation of the diversity of Y- and mt-haplogroups within Africa is essential to understand human history. Here, we present the mitochondrial and Y chromosome diversity among ~5000 individuals from the African Genome Resource panel.
Methods We predicted the mt- and Y-haplogroups for 4,990 individuals and 2,399 males, respectively, representing diverse ethno-linguistic groups from Ethiopia, Uganda, South Africa, Egypt, and 5 African populations sequenced within the 1000 Genomes project. Mitochondrial and Y haplogroups were predicted with Haplogrep and YFitter, respectively. We called the mitochondrial genome and the Y chromosome for each sample and reconstructed their phylogenetic relationships with FastML.
Results We found evidence for Eurasian admixture among several populations across sub-Saharan populations. Eurasian mt haplogroups appeared in 23% of the Ethiopians and 0.8% of the Ugandans. No Eurasian mt haplogroups were detected for the Zulu and Nama. We identified 13% Ethiopians, 0.5% Ugandan, and 43% Nama/Khoe-Sans with Eurasian Y haplogroups. Eurasian admixture is prevalent in Ethiopia but it is not distributed homogenously. Whereas the Gumuz show no Eurasian haplogroups, the Amhara show the highest frequencies. Within the Nama/Khoe-San there is not a single Eurasian mitochondrial haplogroup but up to 43% of Eurasian Y haplogroups, revealing a strong sex bias (p=1e-12). Consistent with previous reports, the oldest haplogroups are found in highest frequencies within the Khoe-Sans.
Conclusions We present the largest panel of mt and Y chromosome sequences across Africa, including highly diverse Khoe-San populations from South-Africa. Our findings suggest substantial variation in Y chromosome and mt haplogroups across Africa, and provide evidence for extensive Eurasian admixture among several populations across Africa.

Whole-genome sequence analyses provide new insights into the demographic history and local adaptation of African populations.
S. Fan 1, D.E. Kelly 1, M.H. Beltrame 1, M.E.B. Hansen 1, S. Mallick 2,3,4, T. Nyambo 5, S. Omar 6, D. Meskel 7, G. Belay 7, A. Froment 8, N. Patterson 3, D. Reich 2,3,4, S.A. Tishkoff 1,9 . 1) Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; 2) Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; 3) Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; 4) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA; 5) Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dares Salaam, Tanzania; 6) Kenya Medical Research Institute, Center for Biotechnology Research and Development, Nairobi, Kenya; 7) Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia; 8) UMR 208, IRD-MNHN, Musée de l'Homme, Paris, France; 9) Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.

Africa is the origin of modern humans within the past 200,000 years. There are more than 2,000 ethnolinguistic groups in Africa, which encompass around one-third of the world’s languages. To infer the complex demographic history of African populations and adaptation to diverse environments, we sequenced the genomes of 94 individuals from 44 indigenous African populations using high coverage Illumina sequencing technology. Phylogenetic analysis confirms that the San lineage is basal to all other modern human population lineages. The location of other African populations in the phylogenetic tree correlates with geographical location, with the exception of the Central Africa rainforest hunter-gatherer (RHG) populations, who group with Southern African populations. We characterize ancient African population structure by inferring the effective population size and divergence time between populations. A common population bottleneck for all African populations was observed at ~200 thousand years ago (kya), corresponding with paleobiological evidence for modern human origins. Since then, the San and RHG populations have maintained the largest effective population size compared to other populations prior to 10 kya. Using MSMC analysis, we infer that the San population split from the RHG and the East African Khoesan-speaking Hadza and Sandawe hunter-gatherers within the past 66-82 kya, suggesting these populations could have originated from a historically more widespread population of hunter-gatherers. By contrast, the San diverged from all non-Khoesan speaking populations ~100-120 kya The divergence times of Niger-Kordofanian, Nilo-Saharan and Afroasiatic speaking populations were within the past ~22 to 41 kya. In the RHG populations, the oldest divergence was found between Eastern and Western RHG at ~36-51 kya; the time of divergence of the western RHG populations was inferred to be ~12-18 kya. Based on the ADMIXTURE analysis, Niger-Kordofanian and RHG populations were pooled for analyses of natural selection. We observed signatures of positive selection at genes involve in muscle development, bone synthesis, reproduction, immune function, energy metabolism, cell signaling, and neural development. 

This work is supported by NIH grants 1R01DK104339-01, 1R01GM113657-01, and DP1 ES022577-04 to SAT. The sequencing was funded by the Simons Foundation (SFARI 280376) and the U.S. National Science Foundation (BCS-1032255) grants to DR.

The Genome Diversity in Africa Project: A deep catalogue of genetic diversity across Africa.
D. Gurdasani 1,2, J.P. Martinez 1, M.O. Pollard 1,2, T. Carstensen 1,2, C. Pomilla 1,2, GDAP Investigators 1,2 . 1) Wellcome Trust Sanger Institute, Cambridge, Cambridgeshire, United Kingdom; 2) Department of Medicine, University of Cambridge, Cambridge.

While recent efforts have greatly extended our understanding of genetic diversity in Africa, current sequence panels are limited in their capture of African genetic variation. Deeper sequencing with sampling of diverse indigenous populations is needed to capture diverse haplotypes across Africa. The Genome Diversity in Africa Project (GDAP) aims to characterise diversity from representative populations across all of Africa, including from several indigenous hunter-gatherer populations across the region. This would provide an important global resource to understand human genetic diversity and provide insight into population history and migrations across Africa in recent times. The project has completed sequencing of 575 samples across 23 populations in Africa, including populations from the Gambia, Ghana, Morocco, South Africa, Sudan, Chad, Kenya, South Africa, Uganda, Egypt and Ethiopia. Here, we present preliminary results from the project on 133 samples from 5 ethno-linguistic groups from Morocco, Ghana (Ashanti), Nigeria (Igbo), Kenya (Kalenjin) and South Africa (Zulu) sequenced on the Hiseq X platform (30x).
Methods Reads were mapped to the GRCh38 reference. Following quality control, variant sites were called using HaplotypeCaller v3.5 for each sample to generate gVCFs. GenotypeGVCFs was run across all samples for joint calling. VCFs were fi ltered using VQSR calibrated on DP, QD, FS, SOR, Read- PosRankSum and MQRankSum annotations. A tranche sensitivity threshold of 99.5% was applied for fi ltering of SNPs and 99% for indels. Only sites called in >90% of individuals were included. Results We identifi ed 25.1M SNPs and 2.9M indels among 133 individuals in the GDAP pilot phase, with 25% and 47% of SNPs and indels being novel (not in dbSNP141), respectively. A large proportion of variants per population were private, varying from 12-18%, being greatest among the Kalenjin and Zulu. We found the highest level of heterozygosity and genetic variation among the Zulu, consistent with reported Khoe- San admixture in this group. Conclusions We present the pilot phase of the Genome Diversity in Africa Project, identifying a high level of diversity across 5 populations from Africa. Inclusion of indigenous population groups, such as the Hadza, Twa Pygmies, and Ju/’hoansi in the next phase will materially advance the understanding of genetic diversity across African populations, and provide an invaluable resource to researchers worldwide. 

High-coverage sequencing of the Human Genome Diversity Project (HGDP-CEPH) Panel.
S. McCarthy 1, A. Anders Bergström 1, Y. Xue 1, Q. Ayub 1, S. Mallick 2,3,4, M. Sandhu 1, D. Reich 2,3,4, R. Durbin 1, C. Tyler-Smith 1 . 1) Wellcome Trust Sanger Institute, Cambridge, United Kingdom; 2) Department of Genetics, Harvard Medical School, Boston, MA; 3) Broad Institute of Harvard and MIT, Cambridge, MA; 4) Howard Hughes Medical Institute, Boston, MA.

We discuss the completion of high coverage (>30x), whole-genome sequencing of all 952 core individuals in the Human Genome Diversity Panel (HGDP-CEPH), with the results being made available as an open access population data resource. This widely used panel contains samples from 52 populations spanning Africa, the Middle East, Europe, Asia, Oceania and the Americas, and previous genotype data from these samples have been an important reference resource for human genetic diversity. As seen in the 1000 Genomes Project, having fully open access data, unencumbered by managed access restrictions and other hurdles, is an invaluable driver for democratized data analysis and methods development Building on previous sequencing efforts by the Simons Genome Diversity Project, we have completed sequencing of the panel and are making the data available via the ENA and the 1000 Genomes Project data management successor, the International Genome Sample Resource (IGSR) ( All data has moved to the new GRCh38 reference and we present preliminary results on the call set derived from this data. We have GATK HaplotypeCaller and fermikit primary calls, are making mpileup and freebayes calls, and will present an integrated call set that has been computationally phased, together with initial population genetic analyses. A small number of samples are being experimentally phased using 10X Genomics technology which will allow evaluation of phasing accuracy, and also unbiased use of haplotype-based analyses such as MSMC.

Fine-scale identity-by-descent and birth records in Finland provide insights into recent population history.
A.R. Martin 1,2, S. Kirminen 3, A.S. Havulinna 4, A. Sarin 3, A. Palotie 1,2,3, V. Salomaa 4, S. Ripatti 3, M. Pirinen 3, M.J. Daly 1,2 . 1) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; 2) Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; 3) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; 4) National Institute for Health and Welfare (THL), Helsinki, Finland.

Finland provides unique opportunities to investigate both population and medical genomics because of its adoption of unprecedented uniformity in national electronic health records, concerted coordination of research centers across the country, detailed historical records, as well as recent population bottlenecks that drove specific disease alleles to high frequency. We investigate recent population history (up to ~50 generations ago), particularly relevant to rare, disease-conferring alleles, using identity-by-descent (IBD) haplotype sharing in >10,000 Finns. We compare IBD sharing in Finland to nearby Scandinavian countries with considerably different population histories, including >8,000 Swedes and >30,000 Danes. We find drastically more sharing on average in Finns, including many long tracts. By leveraging fi ne-scale birth record data, we find a non-linear decay of pairwise IBD sharing with increasing distance across Finland. This arises from pockets of excess IBD sharing; e.g. pairs of individuals from northeast Finland share on average several-fold more of their genome IBD than pairs from southwest regions containing the major cities of Turku and Helsinki. We demonstrate inference of recent migration patterns from IBD sharing patterns. For example, high IBD sharing in northeast Finland radiates from north to south rather than to the west, indicating that migration is restricted near the Russian border. We also investigate recent effective population size changes across regions of Finland and find evidence supporting the distinction between early and late settlement areas. However, our results indicate a more continuous flow of migration than previously posited, with a minimum N e occurring ~12 generations ago in the northernmost Lapland region and moving further back in time to the south, with a bottleneck detectable in the early settlement area ~40 generations ago. Lastly, we leverage IBD sharing for genetic disease mapping and show that rare, functional haplotypes show more significant association via IBD mapping than single variants with linear mixed effect models.

Y-chromosomal composition of mediaeval and contemporary populations in Norway and adjacent Scandinavian countries: Y-STR haplotypes and the rare Y-haplogroup Q. 
B. Berger 1, S. Willuweit 2, H. Niederstätter 1, P. Kralj 1, L. Roewer 2, W. Parson 1,3. 1) Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; 2) Department of Forensic Genetics, Institute of Legal Medicine and Forensic Sciences, Charité-Universitätsmedizin, Berlin, 13353, Germany; 3) Forensic Science Program, The Pennsylvania State University, PA, USA.

In the framework of the project “Immigration and mobility in mediaeval and post-mediaeval Norway” molecular genetic analyses were performed on 97 pre-modern human remains including genetic sexing and Y-chromosomal DNA typing. All samples were subjected to molecular genetic analyses of the sex using “Genderplex” consisting of two diff erent regions of the amelogenin gene, SRY and four X-STR loci. From 90% of the extracted remains (n=87) sex assignment was possible. Of these, 49 (56.3%) brought a genetically male result. All of these DNA extracts were subjected to Y-STR analysis using Yfiler Plus PCR Amplification Kit (Thermo Fisher Scientifi c) and/or PowerPlex Y23 System (Promega). At least partial Y-STR profiles were obtained from all samples. A detailed comparison between mediaeval/post-mediaeval and contemporary Y-chromosomes was performed by searching the obtained haplotypes (HTs) in the Y Chromosome Haplotype Reference Database (YHRD: comprising 154,329 haplotypes from 991 populations in 129 countries at the time of query (Release 50). YHRD searches of the pre-modern haplotypes yielded full matches plus neighbor-matches differring at only one allele from the query HT. Matches are presented with geographical and ancestry information of the contemporary HTs. For samples without direct YHRD-matches, this information is provided through their neighbor HTs. AMOVA was performed using the YHRD online tool on pairwise R ST values to create the corresponding MDS plots. The pre-modern HTs were grouped according to medieval and post-medieval origin and compared to contemporary populations from Scandinavian (Norwegian, Swedish and Danish), Northwest European, and Northeast European populations. Both pre-modern populations showed small genetic distances to contemporary Scandinavians and larger distances to Northeast Europeans with Northwest European populations in between. As expected, an initial assessment of the Y-chromosomal haplogroups (HGs) showed that most of the samples were attributable to the main European HGs I1, R1a and R1b. However, one of the HTs seemed to be associated with HG-Q which is rare in Europe and hitherto little evaluated in this region. Network analysis was applied for detecting similar HTs in contemporary samples from Norway and adjacent Northern European countries stored in the YHRD. The outcomes of this survey should initiate a detailed SNP based HG-assessment of HG-Q candidate samples.

Evidence for detailed historical European population structure from large-scale, diverse genetic polymorphism data.
P. Carbonetto 1, J. Byrnes 1, J.M. Granka 1, Y. Wang 1, K. Noto 1, E. Han 1, A.R. Kermany 1, K.A. Rand 1, E. Elyashiv 1, H. Guturu 1, N.M. Myres 2, E.L. Hong 1, R.E. Curtis 2, K.G. Chahine 2, C.A. Ball 1. 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

Despite the recent surge of interest in ancient genomes, we show that there is still much to be elucidated about human demography from contemporary genomes. Here, we demonstrate the use of genealogical data to generate demographic insights from analysis of a large-scale, heterogeneous genetic data set. Specifically, we show that an unsupervised ADMIXTURE analysis of genotypes from 131,293 primarily US-born individuals, followed by a simple statistical analysis of the 3 million pedigree records linked to these genotype samples, yields novel insights into European genetic diversity. In contrast to principal component analysis (PCA), which is the most widely used approach to investigating European genetic diversity, we use ADMIXTURE to infer genetically differentiated source populations reflecting more distant historical time periods. Unsurprisingly, among European-origin individuals, admixture is pervasive. Despite this, our ADMIXTURE analysis with K = 12 ancestral populations identifies 5 stable, genetically differentiated groups within Europe (with putative historical counterparts in parentheses): Ashkenazi Jewish, Irish (Celts), Eastern Europeans (Slavs), Scandinavians (Nordics) and Iberians, featuring Basques and Sardinians. The genealogical data also allow us to provide a detailed portrait of the genetic composition of contemporary peoples across North America (e.g., Iberians in Cuba), and other parts of the world. This work suggests the potential for drawing more detailed connections between present-day and ancient genetic variation by leveraging large, heterogeneous genetic data sets.

Genomic insights into the population structure and history of the Irish Travellers.
E.H. Gilbert 1, S. Carmi 2, S. Ennis 3, J.F. Wilson 4,5, G.L. Cavalleri 1. 1) Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Leinster, Ireland; 2) Braun School of Public Health, The Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel; 3) School of Medicine and Medical Science, University College Dublin, Dublin, Ireland; 4) Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh, Scotland; 5) MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, Scotland.

Aims: The Irish Travellers are a population with a history of nomadism. Consanguineous unions are common, and as a population they are socially and genetically isolated from the surrounding, “settled” Irish population. A previous low-resolution genetic analysis suggested a common Irish origin between the settled and the Traveller populations. What is not known, however, is the extent of population structure within the Irish Traveller population, the time of divergence from the general Irish population, and the extent of autozygosity.
Methods: We recruited Irish Travellers from across Ireland and the UK. To be included a participant had to have had at least three grandparents with a surname associated with the Irish Travellers. DNA was extracted from saliva samples, and genotypes were generated using the Illumina OmniExpress SNP genotyping platform. With this data, we investigated population structure using fi neStructure, quantifi ed the levels of autozygosity with PLINK, and estimated a time of divergence using a method based on Identity by Descent (IBD) segment identification.
Results: We merged, cleaned, and analysed data from 42 Irish Travellers, 2232 settled Irish, 2039 British, 143 Roma Gypsies, and 931 individuals from 57 world-wide populations. We confirm an Irish origin for the Irish Travellers, demonstrate evidence for population substructure within the population, confirm high levels of autozygosity consistent with a consanguineous population, and for the first time provide estimates for a date of divergence between the Irish Travellers and settled Irish.
Conclusion: Our findings have implications for disease mapping within Ireland, as well as on the social history of the Irish Traveller population.

Personal ancestry inference at the finest scale reveals more sub-structure in the UK.
D. Lawson, G. Weyenburg. Integrative Epidemiology Unit, University of Bristol, Bristol, UK, United Kingdom.

Chromosome Painting has revealed genetic differences within the UK at a very fi ne scale [1], with structured genetic variation within a single county in some cases (such as Cornwall & South Wales). However, in that work, it was not possible to genetically distinguish much of England, which appeared as a single homogeneous group. Here, we describe an extension to the Fine-STRUCTURE [2] clustering that can further distinguish ancestry even within England; for example, identifying regions such as Norfolk, the Midlands and the South as genetically distinct. The approach works by using the known county locations to craft genetic features to use in unsupervised clustering. Specifically, we group individuals by their geographic sampling location into reference donor populations. This forms an ancestry profile - which can be viewed as a careful choice of feature vector - that still allows unsupervised genetic clustering for all individuals. Further, we describe how this approach allows individuals to be described as an admixture of the inferred geographical clusters. This allows ancestral information to be recovered for individuals who are not purely represented by a single geographical location. This also allows us to characterise the genetic relationship between the inferred clusters, several of which represent drift that is most strongly represented by a particular geographical region (including Cornwall, Wales, Scotland and the North of England) and others of which represent characteristic admixture proportions between these ancestral drifted populations. Beyond improving resolution, this approach facilitates personal genomics because individuals can be represented in terms of the fixed reference panel. We demonstrate the utility of the approach by describing the ancestry of the UK10K participants in terms of the new, high resolution POBI clusters. Previously, a similar analysis [3] without geographical information inferred little population structure in the UK from these samples, but now we have a rich representation of their population structure, including an assessment of admixture from outside the UK. This highlights the value in high quality fine-scale geographic sampling, which could now facilitate this level of ancestry identification for many other countries.

[1] Leslie et al 2015, Nature 519:309–314 [2] Lawson et al 2012, PLoS Genet. 8:e1002453 [3] UK10K Consortium 2015, Nature 526:82-90.

Chromosome painting for arbitrary sample collections.
G. Weyenberg, D. Lawson. Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom.

Haplotype-based methods have been demonstrated to be capable of detecting fine scale structure within human populations—to the point of distinguishing genetic variation at the sub-county level in the South West of England [1]. However, the aforementioned method implements an all-against-all analysis of sampled individuals, which is not suited to all applications, including personal genomics where samples are obtained individually or in small batches. Here, we describe an extension of the FineSTRUCTURE [2] method to allow for painting of individual samples against a panel of pre-calculated reference haplotype clusters, making the method computationally feasible for on-demand analysis of individuals. The choice of the reference panel also allows the user to tailor the analysis to emphasise targeted features of the data. For example, in the context of a personal ancestry imputation, panels may be constructed to focus on global-, continental-, or national-scale genetic features, and the low computational cost of painting an individual against a pre-computed panel makes sample-level exploratory analysis feasible. Another application of the panel-based painting is to use high-quality reference data to impute unknown geographical labels to samples where such information is either unavailable, or was collected at an undesirable resolution. To demonstrate the latter application, we analysed several populations with suspected Northern-European ancestry—including the Hapmap CEU and ASW populations, and the UK10K dataset—with respect to panels of Europeans and the high-resolution People of the British Isles (POBI) samples. These individuals are characterised in terms of an admixture of inferred clusters in the reference populations. Whilst many individuals were best described as a complex admixture that likely occurred over many generations, many others had a clear signal of geographically distinct ancestry.

[1] Leslie et al 2015, Nature 519:309–314 [2] Lawson et al 2012, PLoS Genet. 8:e1002453.

Local ancestry patterns inferred from one million genomes recapitulate fine-scale population history.
Y. Wang 1, K. Noto 1, J. Byrnes 1, R.E. Curtis 2, E. Han 1, E. Eyal 1, G. Harendra 1, P. Carbonetto 1, A.R Kermany 1, J.M. Granka 1, K.A. Rand 1, N.M. Natalie 2, E.L. Hong 1, C.A. Ball 1, K.G. Chahine 2 . 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

In a country of immigrants, population structure is shaped by a long, ongoing history of immigration, followed by subsequent admixture and migration. All these events have left their footprints in the genomic landscape of current residents and make it possible for geneticists to reconstruct population history from genomic data. However, deciphering the signature of these forces requires accurate inference of genomic tracts that one individual inherits from ancestors of different origins. Previously, several methods have been developed for inferring local ancestry with varying levels of success. Unfortunately, none of these methods can be feasibly applied to a data set of one million genomes. Recently, our team presented Polly, a novel algorithm for estimating genome-wide ancestry proportions in admixed samples. Polly, built on a modified version of the BEAGLE haplotype model, relies on this model to achieve two things: First, to account for phasing uncertainly, and second, to provide a measure of distance between a query haplotype and a reference haplotype. Using haplotype models learned from hundreds of thousands of haplotypes and subsequently annotated with over eight thousand single-origin reference individuals, Polly performs ultra-fast inference of both global and local ancestry. In this study, we evaluate Polly's accuracy in predicting local ancestry using simulated admixed samples with known genomic composition. We assess the assignment accuracy, the switching pattern and the tract length distribution. Using cross-validation experiment, we confirm that Polly makes highly accurate local ancestry estimates even at the subcontinental level. We further use Polly to analyze one million genomes from the United States and discover distinct local ancestry patterns among different ethnic groups and communities, especially among African Americans and Latino Americans. We map local ancestry estimates to individuals’ geographic locations. Our results illustrate clear population structure arising from immigration routes, assortative mating and isolation by distance. We also find evidence that supports large scale domestic migration events, as exemplified by the Great Migration of African Americans following the abolition of slavery. Finally, we attempt to date known historical events from ancestry tract length distributions. Overall, our analysis demonstrates the power of combining local ancestry analysis with big data in studying fine-scale population history. 

Thursday, 13 October 2016

The DNA ancestry craze from The Stream

DNA testing was featured this week in a two-part TV programme on The Stream, an online social-media-oriented TV show hosted by the Al Jazeera media network. The guests on these programmes were: Julie Granka, a population geneticist with AncestryDNA, Joseph Graves, an Evolutionary Biologist at North Carolina A&T State University, and Kevin Jones Giddins, an AncestryDNA user who had found his mother through DNA testing. Questions were submitted via social media and there was a very good and balanced debate on the pros and cons of DNA testing, with a particularly valuable contribution from Professor Joseph Graves. He provided some interesting insights into the methodology used for admixture tests  the reports received from the testing companies that give you the percentages of DNA that you share with different populations. These tests cannot provide the granularity that the companies claim. They can distinguish between populations at the continental level but cannot generally provide meaningful breakdowns at the country level in Europe, Africa or elsewhere. (This might change in future. The new test from Living DNA provides regional breakdowns within the British Isles though we have not yet seen any customer results.)

The programmes mostly focused on results from AncestryDNA. It would have been helpful if they could have shown results from 23andMe and Family Tree DNA as well, and asked scientists from these two companies to contribute to the debate.

There was one matter of concern when one of the presenters, Malika, showed some results that her father had received from a company called African Ancestry back in 2008. You could see from the screenshots that her father had only taken very low-resolution tests (an HVR1 mtDNA test and a 9-marker Y-DNA test) yet the family were told by the company that they were descended from specific tribes in Chad, Cameroon and Ghana. When so few markers are used for a Y-DNA or mtDNA test, the genetic signature could potentially be shared with thousands of people from many different countries, and it is simply not possible to infer an origin in a specific country. Better resolution can be obtained from a full mtDNA sequence test or a comprehensive Y-chromosome sequencing test, but even then it's not generally possible to pinpoint a specific country of origin. An additional problem is that most of the people in the company databases are Americans and Europeans, and there simply aren't enough reference populations from Africa to make such inferences. Unfortunately these Y-DNA and mtDNA results were glossed over in the programme and the limitations of the African Ancestry tests weren't discussed.

However, these are minor quibbles in two otherwise very interesting programmes. It was also good for once to see more of an emphasis on using DNA testing for genealogical matches rather than a mistaken attempt to discover "where you're from". Each programme lasts for just 25 minutes, and they are both well worth watching. Part 1 covers The DNA ancestry craze and Part 2 looked at The science and security of DNA testing. You can click on the videos below to watch the programmes.

Tuesday, 27 September 2016

Living DNA – a new genetic ancestry test providing comparisons with the People of the British Isles dataset

Over five years ago I attended a one-day conference in Cardiff in Wales on Ancient Britons, Europe and Wales. At the conference Professor Sir Walter Bodmer presented the first results from the People of the British Isles (POBI) project which were hot off the press having only been analysed three weeks previously. The audience were completely blown away by the results. For the first time researchers had been able to detect regional differences between the people of Britain based on their DNA. Sir Walter shared a map with us and you could even see that some counties, such as Devon and Cornwall, stood out as distinct regions in their own right.

I don't have a photograph of the map that we saw that day in Cardiff but it was probably very little changed from the map that appeared in the published paper in Nature in 2015 and which you can see here on the Wellcome Trust website. I remember thinking at the time that this research opened up the tantalising possibility of being able to receive an admixture report from a genetic ancestry company which would allow you to compare your results with the POBI data and see how much of your ancestry came from Devon or Cornwall and other regions of Britain.

I am very pleased to report that that day has now arrived with the launch of a new DNA test from a company called Living DNA! This is the first genetic ancestry test to incorporate data from the POBI Project and to give customers the percentages of DNA that they share with people from different regions in Britain.

Disclosure. I should declare here that I have been involved in informal discussions with DNA Worldwide over the development of their product. I have not receive any payment but, in return for my advice, I will receive a free Living DNA account. The company have had access to my raw data files from AncestryDNA, 23andMe, Family Tree DNA for testing purposes. Yesterday I had a Skype briefing about the new test from David Nicholson of Living DNA. I also received a preview of my own results, though I don't yet have access to my online account. There is still some tweaking going on behind the scenes and the results I've received are likely to be changed, so I will report on my results once I have my account.

The original POBI map, which formed the centrepiece of the Nature paper, featured 17 regions. For the new Living DNA test the POBI data has been re-analysed using improved methods and the data has been clustered into 21 areas across the British Isles.

The Living DNA test also includes a standard ancestry report providing admixture percentages. This analysis is based on 80 worldwide regions. (Note that the test does not currently report Jewish or Aboriginal ancestry though hopefully these reference populations will be added in the future.)

A unique feature of the test is the "back in time" map which allows you to see how your DNA matches with populations across a range of historical dates. Historical population statistics have been used for the mapping. All the existing tests on the market are trying to pick up on the small and very recent differences in our DNA but this means that the bigger picture of our universal relatedness gets lost. Recent studies have shown that you only have to go back about 5000 years or so before we reach a point where we all share the same ancestors. I hope that this new test will go some way towards educating people about our common ancestry and the knowledge that there is no such thing as "race".

Here are some sample reports which have been provided by the company. The first map shows "the areas of the world where you share genetic ancestry in recent times" at the continental level.

This second map provides regional resolution.

On the regional map you can click through and see how your ancestry is broken down within the different regions. If you have ancestry from Britain you will be able to see your shared percentages of DNA in comparison with the local populations. The test has also been built so that people with mixed ancestry from other countries get more detail than ever before.
Here is the family ancestry visualisation.
The Living DNA test is currently available as a new standalone test but they are also planning to accept imported data from other companies for a small fee. It is hoped that this feature will be available by the end of the year. I understand that GedMatch are preparing to accept uploads of raw data from Living DNA. I hope that Family Tree DNA will also be able to accept the raw data as this test is likely to attract a lot of interest and we want to ensure that people who are interested in using their results for genealogical matching are able to do so.

The Living DNA test was offered to the public for the first time last weekend at the New Scientist Live show at the ExCeL arena in London. The test can be ordered from the website at and the kits will be shipped in mid-October.

Living DNA is a partnership between the British company DNA Worldwide and the European company Eurofins. Here are the company details from the website:
Living DNA is a collaboration of over 100 world-leading scientists, academic researchers and genetic experts. The team is led by DNA Worldwide Group, a DNA testing company, whose services are used by every court in the UK. The company is run by David Nicholson and Hannah Morden who saw an opportunity to show humanity that we are all made up of all of us, dissolving the concept of race. It was launched in 2016 after two years of intensive development but its parent company DNA Worldwide Group has been operating since 2004, and employs over 35 people from its head office in Somerset, UK.
Although Living DNA is a for profit company they also intend to use their test for educational purposes to show how we are all connected to each other. They are working with an organisation called Show Racism the Red Card, a UK charity which works to combat racism. The company are working with a school in East London and ten schools around Europe. They also have PhD students working with them on different projects.

The following video provides further information about the company and the philosophy behind the test.

Who are Living DNA from Living DNA on Vimeo.

Technical details
Here are some of the technical details which I have gleaned from the website or which have been given to me by David Nicholson during my Skype briefing:
  • The tests are being done on a new custom Illumina Global Screening Array chip. Living DNA are the first company in the world to have access to this chip. This chip is a replacement for the Illumina OmniExpress chip which is in the process of being phased out. The new chip has been designed specifically for imputation and achieves a very high accuracy rate of 99.9%. (The datasheet for the Illumina chip can be found here.)
  • Medical SNPs are included on the chip and there is considerable overlap with the 23andMe chip. (I note from the Illumina press release that 23andMe have signed up to the Global Screening Array chip and I wonder if a v5 23andMe test might be in the pipeline.)
  • The following SNPs are on the Living DNA chip:
638,000 autosomal SNPs
17,800 X-chromosome SNPs
22,500 Y-SNPs
4,700 mtDNA SNPs
  • The Y-DNA SNPs have been selected from the 1000 Genomes Project and research papers, and checked and reviewed on the ISOGG SNP tree, The Y-SNP test has been developed in collaboration with Professor Mark Jobling and Pille Hallast at the University of Leicester.
  • The mtDNA SNPs have been selected by a team of scientists at Innsbruck University in Austria. They have used Phylotree and the latest tree builds to select the most useful mtDNA SNPs.
  • The autosomal component of the test has been developed by Daniel Lawson, Grady Weinberg, Daniel Falush, Garrett Hellenthal and Simon Myers. Lawson, Falush, Hellenthal and Myers were the team who developed the fineSTRUCTURE program used for the analysis of the POBI data and they were all authors on the POBI paper. The Living DNA test uses a refined version of that program upscaled for commercial use. fineSTRUCTURE is a haplotype-based method which takes advantage of the fact that SNPs travel together, and provides a more detailed analysis.
  • The autosomal DNA data is phased prior to processing.
  • White papers will be published soon with more information.
  • It will be possible to share results on the platform with friends and family and to manage multiple kits from a single account.
  • The test is done using a mouth swab. There is no liquid involved. This method has been chosen to facilitate international shipping. 
  • The developers have worked hard to ensure that the new chip is backwards compatible with other chips so that data can be accepted from all companies. The intention is to make the transfer process as easy as possible, preferably by interfacing with company APIs where available.
  • Customers will be able to download their raw data. Raw data will be imputable so that it is backwards compatible with older chips.
  • The test is available worldwide but prices will vary from country to country depending on local taxes and shipping rates. Negotiating global shipping rates and regulations is an extremely complicated process and prices are subject to change. There will be no hidden extras and the cost of shipping (including return shipping) will be included in the advertised price. The website currently shows prices for the UK, USA, Canada, Australia and Europe, and a standardised international price for other countries.
  • Participants will have the opportunity to opt into academic research studies.
Future plans
  • It is planned to add in more data as it becomes available. It is hoped in particular to get some samples from Ireland.
  • Updated ancestry reports will be provided free of charge and there will be no subscription fees to receive the updates.
  • There are plans to incorporate ancient DNA data.
  • A chromosome browser view is in the pipeline.
  • There is a long feature list in development, and the company hope to receive feedback from the community and will develop the tools that people want.
  • There is a major focus on privacy.  The customer decides what happens to their data.
  • All the processing is done within the European Union.
  • Data will not be sold on to pharmaceutical companies.
  • Care has been taken to ensure that people understand the implications of testing.
  • Children under the age of 18 can take the test but will be required to re-consent once they reach the age of 18.
  • There will be an option to nominate a beneficiary.
  • For full details read the company's page on Privacy and Security.
This is a very exciting new test. It is a very pleasant change to have a test that specifically caters for the British and European market. The admixture tests currently provided by the big three genetic genealogy companies are primarily serving the very different needs of the American market. Access to the POBI dataset is what we've all been waiting for and I'm sure that everyone is going to be keen to see how their results compare. I'm also particularly pleased to see the back in time map which I hope will be a useful educational tool and will help to improve our understanding of our shared human ancestry. This is a very welcome addition to the genetic ancestry marketplace, and I look forward to seeing some interesting new developments in the future.

Update 10 October 2016
The Living DNA team are hoping to improve their Irish reference panel. To this end they would like to recruit people with four grandparents born within 80 km (50 miles) of each other in Ireland. If you have already tested with 23andMe, Family Tree DNA or AncestryDNA you can transfer your results free of charge (23andMe data is preferred if available). If not, you can benefit from a discounted rate for a new test. Here is the link to transfer your results:

Further reading
- See Chris Paton's blog for the official press release.
- For further commentary see John Reid's blog post More on the Living DNA genetic ancestry test and plans.
- Razib Khan provides some insights on admixture in his blog post on Scale of time and space in admixture.

© 2016 Debbie Kennett