Sunday 14 January 2018

A chromosome browser and a new matching algorithm at MyHeritage

There was a big update at MyHeritage on Thursday this week. They rolled out their updated matching algorithms and also introduced a new chromosome browser feature. MyHeritage have written an excellent blog post which explains the changes in more detail and also provides a good overview of the technicalities of DNA matching written in easy-to-understand language. You can read the article here:
All MyHeritage customers are currently automatically opted in to DNA matching. If, for any reason, you do not want to be notified of matches you can opt out in the My Privacy DNA settings.
I previously had 49 matches at MyHeritage. The new algorithms have allowed them to drop the threshold and report more distant matches. I now have a grand total of 1474 matches. Before the changeover I found that 72% of my matches did not match either of my parents. Previously I had to go through all my matches one by one and check whether or not they matched my parents. Now, if I click on my matches with my mum and dad, I can see the tally of the matches along with a list of all the matches I share with them. I now share 530 matches with my dad and 473 with my mum. This means that 1003 of my 1474 matches (68%) match my parents. The mismatch rate has been reduced to 32% which is a huge improvement. MyHeritage announced at the end of December that they had tested 1.08 million people so the number of matches is much more in line with what we might expect from such a large database. MyHeritage advised in November that the majority of their customers were in the US but that "sales in Europe are strong".

There are some useful filters which can be used to sort your matches. Currently you can view matches that have family trees, shared surnames and Smart Matches.

I found that 1,255 of my 1474 matches (85%) have uploaded trees. However, no indication is given of the completeness of the trees, and I've noticed that some of the trees only contain a single person.

Two hundred and thirty-one of my matches have shared ancestral surnames. On a brief perusal, many of these are common surnames like Johnson and Williams, and the people I match with these surnames seem to be mostly in America and will likely have no connection with Berkshire or Devon where my ancestors with these surnames are to be found. I would suggest it's best to focus on shared matches with rarer surnames.

I like the way that MyHeritage displays country flags as this makes it much easier to identify people in the countries where you are mostly likely to find recent genetic cousins. Even better, it is possible to filter matches by country, as well as searching for matches by surname and full name. The menu can be found on your DNA Matches page.


Note that the country search box will only accept a single word so if you are searching for matches from Great Britain simply enter the word "Great". Similarly if you're trying to locate matches from New Zealand search for the word "New". I currently have 123 matches from Great Britain, 12 matches from Ireland, 62 matches from Australia, 16 matches from New Zealand, 41 matches from Canada and 867 matches from the USA. Many thanks to Louise Coakley for alerting me to this filter and for the tip about searching for matches from Great Britain and New Zealand.

My Heritage have also added a chromosome browser so that you can see a visual display of your matches. You need to scroll right down to the bottom of the match page to locate the tool. Here's the chromosome browser view of my closest match from the UK.
If I click on the Advanced Options on the top right of the chromosome browser I can download the matching segment data. In this case my match shares three segments of DNA with me which are 13.07 cMs, 6.04 cMs and 6.14 cMs respectively in size.

I recognise the names of some people who match me at other companies. I've not done a proper check but my sense is that the people who match me as 3rd to 5th cousins at MyHeritage are assigned more distant relationships at Ancestry (4th to 6th cousin or 5th to 8th cousins). Given that I'm not able to make the genealogical connections with these people I suspect the AncestryDNA estimates are more appropriate.

There's also a facility to sort matches by shared DNA, largest segment, full name and most recent. Apart from my mum and dad, I currently have no matches closer than third to fifth cousin. My highest match is somebody in America who shares 0.4% (31.9 cMs with me (0.4%) spread across four segments. However, the longest segment is only 12.8 cMs. This match only shares a total of 12.8‎ cMs (0.2%) with my dad. I can see that the remaining three segments this match shares with me that are not shared with my dad are all very small (6.49, 6.03 and 6.62 cMs respectively) so I would guess that these are false positive segments.

Partnership with FTDNA
MyHeritage use the Family Tree DNA labs in Houston, Texas, for their testing. If you've tested at MyHeritage you have the option of taking advantage of the free transfer to Family Tree DNA. The link can be found at the bottom of your DNA results page.
Further details of the transfer programme can be found here.

Similarly, if you've tested at FTDNA you can transfer your results free of charge to MyHeritage using the MyHeritage Upload link. Both companies have different databases and you will find people in both databases who have not tested elsewhere. You never know where you are going to get those all-important breakthrough matches so it's best to "fish in all the ponds".

Conclusion
MyHeritage have done an excellent job overhauling their matching algorithms. It is surprisingly difficult with current technology to identify distant matches, especially when results are being combined across different platforms. I think that MyHeritage are going about the matching in the right way and they are being very responsive to the feedback provided by genetic genealogists. I am sure we will see further improvements in the months and years to come. I look forward to receiving many more matches and to confirming my first relationship at MyHeritage DNA.

Other reviews

4 comments:

Unknown said...

Thank you, Debbie, for your detailed review!

Regarding filtering by country, click on the 'ALL >' drop-down arrow at the left of the search box and select 'Country', then enter your country of interest. I've been using this filter successfully over the last year, and it is a great way to find matches from a particular country. I've found some good results by limiting matches to Australia, New Zealand, Ireland or Great Britain. One note: To find your British matches, select the Country filter and just type 'Great' and press enter. For some reason it doesn't work if you enter two words, ie. 'Great Britain', so just enter the first word. Same for New Zealand, just enter 'New'. Have fun! :)

Debbie Kennett said...

Thank you Louise, and especially for the tip about searching for GB and NZ matches. I hadn't realised that filter was there. I previously had such a high mismatch rate with my parents that I didn't think it was worth spending any time on my matches. I've now updated my blog post to include this information. If I missed the filter I'm sure others will have done so too. I've also included the stats for the number of matches from each country. Although my match list has huge numbers of Americans I'm pleased to see that I'm now getting quite a few matches from Britain, Ireland, Australia and NZ.

Philip Gammon said...

I agree that the new matching algorithm is a huge improvement. Still a few problems though. I’m managing four kits and have noticed that a large proportion of all of their DNA matches include a 20 cM matching segment adjacent to the centromere on chromosome 15. Despite the length of the segments they invariably contain 500 or fewer SNPs. I’m certain that they are just IBC as the segments don’t appear when the same pair of people are compared at FTDNA or on GEDmatch. The extra 20 cM on these matches distorts the match lists, promoting some DNA matches much higher than they should be and making their estimated relationship appear much closer. I have observed quite a few matches sharing two segments: a small one around 6 cM plus the 20 cM adjacent to the centromere on chromosome 15, giving them a total of 26 matching cM. In reality they only share the 6 cM (although that could be IBC too!) and would otherwise not appear on the match lists at all. I’m wondering if anyone else is observing these pseudo-segments on chromosome 15?

Debbie Kennett said...

Philip,

Thanks for sharing your experiences. Given that MyHeritage is now using phasing I think these segments will be identical by state and not IBC. Only a subset of IBS segments (ones where the alleles have been correctly assigned to the maternal and paternal chromosomes) will be IBD. That's particularly the case when the tests are only sampling a random selection of SNPs from across the genome rather than sequencing our genomes in their entirety. I find the term IBC confusing but it's normally used to describe false matches or pseudosegments that occur as a result of a lack of phasing.

However, if the SNP count is very small then there's a possibility of phase switch errors which could result in false positive matches. I suspect too that some of these small 6 cM segments might actually be even smaller segments that have accidentally been inferred to be a single segment. This again could be down to a low SNP count in this region or because the SNPs that would break up the segment aren't included on the chip.

If lots of people are matching on a single segment then this is usually an indication that it's not genealogically relevant. It's either a very old segment or alternatively, as you've suggested, it could be an IBS segment that probably would disappear with whole genome sequencing. It may be that MyHeritage will have to introduce a frequency filter in the same way that Ancestry have done with their Timber algorithm. These problem areas will be unique to each individual and you'll probably find that other people have these pile-ups in different places.