Friday, February 26, 2016

Comparing Match cMs At Different Sites

After a discussion at ISOGG Facebook I decided compare the data from matches who have results in multiple places including AncestryDNA, Family Tree DNA, 23andMe, and GEDmatch. I copied all my mother's match names from these site. I then sorted the names alphabetically. I found it was impossible to compare with AncestryDNA testers because most do not use their first and last names. Because so few testers use first and last names I was not able to use this method to find testers who were also in the other databases. It would be too time consuming to pick out those using their own first and last names. So I decided to do a more scaled down comparison using known cousins who have results in multiple places.

My results demonstrate that segment cM's are generally close to the same when comparing at Family Tree DNA, 23andMe, and GEDmatch. I did find a case where a segment cM's were 10 cM's apart between Family Tree DNA and GEDmatch. SNP totals at GEDmatch are often lower. Now I know to turn down the SNP totals when comparing at GEDmatch. I'll use 500 SNPs now.


Since AncestryDNA doesn't share their segment information I couldn't compare using segment totals. Instead I compared with cM totals. I didn't use segments under 7 cM's in the Family Tree DNA totals. It looks like GEDmatch always has the highest total cMs.  Ancestry always has the lowest. The average difference between Ancestry DNA and the other sites is 17 cM's. AncestryDNA phases and filters matches raw results, which is the reason for the differences in total cMs

Most of these matches are predicted in about the same cousin range at Ancestry and the other sites. The problem can be seen in my first chart. 23andMe, Family Tree DNA, and GEDmatch all show the person in chart one line 1 as a match. This person did test at Ancestry isn't a match with my mother there, even though she is a confirmed 4th cousin. I hadn't noticed until putting this together. I'm noticing more matches at the other sites who don't match at Ancestry. I have at least 5 confirmed cousins who did match at Ancestry, but don't now. Likely because of Timber. I'm not seeing this when looking at matches elsewhere. I'm sure some don't match at Family Tree DNA, but match elsewhere because of the 20 cM requirement. I have not encountered that because 1 cM segments are included.

Someone said if the results are different between sites what difference does it make? Ok, if each company has slightly different ranges but come up with the same matches then there isn't any problem. If confirmed cousin matches are being lost than I believe the companies should be rethinking their testing and matching procedures. Third cousins, and more distant cousins, are the ones affected by unreliable matching techniques. If a match shares only once segment they are more likely to be disappear as a match with additional processing.


Putting this together I have found more difficulties working with AncestryDNA than the others sites.
  1. Ancestry doesn't allow you to download matches or their cM numbers (I used the chrome extension. Doesn't include cMs). 23andMe and Family Tree DNA allow you to download spreadsheets.
  2. Ancestry should encourage testers to provide full names if they want to participate in sharing with other testers. I understand why some may not want to use their real names. They should use a consistent pseudonym, and use it everywhere, if they want to collaborate.
  3. It would be nice if we could filter matches by total cMs.  
  4. It would be nice if we could search by username.

4 comments:

Jason Lee said...

What we need most at Ancestry is start and stop points for matching segments.

Without specific information, I wonder how we will pass down our genetic genealogy research to future generations..."AncestryDNA said so"?

bptakoma said...

What are your thoughts on including gedmatch code within name fields at Ancestry? Is that too great a privacy risk? What we need is a universal identifyer to use across tools, for those who are interested in sharing information.

Sue Griffith said...

Definitely agree that not having segment details is huge. Regarding your list of 4 difficulties:
1. and 3. If you use DNDGedcom.com's Client tool ($10 per month and I think you can sign up for this intermittently), this downloads the cM too in a spreadsheet, so you can sort on that.
2. Unfortunately you don't have the option to use full names unless you are the test administrator – only initials will show up. I think this is the main reason why SO many in the match list only have initials, with the kit administrator's name/username in parentheses.
4.The Chrome AncestryDNA Helper extension will sort on username, but you may have to have run the scan for this to catch everyone. And this also finds the username of the individuals with only initials. However, I'm not sure what happens if the administrator has opted to show their full name for their kit whether their username will also show up.
It's very disappointing that we are forced to use so many workarounds with AncestryDNA.

Sue Griffith said...

Sorry, a typo:
4. The Chrome AncestryDNA Helper extension will SEARCH FOR username ....