Tuesday, April 13, 2021

How Accurate is DNA Phasing?

I've been going over the ethnicity result phasing at 23andMe again. My ethnicity results were phased with my mother's because we both tested at 23andMe. These results are also phased even if a parent hasn't tested. Phasing without a parent is done using haplotypes to attempt to separate each chromosome into two parts, one representing each parent. Ancestry also phases results but not for ethnicity. Ancestry phases for the matching process. They use the same haplotype process, however, but don't use parents at all. 

Haplotypes are picked out using computer programs that look for strings of matching alleles in your raw data results. These strings are learned through looking at genomes of those who have tested previously. This is 23andMe's explanation, "The technical term for determining which alleles reside on the same chromosome together is phasing. DNA data like our raw data is called unphased."

Looking at our families 23andMe phased ethnicity results it becomes apparent right away that results phased with at least one parent are much more accurate. Half of the genome can be correctly matched with one parent and anything left over can be attributed to the other parent. 

The breakdown of my ethnicity estimates by parent, below, is correct (they do reflect my documentation and family knowledge. My father's ancestry is about 100% European, whereas my mother had a European father and a mixed heritage Nicaraguan mother). If a parent tests with 23andMe and you connect with them you can get a chart like the one below. 


My ethnicity results vastly improved after they were phased with my mother. I think all of the companies should use phasing. 

A 23andMe explanation of how testing at least one parent can change your results: "Connecting with a parent may also increase the resolution of your assignments. That translates into better Ancestry Composition results, in the sense that you might see more assignment to the fine-resolution ancestries: for example, more Scandinavian and less Broadly Northern European."

Here is an example of how my own results improved. The Chromosome Painting map at 23andMe showed some Native American on the X chromosome I received from my father, which is orange in the illustration below. This was before my mother tested and connected with me. 


Below is the 23andMe X chromosome after my mother tested. The color scheme for Native American has changed and is yellow. The X chromosome I received from my father is now completely British and Irish, which would be correct based on what I know from his X maternal tree. The Native American disappeared. The X chromosome I received from my mother now appears to be nearly completely Native American, which would make sense because more Spanish men originally settled my maternal grandmother's place of origin Nicaragua than women. The chromosomes I received from my mother all moved to the top line of each chromosome which always happens if you phase with a parent.  



Correct Phasing When Phased with Parent

Looking at the Chromosome Painting maps ethnicity results it becomes very apparent that it's phased correctly 99.9% of the time when a parent tests. 

Here we see my Eastern European results which are correctly solely attributed to the chromosome phased as my father's (the bottom line on each chromosome). 



Below you can see my shared segments which mostly match a 1st cousin once removed (some of the segments are also from 2nd and 3rd cousins merged together). We share Austro--Hungarian ancestry on our Kappel/Kapple line. My 1st cousin once removed maternal grandparents and my great-grandparents, were both from the same village on the Austro-Hungarian border. They were mainly ethnically Germanic and Eastern European. If you compare the segments with the ethnicity chart from 23andMe, above, you can see some of  the segments match up very well (the chart below is from Genome Mate Pro and had assigned my father's segment to the top chromosome instead of to the bottom like 23andme).


Here is another view of the 23andMe ethnicity chart showing my French and German segments. They are actually all German segments. Again they match up well between the Kappel matches and the phased ethnicity results. If you look at chromosome 13 below there is an excellent match with the chart above.


When I first DNA tested about 10 years ago I had no German or Eastern European admixture. None of my results represented my grandfather Rudolph Kapple's Austro-Hungarian ancestry; it's very nice to see him represented now. 

Phasing Errors When a Parent hasn't tested

We begin to see problems with phased results that are statistical only with no parent testing. Looking at my cousins' and mother's results who haven't had a parent test you can see phasing errors. 

Below is the 23andMe chart showing my mother's phased ethnicity Chromosome Painting results. You can see where phasing errors occurred. My mother's father was of British Isles and German ancestry with no Native American or Sub Saharan African ancestry. This has been confirmed with both documentation and DNA testing. A paternal first cousin of my mother has DNA tested and has zero Native American or Sub Saharan African. Several of my mother's paternal 1st cousins once removed have also DNA tested and have zero percentages of those admixtures also. There should be no Native American or Sub Saharan segments on my mother's father's chromosomes. Mixing of Native American and British Isles on the same chromosome such as on chromosome 4, chart below, is definitely wrong. The Native American and Sub Saharan are all from my mother's Nicaraguan mother. Nicaraguans are descendants of the Spanish, Native American, and African Slaves that settled the country. 

Below I circled the chromosomes where phasing errors apparently resulted in Native American and African segments to be placed on my mother's father's chromosomes in error. My mother's father's chromosomes can be identified by long stretches of Northern European or British Isles segments. Often my mother's mother's chromosomes are the ones on top, but a few times they have flipped to the bottom one (this has occurred because without a parent testing there is no way to tell for certain which side the chromosome represents). Only 5 chromosomes appear to have phasing errors. 


It appears that some Native American and Sub Saharan African on these chromosomes should either move up to the top or down to the bottom of the chromosome. 

Below is chromosome 4. The top chromosome definitely looks like it would have been from my Nicaraguan grandmother with a couple of stretches of Spanish DNA. So the yellowish Native American should be moved up to the top chromosome, and a chunk of blue broadly European and Spanish should move down. 

Even at 90% confidence level the bottom half of chromosome 4 has a large British Isles segment which definitely represents my mother's father, as you see represented in the blue segment below. That chromosome should be all blue with no Native American. 


I have inherited a segment of Native American DNA in the same region of chromosome 4. This confirms that place on the chromosome is Native American and isn't a false positive result. 


 . 
I don't know what is going on with chromosome 15? A Native American segment overlaps an African segment. The top segment has a long stretch of  blue which is said to be Spanish. The bottom segment has a long stretch of a slightly different shade of blue said to be British Isles. I would guess the tip of 15 is mixed African and Native American , and the African segment should fit somewhere on the top and is definitely related to my mother's mother. I'm not sure what should be on the bottom chromosome once the other segment moves up? 

My own Native American Chromosome Paint chart has the chromosomes accurately phased because my results are not statistically phased but are parentally phased. Only two tiny segments end up on my paternal side chromosomes. This could represent actual very distant Native American ancestry on my father's side? One of his ancestors was an Indian trader in Pennsylvania who did have a Native wife although I'm not sure if we are descended from her? It's also possible that these tiny Native American segments are false positives? In any case the phasing has been at least 98% accurate. My Sub Saharan segments are 100% correct and are all on my Mother's chromosome. 



Analyzing my results along with my mother's and cousins' I can see where phasing with a parent improves DNA results substantially. As more people test new haplotypes will be found which will improve phasing without a parent testing. Otherwise raw data phased for the purpose of matching cousins or ethnicity can sometimes be in error and throw off our results. I am surprised, however, at how often my mother's chromosomes were correctly phased. Only 5 out of the 23 seem to have phasing issues. 

Both my mother and I have DNA that has been placed in the broad categories. It will be interesting to see how much better the phasing becomes and how much of the DNA in the broad categories is eventually correctly identified as more people test. All of the DNA currently assigned to French and German is actually German according to our DNA matches origins. It looks like most of my French DNA is somewhere in the broad European categories? It will be interesting to see if some of the now "European" categorized DNA  is ever named as French?

AncestryDNA claims their phasing process for matching has only a 1% error rate. I'm guessing on an individual level with a mixed genome like mine the error rate would be higher than 1%. Parental and statistical phasing for ethnicity really does improve the estimates. I wish all of the DNA companies would do that. The Chromosome Painting interactive map is an outstanding feature at 23andMe. If you are ethnically mixed, like I am, naming the segments according to ethnicity can help when you compare segments of cousins using chromosome browsers. Accurate ethnicity estimates can help identify the places of origin you share with your DNA matches. GEDmatch has a feature that will allow you to search for matches on each chromosome so you can compare matches with your Chromosome Paint map at 23andMe.

I believe phasing for ethnicity is what makes 23andMe's estimates more accurate than the other companies, especially if a parent also tests. 

2 comments:

Janice M. Sellers said...

So is phasing used only for refining ethnicity results? Does it have other uses?

Annette Kapple said...

Yes Janice, you can compare the results to your matches' segments in a chromosome browser. Anyone who matches in the same place would be from the same ethnic group and the same family line. If your parents are from the same ethnic group this tool would be less useful however.