Sunday, September 28, 2014

I Think I Understand the AncestryDNA Methodology Now?? i4GG

Some of my own SNPs
One letter comes from Mom and one from Dad in a random order

I watched the i4GG video "AncestryDNA matching: large-scale findings and technology breakthroughs". I've been curious and confused about the methodology used by AncestryDNA. From the start their autosomal testing process has been a mysterious and secretive process, which has given rise to suspicions. They wouldn't release raw data to customers in the beginning. Many people felt they were hiding the fact that most matches were spurious. The fact they still don't have anything like a chromosome browser still leaves us wondering about the validity of the results? On the other hand the fact they phase their results should lead to better, more confident matches than the other companies. The phasing process hadn't been completely clear to me until I listened to Dr. Julie Granka's presentation. She explained the process in greater detail. I believe understand it now?

This is my understanding of the phasing process (I never excelled in science or math in school). If anyone has a better understanding please let me know:

Dr. Julie Granka emphasized the large size of the AncestryDNA data collection, generated from over 500,000 customers, which is leading to more accurate results. The phasing process attempts to separate your results into groups representing your parents. On a position of an SNP you'll get one marker (ACGT) from your mother and one from your father. If for instance you are an AG on a position and your mother AG at the same position of an SNP, but your father was AT at that same position we can infer the G is from your father and the A from your mother.  So your genotype, the marker combinations, come from both parents. The phasing process is designed to separate your single genotype into haplotypes  you got from your mother and father. The phasing process relies on the comparison of your genotype with those of people with known haplotypes (haplotypes are just strings of markers (SNPs) shared by groups of people, ACGT's, the building blocks of DNA).  Your haplotypes are then inferred from the results of these comparisons. This process is complicated by the fact positions contain markers for which they don't know which of the two markers we got from which parent, so they cannot be read in a continuous line. There is some sort of formula for reading these scrambled marker pairs, and separating them into haplotypes for Mom and Dad.  The process can misinterpret a block of DNA as a haplotype when actually it's a mix of different markers inherited from both parents, ACGTs, that happen to look like a known haplotype. It's also possible that one of your haplotypes has not been seen before. When a mismatch occurs it throws the rest of the phasing off. So it's important to limit mismatching. Their old phasing process took 7 to 10 hours for 1000 tests, and resulted in 3 errors per 100 heterozygous sites, the new process takes 5 minutes and results in only 1 error. So the process continues to be refined. Still around half of our thousands of matches are IBS, so it's not perfect.

The haplotypes are very important in the AncestryDNA matching process. In order to be a high confidence match your match has to share a certain amount of DNA plus belong to the same haplotype on that particular segment.

Sometimes these haplotypes proliferated because they were advantageous. Dr. Granka used the example of lactose intolerance. Ancient populations were all lactose intolerant. When animals were domesticated and their milk began being used the genetic mutation which allowed milk to be drunk was an advantage. This gave that person and their descendants an advantage which allowed them to get more nourishment and reproduce at a higher rate. So we all share some of these blocks because they provided a genetic advantage.

The fact that many people share the same DNA blocks presented AncestryDNA with a problem. Do all of these people share a common ancestor in the genealogical time frame? They determined blocks shared by huge numbers of people were IBS and should not be used for matching. This led to a smaller number of matches? I still have 11,000.

Some other very interesting points:
  1. In a group of 200 people there is a 97% chance of finding a pair of 4th cousins
  2. If you can't find evidence of an ancestor in your DNA (and they are several generations removed from you) it could be you just didn't inherit any perceptible DNA from them.
  3. "Absence of evidence isn't evidence of absence."
  4. We have 120,000 7th cousins, which increase your odds of finding a match at that distance
  5. There are 30 million 4th cousin matches at AncestryDNA out of around 500,000 in the database
  6. The average person has 5  3rd cousin matches at AncestryDNA ( I don't have any. My Mom has 7)
  7. The average person has 147 4th cousin matches at AncestryDNA
  8. At 20 generations we share DNA with around 1200 of our 1 million ancestors

Thursday, September 25, 2014

"Understanding Autosomal Biogeographical Ancestry Results" I4GG

I could be doing a number of things as I wait for a Nurse visit ( for mom). I could have cleaned the house or microwaved IPhone. Instead I decided to listen to Doug McDonald's I4GG conference presentation titled "Understanding Autosomal Biogeographical Ancestry Results".

This was an excellent presentation. I followed his suggestion and analyzed and compared the chromosome painting charts, from GEDmatch, using my Mom, Aunt, and my own kits. My Aunt represents my deceased father's line. Comparing all of our results I have a better understanding of our results. Family Tree DNA showed substantial Eastern European roots for my Aunt. Looking at one of the charts I can clearly see she has more Eastern European than I do, that's probably why I didn't have any Eastern European at Family Tree DNA. What the tests confirmed is that we are mainly European. We can infer just a little more beyond that point. The companies still have a ways to  go in order to provide us with more than vague predictions.
Notes from Presentation:
  1. 50,000 to 300,000 markers are tested (should be to 700,000?)
  2. They're all right (tests) in the big picture
  3. Use 3rd party tools for analysis GEDMatch
  4. You may not inherit an exact 1/4 DNA from your Grandparents due to recombination
  5. At 6 to 10 generations back most of our ancestors lived in areas where their ancestors and relatives lived
  6. There is a 20% chance that you would, for instance, inherit DNA from a Native American ancestor who lived 12 generations ago.
  7. At 6 generation the probability is 100% that you have inherited some DNA from that Native American ancestor, but it can be hard to identify
  8. The tests go back at least 2000 years in time
  9. Too much overlapping of populations in Europe makes identification difficult
  10. Populations less than 500 years old are too mixed to provide useful data
  11. We should consider known probabilities; i.e., we should pick and choose which test results we accept based on what we already know about our ancestors
  12. Certain groups are easier to differentiate like Ashkenazi
  13. All companies use Monte Carlo method for best population fit
  14. Important to use demixed populations
  15. Use chromosome painting at GEDmatch to better understand and analyze your results
  16. 23andme and AncestryDNA are the best when it come to ethnic breakdown
  17. Should be 50/50 cut, in the painting, for most accurate results
  18. Only trust conservative at 23andme
  19. Look at where your strong matches come from
  20. Not enough data for Native Americans
  21. Don't trust the 3 big companies for African data they use the wrong chip
  22. Don't trust African Ancestors unless you are around 90% African
  23. Affymetrix  chip is only reliable chip for Africa
  24. AncestryDNA provided him with the best ethnic fit. He feels they have the best methodology
  25. His results skew French but his ancestors were from Scotland, which points to ancient continental ancestry
  26. Chromosome painting at 23andme very good
  27. "Fully sequence lots more people in lots more groups." "400 people each in 250 groups." Look for rare mutations shared by less than 2% overall, but common to a group
  28. Results should be analyzed by humans

Wednesday, September 24, 2014

DNA News: "Finding Your Roots" Premiere And X Chromosome Match Browning

I enjoyed the first episode of the new season of "Finding Your Roots" on PBS. Stephen King's experience was very much like my own. Some of his ancestors migrated to Indiana from Tennessee; just like my ancestors. Like him, I had no idea I had southern roots until some cousins shared their research with me. That information was never passed down in our family. I also found Gloria Reuben's story very interesting. Her father's ancestry was Jewish. Apparently many Jews fled Spain during the inquisition and migrated to Jamaica, where Gloria's ancestors also settled. I was very interested in her admixture results, curious to see how much Ashkenazi would show up?  I replayed her segment and could see what I believe is 12% Ashkenazi. I suppose European Jews have substantial European admixture, and not as much Middle Eastern as I expected? Also I expected to see a higher percentage of Ashkenazi in her results? I was surprised that more Ashkenazi didn't show up in my own results?
Browning family Tree mostly circumstantial evidence

Edna Kapple 2nd to 4th cousin
Browning match
I was so surprised to find a new Browning match at Family Tree DNA. This match shared a large segment on the X chromosome with Mom, but zero on the X with me. My Mom shared an 18.9 cm segment with this Browning match, and a 5 cm segment. The 18.9 segment is the largest X chromosome share I've found in our matches. My Mom shared substantially more DNA with this match than I did. My relationship prediction, with this match, was 5th cousin remote. The prediction for my Mom was 2nd to 4th cousin. This match is a 3rd cousin 1x removed to me Mom. This same person also matches us at AncestryDNA, and is predicted to be a 4th cousin there. We also have another Browning match at Family Tree DNA. This person is a more distant cousin, and didn't match our closer cousin.

I noticed a glitch in the surname search at Family Tree DNA. Our new match has Browning in their tree, but this match doesn't show up when you search on that name?

Annette Kapple Distant Remote
 5th Cousin Match Browning
My goal on our Browning line is to confirm a circumstantial lineage. It's believed that Roger Browning of Greene County, Tennessee is the same Roger Browning mentioned in Benjamin Browning's estate records. Benjamin died in Maryland. I haven't found any source material verifying the fact Roger of Tennessee migrated from Maryland?  Unless a document surfaces we will need to support this inference with DNA testing.

My Mom and I have a remote cousin match with a descendant of Francis Browning and Rachel Marriott who were supposed to Grandparents of Benjamin Browning. My Mom shares a 16 cm segment with this match. This would be a 7th cousin to my Mom. At AncestryDNA we have a moderate match with a descendant of Benjamin's parents Edward Browning and Elizabeth.

List of Browning matches
  1. Match through Roger Browning and his Daughter Malinda. AncestryDNA very low confidence for me and moderate for my Mom.
  2. The match I talked about above is through Richard W. Browning and Obedience McPike, and son William Jennings Browning. AncestryDNA. Predicted 4th to 6th cousin to my Mom, and 5th cousin remote to me. Actually 3rd cousin 1x removed to my Mom. Family Tree DNA.
  3. Match through Roger Browning and Mary, and their son  Amzi. Down through Amzi's daughter Emma. At AncestryDNA.
  4. We have a match through Richard Washington Browning, my great-great grandfather and his son William Jennings Browning at AncestryDNA. Predicted 4th cousin, but actually 2nd cousin twice removed to my Mom.
  5. A possible match down the Francis Browning and Rachel Marriott line through son John. This tree is quite mixed up. This is a predicted 4th- 6th cousin match to my Mom, which would actually be 7th at AncestryDNA.
  6. We have another match through Richard Washington Browning, my great-great grandfather and his son William Jennings Browning at AncestryDNA. Predicted 4th- 6th cousin, but actually 3rd cousin twice removed to my Mom.
  7. Match through Edward Browning and Elizabeth  moderate match. AncestryDNA.
  8. Match through Francis Browning and son John. Ancestry DNA. Low confidence to Mom.
  9. Match through Francis Browning and daughter Catherine. Ancestry DNA. Low confidence to Mom.
  10. Another match through Nathan Browning and Obedience McPike and son William. Ancestry DNA. Very low confidence to Mom. Actually 3rd cousin 2x removed.
  11. Another match through Francis Browning and Rachel Marriott through Catherine. Ancestry match. Very low confidence to Mom.
  12. Yet another match through Francis Browning and Rachel Marriott through Catherine. Ancestry match. Very low confidence to Mom.
  13. A match through Francis and Rachel and son John. AncestryDNA. Very low confidence to Mom.
  14. Another match Edward Browning and Elizabeth through son Nathan. AncestryDNA. Very low confidence to Mom.
  15. Match through Nathan Browning and Obedience McPike and their daughter Mary (Polly). AncestryDNA match very low confidence. Actual relationship 3rd cousin 2x removed to Mom.
  16. Another match through Nathan Browning and Obedience McPike through daughter Elizabeth. AncestryDNA very low confidence match. Actually relationship 3rd cousin 2x remove to Mom.
  17. Another match through Roger Browning and his daughter Melinda's line. AncestryDNA. Very low confidence to Mom.
  18. A match through Francis Browning and Rachel Marriott through their daughter Catherine. At Family Tree DNA 4th cousin remote to Mom. Would actually be 7th cousin.
If anyone reading this matches us on the Browning line at AncestryDNA please let me know if you'd like to compare at Gedmatch? If I could get our Browning matches to compare at Gedmatch, and we can compare notes, maybe we can confirm our circumstantial lineage. I'm finding that some of the Brownings, among my matches, settled in Culpepper County, Virginia. I'll have to see what I can find in their records.

Something else that struck me is how cousin removals affect the amount of shared DNA. My Mom has many cousins in the 3rd and 4th cousin range who are removed from her by 1 to 3 generations. Most of these cousins are very low confidence matches, although some are more confident. I'll have to examine a chart showing the amount of DNA generally shared by these cousins. I wish I could see how much DNA we share, and where it is on the chromosomes at AncestryDNA.

Wednesday, September 10, 2014

DNA News: The Awkward New Tree At Family Tree DNA

The new Family Tree DNA pedigree chart was unveiled yesterday. I attended the introductory webinar. Looking at the tree via the webinar I couldn't tell how difficult the tree was to navigate. After the webinar I tried it out immediately and had difficulty navigating my large tree. I had to do a great deal of screen dragging to see everyone. I tried making the tree smaller which helped, but when I got to the best view the names were too small. Also when I resized the tree I would sometimes lose my place completely.  I didn't like the old tree much better. In the past I used the Gedcom DNA site to download gedcoms and I would view them in my family tree software, which provided me with the best pedigree chart for review. Apparently this feature has been disabled at the Gedcom website.

I hate the bottom up layout of the Family Tree pedigree charts. The top down old layout was a little better, but I prefer the left to right layout.

There are some positive features. I was able to search a match's tree for a surname, which helped me find it without having to drag the screen. I knew a match had the name Browning in their family tree, but the name was farther back in the tree than could be viewed with the old setup. The new tree displays more generations. A definite positive.

You can drag and drop matches on to your tree from a list on the left of the screen. I've attached my Mom and Aunt as matches on my tree. I was going to build out the tree and attach more matches, but I could not because all of my dozen or so positive matches are out past the Family View generations, which can be displayed.

What I do miss regarding the old tree is a more compact screen view, and the ability to shine the mouse over a name to see more information without clicking on the name.

It would be great if Family Tree DNA could partner with one of the genealogy software companies to create an outstanding, user friendly tree. The best DNA related tree layout is at AncestryDNA. I would, however, like to see details when I shine my mouse over a name . The layout at 23andme is OK, but doesn't display enough generations on one screen. So my ideal tree would be the left to right layout, with the maxim number of generations on one screen, and the ability to see more detail when you shine your mouse over a name. I also prefer scrolling to dragging the screen.
AncestryDNA Tree

23andme Tree