Scientist and product developer. I design algorithms and consumer experiences to help people understand their personal genetic data. I have a PhD in Biosciences from Stanford University and a master's degree in applied mathematics from the University of Michigan.
Last updated: May 17, 2026
Want to learn more about one of your ancestors by sequencing their DNA (e.g., from a hair sample)? I turns out you don't have to. We can statistically reconstruct their genome using DNA from you and your relatives. My work on ancestor reconstruction has become the Reconstructed Ancestors product feature at 23andMe, powered by my GRAMPA algorithm (Genotype Reconstruction and Ancestral Mixture Proportions in Ancestors).
Personal genomics companies like 23andMe and Ancestry.com can detect your relatives in their databases. These relatives are ususally presented in a list of relationships (e.g., "Bob Smith: 2nd Cousin"). My Bonsai algorithm allows you to see how all these people fit together in a family tree. This became the 23andMe Family Tree feature.
For over a decade, genomics companies and researchers have estimated relationships incorrectly. It was believed that the most distant relationship one could detect from genetic data was around a 5th cousin. I noticed this mistake and created a correct estimator. The new estimator shows that the majority of relationships are much more distant than previously thought. For instance, many previously predicted 6th cousins might be more like 100th cousins, sharing common ancestors with one another in Roman times, rather than in the 18th Century. These relationships now power our DNA Relatives product feature. Read my paper on the new estimator.
At 23andMe I was tasked with creating a set of public facing pages that allow customers to learn about genetic patterns linked to their surnames. Of course, a surname doesn't have much to do with genetics because surnames are usually passed down patrilinearly or matrilinearly and only a tiny fraction of your genome - the Y-chromosome and mitochondrial DNA - is passed down in this way. Moreover, the ties between genetic lineages and surnames are tenuous at best. So associations between surnames and genetic traits typically arise only because both are loosely associated with ancestry. However, we can visualize things like the ancestries or haplotypes commonly associated with a given surname. Or dumber things like ice cream preference.
I worked on a recent study of burials from Maryland's first colony. By updating my pedigree inference method, Bonsai, to work on historical data, I incorporated historical burials into modern pedigrees of 23andMe customers. Through a combination of genetic inference and genealogy, I was able to propose identities for some one of the burials. Through further investigations and comparisons with isotopic data, our team was able to propose identities for three burials including the first governor of Maryland, Thomas Greene, his son Leonard, and his wife Anne Cox. To our knowledge this is the first time that genetics have been used to re-identify the remains of historical figures without any prior hypothesis of their identities. Read the summary in Discover Magazine or Popular Science.
Slavery in America left millions of people disconnected from their ancestral roots. By finding genetic connections with historical burials of enslaved workers at Catoctin Furnace in Maryland, we were able to connect them to their modern descendants, restoring connections that had been lost. Check out news coverage by New York Times and NPR
Academics and genetic testing companies have been incorrectly applying relationship esimators for the better part of two decades. This has led everyone to believe erroneously that the most distant relationship that can be detected from genetic sharing is around a 5th cousin (the chances of sharing DNA with a more distant relative become vanishingly small under the current models). But these models ignore the fact that each person is related to every other person through millions of different ancestors, making the chances of sharing DNA with a distant cousin very high. I noticed this mistake and derived new relationship estimators. These new estimators push the most distant detectable relationship out to cousins separated by hundreds of generations, not 5.
The Coalescent Model is a fundamental framework in genetics for understanding the evolutionary history of genetic lineages. Many formulas can be derived under this model, such as the expected time when two pieces of DNA share a common ancestor or the distribution of allele frequencies in a population. However, derived quantities under the model can have complicated closed forms and algorithms that implement these formulas can be slow. Deterministic approximations to the model have been known for decades, but each author largely rediscovered them anew and there was no formalism to understand their accuracy or broad principles of usage. I put these approximations on a firm theoretical footing by showing when they are accurate and I compiled the first systematic treatment of how to apply them.
You inherit one copy of your genome from your mother and another copy from your father. However, genome sequencing or genotyping loses the information about which allele came from which parent. "Phasing" is a technique for recovering the parental side information about alleles. Most phasing methods compare a focal genome to the genomes of thousands of very distant relatives. However, by leveraging close relatives of the person we want to phase (so-called Family Phasing), we can do a better job. A patent application for my method for family phasing is currently pending.
Family trees are typically built by hand using laborious trial and error methods. In the last decade, several computational methods have been developed to infer pedigrees. My Bonsai method is the current state of the art method for inferring large complicated pedigrees. See the Bonsai paper here.
This patent covers relationship infrence methods that correctly infer distant relationships, fixing a mistake that has existed in the scientific literature for over a decade and greatly extending the most distant relationship that can be inferred. This patent also introduces methods for sampling DNA sharing within pedigrees, conditional on the event that individuals in the pedigree share DNA.
23andMe, Sunnyvale, CA | 2017 - Present
Research and development of population-genetics methods powering consumer features at 23andMe, including pedigree inference (Bonsai), identity-by-descent inference, relationship estimation, family phasing, and ancestry composition.
UC Berkeley, Departments of Statistics and EECS | 2014 - 2017
Postdoctoral research in statistical population genetics, including approximations to the joint site frequency spectrum of multiple populations and methods for inferring selection coefficients from time-series genetic data.
Stanford University, Department of Biology | 2011 - 2014
PhD research on coalescent theory and population-genetic inference. Recipient of the Samuel Karlin Prize in Mathematical Biology and the CEHG Predoctoral Fellowship in Genomics.