These data are made available in the public domain (Creative Commons CC0).Parts a–c summarize the identification of clusters: a constructing network from IBD, b detecting network clusters, and c identifying subsets of clusters that separate in the spectral embedding.We took a simple approach to infer population structure from the spectral dimensionality reduction by projecting all samples labelled by the hierarchical clustering onto this low-dimensional embedding, then using this data visualization to extract further clusters.
We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records.
Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow.
Part d summarizes the interpretation of clusters by annotating clusters with admixture and genealogical data.
Part e summarizes the genealogical data—birth location annotations in pedigrees (shaded symbols in d)—for the ‘African American’ cluster.
Principal components (PCs) are computed using kernel PCA, in which the kernel matrix is defined by total IBD between pairs of states, normalized to remove the effect of variation in within-state IBD.
US states that share high levels of IBD on average are placed closer to each other in the projection onto the first two principal components.
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited.
Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin.
Recent genetic studies of the United States and North America have drawn insights into ancient human migrations.
These insights have been primarily drawn from modelling variation in allele frequencies (for example, refs 11, 12, 13, 14, 15), which typically diverge slowly.