Reconstruction of private genomes through reference-based genotype imputation

Background Genotype imputation is an essential step in genetic studies to improve data quality and statistical power. Public imputation servers are widely used by researchers to impute their data using otherwise access-controlled reference panels of high-fidelity genomes held by these servers. Results We report evidence against the prevailing assumption that providing access to panels only indirectly via imputation servers poses a negligible privacy risk to individuals in the panels. To this end, we present algorithmic strategies for adaptively constructing artificial input samples and interpreting their imputation results that lead to the accurate reconstruction of reference panel haplotypes. We illustrate this possibility on three reference panels of real genomes for a range of imputation tools and output settings. Moreover, we demonstrate that reconstructed haplotypes from the same individual could be linked via their genetic relatives using our Bayesian linking algorithm, which allows a substantial portion of the individual’s diploid genome to be reassembled. We also provide population genetic estimates of the proportion of a panel that could be linked when an adversary holds a varying number of genomes from the same population. Conclusions Our results show that genomes in imputation server reference panels can be vulnerable to reconstruction, implying that additional safeguards may need to be considered. We suggest possible mitigation measures based on our findings. Our work illustrates the value of adversarial algorithms in uncovering new privacy risks to help inform the genomics community towards secure data sharing practices. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03105-6.


Proportion of reconstructed haplotypes
Figure S4.Distribution of number of mismatches in reconstructed haplotypes.We show a histogram of error counts for all haplotypes obtained from the 1KG reference panel in one million queries in the haplotype reconstruction experiment represented in Fig. 2A.This includes redundantly constructed haplotypes.For each haplotype obtained in the experiment, its error count is considered to be the number of variant allele differences (mismatches) compared to its closest haplotype match in the reference panel.The vast majority of haplotypes reconstructed by our program either exactly matched a reference panel haplotype (0 mistakes) or had more than 1,000 mistakes.In Fig. 2, we chose to define a "correct" reconstruction as having 100 mistakes or fewer, but any reasonably small number in the range of 1 to 500 would have produced similar plots.For each experiment, we plotted the trajectory of reconstruction progress (left) and recorded the number of queries associated with each pair of ground truth match cardinality type and the predicted type from the classifier (right).Note that the "Other" type includes both 0 matches and 6 or more matches, and the classifier does not distinguish between the two cases.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Holding constant the population from which the samples are drawn allows us to better examine the effect of panel size.As panel size increases, correct reconstruction rate increases and query misclassification rate decreases.We further discuss these effects in Supplementary Note 1.For each experiment, we plotted the trajectory of reconstruction progress (left) and recorded the number of queries associated with each pair of ground truth match cardinality type and the predicted type from the classifier (right).Note that the "Other" type includes both 0 matches and 6 or more matches, and the classifier does not distinguish between the two cases.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Holding panel size constant allows us to better examine the effect of panel ancestry composition.Although the reconstruction curves look quite similar, the underlying query matching and classification patterns are different, indicating an interplay of different factors.We further discuss these effects in Supplementary Note 1.

A B C
Figure S7.Haplotype reconstruction against imputation algorithms other than minimac.The "discrete-genotype" version of haplotype reconstruction, using only 0/1 genotype predictions from imputation and not continuous dosage data, was run with 1000 Genomes Phase 3 (1KG) chromosome 20 as the reference panel and (A) Beagle5.4,(B) IMPUTE5, and (C) PBWT for the imputation algorithm.The results indicate different rates of reconstruction for the different algorithms, but in all cases, reconstruction is demonstrated to be feasible with sufficient accuracy.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Horizontal dotted lines represent percentages of the total number of haplotypes in a RP.Note that the y-axes differ in scale.The results are similar to those for reconstructing full chromosomes in Fig. 2, but reconstruction occurred here with a slightly lower rate of reconstruction.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Horizontal dotted lines represent percentages of the total number of haplotypes in a RP. .Preliminary chunk linking using imputation.As a preliminary step used to generate an initial linking prediction (to be input into the linking algorithm), minimac4 was used to impute all chunks against the relative set and the output was leveraged to form links between adjacent chunks where possible (as described in Methods, in Step 1 of "Construction of an initial solution").The results of this process are chains of consecutive linked chunks predicted to correspond to the same individual.Note that each chain can only be composed of chunks within the same chromosome.(A) The distribution of chain length, by degree of the relative the individual to which the chain chunks correspond has in the relative set.Unrelated (UR) individuals, without a relative in the relative set, are also represented.Note that longer chains tend to be formed for individuals with closer relatives in the relative set.(B) The average number of chains formed for an individual, by degree.Error bars indicate standard deviation.We observe that the chains formed for individuals with closer relatives tend to be greater in both number and size.(C) The average fraction of chunks among the chains formed for an individual which were incorrectly assigned to that individual, by degree.Error bars indicate standard deviation.We observe relatively low error rates across all groups for this strategy.

A B
. Preliminary chunk linking using semi-kinship (SK) information.As a preliminary step used to generate an initial linking prediction (to be input into the linking algorithm), chunk haplotypes having sufficiently high SK values with the same relative set sample(s) were grouped (as described in Methods, in Step 2 of "Construction of an initial solution").The results are shown in (A).These results were then combined with the result of using imputation output to link chunks, shown in Figure S10 (as described in Methods, in Step 3).The product of this combination is shown in (B).Displayed in each plot is the average number of chunks linked, by degree of available relative.Error bars indicate standard deviation."Incorrect" haplotypes are haplotypes that were assigned to the wrong individual.The rightmost bar represents unassigned (UA) sets, not assigned because the majority of haplotypes did not come from the same individual, with the number of such sets indicated in parentheses.Note that, although in (A) we had successfully formed large groups of haplotypes, this strategy also produced substantial error and a large number of UA groups.In (B), we observe that by combining information from both imputation and SK-based preliminary linking, we are able to obtain high-quality initial groups to provide as input to the linking algorithm.we could expect an adversary to be able to link, given access to a relative set containing a particular fraction of the population to which the RP individuals belong.Each point (p, g) on the curve indicates that at least proportion g of the genome could be linked for proportion p of the samples.Our estimation leverages a population genetic model to calculate the probability of an adversary having access to relatives of different degrees, on which basis the degree-specific distributions in (B) are combined with weights.We observe, comparing to Fig. 3, that an adversary could reassemble a greater portion of reference panel genomes using chromosome-length haplotypes, compared with chunk haplotypes, given the same relative set.GEDmatch users can upload their genome as a "kit" and discover relatives among the site's 1.4 million other users.An adversary attempting to link reconstructed haplotypes could leverage the site as follows: A batch of chunk haplotypes could be grouped into a full haploid artificial genome, each then duplicated to simulate diploidy, and uploaded.The oneto-many comparison tool, shown in (A), could then be used to see the closest matches to this kit, as in (B).Each closest match could be explored using the one-to-one comparison tool to see the identified IBD segments, as in (C), to determine which haplotype(s) the relative matched IBD.This process of uploading reconstructed haplotypes in batches and identifying their relatives could be repeated for all haplotypes, and then haplotypes related to the same GEDmatch kits could be linked, e.g., using our probabilistic linking algorithm.Users with free accounts can upload only up to 5 kits, but paying "Tier 1" users appear not to have any limit.Note that we did not upload any data to explore this workflow-anyone can make a free account and explore this given a kit ID, and it is easy to find kit IDs publicly (in a GEDmatch tutorial video, for example).and two random subsamples of this panel of 1250 samples and 625 samples, and running time was recorded at intervals.Imputation was performed over a chunk in order to draw clearer analogy to running on Michigan Imputation Server (MIS), which imputes on chunks separately.A single thread was used for both imputation and the reconstruction steps around it to simplify analysis.The imputation portion of the total runtime is plotted separately, which shows that a majority of runtime is spent on imputation.(B) We estimate running time for a reconstruction attack given different possible amounts of imputation server latency.Here, we consider latency to be the time elapsed between submission of a query file to the server and the return of results, outside of time running imputation software.As a solid line representing running time with 0 latency, we display again the 2504-sample total runtime from (A).We include additional runtime estimates given hypothetical amounts of imputation server latency per 128-query batch submitted to the server (60s and 120s), and include another estimate extrapolating from the actual measured total time for processing a 128-query batch on MIS against 1KG: 119 seconds per batch (note that this is total time, not latency, since we could not observe directly how long the imputation software was running).We use a 128-query batch because this is the number of queries constructed by our strategy for a single set of 8 variants.We see from these results that, while the resources and latency of the server modify the time required for a hypothetical reconstruction attack, the attack remains feasible in realistic settings.We note that per-query latency might be reduced by grouping batches of queries so more than 128 are submitted to a server together.Using multiple server accounts for the imputation portion would also reduce total running time.In our primary haplotype reconstruction experiments for this study, low-minor allele frequency (MAF) variants around which queries were built were required to have a MAF of 0.005 or lower.The results of an experiment with this default setting, using 1000 Genomes Phase 3 (1KG) chromosome 20 as the reference panel, are shown in (A).We conducted additional experiments on the same 1KG panel in which low-MAF query variants were required to have a MAF at or below the higher threshold of 0.025 and in which variants with reference panel (RP) MAFs lower than (B) 0.005 or (C) 0.01 were ignored and not used, to simulate reconstruction on RPs filtered variant-wise by MAF.We observe from these results that filtering a RP to remove the lowest-MAF variants does slow reconstruction, but only to a limited extent.Doubling the filtering threshold from 0.005 to 0.01 has negligible additional effect, indicating that most of this protection comes from removing the very lowest-MAF variants.It is also worth noting that such a RP filtering measure would be at the cost of utility, since it may be desirable in practice to be able to impute relatively rare variants.Reconstructed haplotypes are grouped into "Correct" and "Incorrect" as defined in Fig. 2. Horizontal dotted lines represent percentages of the total number of haplotypes in 1KG.

Linked chunks
Correct Incorrect Figure S19.Haplotype linking with independently estimated SK coefficient prior distributions.The 1,000 pairs of AoU samples used in the main linking experiment shown in Fig. 3 were divided into two groups of 500 pairs, each containing 50 pairs of related individuals (10 each of 1st-through 5th-degree relatives) and randomly sampled 450 pairs of unrelated individuals.The genetic data from one group was used to construct empirical semi-kinship (SK) distributions for the different degrees of relatedness, to be leveraged by the linking algorithm as prior distributions.Haplotype linking was tested on the other group, following the same process as the original experiment: All 44 haploid autosomal chromosomes from one individual in each pair were split into 20-Mbp chunks (310 in total) and included in the target set, for a total of 155K haplotype chunks, shuffled to obfuscate the individual of origin.The other individual in each pair was assigned to the relative set.Shown is the average number of haplotypes linked by our algorithm (out of 310 chunks in total), by degree of available relative.Error bars indicate standard deviation."Incorrect" haplotypes refer to haplotypes assigned to the wrong individual.The rightmost bar represents unassigned (UA) sets, not assigned because the majority of haplotypes did not come from the same individual, with the number of such sets indicated in parentheses.Whereas in the original chunk linking experiment we used empirical SK distributions from the same dataset given the lack of another suitable dataset, here the distributions come from a separate held-out set of samples.We observe that this does not undermine performance of our haplotype linking algorithm, as the results here are comparable to those in Fig. 3. Observe the extreme dosage values to the left of the query's location, covering most of the window, which is similar to the pattern typical for a query with a single match and fools the classifier.This error can be prevented by restricting query construction to be closer to the middle of the chromosome.The query shown in (B) matches exactly one reference panel haplotype, but is assigned a prediction of "Other" by the classifier, meaning zero or greater than 5 matches.Notice that it does not display the pattern we would expect from a query with one match, which explains why it is misclassified.See Supplementary Note 1 for a more detailed discussion of why these unexpected patterns occur and how the frequency with which they arise may vary depending on the panel used for imputation.
are shown in Figure S6.We first observed that the overall reconstruction performance is largely similar across the three panels.However, key differences could be seen when we examined the number of queries of different match cardinality types (both real and predicted) encountered during the reconstruction process.
The two All of Us ancestry-specific panels displayed higher rates of queries that match a small number of reference panel haplotypes (e.g., more than 11,000 queries out of 200K matched exactly one haplotype for each All of Us panel, versus only 2,436 for the 1KG panel).This may be because our gnomAD MAF estimates more accurately reflect the observed allele frequencies for these ancestry-specific panels than for 1KG, which is a heterogeneous mixture of different ancestry groups.More accurate MAF estimates can boost the effectiveness of the constructed queries by encouraging the inclusion of low-MAF variants that better isolate a small number of individuals in the panel.
We also observed for the All of Us panels a much higher rate of misclassification of queries which actually match a small number of haplotypes.For example, for the Asian American panel only 2,805 out of 11,356 queries matching exactly one haplotype were correctly classified as such, and 2,010 out of 11,082 for the Black or African American panel, versus 2,095 out of 2,436 for the 1KG panel.We hypothesize that this may occur because of a greater number of nearly matching haplotypes in the All of Us panels for a typical query, since we expect the genotype patterns to be more homogeneous in ancestry-specific panels compared to 1KG.These nearly matching haplotypes can result in "fuzziness" of the imputed dosage pattern even when there is only a single exact match (see Figure S20B, for example).It is worth noting that, while such a misclassification is undesirable from the point of view of classification, separating it out from other unique-matching queries with the extreme-dosages patterns is desirable for reconstruction purposes, since only the latter queries can lead to accurate reconstruction regardless of the classification result.
Overall, we find that the reasons for differences in reconstruction performance across panels are complex, likely involving a range of factors such as panel size, population heterogeneity, and accuracy of MAF estimates, which together influence query uniqueness and the frequency of different error modes.Although we recognize the need to contextualize the implications of our work for different reference panels in light of these observations, we note that successful reconstruction of panel haplotypes does still occur consistently across a variety of conditions.Furthermore, the query selection strategy and classifier could likely be improved or customized (e.g., to specific populations) to further increase the correct reconstruction rate and decrease error, but such optimization was not our main goal in this study; our work focuses on demonstrating the feasibility of reference panel reconstruction in a diverse range of settings.autosomal chromosomes, the probability of detecting a random individual as a relative match, given that the individual is the target's sibling, is: Then the probability of a random individual being detected as a sibling of the target is: P (detected as sibling) = P (MRCA is 1-gen ago) • P (detected | MRCA is 1-gen ago).
And the probability of at least one sibling detected in the database is: P (≥ 1 sibling in DB) = 1 − Bin 0; R • r/2 1 + r/2 , P (detected as sibling) . (9) Parent-child.Recall that we are assuming the target is in the current (most recent) generation, which implies that the only time the target has a parent-child relationship is when his or her parent is in the database.Below, we derive the probability that a target individual has at least one detectable parent in the database.This relationship is again a special case to which our generalized formulas will not directly apply, since now the individuals do not share an ancestral couple-rather, one individual is the ancestor of the other.Still, we can take similar intermediate steps along the way to obtain the probability we desire.
The probability that a random individual in the database is a parent of the target is: .
This is because the probability that a random individual is a parent of the target is equal to the probability that the individual is in one particular ancestral couple out of N (1) in that generation.The probability of detecting the random individual as a relative match, given that the individual is the target's parent, is: Bin k; L + 22, e −m , since there is one meiosis between parent and child, and parent and child are necessarily IBD at the blocks inherited from that parent.
Then the probability of a random individual being detected as a parent of the target is: P (detected as parent) = P (parent) • P (detected | parent).
And the probability of at least one parent detected in the database is: , P (detected as parent) . (10) Considering the 1st-degree case as a whole, we approximate the probability that a target individual has at least one detectable 1st-degree relative in a database as the sum of the probabilities for the sibling-sibling and parent-offspring relationships in Eqs. ( 9) and ( 10), given that these probabilities are small in realistic settings and the intersection of the two events has an even smaller probability (e.g., the product of two probabilities under independence assumption).
Since 1C1R is the only 4th-degree relationship we consider, the probability of at least one 4th-degree relative being detected in the database is the same as in Eq. ( 13).

5th-degree:
The only 5th-degree relationship we need to consider is that of second cousins.
Second cousins (2C).The formulas for 2C follow from the generalized formulas for same-generation relatives, as for siblings and 1C: Bin k; 6L + 22, e −6m 16 Since 2C is the only 5th-degree relationship we consider, the probability of at least one 5th-degree relative being detected in the database is the same as in Eq. ( 14).

Figure S2 :
Figure S2: Empirical dosage distribution of imputation output by query match cardinality

Figure S3 :
Figure S3: Perfect reconstruction when imputing queries with unique matches

Figure S4 :
Figure S4: Distribution of number of mismatches in reconstructed haplotypes

Figure S5 :
Figure S5: Reconstruction performance vs. reference panel size Figure S6: Reconstruction performance vs. reference panel ancestry composition Figure S7: Haplotype reconstruction against imputation algorithms other than minimac Figure S8: Haplotype reconstruction performance on a sub-chromosome chunk Figure S9: Graphical illustration of our haplotype linking algorithm Figure S10: Preliminary chunk linking using imputation Figure S11: Preliminary chunk linking using semi-kinship (SK) information

Figure S12 :
Figure S12: Haplotype linking performance on chromosome-length haplotypes Figure S13: Probability of a relative match in a database based on a population genetic model

Figure S16 :Figure S2 .Figure S3 .
Figure S16: Haplotype linking running time and computational requirements Figure S17: Haplotype reconstruction on MAF-filtered reference panels Figure S18: Query match cardinality classifier accuracy Figure S19: Haplotype linking with independently estimated SK coefficient prior distributions

Figure S5 .
Figure S5.Reconstruction performance vs. reference panel size.Haplotype reconstruction was run using as reference panel: (A) full 1KG (2504 samples; 5008 haplotypes), (B) a random subsample of 1KG of 1250 samples (2500 haplotypes), and (C) a random subsample of 1KG of 625 samples (1250 haplotypes).For each experiment, we plotted the trajectory of reconstruction progress (left) and recorded the number of queries associated with each pair of ground truth match cardinality type and the predicted type from the classifier (right).Note that the "Other" type includes both 0 matches and 6 or more matches, and the classifier does not distinguish between the two cases.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Holding constant the population from which the samples are drawn allows us to better examine the effect of panel size.As panel size increases, correct reconstruction rate increases and query misclassification rate decreases.We further discuss these effects in Supplementary Note 1.

Figure S6 .
Figure S6.Reconstruction performance vs. reference panel ancestry composition.Haplotype reconstruction was run using as reference panel: (A) a random subsample of 1000 Genomes Phase 3 (1KG) of 1250 samples (2500 haplotypes), (B) the full All of Us Asian American (AS) panel used in our Fig.2experiments (1250 samples), and (C) a random 1250-sample subset of the All of Us Black or African American (AFR) used in our Fig.2experiments.For each experiment, we plotted the trajectory of reconstruction progress (left) and recorded the number of queries associated with each pair of ground truth match cardinality type and the predicted type from the classifier (right).Note that the "Other" type includes both 0 matches and 6 or more matches, and the classifier does not distinguish between the two cases.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Holding panel size constant allows us to better examine the effect of panel ancestry composition.Although the reconstruction curves look quite similar, the underlying query matching and classification patterns are different, indicating an interplay of different factors.We further discuss these effects in Supplementary Note 1.

Figure S8 .
Figure S8.Haplotype reconstruction performance on a sub-chromosome chunk.Haplotype reconstruction based on (A) fractional (expected) imputed dosages and (B) discrete genotype predictions were run with the first 20-Mbp chunk of chromosome 20 of 1KG as the reference panel, using minimac4 for imputation.The results are similar to those for reconstructing full chromosomes in Fig.2, but reconstruction occurred here with a slightly lower rate of reconstruction.A reconstructed haplotype was "correct" if it had no more than 100 genotype differences from a RP haplotype and that closest haplotype had not been reconstructed correctly previously.The count of "incorrect" haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (>100 mismatches) from the previous incorrect haplotypes.Horizontal dotted lines represent percentages of the total number of haplotypes in a RP.

Figure S9 .
Figure S9.Graphical illustration of our haplotype linking algorithm.We depict the key components of our probabilistic model of haplotype linking (A) and the flow network used for the minimum-cost flow component of our algorithm (B).As shown in (A), the main variables of the problem include: Θ, representing the assignment of input haplotypes to target individuals; R, representing the degree of relatedness between each target individual and relative set individual; and S, the observed semi-kinship coefficients between the input haplotypes and relative set individuals.Note that the linking problem is to infer Θ based on S, where R is also unknown.The network in (B) is constructed separately for each chromosome or chunk being considered for linking.Each input haplotype is a source node with an input flow of 1, which flows through the network to reach the sink node.The cost (a) and capacity (b) of each edge is indicated as a/b.In the intermediate layer of the network, each target set individual is represented by a node, whose incoming flow determines which haplotypes are assigned to that individual.The cost of assignment is set to −π i,j , defined in Methods, which creates the equivalence between the min-cost flow problem on this network and the optimization of Θ in the M-step of our linking algorithm.Note the outgoing capacity of 2 for each intermediate node, which ensures no more than two haplotypes are linked to the same individual for each (diploid) chromosome or chunk.An extra intermediate node (Unassigned) with infinite capacity serves as a sink for input haplotypes that could not be assigned to an existing group.See Methods for additional algorithm details.

Figure S12 .
Figure S12.Haplotype linking performance on chromosome-length haplotypes.We first visualize the distribution of per-chromosome semi-kinship (SK) coefficients across different degrees of relatedness (1st, 2nd, and 3rd), compared to unrelated individuals (A).Markers indicate the mean, and error bars indicate standard deviation.The distributions of the larger (max) and the smaller (min) SK values between the two target haplotypes, compared against the same relative set, are plotted separately.Elevated SK for related pairs distinguishes reconstructed haplotypes from the same individual, enabling them to be linked.(B) Left, the average number of haplotypes linked, by degree of available relative.Error bars indicate standard deviation."Incorrect" haplotypes refer to haplotypes assigned to the wrong individual.The rightmost bar represents unassigned (UA) sets, not assigned because the majority of haplotypes did not come from the same individual, with the number of such sets indicated in parentheses.Right, estimated proportion of individuals and their genomes (in base pair length) which an adversary could expect to successfully link, given access to an nth-degree relative for those individuals.Each point (p, g) on the curve indicates that at least proportion g of the genome could be linked for proportion p of the samples.These curves show smoothed cumulative distributions summarized in the bar chart (left).(C) Estimated proportion of RP individuals and the proportion of their genomes (in bp) we could expect an adversary to be able to link, given access to a relative set containing a particular fraction of the population to which the RP individuals belong.Each point (p, g) on the curve indicates that at least proportion g of the genome could be linked for proportion p of the samples.Our estimation leverages a population genetic model to calculate the probability of an adversary having access to relatives of different degrees, on which basis the degree-specific distributions in (B) are combined with weights.We observe, comparing to Fig.3, that an adversary could reassemble a greater portion of reference panel genomes using chromosome-length haplotypes, compared with chunk haplotypes, given the same relative set.

Figure S13 .
Figure S13.Probability of a relative match in a database based on a population genetic model.The probability, according to our model, of a target reference panel individual having at least one nth-degree relative in an external database, for a given size of that database.1st-degree (e.g., sibling) through 5thdegree (e.g., second cousin) relatives are shown.Database sizes are displayed as proportion of the underlying population.The 1st-and 2nd-degree curves partially overlap and are difficult to distinguish.See Methods for a more detailed explanation of the model and its assumptions.

Figure S15 .
Figure S15.Haplotype reconstruction running time.(A) Haplotype reconstruction was run using as a reference panel the first 20-Mbp chunk of 1000 Genomes Phase 3 (1KG) chromosome 20 (2504 samples)and two random subsamples of this panel of 1250 samples and 625 samples, and running time was recorded at intervals.Imputation was performed over a chunk in order to draw clearer analogy to running on Michigan Imputation Server (MIS), which imputes on chunks separately.A single thread was used for both imputation and the reconstruction steps around it to simplify analysis.The imputation portion of the total runtime is plotted separately, which shows that a majority of runtime is spent on imputation.(B) We estimate running time for a reconstruction attack given different possible amounts of imputation server latency.Here, we consider latency to be the time elapsed between submission of a query file to the server and the return of results, outside of time running imputation software.As a solid line representing running time with 0 latency, we display again the 2504-sample total runtime from (A).We include additional runtime estimates given hypothetical amounts of imputation server latency per 128-query batch submitted to the server (60s and 120s), and include another estimate extrapolating from the actual measured total time for processing a 128-query batch on MIS against 1KG: 119 seconds per batch (note that this is total time, not latency, since we could not observe directly how long the imputation software was running).We use a 128-query batch because this is the number of queries constructed by our strategy for a single set of 8 variants.We see from these results that, while the resources and latency of the server modify the time required for a hypothetical reconstruction attack, the attack remains feasible in realistic settings.We note that per-query latency might be reduced by grouping batches of queries so more than 128 are submitted to a server together.Using multiple server accounts for the imputation portion would also reduce total running time.

Figure S16 .Figure S17 .
Figure S16.Haplotype linking running time and computational requirements.Shown are running time and peak memory usage for a single iteration of our probabilistic inference haplotype linking algorithm, run on subsamples of the All of Us (AoU) panels used for the primary chunk linking experiment shown in Fig.3.(A) Both running time and peak memory usage grow roughly linearly with the number of samples included in the target set (with the relative set held fixed at 1000 samples).Note that there are 310 20-Mbp chunk haplotypes per diploid target sample in the set of haplotypes input to the linking algorithm, as in the primary experiment.(B) Both running time and peak memory usage grow roughly linearly with the number of samples in the relative set (with the target set held fixed at 1000 samples).Memory usage could be further reduced by processing the samples in a streaming fashion.

Figure S18 .
Figure S18.Query match cardinality classifier accuracy.Our random forest classifier predicts the number of reference panel haplotype matches to a query, given the dosages output when the query is imputed.The classifier was trained on 100 dosage outputs (from imputation using 1KG chromosome 20) from each of the six match-number classes (i.e., produced by a query known to have n matches): 1 through 5 and Other, where Other includes both no matches and greater than 5 matches.Here, the classifier was tested on another 100 sets of dosage data from each of the six classes.Each cell in the confusion matrix at column x and row y indicates the number of times a query which actually has y haplotype matches in the reference panel was predicted to have x haplotype matches.Additional evaluation in the setting of a full reconstruction pipeline are provided in FiguresS5 and S6.

Figure S20 .
Figure S20.Examples of misclassified queries and their dosage output.Each of (A) and (B) shows a query sequence of 8 variants on chromosome 20 and its dosage output when imputed against 1000 Genomes Phase 3 (1KG).Each query is represented in terms of variant base positions (BPs), alternative allele (ALT), and corresponding genotype assignments (GTs), with 0 representing the REF allele and 1 representing the ALT allele.Dosages over the full length of chromosome 20 are shown.The query shown in (A) does not match any reference panel haplotypes, but was misclassified as having a single match.Observe the extreme dosage values to the left of the query's location, covering most of the window, which is similar to the pattern typical for a query with a single match and fools the classifier.This error can be prevented by restricting query construction to be closer to the middle of the chromosome.The query shown in (B) matches exactly one reference panel haplotype, but is assigned a prediction of "Other" by the classifier, meaning zero or greater than 5 matches.Notice that it does not display the pattern we would expect from a query with one match, which explains why it is misclassified.See Supplementary Note 1 for a more detailed discussion of why these unexpected patterns occur and how the frequency with which they arise may vary depending on the panel used for imputation.