Discovery and functional prioritization of Parkinson’s disease candidate genes from large-scale whole exome sequencing

Background Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson’s disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we performed WES in 1148 unrelated cases and 503 control participants. Candidate genes were subsequently validated for functions relevant to PD based on parallel RNA-interference (RNAi) screens in human cell culture and Drosophila and C. elegans models. Results Assuming autosomal recessive inheritance, we identify 27 genes that have homozygous or compound heterozygous loss-of-function variants in PD cases. Definitive replication and confirmation of these findings were hindered by potential heterogeneity and by the rarity of the implicated alleles. We therefore looked for potential genetic interactions with established PD mechanisms. Following RNAi-mediated knockdown, 15 of the genes modulated mitochondrial dynamics in human neuronal cultures and four candidates enhanced α-synuclein-induced neurodegeneration in Drosophila. Based on complementary analyses in independent human datasets, five functionally validated genes—GPATCH2L, UHRF1BP1L, PTPRH, ARSB, and VPS13C—also showed evidence consistent with genetic replication. Conclusions By integrating human genetic and functional evidence, we identify several PD susceptibility gene candidates for further investigation. Our approach highlights a powerful experimental strategy with broad applicability for future studies of disorders with complex genetic etiologies. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1147-9) contains supplementary material, which is available to authorized users.


Background
Next-generation sequencing (NGS) approaches have recently accelerated the identification of variants responsible for familial Parkinson's disease (PD) [1][2][3][4]. While a positive family history is common in PD, large, multigenerational pedigrees, especially with available DNA and clinical evaluations, remain exceptional, hindering progress in unraveling the genetic underpinnings. Importantly, several genes initially discovered to cause PD in families, such as LRRK2, GBA, and PARK2/parkin, were subsequently discovered with surprisingly high frequency in "sporadic" PD cohorts [5,6]. To date, large population samples of individuals with PD have primarily contributed to the discovery of common variant susceptibility loci, based on genome-wide association studies (GWAS) of case/control cohorts [7]. The variants identified by GWAS have modest effect sizes and collectively fail to account for current estimates of PD heritability [8,9]. Considering the above, it seems likely that additional less common alleles, with larger effect sizes, contribute to PD risk in the population and NGS is one promising approach to identify such alleles. Despite recent successes in other neurodegenerative disease with complex genetic etiologies, including Alzheimer's disease [10][11][12] and amyotrophic lateral sclerosis [13,14], sequencing has yet to be deployed in large, unrelated PD case/control samples for rare variant discovery.
The successful discovery of rare variant risk alleles in population-based PD samples faces a number of potential challenges. Perhaps most importantly, analyses of rare variants in large family pedigrees is greatly facilitated by segregation analysis which is not possible in cohorts of unrelated individuals, leading to an increased number of candidate variants to consider. Assumptions of a recessive inheritance model and the application of stringent filters, such as consideration of only strongly damaging, loss-of-function (LoF) variants, is one potential solution, but this is likely to miss many important variants, including dominantly acting alleles. Further, PD is characterized by extensive genic and allelic heterogeneity and extremely large cohorts may be required to document sufficient numbers of cases to facilitate meaningful statistical comparisons [15]. Lastly, as PD is: (1) common (~1-3% prevalence); (2) strongly agedependent; and (3) often preceded by a prolonged presymptomatic or minimally symptomatic phase, we may expect to find truly pathogenic rare variants, including those with large effect sizes, in "control" cohorts of adults (due to unrecognized or early disease stages with minimal symptoms). Therefore, given the occurrence of rare variants, including potentially damaging variants, in most genomes of presumably healthy individuals [16], it may be difficult to identify genes/variants that truly cause disease. Importantly, recent advances in cellular and animal models, along with improved understanding of PD pathogenesis, enable an integrated approach, in which variant discovery is coupled with a functional screening pipeline for prioritization of those genes worthy of more intensive study.
In this collaborative study of the International Parkinson's Disease Genomics Consortium (IPDGC), we report the results of whole-exome sequencing (WES) in 1148 PD cases, the largest such cohort examined to date. Consistent with the younger age of PD onset in this cohort, which is often associated with a recessive inheritance [17][18][19], and to prioritize candidate genes/variants for initial investigation, our analysis focuses on genes with homozygous or compound heterozygous LoF variants. We further couple the human genetic studies with functional screening in mammalian cell culture and invertebrate animal models, successfully identifying those candidate genes showing interactions with established PD mechanisms, including mitochondrial dynamics and α-synuclein-mediated neurodegeneration. Although no sufficiently powered exome dataset was available for definitive replication, human genetic validation was undertaken in several independent datasets. Our integrated approach identifies five strong candidate PD susceptibility genes worthy of further investigation, and exemplifies a powerful strategy with potential broad applicability to the follow-up of future rare variant studies in PD and other neurologic disorders with complex genetic etiologies.

Discovery of recessive LoF variants from PD exomes
A total of 920,896 variants (93.2% single nucleotide variants and 6.8% insertions and deletions) were called in a WES dataset of 1651 participants, including 1148 young-onset PD cases (average age of onset, 40.6 years; range, 5-56 years) and 503 control participants with European ancestry. As our cohort has an average age at onset of less than 45 years, we focused our search on homozygous and putative compound heterozygous variants, consistent with a recessive inheritance model. Although most PD cases were prescreened for mutations in established PD genes, we identified two participants with homozygous exonic variants in parkin and PINK1 (Additional file 1: Table S1). In order to identify novel PD gene candidates, we focused on variants that are rare in control populations. Considering the worldwide prevalence for PD (0.041% in individuals aged 40-49 years) [20], we used a minor allele frequency (MAF) threshold of 1% and only considered LoF variants causing a premature stop codon or splicing site mutations (see "Methods"). When co-occurring with a heterozygous LoF variant, we also considered rare, heterozygous aminoacid changing missense alleles that were predicted to be deleterious (CADD > 20), consistent with a compound heterozygous recessive genotype. Figure 1 displays each variant filtering step along with the corresponding numbers of implicated variants. Following Sanger sequencing confirmation, we identified a total of 27 candidate genes-18 genes encompassing homozygous variants and nine genes harboring putative compound heterozygous variants-all predicted to cause a loss of gene function (Table 1). Approximately 17% of the variants are absent in public allele frequency databases (1000 Genomes Project (1000G), Exome Sequencing Project v. 6500 (ESP6500), or Exome Aggregation Consortium (ExAC)) and therefore implicated to be novel. Except in the case of ARSB, the other 26 genes harbor LoF variants in only a single case, consistent with the hypothesis that novel recessive PD alleles may consist of many rare, "private" mutations. Four PD cases in our cohort were identified with a LoF variant in the ARSB gene, in which mutations have previously been linked with the recessive lysosomal storage disorder, MPS VI (also called Maroteaux-Lamy syndrome). All four individual cases, along with one control participant, were homozygous for a variant (rs138279020) predicted to disrupt splicing. Although this variant is neither reported in ExAC nor was frequency information available from dbSNP, the MAF was 0.065 in our cohort (MAF CASES = 0.073, MAF CONTROLS = 0.052, p = 0.054). Although relatively frequent in our control dataset (MAF > 1%), we have retained it among our results, based on three considerations. First, information was not present in dbSNP, ExAC, or ESP6500, which was the basis for applying this frequency filter in all other cases. Second, at least one of the homozygous individuals had clinical manifestations consistent with MPS VI, supporting potential pathogenicity of this allele (see "Discussion"). Lastly, as detailed below, our functional Fig. 1 Flowchart explaining multiple filtering steps to select LoF variants with assumed recessive inheritance pattern. Functional annotation was performed with transcripts of RefSeq and UCSC databases. MAF annotations were based on 1000 Genomes project, Exome variant Server, and the ExAC database. Seventeen genes harbored homozygous variants causing stopgain or loss and one gene contained a homozygous splicing variant. For the putative compound heterozygous genes, six genes were selected based on the presence of two LoF variants, and three genes were based on the presence of one LoF variant and one missense variant (predicted to belong to the 1% most harmful variants of the genome) studies identify links between manipulation of ARSB and cellular/organismal phenotypes consistent with a potential role in PD. Of note, while the analyses of the IPDGC WES dataset and subsequent work described here were in progress, an independent family-based sequencing study identified VPS13C as a cause of autosomal recessive parkinsonism [21]. Although the single IPDGC subject with compound heterozygous VPS13C LoF alleles was published as a replicate case in that work, we retained it among the 27 candidates described here, since it was independently carried forward for all analyses detailed below.

Tolerability of gene LoF in humans and animal models
The "tolerability" of recessive LoF genotypes has important implications for understanding the genetic basis of adult-onset, age-influenced disorders such as PD. As most of the identified homozygous and putative compound heterozygous LoF genotypes are based on a single individual, we also examined for their occurrence in a large, recently published study [16] of predicted complete gene knock-outs in the Icelandic population, including 104,220 participants with imputed genotypes, based on whole genome sequencing from a subset of 2363 individuals. The Icelandic population is enriched for rare disease-causing mutations with a recessive inheritance pattern, given a strong founder effect and non-random mating patterns. Twelve of the variants that we identified are also present in the Icelandic study (Additional file 1: Table S2); however, the observed homozygote frequencies are not sufficiently high to confidently exclude them as possible PD genes and importantly, detailed phenotypic data are not publicly available for these participants. For example, 29 Icelandic participants are reported homozygous for the identical PTCHD3 stopgain variant (c.C1426T, p.R476X) as the single PD case in our WES study. However, this is only 0.028% of the total sample set and below the reported prevalence of young-onset PD (0.041%).
We additionally examined for the presence of other LoF variants with a recessive inheritance pattern in our implicated candidate genes (Additional file 1: Table S2). For a subset of genes, we indeed identified several variants with particularly high homozygote frequencies including OR7G3 (9.16%), SSPO (9.38%), and PTCHD3 (16.55%). This is consistent with prior reports describing a homozygous deletion covering PTCHD3 in apparently healthy individuals, consistent with a non-essential role [22]. Assuming that the variants in OR7G3, SSPO, and PTCHD3 confer similar LoF to the alleles identified in our PD WES data, their high variant frequency makes these genes unlikely to be highly penetrant PD-risk loci.
Human genes harboring homozygous LoF variants-especially those observed recurrently in large population-based datasets-potentially identify genes that are dispensable for fetal and subsequent child development. Given the limited human phenotypic information available, we further investigated the potential tolerability for the implicated genes using a cross-species approach, performing systematic LoF analysis in the nematode, C. elegans. Out of the 27 candidate genes identified in our WES analysis, ten were well conserved in the C. elegans genome and nine had readily available RNA-interference (RNAi) reagents for LoF screening (see "Methods"). Each gene was targeted for knockdown using RNAi and we assessed for developmental lethality and survival. The results of these studies, along with other LoF data from public databases, are available in Additional file 1: Table S3. Knockdown of homologs of DIS3 (dis-3), KALRN (unc-73), and PTCHD3 (ptr-10) resulted in developmental arrest and/or reduced survival in C. elegans. Notably, homologs of KALRN and DIS3 are also associated with reduced viability following genetic disruption in both Drosophila [23,24] and mice [25,26]. Thus, these results are potentially consistent with conserved, early, and/or essential developmental roles for these genes and the absence of individuals harboring homozygous LoF variants in the Icelandic cohort [16].
Since the human genome contains multiple gene paralogs for KALRN and PTCHD3, genetic redundancy might account for how LoF might be tolerated in humans but not in simple animal models. Alternatively, it is possible that the allelic variants implicated in our PD WES cohort and Icelandic study might not cause a complete LoF (i.e. genetic null) despite the algorithmic predictions, instead causing only a partial LoF. Nevertheless, these cross-species comparisons suggest essential and early developmental roles for homologs of PTCHD3, DIS3, and KALRN, and informing our consideration of potential contribution to adult-onset disorders, such as PD.

Variant aggregation analyses
For the 27 genes implicated based on our primary analyses of homozygous or compound heterozygous LoF variants, we additionally considered evidence for the presence of other allelic variants conferring risk for PD in our cohort. We therefore performed burden analyses leveraging our IPDGC WES data, testing two nested classes of variants: (1) a subset predicted to be deleterious (CADD > 20); and (2) all amino-acid changing missense alleles. Rare variants (MAF < 0.018) were considered either selectively or in joint models with common variants (MAF > 0.018). As detailed in Additional file 1: Table S4, the rare variant aggregation association analyses provided further evidence in support of four candidate genes: GH2, PTPRH, UHRF1BP1L, and ZNF453. Interestingly, the burden association at the PTPRH gene is further enhanced when common and rare variants are simultaneously modeled.
Our analyses of LoF variants in PD exomes identify a number of promising candidate genes. However, even though a positive family history was observed for almost 40% of the cases, segregation analysis of the variants in families is not feasible, as DNA samples are not available from additional family members. Further, since most of the genes implicated contribute to single or few cases, we are unable to perform meaningful statistical comparisons, based on the limited numbers of LoF variants identified by WES in cases versus controls. As an alternative strategy, we therefore deployed a combination of cell-based and model organism functional screens to define potential links between the 27 candidate genes (Table 1) and well-established mechanisms of PD susceptibility and pathogenesis, including (1) mitochondrial health and (2) α-synuclein-mediated toxicity.

Functional prioritization: mitochondrial health
Although the mechanism of neurodegeneration in PD remains incompletely defined and may be heterogeneous, mitochondrial dysfunction has been proposed to play an important role, particularly in young onset PD [27][28][29]. Notably, parkin (PARK2), DJ-1, and PINK1, associated with autosomal recessive, juvenile-onset Parkinsonism, have roles in mitochondrial dynamics and quality control [30]. Specifically, Parkin is an E3 ubiquitin ligase and recruited selectively to dysfunctional mitochondria with a low membrane potential [31]. Further, the neurotoxicity of α-synuclein, the primary constituent of Lewy body inclusions in PD, has also been linked to mitochondrial injury [32]. We therefore hypothesized that LoF in candidate genes identified from our analyses of WES, might similarly impact mitochondria, consistent with roles in PD susceptibility.
Therefore we quantified mitochondrial morphology after gene knockdown in BE(2)-M17 neuroblastoma cells by examining three parameters commonly used for quantification of mitochondrial morphology: mitochondrial number, axial length ratio, and roundness [33]. Cells transduced with the short hairpin RNA (shRNA) encoding a scrambled sequence were used for normalization and positive controls for mitochondrial morphology were included in each experiment. For example, knockdown of the mitochondrial fission gene dynamin 1-like (DNM1L), a positive control, results in elongated mitochondria and therefore decreases mitochondrial axial length ratio and roundness (Fig. 2a, b) [34]. Knockdown of 13 genes show a significant effect on at least one of the three parameters (Additional file 1: Table S5 and Table S6 and Additional file 2: Figure S1). GPATCH2L shows the largest increase in mitochondrial roundness, while UHRF1BP1L displays the largest decrease (Fig. 2c, d).
We also took advantage of a well-established Parkin translocation assay [31,[35][36][37][38] based on BE(2)-M17 human neuroblastoma cells stably expressing Parkin-GFP. As expected, upon exposure to the mitochondrial toxin and electron transport chain uncoupling reagent, CCCP, we observed robust translocation of Parkin-GFP from the cytoplasm (Fig. 3a, untreated) to the mitochondria (Fig. 3a, CCCP-SCR transduced) and this was PINK1dependent ( Fig. 3a, CCCP-PINK1 shRNA), which provides an internal, positive control in our assay. CCCP-induced Parkin accumulation was assessed by high-content microscopy and automated image analysis following systematic shRNA-knockdown of our 27 candidate genes (Fig. 3b). Based on stringent criteria (see "Methods"), six genes significantly modified Parkin translocation ( Fig. 3c and d; Additional file 2: Figure S2; Additional file 1: Table S5 and Table S6), including four genes (GPATCH2L, PTCHD3, SVOPL, and ZNF543) with consistent activities in both the mitochondrial morphology and Parkin translocation assays.
Functional prioritization: α-synuclein-mediated toxicity A wealth of evidence also supports a central role for αsynuclein-mediated toxicity in PD pathogenesis. αsynuclein aggregates, termed Lewy bodies, are the defining disease pathology and α-synuclein gene (SNCA) mutations, locus multiplication, and promoter polymorphisms are associated with PD susceptibility [5]. Further, expression of α-synuclein in numerous animal models including in the fruit fly [39][40][41], Drosophila melanogaster, recapitulates features of PD-related neurodegenerative pathology. Transgenic expression of α-synuclein in the fly retina leads to neurotoxic changes [39] and is amenable for detection of genetic modifiers [42,43]. Genetic manipulation of established PD susceptibility genes, including PARK2 [44,45] and VPS35 [46], modulate α-synuclein toxicity in transgenic flies, similar to findings in mammalian models [44,47]. We therefore hypothesized that LoF in homologs of novel PD genes may similarly enhance α-synucleininduced retinal degeneration.
Out of the 27 candidate genes implicated by our WES analyses, 13 were well-conserved in Drosophila (Additional file 1: Table S7). Available RNAi stocks targeting each of the 18 fly homologs (some genes had multiple conserved paralogs) were crossed to flies in which the human α-synuclein transgene was directed to adult photoreceptors using the Rhodopsin1-GAL4 (Rh1) driver (Rh1 > α-synuclein) [48]. For rapid screening, retinal neurodegeneration was monitored using the optical neutralization technique which allows assessment of retinal tissue integrity in intact, unfixed heads. In Rh1 > α-synuclein animals, the retina appears morphologically normal at 1 day ( Fig. 4), but demonstrates age-dependent degeneration leading to progressive vacuolar changes, rhabdomere loss, and culminating with extensive tissue destruction by 30 days. At the 15-day time point selected for screening, only mild, if any, retinal pathology is detectable on most histologic sections, consistent with a weakly penetrant degenerative phenotype following optical neutralization (mean penetrance~25%) (Fig. 4). However, co-expression of RNAi targeting fly homologs of four candidate genes (ARSB, TMEM134, A C B D Fig. 2 High-content assay for mitochondrial morphology. Effect of DNM1L shRNA (a, b) and UHRF1BP1L shRNA (c, d). BE(2)M17 cells stained with Hoechst (blue; nuclei), MitoTracker CMXros, and MitoTracker Deepred (yellow; mitochondria). a Cells infected with shRNA encoding a scrambled sequence (SCR, left panel) and decrease in mitochondrial axial length ratio and roundness for DNM1L (positive control, right panel). b The graph displays normalized mitochondrial roundness. c Cells infected with shRNA encoding a SCR sequence (left panel) and decrease in number of mitochondria per cell, mitochondrial axial length ratio, and roundness for UHRF1BP1L (right panel). d The graph displays normalized mitochondrial roundness. Data are median values ± median absolute deviation (MAD) of N = 6 measurements. *p < 0.05 and **p < 0.01, Mann-Whitney U test (see "Methods"). All values were normalized to the negative control (infected with SCR shRNA) and all shRNA clones that meet the cutoff criteria are shown (b, d) The graph displays the normalized ratio of cells positive for translocation and cells negative for parkin translocation. All values were normalized to the negative control (CCCP treated infected with shRNA encoding a scrambled sequence). Data are median values ± median absolute deviation (MAD) of N = 6 measurements. *p < 0.05, **p < 0.01, and ***p < 0.001, Mann-Whitney U test (see "Methods"). All shRNA clones that meet the cutoff criteria (see "Methods") are shown PTPRH, and VPS13C) was observed to robustly enhance α-synuclein-mediated neurodegeneration in the retina (mean penetrance~75%; Additional file 1: Table S8).
All candidate enhancers of α-synuclein identified using the screening assay were further confirmed based on retinal histology, demonstrating accelerated pathologic changes with a significantly increased overall extent and severity of degeneration compared to Rh1 > α-synuclein controls without RNAi transgenes present (Fig. 5). Importantly, when each of these genes were targeted under similar experimental conditions (Rh1 > RNAi), but independent of α-synuclein expression, we did not observe any significant retinal pathology in 15-day-old animals ( Fig. 5). Therefore, within the Drosophila α-synuclein transgenic model system, the implicated LoF enhancers appear consistent with synergistic (non-additive) effects on α-synuclein-mediated retinal degeneration. Since increased α-synuclein expression levels are one important mechanism of PD susceptibility [5], western blot analyses were performed to determine whether any of the identified genetic enhancers alter α-synuclein protein levels. However, following RNAi-mediated knockdown, none led to significant changes (Additional file 2: Figure S3). Thus, we hypothesize potential interactions with more downstream mechanisms of α-synuclein neurotoxicity. For 3 out of 4 candidate enhancers (ARSB, VPS13C, PTPRH), available siRNAs permitted additional testing of gene homologs as candidate modifiers in an established C. elegans model of α-synuclein toxicity [49]. However, no significant differences were detected in the α-synuclein-induced locomotor phenotype observed in one-week-old worms following knockdown of these genes (Additional file 2: Figure S4). We speculate that these contradictory results might stem from differences in assay sensitivity and/or tissue-specific toxic mechanisms as the fly and worm models are based on α-synuclein expression in the retina versus muscle, respectively.

Genetic replication of candidate PD genes from WES
We next evaluated our 27 gene candidates in additional available genetic datasets including: (1) an independent exome sequencing dataset from the Parkinson Progression Markers Initiative (PPMI) project [51]; (2) a wholegenome sequencing dataset including PD index cases of a Dutch genetic isolate belonging to the Genetic Research in Isolated Population (GRIP) program [52]; (3) an independent NeuroX exome array dataset [7,53]; and (4) a large PD GWAS dataset [53]. Within the PPMI exome dataset, including 462 PD cases and 183 controls, evidence supporting replication was discovered for two genes, in which we identified the identical variants from the IPDGC discovery exome dataset (Additional file 1: Table S9). A PD case from PPMI carries the same homozygous stopgain variant (p.R362X) in GPATCH2L as observed for an IPDGC case. Although the age of onset differs 20 years between these two PD cases (47 and 68 years for the IPDGC and PPMI patients, respectively), they share similar asymmetric clinical symptoms at onset, which are characterized by resting tremor, bradykinesia, and rigidity. Furthermore, both PD cases have a father diagnosed with PD, implying the variant to be highly penetrant. We excluded the possibility that these two PD cases might be related by computing pairwise genetic relationships [54] from common SNPs (MAF ≥ 0.01). No evidence of relatedness was observed (A jk = −0.0018). Based on ExAC, only one (0.003%) out of 32,647 European individuals has this same homozygous variant. The observation of two PD cases (0.12%) of our 1610 studied PD patients (1148 IPDGC WES plus 462 PPMI WES) with this GPATCH2L mutation is consistent with a 40-fold enrichment in our PD cohort. The second gene harboring an identical LoF variant is FAM83A. The p.G86X variant in FAM83A, detected within an IPDGC participant with sporadic PD diagnosed at the age of 28 years, was also observed in a single sporadic PD case from PPMI with an age of onset of 62 years. These FAM83A carriers presented with similar symptoms, including bradykinesa, rigidity, and resting tremor. In both datasets, the p.G86X allele is predicted to be in trans with another variant: p.R347X or p.V137G in PPMI and IPDGC, respectively.
The second genetic independent dataset that was investigated included a whole-genome sequencing study (39 PD index cases and 19 controls) of a genetic GRIP isolate from the Netherlands, focusing on variants within our candidate genes that were present in at least two PD index cases and absent in controls. We identified a heterozygous missense variant (NM_001127444:c.1176G > T:p.L392F) in CD36 for three PD index cases. Although not consistent with a recessive inheritance model, this variant has not been observed in the 60,706 unrelated individuals of the ExAC database, suggesting potential enrichment in PD cases. These heterozygote variant carriers have a substantial higher age of onset (range, 61-79 years) in comparison to the PD patient (age of onset, 38 years) with the putative compound heterozygous variant within the discovery WES dataset. This observation supports an additive model of pathogenicity, implying more severe disease onset when two alleles are affected. Further, CD36 (p.L392F) is predicted to represent the top 1% most harmful variants within the genome (CADD score = 23.3). In the IPDGC discovery dataset, the discovered compound heterozygous variants, p.Q74X and p.P412S (Table 1), are also predicted to be strongly deleterious (CADD scores of 26.5 and 25.9, respectively).
We next interrogated the independent IPDGC NeuroX dataset, including genotypes from 6801 individuals with PD and 5970 neurologically healthy controls. NeuroX is a genotyping array that includes pre-selected exonic variants and is therefore not suitable to search for the identical recessive LoF variants implicated by our WES analyses. Instead, we examined the burden of multiple variant classes within the 27 candidate genes, following the same variant categories as for the original IPDGC WES dataset (Additional file 1: Table S10). When only considering variants predicted to be deleterious (CADD > 20), an association is detected for UHRF1BP1L with PD risk (p = 0.005). This gene also shows an association with PD in the IPDGC WES dataset when performing a similar burden analysis considering missense variants (see above, p = 0.016). Using the NeuroX dataset, we additionally confirmed the enrichment of rare PTPRH variants in participants with PD (WES: p = 0.034, NeuroX: p = 0.045). Furthermore, VPS13C and ARSB show significant associations to PD when considering the joint effect of all variants, both common and rare (Additional file 1: Table S10).
Leveraging available IPDGC GWAS data (13,708 cases/ 95,282 controls), we next assessed for potential common variant association signals (p < 1 × 10 −4 ) using a 1-Mb genomic window centered on each of the 27 candidate genes. Three loci (VPS13C, PCDHA9, and TCHHL1) showed evidence consistent with an association peak (Additional file 2: Figure S6). A genome-wide significant association at the VPS13C locus, was in fact recently reported [7]; the best SNP (rs2414739, p = 3.59 × 10 −12 ) maps~150 kb distal to VPS13C. Based on local patterns of linkage disequilibrium defined by Hapmap (Additional file 2: Figure S6), it is unlikely that rs2414739 is a proxy for p.E3147X or similar LoF variants in VPS13C; however, it might be possible that the SNP influences VPS13C expression by affecting the long non-coding RNA lnc-VPS13C-1 [55] in which the SNP is located. The other two candidate association peaks, adjacent to PCDHA9 and TCHHL1, are considerably weaker signals (rs349129 = 1.40 × 10 −5 and rs7529535 = 7.66 × 10 −5 , respectively) and given the distances (~500 kb) many other candidate genes are potentially implicated.
In sum, we identify additional genetic evidence consistent with replication for seven genes (GPATCH2L, FAM83A, CD36, UHRF1BP1L, PTPRH, ARSB, and VPS13C) that were implicated by our WES analysis, of which five (GPATCH2L, UHRF1BP1L, PTPRH, ARSB, and VPS13C) are further validated based on functional evidence from PD-relevant experimental models.

Transcriptomics-based functional exploration
Lastly, we examined each candidate gene from our WES analysis for co-expression with established PD susceptibility gene in expression networks derived from human substantia nigra, leveraging available data from the United Kingdom Brain Expression Consortium (UKBEC) and the Genotype-Tissue Expression project [56]. Of the 27 candidate genes, seven were not sufficiently expressed in substantia nigra on the basis of UKBEC. Except for DIS3, these genes were also expressed poorly in publicly available data of the Genotype-Tissue Expression (GTEx) project [56]. Consequently, expression values for these genes were not used for construction of the UKBEC gene coexpression network (GCN). The remaining 20 genes were assessed for co-expression with known Mendelian PD genes (ATP13A2, FBXO7, LRRK2, PARK2, PARK7, PINK1, RAB39B, SNCA, and VPS35) using the UKBEC GCN (Additional file 1: Table S11 and Additional file 2: Figure S7). This approach highlighted three genes (UHRF1BP1L, GPATCH2L, and PTPRH) and the implicated networks were further interrogated based on gene set enrichment analysis using gene ontology (GO) terms to denote potential functions. UHRF1BP1L was coexpressed with SNCA, PINK1, GBA, and ATP13A2 in a network significantly enriched for genes with roles in synaptic transmission (p = 2.27 × 10 −11 ) as well as astrocytic (p = 8.18 × 10 −8 ) and dopaminergic neuronal markers (p = 3.98 × 10 −46 ). GPATCH2L was co-expressed with PARK7 in a network enriched for other neuronal genes (p = 3.41 × 10 −12 ) with cellular roles in metabolism of macromolecules (p = 3.82 × 10 −15 ). Lastly, PTPRH was assigned to a co-expression module including FBX07 and enriched for oligodendrocyte markers (p = 8.69 × 10 −22 ). Importantly, the implicated modules were preserved (Z.summary > =10) in the independent GTEx dataset.

Discussion
We report the results from WES analysis in the largest PD cohort studied to date. Assuming a recessive inheritance model, we identified 27 candidate genes harboring rare homozygous or compound heterozygous LoF variants. With the exception of ARSB, we did not identify recurrent recessive alleles in more than a single PD case. This result-potentially consistent with a highly heterogeneous genetic etiology for PD-creates significant barriers for statistical confirmation and genetic replication of novel PD susceptibility loci. Additional genetic samples were not available for segregation analysis and given the rarity and heterogeneity of the implicated alleles, definitive human genetic replication would likely require very large sample sizes, including many thousands of PD cases with either WES or gene resequencing. We therefore coupled our WES analyses with functional studies in both mammalian cells and experimental animal models, including Drosophila and C. elegans, in order to prioritize genes for future study. Our results highlight 15 out of the 27 gene candidates that interact with mitochondrial dynamics and five loci that enhance αsynuclein-mediated neurodegeneration. As discussed below, while these results highlight a promising subset of genes with potential links to PD-relevant mechanisms, we cannot exclude contributions from other implicated genes/variants. All of these data, including promising variants from the human genetic analyses and results of functional studies, will be a valuable resource for future investigations of PD genomics. Analyses of several other WES and complementary large-scale, genetic datasets provide additional evidence supporting replication for 7 out of 27 genes. Evidence from human genetics and functional studies converge to most strongly implicate five gene candidates discussed below; however, further investigation will be required to definitively link each of these loci to PD susceptibility and elucidate the relevant mechanisms. Nearly all of these genes are robustly expressed in brain [56], including the substantia nigra, thereby consistent with their implication in PD. A subset (GPATCH2L, UHRF1BP1L, and PTPRH) are co-expressed with established Mendelian PD genes in the substantia nigra based on analyses of UKBEC and GTEx expression data. In sum, our results define several promising new susceptibility loci candidates for further investigation and illustrate a powerful, integrative discovery strategy for future, large-scale PD genomic studies.
Mitochondrial mechanisms have been strongly implicated in PD risk and pathogenesis [28,30]. Following shRNA-mediated knockdown, 15 candidate recessive loci identified in our WES dataset showed effects on mitochondrial morphology and Parkin translocation to mitochondria in cell culture. We focus our initial discussion on three genes, GPATCH2L, UHRF1BP1L, and VPS13C, for which we discovered additional genetic evidence consistent with replication in independent cohorts. In the IPDGC cohort, a single PD case was identified with a homozygous stopgain variant (p.R362X) in GPATCH2L and a second individual with the identical, rare genotype was discovered in PPMI. This variant is reported with a low frequency of 0.003% in ExAC.
Although minimal clinical or demographic information is available within ExAC, this finding is compatible with population prevalence estimates for PD [20]. Nevertheless, genotyping of p.R362X in additional large PD case and control cohorts will be required to definitively establish an association with PD susceptibility. GPATCH2L knockdown both increased mitochondrial roundness and impaired Parkin translocation. The encoded protein, GPATCH2L, which has not previously been studied, contains a glycine-rich RNA-binding motif, the "Gpatch" domain [57]. GPATCH2, a paralog of GPATCH2L, is upregulated in cancer cells, localizes to the nucleus where it interacts with RNA-processing machinery, and manipulation in culture alters cell proliferation [58,59]. Notably, GPATCH2L is non-conserved in either the C. elegans or Drosophila genomes, precluding study of this candidate in these models. While our results using cellular assays implicate GPATCH2L in mitochondrial quality control mechanisms, further follow-up studies in mammalian model systems will be needed to confirm a role in PD pathogenesis.
Another promising gene, UHRF1BP1L, harbored a homozygous stopgain variant (p.K1376X) in a single IPDGC case. This is a novel variant, based on its absence from the ExAC cohort. Additional support for UHRF1BP1L as a bona fide PD locus comes from complementary analyses in both the IPDGC WES and Neu-roX datasets, documenting a burden of rare missense and LoF variants in association with disease risk. In the UKBEC, UHRF1BP1L was associated with a substantia nigra co-expression module including both SNCA and PINK1, reinforcing potential links with established PD genetic mechanisms. Indeed, UHRF1BP1L knockdown cause sharply reduced mitochondrial numbers and altered morphology. Interestingly, UHRF1BP1L encodes a protein bearing an amino terminal homologous to yeast VPS13 and studies in cell culture provide support for a role in retrograde transport from the endosome to the trans-Golgi network [60].
Notably, LoF in human VPS13C was also implicated by our analyses of IPDGC WES data and knockdown disrupted mitochondrial morphology. Besides the single IPDGC case, several families with autosomal recessive early onset Parkinsonism and dementia due to VPS13C were recently reported [21] and this locus also harbors common PD susceptibility variants based on GWAS [7].
Our findings of a potential mitochondrial role for VPS13C agree with those of Lesage et al. who additionally reported that VPS13C localizes to the outer membrane of mitochondria and LoF was associated with reduced mitochondrial membrane potential, fragmentation, and increased Parkin-dependent mitophagy. Importantly, VPS35, which causes autosomal dominant, late-onset PD, is similarly involved in endosomal trafficking [61] and has also recently been implicated in mitochondrial dynamics [62], including interactions with Parkin [63]. Like UHRF1BP1L, VPS13C and GPATCH2L are expressed in the brain, including within the substantia nigra; however, additional work will be needed to define their functions, including potential interactions with other established disease genes (e.g. VPS35, parkin) and requirements for mitochondrial maintenance.
Based on functional screening in Drosophila, four candidate genes from our WES analyses were implicated as LoF enhancers of α-synuclein neurotoxicity, which also has a central role in PD pathogenesis. We discuss the three genes (VPS13C, PTPRH, and ARSB) where additional human genetic evidence supports replication. Interestingly, besides its requirement for mitochondrial maintenance, RNAi-mediated knockdown of Drosophila Vps13 enhanced α-synuclein toxicity. In the single reported VPS13C PD case with a completed autopsy, neuropathological findings included abundant αsynuclein aggregates in both the brainstem and cortex [21]. Thus, VPS13C and associated endosomal sorting pathways (including VPS35) may represent a point of convergence for mitochondrial and α-synuclein-mediated PD mechanisms. Consistent with this, evidence for the impact of α-synuclein toxicity on mitochondria has recently emerged [28], including from studies in mammals [64].
In the IPDGC WES cohort, a single PD case was discovered with compound heterozygous LoF variants in PTPRH (p.Q887X and p.E200X). Both variants were also observed at low frequencies in the ExAC database (0.039% and 0.003%, respectively); however, they each met our prespecified threshold of < 1% based on the population prevalence of PD. Encoding a receptor protein tyrosine phosphatase, PTPRH (also called SAP-1) was first discovered for its potential association with gastrointestinal cancers [65,66] and remains poorly studied in the nervous system context. In studies of both vertebrates and invertebrates, receptor protein tyrosine phosphatases have been strongly implicated as key neural cell adhesion receptors, with roles in neurodevelopment and synaptic function, and other members of this family have been implicated in numerous neuropsychiatric disorders [67]. In Drosophila, RNAi-mediated knockdown of the conserved PTPRH ortholog, Ptp10D, enhanced αsynuclein-triggered retinal degeneration, but was not associated with substantial neurotoxicity independent of α-synuclein expression. Ptp10D mutant flies are also viable and fertile but demonstrate long-term memory deficits in behavioral assays [68]. More recent studies further implicated Ptp10D in neural-glial interactions during development of the central nervous system [69], potentially consistent with our findings that human PTPRH participates in a substantia nigra gene coexpression network strongly enriched for oligodendrocyte markers. Besides our discovery of homozygous LoF in PTPRH, further analyses of the IPDGC WES dataset, and the substantially larger, independent NeuroX cohort, implicate a burden of rare variants at this locus in association with PD susceptibility.
α-synuclein-induced neurodegeneration was also enhanced by knockdown of CG32191, a Drosophila homolog of ARSB. RNAi transgenic lines targeting three other conserved fly ARSB homologs showed consistent interactions with α-synuclein (Additional file 1: Table S7 and  Table S8). In the IPDGC cohort, we discovered four PD cases homozygous for a variant predicted to disrupt splicing of exons 1 and 2 in ARSB. Although the identified variant has not previously been documented in ExAC, we identified a single IPDGC control homozygote. Additional evidence supporting association of the ARSB gene with PD susceptibility comes from burden analysis in the independent NeuroX cohort. The surprisingly common ARSB splicing variant (rs138279020, MAF = 0.065 in IPDGC) is a single nucleotide insertion allele within a poly-A repeat, which we speculate might lead to inefficient capture in prior WES and possibly explain the absence of this variant from ExAC and the 1000 Genomes project reference. All four PD cases in our data with the homozygous ARSB splicing variant were confirmed by Sanger sequencing. Intriguingly, mutations in ARSB, encoding the lysosomal enzyme Arylsulfatase B, are associated with the recessive lysosome disorder, Mucopolysaccharidosis type VI (MPS VI, also called Maroteaux-Lamy syndrome), in which the glycosaminoglycan, dermatan sulfate, accumulates causing skeletal dysplasia and other heterogeneous manifestations [70]. Substrate accumulation and associated cellular stress has been reported to induce markers of impaired autophagy and mitochondrial dysfunction in ARSB deficient fibroblasts from MPSVI patients, as in other lysosomal disorders [71,72]. Importantly, Maroteaux-Lamy can be characterized by minimal or even absent clinical signs, leading to incidental discovery or diagnosis in adulthood, and such mild phenotypes have been suggested to accompany partial LoF with preserved low-level ARSB enzymatic activity [70,73,74]. Similar genotype-phenotype relationships have been documented for other lysosomal-storage disorders, including Gaucher's disease, which has established links with PD risk [75,76].
While a full accounting is outside the scope of this study, at least one of the three IPDGC cases for which records were available revealed clinical features potentially overlapping with MPS VI.
The strengths of our study include the largest PD WES discovery dataset assembled to date, complementary analyses in independent available cohorts to establish replication, and integration of promising human genetic findings with multiple functional assays relevant to PD mechanisms. Nevertheless, we also make note of several inherent limitations. In order to prioritize candidate genes for initial investigation, assumptions were made concerning the specific inheritance model (recessive) and stringent criteria were employed for variant filtering. In the future, it will be important to also consider the possibility of dominantly acting alleles; however, this substantially increases the number of variants to consider and also potentially complicates functional studies (i.e. compared with LoF screening using RNAi). Our study design excluded consideration of many nonsynonymous variants that could potentially cause loss (or gain) of gene function, along with certain nontruncating, frameshifting alleles (see "Methods"). Even with fairly stringent criteria for variant filtering and the assumption of recessive inheritance, we found evidence for substantial etiologic heterogeneity. Improved confidence for the discovery of PD causal variants will likely come from PD WES cohorts with significantly enhanced sample sizes, as well as increased numbers of adult controls, including those with careful neurological assessments to exclude mild PD symptoms. Indeed, most of the variants implicated by the IPDGC WES cohort were represented at low frequencies within the largest available public database, ExAC [77,78]; however, we have no information about potential PD manifestations in such individuals or even participant age.
Since no single cellular or animal experimental model is expected to universally recapitulate all potential facets of disease biology, we note that the employed functional screening assays are potentially liable to false-negative or false-positive findings. Importantly, experimental evidence of a genetic interaction with either mitochondrial dynamics or α-synuclein-mediated neuronal injury in our screening assays cannot in isolation confirm a role in disease causation, but rather serves to prioritize genes for future investigation. Out of the 27 candidate genes implicated in the IPDGC WES discovery analysis, 14 were insufficiently conserved for follow-up in αsynuclein transgenic flies. While simple animal models, including Drosophila or C. elegans, have made important contributions to our understanding of PD pathogenesis, selected mechanisms, such as the potential role of adaptive immunity or basal ganglia circuit dysfunction, cannot be addressed in invertebrates [79,80]. We were unable to confirm our findings from Drosophila in a published C. elegans model of α-synuclein toxicity. In the future, it will also be important to examine potential genetic interactions in other PD models, including LRRK2 transgenic flies or those containing mutations in other PD loci, such as VPS35 or parkin. While neuroblastoma cells offer the convenience of robust mitochondrial readouts, they are limited by their undifferentiated, transformed state distinct from that of postmitotic neurons. In the future, human-induced pluripotent stem cells, including those derived from individuals with PD, can be differentiated into dopaminergic or other neuronal types and potentially deployed for functional screening strategies. Additionally, genome-editing technologies may facilitate systematic functional evaluation of candidate disease-associated variants of unknown significance.

Conclusions
We have identified five excellent PD gene candidates (GPATCH2L, UHRF1BP1L, PTPRH, ARSB, and VPS13C), harboring homozygous or compound heterozygous LoF variants in PD exomes, demonstrating functional interactions with mitochondrial and/or α-synuclein-mediated mechanisms, and supported by evidence of replication in independent human datasets. The recent report [21] of additional PD families segregating LoF mutations in VPS13C along with other experiments supporting a role in mitochondrial mechanisms significantly strengthens the evidence in support of this gene in PD and validates our overall approach. These loci are well-suited for future efforts directed at human genetic replication and in-depth functional dissection. We also make available results, including findings from human genetic analyses and functional studies in most cases, on 22 other promising loci. These data will serve as a valuable reference for ongoing and future PD genetic studies. More broadly, our approach of integrating high-throughput sequencing in PD case/control cohorts with parallel systematic screening in cells and model organisms for functional prioritization exemplifies a powerful experimental strategy with great promise for future genomic studies of PD and other human disorders.

Genetic analyses
Whole-exome sequencing WES was performed on 1148 PD cases and 503 neurologically healthy controls of European descent. All participants provided written informed consent. Relevant local ethical committees for medical research approved participation in genetic studies. If PD patients were prescreened for known pathogenic mutations, they were excluded for exome sequencing when having such a variant. The cases were diagnosed with PD at a relatively young average age of 40.6 years (range, 6-56 years), of which approximately 37% reported a positive family history. The neurologically healthy controls are on average 48.2 years of age (range, 10-97 years). A more extensive overview of demographic information is reported in Additional file 2: Figure S8.
Due to improvements of the exome sequencing protocol over time, the exome sample libraries were prepared with different capture kits. For this study, three different capture kits were used: Illumina TruSeq (San Diego, CA, USA) (62 Mb target); Roche (Basel, Switzerland) Nimblegen SeqCap (44.1 Mb target); and Agilent (Santa Clara, CA, USA) SureSelect (37.6 Mb target), which captured 96%, 81%, and 71% of the targeted exome at least ten times, respectively (Additional file 1: Table S12). Exome libraries were sequenced on a HiSeq 2000 (Illumina, San Diego, CA, USA). The Burrows Wheeler Aligner MEM v0.7.9.a [81] was used to align the 100-bp paired-end reads to the human reference genome build hg19. We called the single nucleotide variants (SNVs) and insertions/deletions (indels) for all samples simultaneously using Genome Analysis Toolkit (GATK) 3.x [82], followed by the exclusion of low-quality variant calls not passing the default GATK filters. Individual genotypes were removed with genotype quality Phredscores below 40. ANNOVAR [83] was applied to annotate the variants with information concerning variant type (valid annotations when Refseq in concordance with UCSC), MAF in the general population, and predictions of the variant's effect on gene function, implementing CADD [84].

Variant identification in IPDGC WES dataset
Considering the worldwide prevalence of 0.041% for PD in the age range of 40-49 years [20], we selected rare variants with a MAF < 1% (corresponding to a homozygous frequency of 0.01%) in the European population. Because the specified 0.041% of the population with young-onset Parkinson's disease (YOPD) is not caused by one shared genetic factor, we expect a homozygous frequency of 0.01% to be an adequate cutoff, which would be able to determine variants present in approximately 25% of the YOPD population. As a comparison to the most common genetic cause of YOPD, parkin [85], the most frequent mutation is an exon 3 deletion, which has been identified in 16.4% of YOPD patients [86]. Using ANNOVAR [83], all variants were annotated with MAF information of ESP6500si (European American population) [87], 1000 Genomes Project (European population of April 2012 version) [88], and the ExAC browser (non-Finish European population) [77,78]. When no public allele frequency was available for homozygous variants, the in-house control dataset of 503 individuals was used as a reference for the general population. Homozygous variants were excluded when being common (>1%) in controls or having a relative higher frequency in controls than in cases. KGGseq [89] was used to count the number of homozygous variants for the cases versus controls.
In addition to the population allele frequency filters, we only selected SNVs and indels affecting the position of the stop codon or located at a splice site (within 2 bp of splicing junction), which are variants expected to result in a loss of gene function. As the aim of this study was to validate our approach to identify high promising PD candidate genes, rather than discovering all putative PD genes present within our WES dataset, we set a conservative selection criteria by only including frameshifts that caused an immediate stopcodon at the position of the indel. Splice-site variants were only considered when being adjacently located to an exon that is coding for amino acids. As a final filter for the homozygous variants, we manually excluded variants that failed GATKVQSR and hard filtering. Quality predictions based on the ExAC database are more adequate, as it includes~37× more samples than our dataset.
For the putative compound heterozygous mutations, both variants should be located within the same transcript and at least one allele should contain a LoF variant. The second variant could be: (1) a LoF variant; or (2) a missense variant that is absent in dbSNP137 [90] database and with a CADD score > 20 (predicted to belong to the 1% most deleterious variants of the total genome), indicating a pathogenic effect. The latter two filter criteria should decrease the chance of including benign missense variants. The putative compound heterozygous variants were identified by scoring the number of variants per sample per gene with PSEQ (https://atgu.mgh.harvard.edu/plinkseq/pseq.shtml). The reads of variants located within approximately 200 base pairs were visualized in IGV [91] to judge the authenticity of the compound heterozygous variant. When the different variants are located on distinct alleles, the combination of variants was considered a true compound heterozygous mutation.
All recessive variants that remained after the filtering procedures were Sanger sequenced to confirm the variant calls generated by the exome pipeline.

Variant aggregation analyses in the IPDGC WES dataset
SKAT-c [92] was used to analyze the burden of coding variants for each identified gene. Both rare variants only and the joint effect of common and rare variants were tested. Because variant aggregation tests are prone to coverage differences, capture usage and population stratification, we performed a more stringent individual and variant QC, resulting in a reduced dataset of 1540 samples (1062 cases and 478 controls) covering 268,038 variants. Individuals were excluded when failing gender test, showing evidence of relatedness, having dubious heterozygosity/genotype calls, or being a population outlier. Variants were removed when having a genotype missingness > 5%, a Hardy-Weinberg equilibrium p value < 1e −6 or a p value for non-random missingness by phenotype < 1e −5 . Variants were only considered for association analyses if located in a region targeted by all different capture kits.
Benign variants have the potential to dilute a true association signal of the combined effect of functional variants in a gene. We therefore annotated variants with ANNOVAR [83] to group variants according to their type or predicted pathogenicity. Two subsets of variants were examined: (1) predicted pathogenic variants, including LoF variants and missense mutations that are predicted to be pathogenic by the CADD framework; and (2) missense variants, including amino-acid changing and LoF variants.
As suggested by SKAT, we selected a MAF cutoff of 0.018, which is based on the total sample size and separates rare and common variants. Common variants (MAF > 0.018) were pruned using PLINK [93] (indep settings 50 5 1.5). Due to confounding factors (usage different capture kits and multiple CEU populations), 20 principle components, 10× coverage, and gender were taken into account as covariates. Both a traditional one-sided burden (assuming all variants to have a harmful effect) and a two-sided SKAT test (allowing variants to be either damaging or protective) were performed. Empirical p values were calculated by comparison of the nominal p value to 10,000 permutations of affection status. Genes with an empirical p value < 0.05 were considered to be significantly associated to PD.

Genetic replication 1: variant identification in PPMI WES dataset
We obtained permission to access WES data generated by the PPMI [51]. After standard variant and individual QC, the dataset includes 477,512 variants for 462 PD cases and 183 neurologically healthy controls. A similar search for homozygous and putative compound heterozygous LoF variants, as described for the original IPDGC WES dataset, was applied for this second independent PPMI WES dataset by using ANNOVAR [83] and KGGSeq [89].

Genetic replication 2: GRIP genetic isolate
The southwest of the Netherlands contains a recently isolated population which is part of the GRIP program [52]. A total of 39 PD index cases and 19 controls of this isolate were subjected to whole-genome sequencing to explore the genetic factors underlying PD within this geographic region. Missense and LoF variants which were present in at least two index cases and a MAF < 0.1% in public databases (ExAC, 1000G dbSNP138, and ESP6500) were considered as potential PD variants. Genes harboring such variants were surveyed for overlap with our list of candidate genes.

Genetic replication 3: variant aggregation analyses in NeuroX
We investigated the genetic burden of common and rare variants in these genes by using the independent Neu-roX dataset, which is generated by a custom-made genotype array [53] using a backbone of~240,000 standard Illumina Exome content as a basis with an additional 24,000 variants that are suggested to be involved neurological diseases. The same procedures as described for the burden test in the IPDGC WES dataset were applied. After QC, a total of 6801 PD cases and 5970 neurologically healthy controls remained with highquality genotype data for 178,779 variants. Based on the sample size, the MAF cutoff was 0.0063.

Genetic replication 4: overlap PD risk loci
Approximately 70% of the participants included in this study have also been included in previous published GWAS [7,94,95]. To explore the possibility that our candidate genes might also contain common risk variants increasing the risk to develop PD, next to the identified LoF variants with assumed high penetrance, we searched for GWAS loci within 1 Mb upstream and downstream of the gene of interest using the recent PD meta-analysis through pdgene.org [7]. Significant associations and suggestive p values < 1e-4 were considered. To understand the underlying linkage disequilibrium structure, LocusZoom [96] was applied to visualize the European 1000G recombination events for the candidate genes that were closely located to a GWAS locus.

Gene co-expression analyses
We constructed gene co-expression networks (GCN) from two different substantia nigra datasets using the R software package, WGCNA (weighted gene coexpression network analysis) [97]. This was followed by the same post-processing of WGCNA gene modules based on k-means: a heuristic to rearrange misplaced genes between modules using the number of modules detected by the standard WGCNA as k and the eigengenes as centroids. The first GCN is based on 19,152 genes from 65 substantia nigra control brains from the UKBEC consortium. The gene expression profiles are based on Affymetrix Exon 1.0 ST Arrays [98]. The second GCN is based on 63 samples from the same tissue, GTEx [56] V6 gene RPKM values. Genes were filtered with a RPKM based cutoff of 0.2 and missingness < 30% resulting in the analysis of 18,363 Ensembl genes. We corrected this gene expression dataset for the principal components significantly correlated with GTEx samples covariates using the Swamp R package. WGCNA gene modules were functionally annotated with gProfileR [99] R software package using GO database, accounting for multiple testing with gSCS's gProfiler test. Background genes used were all genes in the substantia nigra GCN. Cell type enrichment analysis was performed with the userListEnrichment function with brain specific enrichment, implemented in the WGCNA R package. Preservation analysis of UKBEC GCN in GTEx's substantia nigra profiles was performed with WGCNA's preservation analysis. Results are reported with the Z.summary statistic [100]. Graphical representation of the GCN subnetworks were constructed by using the 27 candidate genes and known PD genes (ATP13A2, FBXO7, LRRK2, PARK2, PARK7, PINK1, RAB39B, SNCA, and VPS35) as seed genes. For each of these genes sequentially, in a round robin fashion, we added the gene with highest adjacency, based on TOM values, and the links this gene has with all the seed genes. We used Cytoscape 3.3 for display with a Kamada-kawai layout algorithm [101].

Human cellular screen shRNA virus production
Bacterial glycerol stocks containing the shRNA vectors (Sigma, St. Louis, MO, USA; TRC1 and 1.5) were grown overnight in Luria-Bertani media containing 100 μg/mL of ampicillin (Sigma-Aldrich, St. Louis, MO, USA). We selected at least five shRNA clones per gene. Endotoxinfree shRNA plasmids were extracted according to the manufacturer's protocol (Zymo, Irvine, CA, USA; ZR Plasmid Miniprep Classic kit). Lentivirus was produced as follows: HEK293T packaging cells were seeded at a density of 4 × 10 Δ5 /mL (100 μL per well) in cell culture media, Optimem (Invitrogen, Carlsbad, CA, USA) containing 10% fetal bovine serum (FBS) in 96-well tissue culture plates. Cells were incubated for 24 h (37°C, 5% CO 2 ). Each well was subsequently transfected with 100 ng of shRNA plasmid, 90 ng of packaging plasmid (pCMV-dr8.74psPAX2), and 10 ng of envelope plasmid (VSV-G/pMD2.G) combined with 0.6 μL of FugeneHD (Promega, Madison, WI, USA) in a total volume of 10 μL. Transfection efficiency was monitored using the pKLO.1 GFP plasmid (Sigma, St. Louis, MO, USA) and had to be greater than 90%. Sixteen hours after transfection, media was refreshed and supernatant harvested after a further 24 h. Virus was stored at −80°C.
To ensure successful lentivirus production, HEK293T cells were plated out at a density of 2 × 10 Δ5 /mL (100 μL per well) in Optimem containing 10% FBS and 15 μg/ mL of protamine sulfate (Sigma, St. Louis, MO, USA). Cells were infected with 10 μL, 25 μL, and 50 μL of lentivirus. The following day, media was refreshed with media containing 2.5 μg/mL of puromycin. After a further three days, plates were manually inspected to determine cell viability of each well. If more than 10% of the wells contained dead cells, lentiviral production for that plate was repeated.

Cell-based screening assays
Four phenotypes were studied in two different assays: Mitochondrial morphology [33] was examined in a single assay with BE(2)-M17 cells, which were expanded and plated at a density of 5 × 10 Δ4 /mL (100 μL per well) in 96-well black CellCarrier plates (PerkinElmer, Waltham, MA, USA) pre-pipetted with 25 μL of the lentivirus. On day 2, media was refreshed with DMEM/ F12 (with 10% FBS) supplemented with 2 μg/mL puromycin. On day 4, the cells were incubated with 100 nM MitoTracker Red CMXros, 100 nM MitoTracker DeepRed (Molecular Probes), and 1 μg/mL Hoechst for 20 min at room temperature. Media was refreshed and the cells were incubated for a further 2 h before fixation with 4% paraformaldehyde (pH 7.3).We examined three parameters commonly used for quantification of mitochondrial morphology: mitochondrial number, axial length ratio, and roundness.

Image acquisition and analysis
Image acquisition was carried out using the automated confocal imaging system, Cell Voyager CV7000 (Yokogawa, Tokyo, Japan). The mitochondrial morphology assay involved a total of 60 fields per well using a 60× water immersion objective lens for improved resolution. Nuclei were imaged utilizing the 405 nm laser, Mitotracker CMXros utilizing the 561 nm laser, and mitotracker DeepRed utilizing the 640nM laser. For the translocation assay, a total of 60 fields per well were taken using a 20× objective lens. Nuclei were imaged utilizing the 405 nm laser, Parkin-GFP utilizing the 488 nm laser, and mitotracker DeepRed utilizing the 640 nm laser.
Images were stored and analyzed by the Columbus Image Data storage (PerkinElmer, Waltham, MA, USA). Image quality control: only well-segmented interphase cells were included. Mitotic, apoptotic badly segmented, and out-of-focus cells were excluded. Cells touching the border of the image were removed to avoid analysis of artificially cropped cells. All wells where the perturbation strongly decreased cell number were disregarded. Morphological characteristics and signal intensities were quantified and results exported to R package CellHTS2. To quantify mitochondrial morphology, the median mitochondrial number per object, roundness, axial length ratio, and intensity of mitorackerCMXros (mitochondrial potential) were calculated.
To differentiate between CCCP-treated Parkin stable cell lines and untreated cells, the number of spots formed on mitochondria was calculated. Cells containing more than two spots were considered positive for Parkin translocation. The ratio of cells positive for translocation versus the number of cells negative for translocation was calculated per well to give a cell number independent measure of Parkin translocation. CCCP-treated cells transduced with a scrambled shRNA and CCCP-treated cells transduced with shRNA targeting PINK1 were included on each plate. An average Z' of 0.61 was calculated for the entire screen, with a minimum Spearman's Rank correlation between replicates of 0.8.
Data from high content imaging assays were analyzed using the BioConductor CellHTS2 package for the R software environment (R version 2.11.1, BioConductor version 2.6). Data were normalized to negative controls on a per-plate basis to minimize plate-to-plate variation. For the Parkin-translocation screen, negative controls were considered as wells which had been transduced with lentivirus encoding a scrambled sequence and had been treated with CCCP. For the remaining screens, negative controls were considered as wells that had been transduced with lentivirus encoding a scrambled sequence.

Statistical analysis
For each of the shRNA screens, each assay plate was completed with six replicates to enable the detection of subtle effects and minimize false negatives. For each shRNA, Mann-Whitney U tests with false discovery rate (FDR) correction were performed and the robust strictly standardized median difference (SSMD*) was calculated [102]. Effects were considered significant when the SSMD* normalized effect of shRNA treatment was greater than or less than 4 or −4 and at least two independent clones per gene showed a significant effect. Seed sequences were manually inspected to ensure no common sequence.
For each assay, a positive control plate containing known modifiers of the phenotype in question was run in parallel to ensure the assay worked optimally. The robust Z-factor was calculated as previously described [103], using the normalized values for the controls from all plates. For the mitochondrial assay, known regulators of mitochondrial fission or fusion were included. For the Parkin translocation assay, TOMM7 and PINK1 were used as positive controls.

shRNA knockdown validation
Cell culture and shRNA mediated knockdown were performed as described above. Cells were harvested for RNA isolation using the SV 96 Total RNA Isolation System (Promega, Madison, WI, USA) according to the manufacturer's protocol. Total RNA primed with oligo dT (Qiagen, Hilden, Germany) was used for cDNA synthesis with Superscript III RT (Life Technologies, Carlsbad, CA, USA) according to the manufacturer's specifications. Quantitative polymerase chain reaction (PCR) was carried out in triplicates on a ViiA7 realtime PCR system using SYBR Green PCR master mix (Life Technologies, Carlsbad, CA, USA) and 0.04 μM specific primer pairs for all targets. For multiple exons, gene primers were designed to span exon-exon junctions or to be separated by one intron on the corresponding genomic DNA. Normalized relative quantities were calculated with HMBS as housekeeping gene by using the qbasePLUS software (Biogazelle, Gent, Belgium) and knockdown efficiencies per clone were calculated using scrambled control wells (n = 3) as a reference.

Animal models Orthologue selection
The function of the candidate genes and their involvement in neurodegeneration was tested in two animal models; C. elegans and Drosophila. The DRSC Integrated Ortholog Prediction Tool (DIOPT) [104] was used to identify the conserved homologs of human genes in the nematode or fly genomes. Orthologues were defined based on a minimum unweighted DIOPT score of 2, such that two independent bioinformatics algorithms were in agreement concerning the orthologue pairing. In cases where multiple genes were identified as potential orthologues for a given human gene, we carried forward all candidates with DIOPT scores greater than 3.

Fly stocks and husbandry
The human α-synuclein transgenic flies with codonoptimization for Drosophila (UAS-α-synuclein line #7), were recently described [48] and are available from the Bloomington Stock Center (Bloomington, IN, USA). RNAi transgenic lines were obtained from the Vienna Drosophila RNAi Centre (Vienna, Austria) or from Bloomington for the Harvard Transgenic RNAi Project. All RNAi lines used for this study are detailed in Additional file 1: Table S8. The GAL4-UAS system [105] was used for ectopic co-expression of both the α-synuclein and RNAi transgene. The Rh1-Gal4 driver line (secondchromosome insertion) has been previously described [48,106]. For screening, individual RNAi (IR) lines or Canton S (as a control) were crossed to animals of the genotype: Rh1-Gal4/CyO; UAS-Syn/TM6B. All crosses were established at 18°C and F1 experimental animals (Rh1-Gal4 / UAS-IR; UAS-Syn / + or Rh1-Gal4 / +; UAS-Syn / UAS-IR) were shifted to 25°C within 24 h of eclosion and aged 15 days. To examine for potential αsynuclein independent retinal degeneration, each UAS-IR transgenic line was separately crossed to Rh1-Gal4, using identical conditions. Based on the results of the primary RNAi screen, we also obtained from Bloomington available mutant alleles for the fly orthologues of PTPRH: Ptp10D and Ptp4E. The following additional stocks were used: (1) w, Ptp4E 1 ; (2) w, Ptp10D 1 ; (3) yw, Ptp4E 1 , Ptp10D 1 / FM7C. All experimental results were quantified and photographed in female animals.

Characterization of retinal degeneration in Drosophila
For optical neutralization (also known as the pseudopupil preparation), fly heads of 15-day-old animals were immersed in mineral oil and transilluminated using a 40× objective on a Leica (Wetzlar, Germany) DM6000B light microscope. Eyes from at least four animals were examined per genotype (at least eight retinae). All candidate modifier lines and controls were scored blinded by three independent examiners. The penetrance of degeneration caused by each RNAi line was calculated by dividing the number of abnormal retinae, showing evidence of either reduced rhabodomere numbers or altered refraction of light indicative of vacuolar changes, by the total number of retinae examined. For identification of genetic enhancers, we required two independent RNAi lines targeting non-overlapping sequences with 50% or greater degenerate retinaes observed using the pseudopupil assay. Following our initial screen of two RNAi lines targeting each of 18 fly gene homologs, additional RNAi lines and mutant strains were evaluated, where possible, for the most promising candidates. For each enhancer gene, the strongest RNAi line was independently re-tested for consistency using the pseudopupil assay and retinal histologic sections were also performed for further confirmation. To examine for potential α-synucleinindependent retinal degeneration, the strongest RNAi modifier for each gene was separately crossed to Rh1-Gal4 and histologic sections were examined for 15-day-old animals. For histology, fly heads from 15-day-old animals were fixed in 8% glutaraldehyde and embedded in paraffin. Tangential (3 μm) retinal sections were cut using a Leica Microtome (RM2245) and stained with hematoxylin and eosin. Retinae from at least three animals were examined and quantified per genotype. Enhancement of α-synucleininduced retinal degeneration was quantified based on the severity of retinal vacuolar changes seen in stained histologic sections. We examined representative photographs taken with a 40× objective from well-oriented, intact tangential sections at a depth in which the retina achieves maximal diameter. Using ImageJ software [107], we recorded the area occupied by all vacuoles with a diameter greater than 4 μm and divided by the total retinal area to compute a percentage. Statistical comparisons were implemented using a two-tailed student's t-test. α-synuclein expression levels were determined by immunoblot (clone 42, 1:1000, BD Transduction Laboratories, San Diego, CA, USA).

Phenotype assays for basal phenotypes in C. elegans
The systematic RNAi screen was carried out as described [109]. RNAi clones targeting the genes of interest (9/27; Additional file 1: Table S3) were obtained from the Vidal cDNA RNAi library or the Ahringer RNAi library. Bacteria expressing the empty vector L4440 were used as negative control. For the survival assay, we employed a sterile strain, CF512 (fer-15(b26); fem-1(hc17)) [110]. To induce sterility, eggs were collected and kept in M9 medium at 25°C overnight until they reached L1 arrest. Approximately 25 L1 worms were added to plates seeded with RNAi clones of interest and empty vector control and allowed to develop to adults at 25°C. At day 9 of adulthood at 25°C, when approximately half of the worms grown on control plates were dead, the survival of worms on RNAi plates was determined.
The offspring and developmental phenotypes were tested in a single assay. N2 worms were grown at 20°C until L4 stage on OP50 bacteria and then transferred to plates seeded with RNAi clones of interest and empty vector control. At day 2 of adulthood, ten worms were put onto a new plate seeded with the same RNAi clone for 1 h to produce progeny. The plates containing the progeny were kept at 20°C until the F1 generation of the control worms reached L4 stage. The number and developmental phenotypes of the offspring were scored at the last time point using a dissecting microscope. A one-sided student's t-test was used to determine the significant changes compared to controls. All counting was done in a blind fashion in which the identity of the samples was concealed and each experiment was performed in three biological replicates.
Motility assay for α-synuclein toxicity model in C. elegans Animals were age-synchronized by hypochlorite treatment, hatched overnight in M9 buffer, and subsequently cultured on NGM containing isopropylthio-β-D-galactoside (IPTG, 15 mg/L) and 50 μg/mL ampicillin (plates for RNAi treatment). Plates were seeded with RNAi bacteria. Prior to the experiment, the plates were kept at room temperature for two days to allow the production of dsRNA by the bacteria. On day 1 of adulthood (one day after larval stage L4), animals were transferred to RNAi plates containing 5-fluoro-2'deoxy-uridine (FUDR) to prevent the offspring from growing. RNAi clones targeting C54D2.4 (ARSB), T08G11.1 (VPS13C), and F44G4.8 (PTPRH) were used from the Ahringer C. elegans RNAi library. All clones were verified by sequencing. RNAi clones for the C. elegans orthologue F21F3.7 (TMEM134) was not available.
Animals were scored at day 4 and day 8 of adulthood. Animals were placed in a drop of M9 and allowed to adjust for 30 s, after which the number of body bends was counted for another 30 s. Fifteen animals were scored per condition. Relative body bends were calculated by normalizing to control values. Error bars are showing the standard error of mean. Assays were repeated in three independent experiments and the relative body bends of one representative experiment is shown. The relevant funding bodies (above) did not participate in the design of the study; collection, analysis, or interpretation of data; or in drafting the manuscript.
Authors' contributions IEJ: Conception/design, data analysis/acquisition, data interpretation, and writing manuscript. HY: Conception/design, data analysis/acquisition, data interpretation, and writing manuscript. SH: Data analysis/acquisition, data interpretation, and writing manuscript. ML: Data analysis/acquisition, data interpretation, and writing manuscript.