Abnormal molecular signatures of inflammation, energy metabolism, and vesicle biology in human Huntington disease peripheral tissues

Background A major challenge in neurodegenerative diseases concerns identifying biological disease signatures that track with disease progression or respond to an intervention. Several clinical trials in Huntington disease (HD), an inherited, progressive neurodegenerative disease, are currently ongoing. Therefore, we examine whether peripheral tissues can serve as a source of readily accessible biological signatures at the RNA and protein level in HD patients. Results We generate large, high-quality human datasets from skeletal muscle, skin and adipose tissue to probe molecular changes in human premanifest and early manifest HD patients—those most likely involved in clinical trials. The analysis of the transcriptomics and proteomics data shows robust, stage-dependent dysregulation. Gene ontology analysis confirms the involvement of inflammation and energy metabolism in peripheral HD pathogenesis. Furthermore, we observe changes in the homeostasis of extracellular vesicles, where we find consistent changes of genes and proteins involved in this process. In-depth single nucleotide polymorphism data across the HTT gene are derived from the generated primary cell lines. Conclusions Our ‘omics data document the involvement of inflammation, energy metabolism, and extracellular vesicle homeostasis. This demonstrates the potential to identify biological signatures from peripheral tissues in HD suitable as biomarkers in clinical trials. The generated data, complemented by the primary cell lines established from peripheral tissues, and a large panel of iPSC lines that can serve as human models of HD are a valuable and unique resource to advance the current understanding of molecular mechanisms driving HD pathogenesis. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02752-5.

Background The CAG repeat expansion mutation in exon 1 of the huntingtin gene (HTT) causes Huntington disease (HD), a progressive movement disorder with dementia and behavioral abnormalities [1]. Huntingtin is expressed ubiquitously throughout the body, and HD does not exclusively affect the brain. In addition to a particular vulnerability of striatal and cortical neurons to the mutation, several lines of evidence suggest that mutant HTT influences molecular function in peripheral tissues including skeletal muscle, peripheral blood monocytes, liver, and others [2]. Hence, the investigation of peripheral tissues and cells holds promise to reveal important insight into the physiological function of HTT that may include transcriptional processes, protein trafficking, and vesicle transport [3,4]. It can also reveal peripheral signatures of HD that track with the evolution of the expression of HD in the central nervous system. This could have potential as an easily accessible and low invasive biomarker in clinical trials in HD. Finally, primary cell cultures established from peripheral tissues such as skin can be used to generate pluripotent stem cells which, when differentiated into neuronal and non-neuronal cells, can serve as human models of HD.
The objective of the Multiple Tissue Monitoring in Huntington disease (MTM-HD) study was to comprehensively examine the molecular biology at the RNA and protein level of peripheral tissues from HTT CAG repeat expansion mutation carriers and sex and age matched healthy volunteers. All participants were also participants in Enroll-HD (www. enroll-hd. org), and extensive demographic and phenotypic data were collected according to the Enroll-HD protocol. The analyses of proteomics and transcriptomics datasets in conjunction with the available deep clinical phenotype information can give valuable insights into more complex signatures in human carriers of the HD gene mutation and provide a resource of peripheral tissues and primary cells with accompanying multi-layer 'omics datasets.
The main potential of these types of analyses is in relating biological signature(s)either in individual modalities and/or tissues or across modalities and tissues-with CAG repeat length as the severity of the expression of the HD gene mutation. Lastly, the insight gained through these analyses can inform future research with the aim to better understand the nature of any peripheral biological signature and the impact an intervention, e.g., in a clinical trial, can have on them.

Results
Skeletal muscle, adipose tissue and skin from an open biopsy of the quadriceps femoris muscle, as well as blood, was collected from 20 healthy controls, 21 pre-symptomatic, and 20 early motor manifest HD (UHDRS total functional capacity stages 1 and 2) patients. Additionally, primary fibroblast and myoblast cell lines have been established. In the following we present the demographic and clinical data for the participants, the initial findings of transcriptional and proteomic dysregulation in tissues, as well as the analysis of the generated fibroblasts by RNAseq and HTT-centered single nucleotide polymorphism (SNP) sequencing. The analysis of the epigenetic clock in the fibroblast lines has already been published [5]. As an additional resource, pluripotent stem cell lines (iPSCs) have been generated from the primary fibroblast lines and are available for further studies.

Participant's demographic and clinical data
At both sites, a total of 24 unaffected individuals (control), 23 pre-manifest (known mutation carriers, before disease onset; pre-HD), and 21 early HD (early-HD) patients were recruited into the study that was approved by the Ethics Review Boards of Ulm University and University College London. Following a detailed description of the study participants gave informed consent according to the Declaration of Helsinki. Participants attended two visits: the screening and sampling visits, which were no more than 1 month apart. At the screening visit, participants were assessed for inclusion and exclusion criteria (Table 1). Ultimately, 20 control, 21 pre-HD, and 20 early-HD participants were included in the sampling visit of the study. Summary data for these 61 participants are shown in Table 2. For the full dataset including statistics, see Additional file 1 (MTM_all_metadata.xlsx). Participants from all three groups were evenly recruited across the two sites and were matched for gender, as well as age (Table 2). Neither age, weight, height, nor BMI were significantly different between the groups (one-way  ANOVA). CAG repeat sizes of the expanded allele were not significantly different in the pre-HD and early-HD groups while DBS was higher in the early-HD group than in the pre-HD group, as expected ( Table 2). All clinical measures were only significantly different in the early-HD group compared to the control or pre-HD group (Table 2 TFC, TMS, FA, IS; one-way ANOVA with Bonferroni post hoc test, p < 0.001).

Transcriptomic analysis of adipose and muscle tissue
We next analyzed the transcriptome of adipose and muscle tissue from the biopsies. We generated 50 bp long, paired-end reads with a depth of approximately 140 to 200 million reads per sample of rRNA depleted total RNA (see also the "Method" section for more detail and supplementary RNAseq count files: Additional file 2 for adipose counts; Additional file 3 for muscle counts). In total, we found 78 genes in adipose (Fig. 1A) and 53 genes (Fig. 1B) in muscle to be significantly dysregulated (Benjamini-Hochberg adjusted p-value < 0.05). We only considered genes with counts in at least 9 samples in at least one group (50%), in at least one comparison (pre-HD vs. controls; early-HD vs. controls; early-HD vs. pre-HD). The full analysis files can be found in Additional file 4 (adipose) and Additional file 5 (muscle). Notably, only very few genes were regulated in the same way in the pre-HD and early-HD groups when compared to controls (Fig. 1A, B, similar regulation). Several genes were differentially regulated in both groups (Fig. 1A, B, different regulation). One gene in this group, TBC1D3D (ENSG00000274419), was found to be differentially regulated in Only genes with counts in 9 or more samples in at least one group for at least one comparison (pre-HD vs. controls; early-HD vs. controls; early HD vs. pre-HD) are shown. Co-variates (RNAseq batch, site of sampling (site), gender, BMI and the CAG allele sizes) are shown above the expression matrix. A 78 genes in adipose were significantly dysregulated. B 53 genes in muscle were significantly dysregulated. C 21 genes in the fibroblast lines were significantly dysregulated. D-F Gene ontology enrichment analysis with Enrichr (see the "Methods" section). Non-redundant biological process enrichments (GO Biological Process 2018) and potential regulators (ChEA 2016, ENCODE 2015, TTRUST 2019) are shown. Only terms with p < 0.05 (Fisher exact test) and at least 2 genes for the enrichment were considered. The combined score of enrichment is shown in brackets. GO enrichment for adipose (D), muscle (E), and fibroblast (F) RNAseq data. See also Additional files 2, 3, 4, 5, 6 and 7 for full data both tissues. In both tissues, the gene was upregulated in the early-HD group compared with the pre-HD group (adipose: log2-fold change 4.47, p adj = 0.003; muscle: log2-fold change 1.51, p adj = 0.041). In both tissues, sets of genes were exclusively dysregulated in either the pre-HD or early-HD groups (Fig. 1A, B, pre-HD only and early-HD only).
Gene ontology enrichment in adipose tissue for the different comparisons showed that REST (RE1-silencing transcription factor) target genes are dysregulated only in the pre-HD group (Fig. 1A), while protein sumoylation was only dysregulated in the early-HD group (Fig. 1A). Alteration in protein sumoylation is a well described mechanism in HD, both for HTT itself, as well as in general [6][7][8][9][10][11]. Also, REST has been implicated in HD pathogenesis [12][13][14].
Gene ontology enrichment analysis in muscle points towards disrupted homeostatic pathways, especially in the pre-HD group (Fig. 1B). Interestingly, PAX6, an important regulator of diverse peripheral and central nervous system processes, was highly and progressively upregulated in both HD groups (Fig. 1B, similar regulation), suggesting a compensatory mechanism of muscle regeneration in response to mutant HTT expression. Similar to our finding, in the muscle of R6/2 mice [15], a higher level of satellite cells (skeletal muscle progenitor cells crucial for repair and regeneration) has been found, concomitant with an increase in PAX7, another member of the PAX transcription factor family that is key for normal satellite cell function in skeletal muscle regeneration [16]. The enrichment analysis of the dysregulated genes in the muscle pre-HD group suggested PPARA (peroxisome proliferator activated receptor alpha) as a regulatory protein with very high confidence (Fig. 1B).

Transcriptomic analysis of fibroblast lines
We generated 100 bp long, paired-end reads with a depth of approximately 40 to 60 million reads per sample of polyA-enriched total RNA (RNAseq counts: Additional file 6; DESeq2 analysis Additional file 7). Evaluation was performed in the same way as described for tissues above. Only few genes were significantly dysregulated between pre-HD and early-HD groups as compared to controls (Fig. 1C). Consequently, gene ontology analysis resulted in few enriched terms (Fig. 1C). Notably, TBC1D3D (ENSG00000274419), which was found to be dysregulated in both the adipose and muscle datasets (see above and Fig. 1A, B), was also dysregulated in the fibroblast dataset. Its expression pattern was consistent between all three datasets with a downregulation in the pre-HD group and a restoration towards control levels in the early-HD group.

Proteomic analysis of skin, adipose, and muscle tissue
Next, we generated proteomics data from the skin, adipose tissue and muscle biopsies. In adipose tissue, we identified 1347 proteins (Additional files 8 and 9), in muscle 2671 proteins (Additional files 10 and 11), and in skin 4640 proteins (Additional files 12 and 13). After correction for confounding variables (TMT-plex, site of sampling, gender, age and BMI; see the "Methods" section for details), we performed dysregulation analysis using ROTS (reproducibility-optimized test statistic) [17]. ROTS uses a non-predetermined optimized test statistic for each dataset by utilizing bootstrapped datasets that preserve the individual top-ranked features and maximizing the overlap between those. The data files for peptide and protein analysis can be found in the Additional files 14 and 15 (adipose), Additional files 16 and 17 (muscle), and Additional files 18 and 19 (skin). The analyses are shown as volcano plots for each of the pairwise comparisons for each tissue in Fig. 2 with significantly dysregulated proteins (p-value < 0.001) highlighted. Proteins that were also determined to be dysregulated based on analysis of permutated sample lists are highlighted in bold (FDR < 0.05).
In summary, for all three tissues we observed a progressive increase of the number of dysregulated proteins from pre-HD to early-HD stage. In adipose tissue, there was no overlap of commonly dysregulated proteins in the three comparisons ( Fig. 2A). In muscle, syntaxin-binding protein 3 (STXBP3) was upregulated in both pre-and early-HD samples (Fig. 2B). We observed the largest changes in skin (Fig. 2C). Here, MOB family member 4, phocein (MOB4) was upregulated in both, pre-and early-HD samples. One hundred sixty-two proteins were significantly (FDR < 0.05) dysregulated in the early-HD group. Twenty-three proteins significantly changed their expression levels from pre-HD to early-HD stage (Fig. 2C).
Due to the low number of changed proteins, we only performed gene ontology enrichment in the adipose tissue dataset (early-HD compared to controls) and the skin dataset (early-HD compared to controls and the comparison of pre-HD to early-HD) ( Table 3). Pathway and ontology analysis in the adipose tissue dataset hinted at dysregulated lipid metabolism and proteasome function ( Table 3, adipose: early-HD vs. controls). Both have been extensively studied in HD. Peroxisome proliferator activated receptor alpha (PPARA) was strongly predicted as potential upstream regulator of these dysregulated proteins (Table 3). Worth noting, PPARA was also predicted as the most likely upstream regulator for the dysregulated gene analysis of the muscle dataset (Fig. 1B).
Strikingly, 21 out of the 28 dysregulated proteins in the adipose tissue dataset (early-HD vs. controls) are associated with the term "extracellular vesicle. " Extracellular vesicles (EVs) are composed of lipids, proteins, and RNAs, are membrane coated, and are secreted by most cells. Moreover, they are implicated in neurodegeneration, although the availability of data for HD is poor [18].
Pathway and ontology enrichment in the skin proteomics dataset showed that proteins related to gene expression in general are affected in the early-HD samples compared to controls (Table 3). RNA processing and translation as well as amino acid metabolism seems to be dysregulated. The predicted regulators are all factors involved in gene expression. Of note, androgen receptor (AR) and lysine acetyltransferase 2A (KAT2A), the latter in a complex with ataxin 7 (ATXN7), are both linked to polyglutamine diseases. A CAG expansion in AR causes spinal and bulbar muscular atrophy [19]; an expansion in ATXN7 causes spinocerebellar ataxia type 7 [20]. This could potentially indicate that Table 3 Ontology enrichment of proteomics data Ontology enrichment using the Enrichr website. See the "Methods" section for details. Only non-redundant, aggregated terms are shown. Only entries of human samples in the databases were considered. Only terms that applied to at least two dysregulated proteins were considered. The combined score is shown in brackets. Only entries with a combined score of 5 or greater were considered. Top three terms for pathways and regulators and top 5 terms for ontology are shown there are common phenomena caused by a CAG repeat expansion, independent of the mutated gene. The enriched pathways in the comparison of the early-HD and pre-HD samples in the skin dataset (Table 3) are intertwined and regulate cell survival and proliferation. Consequently, TP53 (tumor protein P53), a key regulator of cellular survival, was predicted as an upstream regulatory protein. In addition, extracellular vesicle mediated signaling was again predicted with a huge combined score (Table 3).

HTT gene SNP sequencing
An important consideration in therapeutic HTT lowering trials is preservation of a certain level of HTT protein. While inactivation of HTT in the brain of adult mice [21] and primates [22] seems to be tolerated for some time, inactivation of HTT beyond a certain level, likely 50% of the normal level, in the developing, as well as the adult brain potentially has severe consequences (reviewed in [23,24]). To avoid indiscriminately reducing both wild-type and mutant HTT, an attractive approach in human HTT lowering trials is to target the alleles of single nucleotide polymorphisms (SNPs) that are phased on the mutated HTT allele. To facilitate such approaches, we genome-sequenced the primary fibroblast lines generated in this study by the 10X Genomics Chromium technology. We only sequenced the lines that were generated from HD patients (n = 39). For patient privacy and legal reasons, here we only show the HTT gene (± 100 kbp) associated SNPs (Additional file 20). Overall, 98.3% of all identified SNPs could be assigned to a specific haplotype. The largest block of assembled HTT sequencing data was around 14.5 million base pairs. We previously defined 16 haplotypes that are common in HD subjects with European ancestry [25] and demonstrated allele-specific target sites for each haplotype [26,27]. To maximize the discrimination power of the haplotypes and to cover non-Europeans, we extended the HTT haplotype definitions by using 1000 Genomes Project (KGP) data (phase 3, all populations). Based on the same 21 genetic variations that were used to define the original 16 haplotypes [25], 253 additional haplotypes were identified and subsequently assigned based on the allele frequency in all KGP populations. For example, excluding the original 16 haplotypes, hap.17 represents the most frequent HTT haplotype in KGP data, accounting for approximately 3% of haplotypes in KGP chromosomes. The list of all newly defined HD haplotypes can be found in the Additional file 21.
Out of the total of 39 sequenced samples, the HTT haplotypes of 37 samples could be conclusively defined (Additional Fig. S1A and B). A phylogenetic tree analysis on the HTT haplotypes to study evolutionary relationships between the different HTT alleles revealed three major groups of HTT haplotypes (represented by colors in Additional Fig.  S1C and D).

Genome-wide methylation data
DNA methylation states, which are part of the epigenetic modification landscape, allow the assessment of the "molecular age" of a cell through the analysis of epigenetic clocks [28]. The fibroblast lines described in this manuscript were part of a multi-species study to analyze changes in the methylome in the background of HD [5]. Associated data has been published by the authors [5].

Generation of iPSC lines
In addition to the 'omics datasets as mentioned above, we also generated, as a flexible model for HD research, a total of 45 iPSC lines. These lines were derived from the primary fibroblast cell lines in this study (controls and HD patients). All fibroblast lines were used to generate iPSC lines. However, for technical reasons, only 45 lines resulted in viable iPSC lines that also passed quality control. The iPSC lines are available from the European Bank for Induced pluripotent Stem Cells (EBiSC; search for "CHDI" at https:// cells. ebisc. org). Re-programming of fibroblasts was conducted using the non-integrative Sendai virus strategy expressing KLF4, MYC, POU5F1 (OCT4), and SOX2 [29]. All lines were characterized regarding expression of pluripotency markers (OCT-4 (POU5F1), TRA-1-60, SSEA-1, and SSEA-4), retention of re-programming vector sequences, morphology, and karyotype. Furthermore, iPSC lines were differentiated into endo-, meso-, and ectodermal cells and marker expression was analyzed (endoderm: CXCR4, GATA6, SOX17; mesoderm: MIXL1, NCAM1, VIMENTIN; ectoderm: HES5, NEUROD1, PAX6). All data are freely available from the EBiSC website (at the time of May 2022). The HTT haplotype information of the iPSC lines is linked to the EBiSC identifier in the Additional file 20.

Discussion
Many valuable insights into the pathogenesis of HD have been gained from disease models, whether in mice, flies, or non-human primates [30,31] to suggest that the expression of the HD mutation affects the innate immune system and energy metabolism [32]. The in-depth molecular tissue signatures, which we derived from human HD patients' peripheral tissues specifically indicate PPARA (peroxisome proliferator activated receptor alpha) as a regulatory protein with very high confidence for dysregulated genes in muscle in the pre-HD group (Fig. 1B) and for dysregulated proteins in adipose tissue in the early-HD group (Table 3). In other, independent, external datasets PPARA was found to be dysregulated itself: It was significantly upregulated in post mortem caudate and cerebellum [33] and also significantly upregulated in post mortem pre-frontal cortex [34]. While none of our datasets indicate that PPARA itself is statistically significantly dysregulated in muscle, adipose, or skin, very small changes in regulatory proteins can have a much larger knock-on effect on target genes, as suggested by our analysis. The family of PPA receptors are important regulators of various processes including inflammation, lipid and energy metabolism. The PPA receptors interact with, e.g., PGC-1α (peroxisome proliferator-activated receptor gamma coactivator) to enable the correct function of both partners including in the central nervous system [35]. Paralogs of PPARA, namely PPARD [36,37] and PPARG [38,39], have been implicated in HD pathogenesis. Moreover, PPARs can be therapeutically targeted to improve phenotypes such as mitochondrial dysfunction in HD model systems [37][38][39][40].
The most consistently changed gene in the RNAseq data from muscle, adipose, and fibroblasts was TBC1D3D (ENSG00000274419). TBC1D3D is not included in published 'omics datasets (see above); however, in all tissues, TBC1D3D expression was significantly downregulated in the pre-HD group and restored towards control levels in the early-HD group in our datasets (Fig. 1). TBC1D3D protein is expressed in all three tissue types (www. unipr ot. org, [41]) and most probably acts as a GTPase activating protein to stimulate RAB5 function [42]. RAB5 is an early endosome marker and functions in endocytosis and membrane transport where it regulates receptor sorting and vesicle fusion [43]. Intriguingly, the HTT-HAP40 complex also acts as a regulatory protein complex for RAB5 [44]. Consequently, expression of mutant HTT leads to aberrant endosomal/lysosomal pathway function [45] and its secretion via a non-canonical pathway [46]. Endosomal pathway integrity is a prerequisite for functioning secretion of exosomes through multi-vesicular bodies (part of the endosomal system) and vice versa [47]. Exosomes are one of the three main species of extracellular vesicles (EVs), the others being microvesicles, also called ectosomes, and apoptotic bodies [48]. Apoptotic bodies form during cell death; microvesicles directly bud from the cell plasma membrane. All EVs are composed of lipids, proteins, RNAs, and other small molecules. Their circulating nature makes them attractive messengers of information from the secreting cell towards the target cell. They are also able to cross the blood brain barrier and are found in virtually every bodily fluid [48]. When we analyzed the dysregulated proteins in the adipose dataset (early-HD vs. controls), we found an exceptionally high association of the gene ontology term "extracellular vesicle" (for 21 out of the 28 dysregulated proteins) (Table 3). Additionally, gene ontology enrichment of dysregulated proteins in the skin dataset also pointed towards "Extracellular vesicle-mediated signaling in recipient cells" (Table 3). Extracellular vesicles could play an important role in neurodegeneration as they have been implicated as one route for the spread of misfolded proteins, for instance in Parkinson's disease, and they also contain other cargo such as miRNAs [49]. In HD, our human tissue data indicate that the expression of the HD mutation affects the homeostasis of extracellular vesicles similar to what has been observed before [50]. These findings certainly merit further investigation, in particular because EVs derived from peripheral tissues and biofluids are an attractive source of (peripheral) biomarkers for clinical trials.

Conclusions
In summary, the biological signatures of the changes at RNA and protein levels point towards the involvement of inflammation, energy metabolism, and vesicle biology in peripheral tissues in HD. It shows the potential to identify biological signatures from peripheral tissues in HD that could be suitable as biomarkers in clinical trials. In addition, we have generated a high-quality human multi-omics reference data set and established valuable primary cell lines with associated HTT haplotyping data and generated iPSC cell lines. All participants took part in a large longitudinal observational study, Enroll-HD, with deep phenotyping [51]. These data provide an important resource for further research into, for instance, somatic repeat instability, cell type-specific effects of mutant HTT, or the HD epigenome [5]. SNP-based HTT lowering approaches are underway to selectively target only the expanded allele harboring the SNP [23,24,52]. Our fully SNP phased fibroblasts and the iPSC lines derived from them can be used to analyze the feasibility of these approaches and to develop novel molecules for therapy. iPSC lines are also valuable tools in many neurodegenerative disorders [53], including HD [54], and our set of well-characterized lines can be used to study pathomechanisms across diseases.

Study participants
Participants of the MTM-HD study were recruited at the departments of neurology of Ulm University (Germany) and University College London (United Kingdom). At both institutions the study followed a standardized protocol. Participants were only included if there were no contra-indications to muscle biopsy, e.g., a clotting disorder or evidence to suggest cardio-respiratory abnormalities. The only relatives in the cohort were a pair of sisters in the Ulm samples. None of the other individuals were related. We therefore did not consider the degree of relationship in our analyses. For HD patients, the CAGtract length was determined and participants were clinically assessed as described for the TrackOn and TRACK-HD studies [55,56]. Clinical assessment included the United Huntington Disease Rating Scale (UHDRS) motor part to derive the total motor score and the UHDRS total functional capacity scale (TFC) [57]. The disease burden score (DBS) was calculated from each HD participant's CAG repeat length and age according to the following formula: (CAG-35.5) x age [58]. A DBS of 250 or greater was used to screen for potential HD participants. Furthermore, HD participants were categorized as pre-HD if they had a diagnostic confidence level score of 2 or less on the UHDRS motor scale, or as early-HD if they were in TFC stages of 1 or 2, indicative of early motor manifest HD. Only participants with no signs of neuromuscular disease on clinical examination were enrolled. The local ethics committees at Ulm University and University College London approved the study (Ulm: 265-12; London: 12/LO/1565), and written informed consent was obtained from each participant. Following informed consent, participants were given a 9-digit unique identifier as described [59].

Tissue biopsy and sample collection
In order to ensure samples are collected and processed in exactly the same manner in Ulm and in London, personnel at either site received on-site training on all aspects of the protocol. All samples were collected following an overnight fast (water was permitted) between 7.30 and 9 am local time (CET or GMT) to account for any influence of circadian rhythm. Open biopsies of the skin, subcutaneous adipose tissue and M. vastus lateralis were obtained following local anesthesia. Participants were contacted on the day after and 7 days following the procedures. After 1 week, the stitches were removed. Tissues were dissected (skeletal muscle, adipose tissue and skin), immediately snap-frozen in liquid nitrogen and stored at − 80°C. Blood samples were taken following informed consent on the screening day. This included routine blood tests (full blood count, renal, liver, bone and clotting profiles, creatinine kinase and group and save). In addition, plasma and buffy coat were separated. PBMCs were extracted from CPT/ heparin tubes (BD Vacutainer CPT) as previously described [60]. Cell pellets were then snap frozen in liquid nitrogen and stored at − 80°C. For histochemistry and immunohistochemistry, the muscle sample was mounted on a piece of cork in TissueTek with fibers oriented perpendicular to the cork and then snap-frozen in liquid nitrogen-cooled 2-methylbutane and stored at − 80°C. For electron microscopic analysis, a muscle fiber was pinned onto a cork plate with two needles in the operating theatre and immediately fixed in 2.5% glutaraldehyde.

Generation of primary cell lines
Primary human fibroblast and myoblast cell lines from the MTM participants were established and maintained as previously described [61].

RNA and DNA extraction
RNA extraction from tissues was performed by Labcorp (NC, US) using the Qiagen miRNeasy kit. RNA had to pass the following specifications from an Agilent Bioanalyzer run to be used for subsequent sequencing: 28S/18S ratio: 0.8-3.0 and RIN score: ≥ 6.0. For fibroblast lines, RNA and DNA were extracted simultaneously by Q Squared Solutions Expression Analysis LLC (NC, US). The Qiagen AllPrep DNA/ RNA kit was used to simultaneously purify genomic DNA and total RNA from the same sample. Briefly, samples were lysed in a guanidine-isothiocyanate containing buffer and the lysate was passed through an AllPrep DNA spin column which selectively binds genomic DNA. The column was washed and the genomic DNA was eluted in EB buffer stored at − 20°C. Ethanol was added to the flow-through of the AllPrep DNA spin column which allows for efficient binding of RNA. The mixture was then applied to a RNeasy spin column (Qiagen), where total RNA binds to the membrane. RNA was then eluted in RNase-free water, quantified and integrity assessed using the RNA 6000 Nano Assay on a Bioanalyzer 2100. RNA passed QC with RIN values ≥ 7.0. The purified RNA was stored at − 80°C.

RNA sequencing of adipose and muscle tissue RNA
RNA sequencing was performed by Q Squared Solutions Expression Analysis LLC (NC, US). RNA samples were converted into cDNA libraries using the Illumina TruSeq Stranded Total RNA sample preparation kit (Illumina # RS-122-2303). Briefly, total RNA samples were concentration normalized, and ribosomal RNA (rRNA) was removed using biotinylated probes that selectively bind rRNA species. The resulting rRNA-depleted RNA was fragmented using heat in the presence of divalent cations. Fragmented RNA was converted into double-stranded cDNA, with dUTP utilized in place of dTTP in the second strand master mix. A single "A" base was added to the cDNA and forked adaptors that include index, or barcode sequences were attached via ligation. The resulting molecules were amplified via polymerase chain reaction (PCR). During PCR, the polymerase stalls when a dUTP base is encountered in the template. Since only the second strand includes the dUTP base, this renders the first strand the only viable template, thereby preserving the strand information. Final libraries were quantified, normalized, and pooled. Pooled libraries were bound to the surface of a flow cell and each bound template molecule was clonally amplified up to 1000-fold to create individual clusters. Final libraries were sequenced on the HiSeq 2500 Illumina sequencing platform with paired-end, 50 bp long reads with a total read depth of approximately 150M reads per sample. Sequencing raw files were demultiplexed, adapter trimmed, and clipped for low quality base calls. Final fastQ files were produced with reads of a minimum length after clipping/trimming of 25 nucleotides.

RNA sequencing of fibroblast RNA
RNA sequencing was performed by Q Squared Solutions Expression Analysis LLC (NC, US). Sequencing libraries were created using the Illumina TruSeq Stranded mRNA method (Illumina, RS-122-2103), which preferentially selects for messenger RNA by taking advantage of the polyadenylated tail. In summary, approximately 100 ng of total RNA per sample was used to purify poly-adenylated RNAs using oligo-dT attached to magnetic beads. Purified mRNAs were fragmented using heat in the presence of divalent cations. The fragmented RNAs were converted into double-stranded cDNA, with dUTP utilized in place of dTTP in the second strand master mix. This facilitates the preservation of strand information, as amplification in the final PCR step will stall when it encounters uracil in the nucleotide strand, rendering the first strand as the only viable amplification template. The double-stranded cDNA underwent end-repair, A-tailing, and ligation of adapters that include index sequences. The final libraries were amplified via polymerase chain reaction (PCR), after which they were quantified, normalized, and pooled in preparation for sequencing. Normalized libraries were multiplexed for efficient sequencing to the required number of reads per sample. Pooled libraries were bound to the surface of a flow cell, and each bound template molecule was clonally amplified up to 1000-fold to create individual clusters. Libraries were sequenced using the Illumina sequencing-by-synthesis platform, with a sequencing protocol of 100 bp paired-end sequencing and total read depth of 40M reads per sample on a NovaSeq. Sequencing raw files were demultiplexed, adapter trimmed, and clipped for low quality base calls. Final fastQ files were produced with reads of a minimum length after clipping/trimming of 25 nucleotides.

Bioinformatics analysis of RNAseq datasets
FastQ files were quality checked with FastQC v0.11.5 [62]. All files passed QC. The reads were aligned against Ensembl homo sapiens GRCh38 release 90 using STAR aligner v2.5.3a [63]. Reads were quantified using salmon v0.8.2 [64]. Samples were screened for outliers using a combined PCA and clustering analysis. A sample was defined as an outlier if it was outside a 68% probability ellipse in PCA analysis and was outside a − 2.5 standardized connectivity cutoff of a Euclidean distance matrix of all samples per each tissue [65]. This procedure identified 6 samples in the adipose (n = 2 for each control, pre-HD and early-HD), 3 samples in the muscle (n = 1 control, n = 2 pre-HD), and 10 samples in the fibroblast (n = 4 control, n = 4 pre-HD, n = 2 early-HD) datasets. For all subsequent analyses, both RNAseq batches for muscle and adipose tissues were combined. Transcriptional dysregulation was computed using tximport v1.10.0 [66] and DESeq2 v1.22.1 [67]. The DESeq2 intercept design normalized counts are available as supplementary information files. For the dysregulation analysis, HD clinical stage (control, pre-HD, early-HD) was used as the variable of interest. RNAseq batch (for tissues), gender, site of sampling, age, and BMI were used as covariates in the modeling. We factorized the two continuous variables (age and BMI) into 5 intervals before evaluation with the cut function in R. Ashr was used as the fold change shrinkage estimator [68]. DESeq2 analysis files are available as supplementary information files. Ontology analysis was carried out using the Enrichr website [69,70].

Proteomic analysis and bioinformatics
Proteomics analysis was performed by proteome sciences (www. prote omics. com). For TMT labeling (Thermo Fisher), 2 mg of muscle tissue were reduced (dithiothreitol), alkylated (iodoacetamide), digested (trypsin) to generate peptides, desalted (SepPak tC18 cartridges (Waters, Milford, MA, USA)), and lyophilized. Peptides were mixed with their respective TMT 10plex tag and incubated for 1 h at room temperature. Individual TMT reactions were terminated with hydroxylamine and the labeled digests of each of the 10 samples were pooled into the respective TMT 10plex and incubated for another hour. Finally, the TMT 10plex analytical sample was acidified, diluted to an acetonitrile concentration less than 5%, divided into two aliquots, desalted, and lyophilized to completion. Each TMT 10plex analytical sample aliquot was separated into 12 x 4-min fractions by strong cation exchange chromatography (polySULFOETHYL-A column (PolyLC) and HPLC system (Waters Alliance 2695)) and the 12 fractions combined by smart pooling into six fractions with roughly equal peptide amounts for subsequent processing. The muscle samples were subjected to two SCX-runs as the total load of the muscle TMT 10plexes was 20 mg each and makes two runs necessary. Re-suspended peptides were loaded onto a nanoViper C18 Acclaim PepMap 100 pre-column (Thermo Scientific). Each fraction was analyzed in duplicate by LC-MS/MS using the EASY-nLC 1000 system coupled to an Orbitrap FusionTM TribridTM Mass Spectrometer (both Thermo Scientific) for each of the TMT 10plex sets. Peptide mass spectra were acquired throughout the entire chromatographic run (180 min). MS-runs across the TMT 10plexes were submitted to Proteome Discoverer (PD) v1.4 (Thermo Scientific) using the Spectrum Files node. Spectrum selector was set to its default values, while the SEQUEST HT node was suitably set up to search data against the human FASTA UniProtKB/Swiss-Prot database (version 9606_Human_01082016 (created August 2016)). The reporter ions quantifier node was set up to measure the raw intensity values for TMT 10plex mono-isotopic ions (126, 127N, 127C, 128N, 128C, 129N, 129C, 130N, 130C, 131). The SEQUEST HT search engine was programmed to search for tryptic peptides (with up to two missed cleavages allowed) and with static modifications of carbamidomethyl (C), TMT6plex (K), and TMT6plex (N-Term). Dynamic modifications were set to deamidation (N/Q), oxidation (M), and phosphorylation (STY). Precursor mass tolerance was set to 20 ppm and fragment (b and y ions) mass tolerance to 0.02Da. The first stage of data processing was performed using the SQuaT Bioinformatics Module 3.1.1. Each TMT 10plex set was filtered to include PSMs (peptide spectral matches) that had a signal in at least one out of 10 TMT reporter ion channels. A correction procedure was then implemented to address isotopic impurities. Isotope correction factors used in this procedure were specific to the TMT batch used for labeling. Next, for every mass spectrometry run, the TMT reporter ion intensities were normalized using the sum-scaling technique at the PSM level. Briefly, for each TMT channel all reporter ion intensities were summed and the median value across all summed channels was calculated. A correction factor was obtained by dividing the individual summed reporter ion intensity for a channel by the median of all channels. Finally, each individual PSM ion intensity was multiplied by the correction factor for the relevant TMT reporter channel. Post sum-scaling, the ratios of each sample PSM was related to the reference sample channel. The PSM level ratios to the reference sample were calculated as a log2 transformation of sample PSM intensity relative to the reference sample PSM intensity. The data quality of the samples was controlled using quality control charts. Two quality control metrics per sample were calculated: the median (measure of central tendency) and the inter-quartile range (IQR) (measure of scale) using peptide and protein distributions. Both metrices are robust and stable to outliers and are therefore are good for detecting extreme outliers. The median log expression and inter-quartile range (IQR), were traced using control charts for all datasets on peptide and protein level. A sample was considered as a strong outlier if either QC metric value was more than three standard deviations from the overall mean. A second stage of data processing was performed using the FeaST Bioinformatics Module 1.3.1. Imputation was performed to obtain quantitative values for the remaining missing data points using an iterative PCA method [71]. Following imputation, the data were normalized across the sets using quantile normalization method to center the sample values across the different TMT 10plex sets. Multifactorial limma [72] modeling was used to remove TMT 10plex set and site of sampling as batch effects. The final datafiles for adipose, muscle, and skin proteomics analysis are available as supplementary information files.

Bioinformatics analysis of proteomics datasets
For further analysis of the peptide datasets, all non-unique entries for peptides (shared_ status_Gene) were removed, followed by removal of rows with unassigned protein IDs. We then used "Peptide" as the identifier to collapse rows (WGCNA function [73]) onto unique peptides entries. We next employed the same strategy as described above to identify outliers in the peptide and protein datasets but could not detect any outlying samples. Following this, we consecutively removed batch effects with limma [72] for both peptide and aggregated protein datasets. Adipose: gender, BMI, age; muscle: BMI, age; skin: age, gender, BMI. To compute peptide and protein dysregulation, we used ROTS [17]. For each pairwise comparison, we used the respective full dataset for modeling and ran 1000 bootstraps. Reproducibility Z-score value was greater than 2 for all comparisons. The ROTS analysis files are available as supplementary information files.

SNP sequencing
Genomic DNA extraction, sample QC, library preparation, sequencing reactions, and bioinformatics analyses were conducted at GENEWIZ, Inc. (South Plainfield, NJ, USA). High molecular weight genomic DNA was extracted from 2-4 million cells per sample, using Qiagen Genomic-tip 100/G HMW Kit (Qiagen, Hilden, Germany), according to manufacturer's protocol. Sample amount was quantified using a Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA), and sample purity and integrity was checked using a Nanodrop spectrophotometer and a pulsed field gel analysis (PFGE) or an Agilent