Ancestry-related assortative mating in Latino populations

Risch, Neil; Choudhry, Shweta; Via, Marc; Basu, Analabha; Sebro, Ronnie; Eng, Celeste; Beckman, Kenneth; Thyne, Shannon; Chapela, Rocio; Rodriguez-Santana, Jose R; Rodriguez-Cintron, William; Avila, Pedro C; Ziv, Elad; Gonzalez Burchard, Esteban

doi:10.1186/gb-2009-10-11-r132

Research
Open access
Published: 20 November 2009

Ancestry-related assortative mating in Latino populations

Neil Risch^1,2,3,
Shweta Choudhry⁴,
Marc Via^1,4,
Analabha Basu¹,
Ronnie Sebro¹,
Celeste Eng⁴,
Kenneth Beckman⁵,
Shannon Thyne¹,
Rocio Chapela⁶,
Jose R Rodriguez-Santana⁷,
William Rodriguez-Cintron⁸,
Pedro C Avila⁹,
Elad Ziv^1,4 &
…
Esteban Gonzalez Burchard^1,4

Genome Biology volume 10, Article number: R132 (2009) Cite this article

11k Accesses
74 Citations
1 Altmetric
Metrics details

Abstract

Background

While spouse correlations have been documented for numerous traits, no prior studies have assessed assortative mating for genetic ancestry in admixed populations.

Results

Using 104 ancestry informative markers, we examined spouse correlations in genetic ancestry for Mexican spouse pairs recruited from Mexico City and the San Francisco Bay Area, and Puerto Rican spouse pairs recruited from Puerto Rico and New York City. In the Mexican pairs, we found strong spouse correlations for European and Native American ancestry, but no correlation in African ancestry. In the Puerto Rican pairs, we found significant spouse correlations for African ancestry and European ancestry but not Native American ancestry. Correlations were not attributable to variation in socioeconomic status or geographic heterogeneity. Past evidence of spouse correlation was also seen in the strong evidence of linkage disequilibrium between unlinked markers, which was accounted for in regression analysis by ancestral allele frequency difference at the pair of markers (European versus Native American for Mexicans, European versus African for Puerto Ricans). We also observed an excess of homozygosity at individual markers within the spouses, but this provided weaker evidence, as expected, of spouse correlation. Ancestry variance is predicted to decline in each generation, but less so under assortative mating. We used the current observed variances of ancestry to infer even stronger patterns of spouse ancestry correlation in previous generations.

Conclusions

Assortative mating related to genetic ancestry persists in Latino populations to the current day, and has impacted on the genomic structure in these populations.

Background

Mating patterns and preferences have been an active area of research for population geneticists, sociologists, and anthropologists for more than a century. On both a global and local scale, mating does not occur at random. On the larger scale, geographic constraints, such as great distances, high mountains and bodies of water, create local isolation, differentiation and endogamy. The influence of local geography has also been extensively studied [1, 2]. However, on a local level, non-geographic factors have greater importance in mate selection. In racially/ethnically heterogeneous societies that characterize the Western hemisphere, race and ethnicity have played a major role in mate selection [3], although inter-racial mating is on the incline. Within racial/ethnic groups and within racially/ethnically homogenous societies, factors such as age, education, occupation, socioeconomic status (SES), height, weight and religious background influence the choice of a mating partner [4–9]. Specific behavioral characteristics are also known to correlate between spouses [10].

Population structure and assortative mating have implications in a wide variety of fields, ranging from genetics to sociology and anthropology. From the perspective of population genetics, the impact depends on the source of the non-random mating. Generally, assortative mating does not affect the frequency of alleles involved with the choice process unless assortment is linked with natural selection or differential reproduction. These are referred to as first moment effects [11]. By contrast, genotype frequencies may be altered by assortative mating, specifically leading to a positive allelic correlation or homozygote excess for loci that are correlated with the mate selection process [4]. These have been referred to as second moment effects [11]. Second moment effects, or correlations, also occur between alleles at different loci, a phenomenon characterized as linkage disequilibrium (LD). Such LD will occur for all pairs of loci that correlate with the source of non-random mating. In the case of multifactorial traits, Crow and Felsenstein [12] have shown that the increase in locus homozygosity is relatively small while the increase in trait variance can be large. The trait variance increase is due primarily to the myriad LD effects among loci.

Assortative mating can also create correlations between previously unrelated traits when these traits are involved in the mating partner selection [4]. These correlations between previously unrelated traits can also have an impact on case-control association studies, significantly increasing type I error rates with loci involved in the assortative mating process [13].

Populations of the Western hemisphere, and particularly Latin America, provide unique opportunities to study population structure and non-random mating, due to the historical confluence of three major racial groups over the past five centuries. Mating among the various migrant and local populations has given rise to new population groups characterized by genetic admixture. During the Spanish colonial period, Spanish colonialists taking Native American or African-descent women as sexual partners was a common practice as early as in the first decades of the 16th century, although social pressure prevented inter-ethnic marriages from becoming widespread [14]. In 1776, the Royal Pragmatic on Marriage was enacted due to 'unequal marriages on account of their size and the diversity of classes and castes of their inhabitants' [15]. The primary purpose of this law was to avoid 'inequality' in the marriage based on an overall assessment not only of skin color, but also of wealth and social status. This 'pigmentocracy' is still observed in some Latin American countries, where the resistance to inter-ethnic marriage is greater among individuals of higher socioeconomic status [3, 16].

Within the populations of Latin America, assortative mating has been described to occur based on a variety of factors, including education level, religion, age, family values, anthropometric measurements, and skin pigmentation [16–21]. There has also been debate regarding the degree to which spouse correlations for physical traits such as skin color and anthropometric traits reflect partner selection based on perceived 'race' or selection based on socioeconomic position [16, 22], although the two may be confounded in certain settings.

The most significant studies of mating patterns in Latin America have been conducted by Newton Morton and his colleagues in northeastern Brazil [23–25]. These authors studied 1,068 spouse pairs and their offspring of rural origin identified from government records. Subjects were evaluated on an eight-point scale of ancestry based on physical characteristics such as skin pigment, hair color and type, and facial features. The scale reflects the degree of African versus European ancestry. At the same time, the investigators tested 17 blood group and protein markers to genetically estimate African, European and Native American ancestry, within each of the scale categories described above. They found evidence of ancestry correlation between spouses, although they concluded that it was modest [24].

The advent of DNA-based markers now allows us to address the question of non-random mating in Latino populations in a comprehensive way. We use ancestry informative genetic markers (AIMs) to study spouse correlations in two Latino populations, Mexicans and Puerto Ricans. To contrast indigenous versus migrant patterns, we study spouse pairs recruited both from the country/territory of origin (Mexico, Puerto Rico) as well as from the US. We show directly through ancestry estimation that significant spouse correlations in ancestry persist at a high level in all populations, leading to significant LD between unlinked markers, the strength of which is directly related to ancestral allele frequency differences. While both populations show strong assortative mating, the patterns are different, with Mexicans showing spouse correlations in European and Native American ancestry, while Puerto Ricans show spouse correlations in European and African ancestry.

Results

Table 1 provides the average and standard deviation of African, European and Native American ancestry for the wives and husbands, stratified by ethnicity and recruitment site. While both Mexicans and Puerto Ricans have ancestry from all three populations, it is apparent that the Mexicans have predominant European and Native American ancestry but modest African ancestry, while the Puerto Ricans, who also have substantial European ancestry, have greater African ancestry and far less Native American ancestry. Indeed, these studies (and prior ones) indicate that there is only modest overlap in the ancestry distributions for Mexicans and Puerto Ricans (Figure 1). The overlap exists where Native American ancestry ranges from 0.1 to 0.3 and African ancestry from 0 to 0.2. This area of overlap is of particular interest, because it describes individuals who are matched in terms of ancestry but discordant in terms of nationality/ethnicity and culture.

Table 1 Mean (standard deviation) ancestries for Latino spouses by recruitment site

Full size table

In Mexicans, the predominance of Native American and European ancestry is also reflected in the variances of the three ancestries, where the standard deviation for Native American and European ancestry is large at approximately 0.16, while for African ancestry the standard deviation is much smaller at approximately 0.04. By contrast, in Puerto Ricans, where European and African ancestry are dominant, the variance of African and European ancestry are large (standard deviations approximately 0.14) and the variance of Native American ancestry less (standard deviation 0.065). These variances also have implications for correlations in ancestry within individuals. As expected (Table S1 in Additional data file 1), the correlation between Native American and European ancestry in Mexicans is extremely strong (-0.97). There is also a moderately negative correlation observed between African and Native American ancestry (-0.28). In Puerto Ricans, the correlation between African and European ancestry is strong (-0.89). Because European is the predominant ancestry in the Puerto Ricans, there is also a moderate negative correlation between European and Native American ancestry (-0.35).

Results of t-tests comparing average ancestries between spouses, and recruitment site within ethnic group, are given in Table S2 in Additional data file 1. As is apparent in Table 1, there are no significant differences in ancestry between the wives and husbands within any category. There are also no significant differences between the Puerto Ricans recruited from Puerto Rico and those recruited from New York. However, there are substantial ancestry differences between the Mexicans from Mexico City and those from the Bay Area, reflecting a migrant effect. The Bay Area Mexicans have significantly more European and African ancestry and less Native American ancestry compared to the Mexicans from Mexico City (Table S2 in Additional data file 1). This difference may reflect specific geographical or socioeconomic origins of the Mexican migrants to the Bay Area.

To examine a possible role of socioeconomic status on further analyses of these subjects, we examined average ancestries within SES categories for the subset of subjects on whom we had such information (Table S3 in Additional data file 1). Linear regression analysis of ancestry on SES (coded as 1 for low, 2 for moderate, 3 for middle and 4 for upper) was also performed separately for the sexes and ethnicities. There was a non-significant trend towards increased European and decreased Native American ancestry with SES among the Mexican wives but not husbands. However, there was a significant positive relationship of African ancestry with SES and negative relationship of SES with European ancestry among the Puerto Rican wives. SES trends were less clear among the Puerto Rican fathers. We note that because SES was measured based on census-based location information rather than personal information, there may be a loss of sensitivity in these results.

We next examined the between-spouse correlations in ancestry (Table 2). Among the Mexicans, the spouse correlation in European ancestry is extremely high and statistically significant; Native American ancestry shows a similar pattern. By contrast, there is no significant spouse correlation for the African component of ancestry. The correlations for the Mexicans combining the two recruitment sites are confounded by the difference in average ancestries we noted above. However, within site, the spouse correlations for European and Native American ancestry are still high (0.56 to 0.57 for European or Native American ancestry in Mexicans from Mexico City and 0.39 to 0.42 in Mexicans from the Bay Area). Figure 2 depicts the spouse similarity for the three different ancestry components for the two Mexican recruitment sites. Of note, the higher spouse correlation among pairs from Mexico City is due entirely to four couples with particularly high European and low Native American ancestry. Nonetheless, the data show that the spouse ancestry correlation is robust and replicated across the two recruitment sites.

Table 2 Between spouse correlations (95% confidence interval) in ancestry, by ethnicity, recruitment site and socioeconomic status

Full size table

Within the Puerto Rican spouse pairs, the correlations are high and significant for both European and African ancestry, but not for Native American ancestry. In this case, there are no significant differences in ancestry correlations between the couples from Puerto Rico versus those from New York City. We also note that the spouse correlation in African ancestry (0.33) is somewhat higher than the correlation in European ancestry (0.24), although the difference is not statistically significant. Figure 3 depicts the spouse similarity for Puerto Ricans; the ancestry correlations for Puerto Rican pairs from the two recruitment sites appear quite similar.

An important question is the source of the ancestry correlation between spouses. One possible factor is SES. Therefore, for the Mexicans from the Bay Area and the Puerto Ricans from Puerto Rico, for whom we had such information, we also examined spouse correlations within SES categories (Table 2). The spouse correlations in ancestry persisted within SES categories both in Mexicans and Puerto Ricans, and there was no apparent pattern of increase or decline with SES. As an additional evaluation of the impact of SES, we performed a linear regression analysis, with wife's individual ancestry (IA) as dependent variable and husband's IA and SES as the independent variables. These analyses were performed separately for each of the three ancestry components (Table S4 in Additional data file 1). Here again, we find no attenuation of the significant spouse relationship in European or Native American ancestry in the Mexicans when allowing for SES in the regression model. Similarly, we find no attenuation of African or European ancestry spouse correlation in the Puerto Ricans when including SES in the regression model. SES was not a significant predictor of wife's ancestry in any of the analyses of Mexicans; however, as noted previously, there was a significant positive regression of SES on African ancestry and negative regression of SES on European ancestry among the Puerto Rican wives.

We next evaluated the impact of assortative mating on genotype distributions at individual loci. First, we noted no significant differences in allele frequencies between spouses within recruitment sites, either for the Mexicans or Puerto Ricans (Table S5 in Additional data file 1). However, we did find a large excess of significant allele frequency differences between the Mexican and US recruitment sites for the Mexicans (69% of loci significant at P < 0.05). This pattern is consistent with what we previously observed for site-specific ancestry differences for the Mexicans. To determine whether the Mexico City versus Bay Area allele frequency differences were entirely attributable to the ancestry difference between the two sites, we performed a regression analysis of the allele frequency difference chi-square on δ_ij²/p*q*, where δ_ij represents the allele frequency difference between ancestral populations i and j, and p* is the allele frequency in the admixed population, q* = 1 - p* (see Materials and methods). The results are given in Table S6 in Additional data file 1. We observed a highly significant regression coefficient for the European-Native American δ (0.0339 ± 0.0037), while neither of the other coefficients was statistically significant, nor was the intercept significantly different from 1. Similarly, in an analysis where the intercept term was fixed at 1, the regression coefficients were very close to the unconstrained analysis. Thus, the entire excess of significant allele frequency differences between Mexico City and Bay Area can be attributed to the European-Native American δ values at the markers, consistent with the European/Native American ancestry difference between the two sites being the source of site allele frequency differences. As described in Materials and methods, the pairwise sums of regression coefficients provide estimates of the squared difference in ancestry between the two sites. From the regression coefficients in Table S6 in Additional data file 1, we estimate the following ancestry differences between Mexico City and the Bay Area: Native American, √(0.0315 + 0.0025) = 0.184; European, √(0.0315 - 0.0018) = -0.172; African, √(0.0025 - 0.0018) = -0.026. From Table 1, the corresponding numbers are 0.184, -0.160 and -0.024, respectively. Thus, the regression results agree remarkably well with the observed site ancestry differences.

To explore the effect of assortative mating on individual loci, we calculated F values, both for the spouses themselves (within individual correlation) and between spouses (between spouse correlation), as described in Materials and methods. The value F₁ represents the within spouse allelic correlation, which is derived from the excess of homozygosity among the spouses. The value F₂ represents the between spouse allelic correlation obtained by sampling one allele from each parent at random, which is also an estimate of the expected value of F₁ for the children of these spouse pairs (see Materials and methods). Thus, the two values of F allow us to compare the effect of assortative mating across two generations.

The mean values of F₁ and F₂ are given in Table 3, stratified by ethnicity and recruitment site. The mean of all F values are significantly greater than 0, although the largest values are observed for F₂ in Mexicans and F₁ in Puerto Ricans. For Mexicans, the overall F₁ and F₂ values appear reasonably consistent between generations (0.0161 for F₁ and 0.0172 for F₂). However, for Puerto Ricans, the overall F values appear higher within spouses (F₁ of 0.0256) compared to between spouses (F₂ of 0.0085). This may indicate a decrease in spouse correlation between the generations, but requires additional investigation.

Table 3 Mean (standard error) values of allelic correlation within spouses (F₁) and between spouses (F₂)

Full size table

We next undertook an analysis to determine the degree to which the significant F values could be attributed to ancestry assortative mating. We did so by linear regression, allowing the F value to be the dependent variable and three independent variables denoted as δ_ij²/p*q*, where the i, j subscripts refer to the three possible combinations of the ancestral African, European and Native American populations and p* is the allele frequency in the admixed population (see Materials and methods).

Results are provided in Table 4 (for F₁) and Table 5 (for F₂). Among the Mexicans, it appears that the F₁ values are fully explained by the standardized Native American-European squared delta values of the markers, which were significant for the Bay Area Mexicans and for both groups combined. In these analyses, the intercept term was not different from 0, indicating that the F₁ distribution was fully explained by the covariate. In the analysis of F₂, the results were not as clear cut, although again it appears that the Native American-European delta values explain much of the excess. In the analysis including all three delta terms, none were significant in any of the analyses, although the coefficients for the Native American-European delta tended to be largest. However, in analyses including only the Native American-European delta term, this covariate was significant in the analysis of the Bay Area Mexicans and both sites combined. In the final analysis of both groups combined, the intercept term is largely diminished, although still marginally significantly greater than 0.

Table 4 Regressions of F₁ on δ²/p*q*

Full size table

Table 5 Regressions of F₂ on δ²/p*q*

Full size table

Regression analyses on Puerto Rican F₁ values yielded less clear-cut results. As expected, the largest regression coefficients were for African-European delta terms, although none were formally significant, in the analyses of single sites or for the two sites combined. Also, it appears that the ancestral deltas do not fully explain the excess of homozygosity at these markers. As seen in Tables 4 and 5, the F₂ values were not as extreme as the F₁ values, and none of the regression coefficients were significant, although again the largest regression coefficient tended to be for African-European delta terms. After regression, there was no significant intercept term remaining.

As described in Materials and methods, the pairwise sums of regression coefficients provide estimates of the three spouse covariances in ancestry. For the Mexicans we analyzed the two recruitment sites separately, to avoid inflation of spouse covariance due to average ancestry differences between sites. From Table 4, for the regression analysis on F₁ we estimate the following ancestry covariances for Mexico City: Native American, 0.0125 + 0.0054 = 0.0179; European, 0.0125 - 0.0047 = 0.0078; African, 0.0054 - 0.0047 = 0.0007. For the regression analysis on F₂, the corresponding covariance estimates are: Native American, 0.0141 + 0.0034 = 0.0175; European, 0.0141 - 0.0028 = 0.0113; African, 0.0034 - 0.0028 = 0.0006. The corresponding observed spouse covariances in ancestry derived from Tables 1 and 2 for Mexico City are: Native American, 0.0190; European, 0.0168; African, -0.0001. Thus, the regression-based estimates for Native American ancestry spouse covariance are quite close to the observed, but the regression-based estimate for European ancestry covariance is somewhat below the observed. For the Bay Area Mexicans, the regression-based covariance estimates for F₁ are: Native American, 0.0168 + 0.0033 = 0.0201; European, 0.0168 - 0.0038 = 0.0130; African, 0.0033 - 0.0038 = -0.0005. For the corresponding regression analysis on F₂, we estimate: Native American, 0.0135 - 0.0011 = 0.0124; European, 0.0135 + 0.0004 = 0.0139; African, 0.0004 - 0.0011 = -0.0007. The corresponding observed spouse covariances for Bay Area Mexicans are: Native American, 0.0083; European, 0.0093; African, 0. Here the regression-based estimates appear to somewhat overestimate the actual covariances for Native American and European ancestry. All analyses regarding covariances for African ancestry are consistent in showing no evidence of correlation.

We repeated the same analysis in the Puerto Ricans, but for the two recruitment sites combined. From Table 4, for the regression analysis on F₁ we estimated the following ancestry covariances: African, 0.0131 - 0.0006 = 0.0125; European, 0.0131 + 0.0064 = 0.0195; Native American, 0.0064 - 0.0006 = 0.0058. For the regression analysis on F₂, the corresponding covariance estimates are: African, 0.0028 + 0.0024 = 0.0052; European, 0.0028 - 0.0002 = 0.0026; Native American, 0.0024 - 0.0002 = 0.0022. The corresponding observed spouse covariances in ancestry from Tables 1 and 2 for Puerto Ricans are: African, 0.0059; European, 0.0048; Native American, 0. The F₂ regression-based estimates of spouse covariance for African and European ancestry are comparable to the observed (with a somewhat underestimated European ancestry correlation), while the F₁ regression-based estimates are higher. This suggests (as does the overall higher mean value for F₁ than F₂) that the assortative mating in Puerto Ricans was stronger in the prior generation than in the current one.

To determine whether the excess average F₁ and F₂ values might be attributable to specific genomic locations, we created a Q-Q (quantile-quantile) plot of regression residuals against a normal distribution (Figure S1a for Mexicans and S1b for Puerto Ricans in Additional data file 2). In both figures the observed distributions match closely to the expected. Hence, the homozygote excess appears to be a global phenomenon.

Results of the inter-locus (LD) analysis were strikingly different from the single locus analyses. A clear excess of significant chi-square tests was observed in each ethnic group and recruitment site (Table 6). Approximately 15% of tests were found to be significant at the 5% level of significance. Regression analyses of the standardized squared-delta products (for each of the two marker loci involved) were quite revealing (Table S7 in Additional data file 1). For the Mexicans, the European-Native American standardized delta products were extremely predictive of the chi-square, in contrast to the two other delta product covariates. After regression, the intercept terms were greatly attenuated from the corresponding mean chi-squares in Table 6, although still significantly greater than 1. The Puerto Ricans showed a similar pattern, except that the highly significant covariate term in this case was for the African-European squared delta product term (Table S7 in Additional data file 1). As for the Mexicans, the intercept terms were greatly diminished from the corresponding mean values in Table 6, although still somewhat greater than 1. These results show that the primary driver of LD between unlinked loci in this population is ancestral delta values - between Europeans and Native Americans for the Mexicans, and between Africans and Europeans for the Puerto Ricans.

Table 6 Chi-square tests of linkage disequilibrium between pairs of markers for spouses combined

Full size table

To search for possible regions with excess LD, we performed another regression analysis, this time on the LD parameter D as a function of the unstandardized delta products (Table 7). As seen previously for the regression analysis of chi-square, the European-Native American deltas were highly significant for the Mexicans, while the African-European deltas were highly predictive for the Puerto Ricans. We then examined the distribution of residuals from the regression by creating a Q-Q plot against a normal distribution (Figure S2 in Additional data file 2). While the overall fit to a normal distribution appears good for both the Mexicans and Puerto Ricans, there do appear to be a few possible outlier points on both ends. The marker pairs involved in the most extreme points (with Z scores greater than +4 or less than -4) are given in Table S8 in Additional data file 1. The most extreme point occurred in Mexicans (Z = +5.09) for markers on chromosomes 2p and 3p. We note that the same pair of markers gave a Z score of +1.10 in the Puerto Ricans. The marker pair on chromosomes 1p and 2q, which gave a Z score of -4.08 in Mexicans, also had a nominally significant Z score in Puerto Ricans (-2.40), while the pair on chromosomes 1p and 17p (Z score of -4.09 in Mexicans) also had a nominally significant Z score in Puerto Ricans, but in the opposite direction (Z = +2.42).

Table 7 Regression of linkage disequilibrium parameter D on δ₁δ₂

Full size table

We next projected the reduction in ancestry variance over time (see Materials and methods). The results are shown in Figure 4, where we have plotted the proportion of original variance, V_t/V₀ against generation. For a constant spouse correlation over time, the variance decreases most rapidly, and is around 10% of its original value after just five generations (for c = 0.3, corresponding to Puerto Ricans) or seven generations (for c = 0.4, corresponding to Mexicans). By contrast, for the linear model (c = 1-at), and the exponential model (c = e^-bt), the rate of decline of V is slower; a reduction to 10% of the original value occurs between 10 and 13 generations, depending on the model parameters.

To determine the compatibility of the curves in Figure 4 with our own data, we calculated V_t/V₀ and r_t for the current generation of spouses. From the means (α) and standard deviations (√V) in Table 1, we derived values of V_t/V₀ of approximately 0.11 for European and Native American ancestry in Mexicans and 0.08 for African and European ancestry in Puerto Ricans. By contrast, the proportion of original variance for African ancestry in Mexicans is only 0.02, and for Native American ancestry in Puerto Ricans the value is 0.03. These lower values are consistent with the more modest spouse correlations observed for these ancestry components. All these variance ratios may be slightly inflated due to statistical noise in ancestry estimation. Because there was no correlation of African ancestry in the Mexican spouses, we assumed that the variance observed for African ancestry (0.0016) was primarily due to estimation error, since the actual variance would have decreased rapidly by this point in time. Adjusting the values of V_t/V₀ given above for this amount of error variance (an upper bound) reduced the ratios to 0.10 for European and Native American ancestry in Mexicans, and 0.07 for African and European ancestry in Puerto Ricans.

To estimate r_t, we need to project the value of the LD parameter D to marker loci that are completely informative for ancestry (that is, allele frequency of 1 in one ancestral population and 0 in the other), which corresponds to δ values of 1 for both markers. From the regression results presented in Table 7, we can estimate D for δ = 1 by simply using the regression coefficient of δ₁δ₂. For Mexicans combined, D = 0.0402. To obtain the value of r_t, we then need to divide D by α(1 - α), because α and 1 - α correspond to the allele frequencies for a marker that is completely informative for ancestry (δ = 1). Using the mean ancestry values of Table 1 as α, we derive an approximate r_t value of 0.16. For Puerto Ricans, the value of D is 0.0283; dividing by α(1 - α), we obtain a value of 0.12. We can rearrange the formula for V_t given in Materials and methods to V_t/V₀ = r_t/(2 - c_t) and c_t = 2 - r_t/(V_t/V₀). Using the values above for V_t/V₀ and r_t, for Mexicans we obtain c_t = 2 - 0.16/0.10 = 0.40; for Puerto Ricans we obtain c_t = 2 - 0.12/0.07 = 0.29. These values are close to the observed spouse correlations in ancestry in Table 2. Referring back to Figure 4, we see that our results are consistent with a model of decreasing spouse ancestry correlation over a period of about 9 to 13 generations for Mexicans and 10 to 14 generations for Puerto Ricans. The same formulas given above can also be adapted for linked markers [26]. The assortative mating we observed is expected to enhance the LD between linked markers to an even greater extent than for unlinked markers.

Discussion

It is of interest to compare our results to those of prior authors who have studied tri-racial populations of northeastern Brazil. Although Krieger et al. [24] studied 17 genetic polymorphisms, they did not estimate ancestry at an individual level, but rather within 7 'racial classes' based on a graded scale from 0 to 8 of physical characteristics. However, based on their compilation of spouse pairs for the 7 categories [24] and their estimates of genetic ancestry within each of these categories, we obtained a spouse correlation of 0.46 for African ancestry and 0.45 for European ancestry. These results are comparable to what we observed among the Puerto Ricans, although the Brazilian correlations are somewhat higher. These spouse correlations are also similar to a correlation between spouses of the scale scores derived based on physical characteristics (0.46). This is not surprising, given the very strong correlation between genetically estimated African (European) ancestry and their eight-point scale (correlation = 0.98).

A more recent study by Azevêdo et al. [20] examined subjects from the same region of northeastern Brazil, but only used a five-point observed scale of ancestry without genetic markers. However, the spouse correlation in the five-point scale in their data (correlation = 0.47) is quite comparable to that observed in the earlier study from the same region [24].

An important question relates to the actual trait or traits underlying mate selection leading to the spouse correlation in ancestry in these populations. Ancestry is not directly observed, but estimated from genetic markers. One possibility is social, whereby ancestry is associated with social position, and marriages occur within social strata. However, we found only a modest relationship, at best, between SES and ancestry in our study, and the regression of wife's ancestry on husband's ancestry was undiminished when SES was included in the model. Another possibility is geographic origins. If mates are preferentially chosen locally, an ancestry correlation would be induced if ancestry varies geographically. However, among the Puerto Ricans in our study, we found no significant difference between those from New York City and those from Puerto Rico, and also previously found only modest ancestral variation across recruitment sites in Puerto Rico [27]. Re-examining the geographic variation in ancestry in our Puerto Rican subjects [27], we estimate that a spouse correlation of 6 to 8% in African or European ancestry could be induced by such variation; however, this is far short of what we observed, although geographic ancestry variation could be one modest contributor to the observed spouse correlation, assuming that mating preferentially occurs locally.

Among the Mexicans in our study, we noted greater European and lower Native American ancestry among those recruited in the Bay Area than those recruited in Mexico City. Because of this, combining all Mexicans together did increase somewhat the spouse correlations in ancestry; however, the spouse correlations within recruitment sites were nearly as strong. Thus, it appears that geographic heterogeneity in ancestry alone cannot explain the spouse correlations. Another possibility involves physical characteristics, such as skin pigment, hair texture, eye color, and other physical features. Certainly, these traits are correlated with ancestry and are likely to be factors in mate selection. However, the spouse correlation for these traits must be high and the correlation of these traits with ancestry must also be high to explain the observed ancestry correlations. For example, denote the spouse correlation in ancestry by c, the spouse trait correlation by u, and the ancestry-trait correlation by w; then w = √(c/u). If the spouse trait correlation is 0.6 (a reasonably high value), then for a spouse ancestry correlation of 0.3 (Puerto Ricans), the trait-ancestry correlation is 0.7; for a spouse ancestry correlation of 0.4 (Mexicans), the trait-ancestry correlation is 0.8. Previous studies on assortative mating in Latin American groups have retrieved correlation coefficients of 0.29 to 0.46 for education level, 0.48 for skin reflectance, 0.07 to 0.18 for eye and hair color, and 0.16 to 0.24 for different anthropometric measurements [17, 18, 21].

We also note that the spouses in our study were parents of children with asthma. However, it is unlikely that this selection process has contributed to the spouse correlation because the correlation of genetic ancestry with asthma is only modest, at best [28]. A final assessment of the degree to which these and/or other physical traits may underlie the spouse ancestral correlations observed here requires assessment of these traits within spouse pairs along with ancestry informative markers.

The number of generations since admixing we derived from models allowing for a decrease in spouse ancestry correlation over time is clearly more consistent with the known demographic history of Mexicans and Puerto Ricans [29], and suggests that ancestry assortative mating was even stronger historically than observed in the most recent generations. Although admixing between the indigenous American, European and African populations started to occur in the centuries after the arrival of Columbus and the subsequent importation of slaves from Africa, continuous and large scale migrations to the Americas from Europe continued through the 17th, 18th and 19th centuries. Similarly, the slave trade from Africa continued through the 18th and 19th centuries. Thus, 9 to 14 generations, which corresponds approximately to 225 to 350 years, appears consistent with the general time frame over which the admixing started to occur in substantial numbers, giving rise to the admixed Mestizo populations of Mexico and Puerto Rico [14, 30, 31].

Conclusions

We have shown that mating within contemporary Latino populations does not occur at random with regard to ancestry. While both Mexicans and Puerto Ricans show positive assortative mating for ancestry, the pattern between the two populations is quite different. Among Mexicans, the strongest spouse correlations relate to the proportion of Native American and European ancestry, while amount of African ancestry appears to have little impact on mate choice. This is not surprising, given the modest overall level of African ancestry in this population. By contrast, among Puerto Ricans, the strong assortative mating relates to African and European ancestry, while Native American ancestry appears not to contribute to the correlation. While Native American in this population is the smallest ancestral component on average (14%), it is not dramatically less than the average of African ancestry (23%), yet the spouse correlations for these ancestries is dramatically different. Moreover, we did not find any evidence of ancestry asymmetry in the mating patterns. Some authors have described assortative mating by skin color in Latin American populations but with a male preference for lighter-skinned women [16–20]. In our results, there is no evidence of any directionality in partner choice. Ancestry correlation was observed to be a global phenomenon of the genome and not restricted to a few loci.

Our results also reiterate that ancestry variation in Latino populations can be a strong confounder in genetic association studies [32]. As we have shown above, the amount of LD between unlinked markers is directly related to both the ancestry delta values and the variance in ancestry. Assortative mating in these Latino populations will continue to maintain both the ancestry variance and LD over time. However, the patterns observed in these two Latino populations are quite distinct, reflecting strong LD between markers that differentiate Europeans and Native Americans among the Mexicans, versus strong LD between markers that differentiate Europeans and Africans among the Puerto Ricans. It will be of considerable interest to investigate other Latino populations who have varying degrees of African, European and Native American ancestry.

Materials and methods

Subjects

The subjects included in this study are part of the Genetics of Asthma in Latino Americans (GALA) study and have been described previously [33]. Subjects are of Mexican and Puerto Rican ethnicity and are parents of childhood asthma patients. Mexican spouse pairs were recruited from both Mexico City and the San Francisco Bay Area. Puerto Rican spouse pairs were recruited from both New York City and from Puerto Rico. Both spouses self-identified as Mexican and all four parents of the spouse pair were identified as Mexican for the Mexico City and Bay Area recruitment sites. For the New York City and Puerto Rico sites, both spouses self-identified as Puerto Rican, and all four parents of spouses were identified as Puerto Rican. The present analysis included 91 Mexican spouse pairs from Mexico City and 194 spouse pairs from the Bay Area for a total of 285 Mexican spouse pairs; there were 154 Puerto Rican spouse pairs from New York and 223 pairs from Puerto Rico, for a total of 377 Puerto Rican spouse pairs.

All subjects provided written informed consent for blood donation and genotyping. The study protocol was approved by the UCSF Committee on Human Research.

Assessment of socioeconomic status

We used census tract geocoding of income as the basis for SES characterizations of subjects as previously described [27]. The Federal Financial Institutions Examination Council has provided a geocoding/mapping system for this purpose [34]. Census tracts are characterized as low, moderate, middle or upper based on median family income for that census tract compared to median income of the entire metropolitan area. For Puerto Rican subjects from Puerto Rico, SES was defined in terms of the location of the recruitment center; for Mexican subjects from the Bay Area, SES was defined in terms of home residence location.

Selection of ancestry informative markers

AIMs were selected as described [35]. In brief, biallelic single nucleotide polymorphisms (SNPs) were chosen from an Affymetrix 100K SNP chip panel that showed large allele frequency differences (δ of at least 0.5) between pairs of African, European or Native American populations. For the present analysis 107 markers were selected that were widely spaced across all chromosomes, so as to avoid LD in the ancestral populations. A full list of these markers and corresponding chromosome location has been given [35].

Genotyping

Marker genotyping was performed at the Functional Genomics Core, Children's Hospital Oakland Research Institute as described previously [35]. Briefly, four multiplex PCR assays containing 28, 27, 26, and 26 SNPs, respectively, were performed, followed by single-base primer extensions using iPLEX enzyme and buffers (Sequenom, San Diego, CA, USA). Primer extension products were measured with the MassARRAY Compact System (Sequenom), and mass spectra analyzed using TYPER software (Sequenom) to generate genotype calls.

Quality control was performed on the genotype calls for all Mexican and Puerto Rican subjects. Genotype call rates were generally high and reproducible. The average call rate was 97.6%, and all included markers had a call rate of at least 92%. Three markers were excluded that had call rates below 90% (rs10498919, rs2569029, rs798887), leaving 104 AIMs for subsequent analyses. The final list of markers and their chromosomal locations is given in Table S9 in Additional data file 1.

Analytic methods

Surrogate ancestral populations were used in this analysis to characterize ancestral allele frequencies for IA estimation. These samples included 37 West Africans, 42 European Americans and 30 Native Americans [35]. We calculated δ values between allele frequencies for each pair of ancestral populations for all of the markers. For the African versus European groups, the median δ value was 0.56, and 65% of values were greater than 0.30; for the African versus Native American groups, the median δ was 0.71, and 83% were greater than 0.30; for the European versus Native American populations, the median δ was 0.47, and 59% were greater than 0.30. With this number of markers and distribution of δ values, it is predicted that estimated genome-wide IA values are at least 90% correlated with actual values [36].

Estimation of ancestry

To estimate individual ancestries, we used the program Structure 2.1 [37, 38] using the 104 AIMs described above. Structure was run using the admixture model with unlinked markers, with 50,000 burn-in iterations and 50,000 further iterations. We assumed three ancestral populations, African, European and Native American, and included genotype data on the ancestral populations previously described. The program was run four times, once each for Mexican woman, Mexican men, Puerto Rican women and Puerto Rican men. We analyzed the men and women separately due to possible correlations between spouses. The implementation was similar to what we have done previously [27]. To confirm that the use of three ancestral populations was appropriate, we examined the distribution of LnP(D) for K = 2, 3, 4 and 5. There was a large difference in LnP(D) between K = 2 and K = 3, but not between K = 3 and K = 4 or K = 5. Thus, the optimal value of K for these data was determined to be K = 3. However, this is not surprising as the markers were AIMs and therefore specifically selected to have large allele frequency differences between the three ancestral populations.

t-tests

Mean ancestries were compared across groups defined by site, gender and SES using t-tests.

Interclass correlations

Pearson interclass correlations were calculated between ancestries within individuals. Similarly, interclass correlations in ancestry between spouses were calculated. Because means and variances of ancestry were similar in men and women, we also calculated intraclass correlations between spouses. However, these results were virtually identical to the interclass correlations.

Single locus analyses

Allele frequency differences between groups were calculated using standard chi-square tests. We tested for Hardy Weinberg equilibrium at marker loci by using the Z-statistic

where n₂ and n₀ are the number of homozygotes and n₁ the number of heterozygotes observed; N = n₂ + n₁ + n₀. Under the null hypothesis of no within-locus allelic correlation, Z has a normal distribution with mean 0 and variance 1. We chose to use a one-sided test as opposed to a two-sided chi-square test because we specifically were searching for an excess of homozygotes, as predicted by assortative mating.

Related to Z is the within-locus intraclass allelic correlation F, given by:

Note that Z = F√N. Also, 1 - F represents the proportionate decrease in heterozygosity versus expected under random mating. In future discussion, we refer to this value of F as F₁, to denote correlation within the first generation (that is, within spouses).

To examine allelic correlations between spouses, we calculated a similar statistic to F. First, we calculated the intraclass correlation ρ for the number of 'B' alleles (0, 1 or 2) in the spouse pairs (assume a biallelic locus with alleles B and b). However, because we are correlating two alleles between the spouses, this correlation is not directly comparable to the F value defined within individuals defined above. Hence, to derive a comparable statistic, we created a variable F₂, defined as the expected intraclass correlation for single alleles selected at random from the two spouses. It can be shown that F₂ = ρ (1 + F₁)/2. As F₁ values are generally modest, often F₂ will be approximately half the intraclass correlation ρ.

For comparison, we also calculated interclass correlations for the spouse pairs, which allows for unequal allele frequencies between the two spouses. Because the genotype distributions in wives and husbands were generally extremely similar, the interclass correlations were nearly identical to the intraclass correlations (correlation between correlations ranging from 0.997 to 0.999).

Pairwise locus analyses

For pairs of markers, we calculated non-independence of genotype using a likelihood ratio chi-square test, where the double heterozygotes were estimated using maximum likelihood. We also calculated the LD parameter D. Both calculations were performed using the computer package PLINK [39].

Linear regressions to estimate effects of ancestry assortative mating

A major goal of this analysis was to examine how genetic structure in Latino populations is influenced by ancestry-related assortative mating. One way to characterize the structure is by examining intra-locus correlations (F statistics) and inter-locus correlations, or correlations between markers (LD parameters r and D). We therefore derived formulas relating the spouse ancestry correlations to expected patterns of allele frequency difference between recruitment sites, F statistics, and D statistics.

First we consider chi-square statistics for allele frequency differences between sites. Let π_k represent the frequency of a marker allele in ancestral population k, where k ranges from 1 to 3, the total number of ancestral populations. Define δ₁ = π₁ - π₂, δ₂ = π₁ - π₃ and δ₃ = π₂ - π₃. Note that δ₂ = δ₁ + δ₃, so that 2δ₁δ₃ = δ₂² - δ₁² - δ₃², a formula we will use later. Further, let α_k represent the proportionate ancestry from population k to the admixed population for the first recruitment site, and β_k represent the proportionate ancestry from population k for the second recruitment site, and let ε_k = α_k - β_k. Note that ε₁ + ε₂ + ε₃ = 0. The chi-square statistic for allele frequency difference between site 1 and site 2 is given by:

(1)

where:

(2)

p₁' and p₂' are the allele frequencies in groups 1 and 2, N₁ and N₂ are the number of individuals in groups 1 and 2, p* = (N₁p₁' + N₂p₂')/(N₁ + N₂) and Var represents variance.

Assuming a fixed value for the denominator, we can calculate the expectation (Exp) of the numerator of × ² in Equation 1 above as:

Dividing this equation by Var(p₁' - p₂') gives the approximation:

(3)

The numerator in Equation 3 is given by:

(4)

Equation 4 shows that Equation 3 for the expectation of χ² can be fit with a linear model in terms of the three covariates, δ_i²/Var(p₁' - p₂') for i = 1 to 3 via linear regression. If we specify the estimated regression coefficient of δ_i²/Var(p₁' - p₂') as a_i, then from the derived regression coefficients we can estimate ε₁ as √(a₁ + a₃), ε₃ as √(a₂ + a₃), and ε₂ = √(a₁ + a₂).

We next consider regression analyses on the statistic F. Recall that F represents the correlation between alleles at a given locus. Consider again a locus with two alleles B and b. Define the binomial random variable S to be 1 if the maternally transmitted allele is B and 0 if b; similarly, define T to be 1 if the paternally transmitted allele is B and 0 if b. Then F can be defined as Cov(S, T)/p*q* where p* is the frequency of B in the combined set of parents and q* = 1 - p* and Cov is covariance. In the analysis of F₁, p* simply represents the frequency of allele B in the pool of individuals; in the analysis of F₂, p* represents the frequency of allele B in the pool of spouses combined. Next define the random variable X_i as the proportionate ancestry from population i in the wife and Y_i as the proportionate ancestry from population i in the husband, where i ranges from 1 to 3. Note that X₁ + X₂ + X₃ = Y₁ + Y₂ + Y₃ = 1. Then the random variables S and T can be defined as S = π₁X₁ + π₂X₂ + π₃X₃ and T = π₁Y₁ + π₂Y₂ + π₃Y₃, respectively. Then, because π₂ is constant, Cov(S, T) = Cov(π₁X₁ + π₂X₂ + π₃X₃, π₁Y₁ + π₂Y₂ + π₃Y₃) = Cov(π₁X₁ + π₂X₂ + π₃X₃ - π₂, π₁Y₁ + π₂Y₂ + π₃Y₃ - π₂) = Cov((π₁ - π₂)X₁ + (π₃ - π₂)X₃, (π₁ - π₂)Y₁ + (π₃ - π₂)Y₃) = Cov(δ₁X₁ - δ₃X₃, δ₁Y₁ - δ₃Y₃) = δ₁²Cov(X₁, Y₁) + δ₃²Cov(X₃, Y₃) - 2δ₁δ₃Cov(X₁, Y₃), assuming Cov(X₁, Y₃) = Cov(X₃, Y₁). Now define κ_ii = Cov(X_i, Y_i) and κ_ij = Cov(X_i, Y_j) for i, j = 1 to 3. Then again noting that δ₂ = δ₁ + δ₃, we have Cov(S, T) = δ₁²κ₁₁ + δ₃²κ₃₃ + (δ₁² + δ₃² - δ₂²)κ₁₃ = (κ₁₁ + κ₁₃)δ₁² + (κ₃₃ + κ₁₃)δ₃² - κ₁₃δ₂². Therefore, assuming the denominator p*q* is fixed, F is a linear function of the δ_i²/p*q*, whose coefficients can be estimated by linear regression. In this case, the coefficients a_i of δ_i²/p*q* are given by a₁ = κ₁₁ + κ₁₃, a₃ = κ₃₃ + κ₁₃ and a₂ = -κ₁₃. Then note that a₁ + a₂ = κ₁₁, a₂ + a₃ = κ₃₃, and a₁ + a₃ = κ₁₁ + κ₃₃ + 2κ₁₃ = Cov (X₁ + X₃, Y₁ + Y₃) = Cov(1 - X₂,1 - Y₂) = Cov(X₂, Y₂) = κ₂₂. The same linear model and regression coefficients apply to both F₁ and F₂, as defined above.

Finally, we consider regression analysis on the LD statistic D. In this case, we examine the co-occurrence of alleles at two loci. Thus, consider loci B₁ and B₂, with alleles B₁, b₁ at locus B₁ and B₂, b₂ at locus B₂. Define the random variable S corresponding to locus B₁ so that S = 1 if allele B₁ occurs, and 0 if allele b₁. Define the random variable U similarly for locus B₂, so that U = 1 if allele B₂ occurs, and 0 if b₂. The LD parameter D is defined as Cov(S, U), and χ² = N [Corr(S, U)]² where N is the number of individuals and Corr is correlation. Also, Corr(S, U) = Cov(S, U)/[Var(S)Var(U)]^1/2, Var(S) = p*q*, Var(U) = r*s* where p* is the frequency of B₁, q* = 1 - p*, r* is the frequency of B₂ and s* = 1 - r*. Therefore, χ² = ND²/p*q*r*s. For a given individual, assume her(his) three ancestry proportions are represented by the random variables X_i, where i ranges from 1 to 3. Assume the allele frequency of B₁ in the three ancestral populations is represented by π_i, for i = 1,3; similarly, the allele frequency of B₂ in the three ancestral populations is represented by τ_i, for i = 1,3. As before, let δ₁ = π₁ - π₂, δ₂ = π₁ - π₃, and δ₃ = π₂ - π₃. By analogy, we define the ancestral allele frequency differences for the B₂ locus by φ₁ = τ₁ - τ₂, φ₂ = τ₁ - τ₃, and φ₃ = τ₂ - τ₃. Given the proportions X_i, D = Cov(S, U) = Cov(π₁X₁ + π₂X₂ + π₃X₃, τ₁X₁ + τ₂X₂ + τ₃X₃). As before, subtracting the constant π₂ from the first term and τ₂ from the second term, respectively, gives D = Cov((π₁ - π₂)X₁ + (π₃ - π₂)X₃, (τ₁ - τ₂)X₁ + (τ₃ - τ₂)X₃) = Cov(δ₁X₁ - δ₃X₃, φ₁X₁ - φ₃X₃) = δ₁φ₁Var(X₁) + δ₃φ₃Var(X₃) + (δ₁φ₃ + δ₃φ₁)Cov(X₁, X₃). Because Var(X₂) = Var(1 - X₂) = Var(X₁ + X₃) = Var(X₁) + Var(X₃) + 2Cov(X₁, X₃), and δ₁φ₃ + δ₃φ₁ = δ₂φ₂ - δ₁φ₁ - δ₃φ₃, D = δ₁φ₁Var(X₁) + δ₃φ₃Var(X₃) + (δ₂φ₂ - δ₁φ₁ - δ₃φ₃)(Var(X₂) - Var(X₁) - Var(X₃))/2 = δ₁φ₁(Var(X₁) + Var(X₂) - Var(X₃))/2 + δ₃φ₃(Var(X₃) + Var(X₂) - Var(X₁))/2 + δ₂φ₂(Var(X₁) + Var(X₃) - Var(X₂))/2. In this case, D is a linear function of the δ_iφ_i for i = 1,3; by linear regression, the coefficients of these terms can be estimated, and are notated as a_i for i = 1,3. As previously, the regression coefficients can be related to the variances in ancestry by the equations: a1 + a2 = Var(X₂); a1 + a3 = Var(X₁); and a2 + a3 = Var(X₃).

Decrease of ancestry variance over time

In theory, the variation in ancestry should decrease from one generation to the next due to recombination between loci. However, the rate of decline will be diminished when there is assortative mating in ancestry. In fact, there is a direct quantitative relationship between the strength of LD between loci, the ancestry variance, and the degree of assortative mating for ancestry over time [26]. Specifically, let c_t denote the spouse ancestry correlation in generation t, V_t denote the variance in ancestry at generation t, and r_t denote the correlation of alleles selected at random at two unlinked loci at generation t (equivalent to the LD parameter r). Let the average ancestry in the population be represented by α, which we assume to be unchanged over time. Note that α(1 - α) represents the variance of ancestry in the generation before admixing first occurred. Then, as shown by Crow and Kimura [26], V_t = α(1 - α)r_t/(2 - c_t) and r_t+1 = [r_t - 1/2_t-1(r_t - r_t-1)]/(2 - c_t-1). Notice from this formula that when the spouse correlation c is 0, the variance declines by a factor of 1/2 per generation, whereas when c is 1, there is no decline in variance. We iterated the formulas above over 15 generations using 3 different models for the ancestry correlation c: a model where c is constant, a model where c declines linearly over time, and a model where c decreases exponentially over time.

Additional data files

The following additional data files are available with the online version of this paper: supplementary Tables S1 to S9 (Additional data file 1); supplementary Figures S1 and S2 (Additional data file 2).

Abbreviations

AIM:: ancestry informative marker
Corr:: correlation
Cov:: covariance
Exp:: expectation
GALA:: Genetics of Asthma in Latino Americans
IA:: individual ancestry
LD:: linkage disequilibrium
Q-Q:: quantile-quantile
SES:: socioeconomic status
SNP:: single nucleotide polymorphism
Var:: variance.

References

Azevêdo ES, Morton NE, Miki C, Yee S: Distance and kinship in northeastern Brazil. Am J Hum Genet. 1969, 21: 1-22.
PubMed PubMed Central Google Scholar
Cavalli-Sforza LL: Genetic drift in an Italian population. Sci Am. 1969, 221: 30-37.
Article PubMed CAS Google Scholar
Salzano FM, Bortolini MC: The Evolution and Genetics of Latin American Populations. 2002, Cambridge, New York: Cambridge University Press
Google Scholar
Buss DM: Human mate selection. Am Sci. 1985, 73: 47-51.
Google Scholar
Nagoshi CT, Johnson RC, Danko GP: Assortative mating for cultural identification as indicated by language use. Behav Genet. 1990, 20: 23-31. 10.1007/BF01070737.
Article PubMed CAS Google Scholar
Sanchez-Andres A, Mesa MS: Assortative mating in a Spanish population: effects of social factors and cohabitation time. J Biosoc Sci. 1994, 26: 441-450. 10.1017/S0021932000021581.
Article PubMed CAS Google Scholar
Hur YM: Assortative mating for personality traits, educational level, religious affiliation, height, weight, and body mass index in parents of Korean twin sample. Twin Res. 2003, 6: 467-470. 10.1375/136905203322686446.
Article PubMed Google Scholar
Salces I, Rebato E, Susanne C: Evidence of phenotypic and social assortative maring for anthropometric and physiological traits in couples from the Basque country (Spain). J Biosoc Sci. 2004, 36: 235-250. 10.1017/S0021932003006187.
Article PubMed CAS Google Scholar
Esteve A, Cortina C: Changes in educational assortative mating in contemporary Spain. Demogr Res. 2006, 14: 405-428. 10.4054/DemRes.2006.14.17.
Article Google Scholar
Merikangas KR: Assortative mating for psychiatric disorders and psychological traits. Arch Gen Psychiatry. 1982, 39: 1173-1180.
Article PubMed CAS Google Scholar
Yasuda N: An extension of Wahlund's principle to evaluate mating type frequency. Am J Hum Genet. 1968, 20: 1-23.
PubMed CAS PubMed Central Google Scholar
Crow JF, Felsenstein J: The effect of assortative mating on the genetic composition of a population. Soc Biol. 1982, 29: 22-35.
PubMed CAS Google Scholar
Redden DT, Allison DB: The effect of assortative mating upon genetic association studies: spurious associations and population substructure in the absence of admixture. Behav Genet. 2006, 36: 678-686. 10.1007/s10519-006-9060-0.
Article PubMed Google Scholar
Sánchez-Albornoz N: The Population of Latin America; a History. 1974, Berkeley, CA: University of California Press
Google Scholar
Stolcke V: Marriage, Class and Colour in Nineteenth-Century Cuba; a Study of Racial Attitudes and Sexual Values in a Slave Society. 1974, London, New York: Cambridge University Press
Google Scholar
Silva NV: Distância social e casamento inter-racial no Brasil. Est Afro-Asiat. 1987, 14: 54-83.
Google Scholar
Frisancho AR, Wainwright R, Way A: Heritability and components of phenotypic expression in skin reflectance of Mestizos from the Peruvian lowlands. Am J Phys Anthropol. 1981, 55: 203-208. 10.1002/ajpa.1330550207.
Article PubMed CAS Google Scholar
Malina RM, Selby HA, Buschang PH, Aronson WL, Little BB: Assortative mating for phenotypic characteristics in a Zapotec community in Oaxaca, Mexico. J Biosoc Sci. 1983, 15: 273-280. 10.1017/S0021932000014619.
Article PubMed CAS Google Scholar
Trachtenberg A, Stark AE, Salzano FM, DaRocha FJ: Canonical correlation analysis of assortative mating in two groups of Brazilians. J Biosoc Sci. 1985, 17: 389-403. 10.1017/S0021932000015911.
Article PubMed CAS Google Scholar
Azevêdo ES, Chautard-Freire-Maia EA, Freire-Maia N, Mascarenhas Fortuna CM, Abe K, das Gracas Santos M, Leal Barbosa AA, Torres Silva ME, Faraildes Costa A: Mating types in a mixed and multicultural population of Salvador, Brazil. Rev Brasil Genet. 1986, IX: 487-496.
Google Scholar
Procidano ME, Rogler LH: Homogamous assortative mating among Puerto Rican families: intergenerational processes and the migration experience. Behav Genet. 1989, 19: 343-354. 10.1007/BF01066163.
Article PubMed CAS Google Scholar
Silva NV: Uma nota sobre "raça social" no Brasil. Est Afro-Asiat. 1994, 26: 67-80.
Google Scholar
Morton NE: Genetic studies of northeastern Brazil. Cold Spring Harbor Symp Quant Biol. 1964, 29: 69-79.
Article PubMed CAS Google Scholar
Krieger H, Morton NE, Mi MP, Azevedo E, Freiere-Maia A, Yasuda N: Racial admixture in northeastern Brazil. Ann Hum Genet. 1965, 29: 113-125. 10.1111/j.1469-1809.1965.tb00507.x.
Article PubMed CAS Google Scholar
Yasuda N: The inbreeding coefficient of northeastern Brazil. Hum Hered. 1969, 19: 444-456. 10.1159/000152253.
Article PubMed CAS Google Scholar
Crow JF, Kimura M: An Introduction to Population Genetics Theory. 1970, New York: Harper & Row
Google Scholar
Choudhry S, Burchard EG, Borrell LN, Tang H, Gomez I, Naqvi M, Nazario S, Torres A, Casal J, Martinez-Cruzado JC, Ziv E, Avila PC, Rodriguez-Cintron W, Risch NJ: Ancestry-environment interactions and asthma risk among Puerto Ricans. Am J Respir Crit Care Med. 2006, 174: 1088-1093. 10.1164/rccm.200605-596OC.
Article PubMed PubMed Central Google Scholar
Salari K, Choudhry S, Tang H, Naqvi M, Lind D, Avila PC, Coyle NE, Ung N, Nazario S, Casal J, Torres-Palacios A, Clark S, Phong A, Gomez I, Matallana H, Perez-Stable EJ, Shriver MD, Kwok PY, Sheppard D, Rodriguez-Cintron W, Risch NJ, Burchard EG, Ziv E: Genetic admixture and asthma-related phenotypes in Mexican American and Puerto Rican asthmatics. Genet Epidemiol. 29: 76-86. 10.1002/gepi.20079.
Gonzalez Burchard E, Borrell LN, Choudhry S, Naqvi M, Tsai HJ, Rodriguez-Santana JR, Chapela R, Rogers SD, Mei R, Rodriguez-Cintron W, Arena JF, Kittles R, Perez-Stable EJ, Ziv E, Risch N: Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research. Am J Public Health. 2005, 95: 2161-2168. 10.2105/AJPH.2005.068668.
Article PubMed PubMed Central Google Scholar
Alvarez Nazario M: El Elemento Afronegroide en el Español de Puerto Rico: Contribución al Estudio del Negro en América. 1974, San Juan de Puerto Rico: Instituto de Cultura Puertorriqueña
Google Scholar
Díaz Soler LM: Historia de la Esclavitud Negra en Puerto Rico. 2005, San Juan: Editorial de la Universidad de Puerto Rico
Google Scholar
Choudhry S, Coyle NE, Tang H, Salari K, Lind D, Clark SL, Tsai HJ, Naqvi M, Phong A, Ung N, Matallana H, Avila PC, Casal J, Torres A, Nazario S, Castro R, Battle NC, Perez-Stable EJ, Kwok PY, Sheppard D, Shriver MD, Rodriguez-Cintron W, Risch N, Ziv E, Burchard EG: Population stratification confounds genetic association studies among Latinos. Hum Genet. 2006, 118: 652-664. 10.1007/s00439-005-0071-3.
Article PubMed Google Scholar
Burchard EG, Avila PC, Nazario S, Casal J, Torres A, Rodriguez-Santana JR, Toscano M, Sylvia JS, Alioto M, Salazar M, Gomez I, Fagan JK, Salas J, Lilly C, Matallana H, Ziv E, Castro R, Selman M, Chapela R, Sheppard D, Weiss ST, Ford JG, Boushey HA, Rodriguez-Cintron W, Drazen JM, Silverman EK, Genetics of Asthma in Latino Populations (GALA) Study: Lower bronchodilator responsiveness in Puerto Rican than in Mexican subjects with asthma. Am J Respir Crit Care Med. 2004, 169: 386-392. 10.1164/rccm.200309-1293OC.
Article PubMed Google Scholar
Federal Financial Institutions Examination Council Geocoding System. [http://www.ffiec.gov/Geocode/default.aspx]
Yaeger R, Avila-Bront A, Abdul K, Nolan PC, Grann VR, Birchette MG, Choudhry S, Burchard EG, Beckman KB, Gorroochurn P, Ziv E, Consedine NS, Joe AK: Comparing genetic ancestry and self-described race in African Americans born in the United States and in Africa. Cancer Epidemiol Biomarkers Prev. 2008, 17: 1329-1338. 10.1158/1055-9965.EPI-07-2505.
Article PubMed CAS PubMed Central Google Scholar
Tsai HJ, Choudhry S, Naqvi M, Rodriguez-Cintron W, Burchard EG, Ziv E: Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Hum Genet. 2005, 118: 424-433. 10.1007/s00439-005-0067-z.
Article PubMed Google Scholar
Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.
PubMed CAS PubMed Central Google Scholar
Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003, 164: 1567-1587.
PubMed CAS PubMed Central Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender K, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Article PubMed CAS PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the families and the patients for their participation and the numerous health care providers and community clinics for their support and participation in the GALA Study. We would like to especially thank Jeffrey M Drazen, MD, Scott Weiss, MD, Ed Silverman, MD, PhD, Homer A Boushey, MD, Jean G Ford, MD and Dean Sheppard, MD for all of their effort towards the creation of the GALA Study. We are also indebted to Dr Mark Shriver for providing ancestral allele frequency data. This work was supported by National Institutes of Health (HL078885, HL088133, U19 AI077439, ES015794), Flight Attendant Medical Research Institute (FAMRI), and the RWJ Amos Medical Faculty Development Award to EGB, American Thoracic Society 'Breakthrough Opportunities in Lung Disease' (BOLD) Award and Tobacco-Related Disease Research Program New Investigator Award (15KT-0008) to SC, Beatriu de Pinos Postdoctoral Grant (2006 BP-A 10144) to MV, the Ernest S Bazley Grant to PCA, and the Sandler Center for Basic Research in Asthma and the Sandler Family Supporting Foundation.

Author information

Authors and Affiliations

Institute for Human Genetics, University of California, San Francisco, 513 Parnassus Ave, San Francisco, CA, 94143, USA
Neil Risch, Marc Via, Analabha Basu, Ronnie Sebro, Shannon Thyne, Elad Ziv & Esteban Gonzalez Burchard
Division of Research, Kaiser Permanente, 2000 Broadway, Oakland, CA, 94612, USA
Neil Risch
Department of Epidemiology and Biostatistics, University of California, San Francisco, 185 Berry Street, San Francisco, CA, 94107, USA
Neil Risch
Department of Medicine, University of California, San Francisco, 1550 4th St, San Francisco, CA, 94143, USA
Shweta Choudhry, Marc Via, Celeste Eng, Elad Ziv & Esteban Gonzalez Burchard
Biomedical Genomics Center, University of Minnesota, 426 Church St, Minneapolis, MN, 55455, USA
Kenneth Beckman
Instituto Nacional de Enfermedades Respiratorias, Calzada de Tlalpan 4502, Col. Seccion XVI, CP 14080, Tlalpan, Distrito Federal, Mexico
Rocio Chapela
Centro de Neumologia Pediatrica, CSP, 735 Ave Ponce de Leon, San Juan, 00917, Puerto Rico
Jose R Rodriguez-Santana
Veterans Caribbean Health Care System, 10 Casia St, San Juan, 00921, Puerto Rico
William Rodriguez-Cintron
Division of Allergy-Immunology, Feinberg School of Medicine, Northwestern University, 676 N St Clair St, Chicago, IL, 60611, USA
Pedro C Avila

Authors

Neil Risch
View author publications
You can also search for this author in PubMed Google Scholar
Shweta Choudhry
View author publications
You can also search for this author in PubMed Google Scholar
Marc Via
View author publications
You can also search for this author in PubMed Google Scholar
Analabha Basu
View author publications
You can also search for this author in PubMed Google Scholar
Ronnie Sebro
View author publications
You can also search for this author in PubMed Google Scholar
Celeste Eng
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Beckman
View author publications
You can also search for this author in PubMed Google Scholar
Shannon Thyne
View author publications
You can also search for this author in PubMed Google Scholar
Rocio Chapela
View author publications
You can also search for this author in PubMed Google Scholar
Jose R Rodriguez-Santana
View author publications
You can also search for this author in PubMed Google Scholar
William Rodriguez-Cintron
View author publications
You can also search for this author in PubMed Google Scholar
Pedro C Avila
View author publications
You can also search for this author in PubMed Google Scholar
Elad Ziv
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Gonzalez Burchard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neil Risch.

Additional information

Authors' contributions

NR conceived of the assortative mating study, performed the statistical analyses and drafted the manuscript. SC contributed to the statistical analyses and manuscript writing. MV contributed to the drafting of the manuscript. AB contributed to the data analysis. RS contributed to the analytical theory behind the analyses. CE participated in the genotyping of study subjects. KB oversaw the genotyping of study subjects. ST participated in study subject recruitment. RC participated in subject recruitment and assessments. JRR-S participated in subject recruitment and assessments. WR-C participated in subject recruitment and assessments. PCA participated in subject recruitment and assessments. EZ contributed to the development and analysis of the ancestry informative markers. EGB is the creator of GALA and had overall responsibility for study design and implementation, including subject recruitment and assessment and genotyping, and also contributed to drafting of the manuscript.

Electronic supplementary material

13059_2009_2282_MOESM1_ESM.DOC

Additional data file 1: Table S1: within spouse correlations in ancestry. Table S2: t-tests of ancestry differences between spouses and between recruitment sites. Table S3: mean (standard deviation) ancestry by socioeconomic status. Table S4: regression of wife's IA on husband's IA and socioeconomic status. Table S5: allele frequency difference chi-square tests between sites and spouses. Table S6: regression of chi-square for Mexico versus US allele frequency difference on δ²N*/p*q*. Table S7: regression of LD chi-square tests on (δ₁δ₂)²/pqrs. Table S8: outlier marker pairs from regressions on D. Table S9: list of ancestry informative markers used in the current study. (DOC 262 KB)

13059_2009_2282_MOESM2_ESM.DOC

Additional data file 2: Figure S1: Q-Q plot of residuals from regressions of allelic correlations F₁ and F₂ for (a) Mexicans and (b) Puerto Ricans. Figure S2: Q-Q plot of residuals from regression analysis of the linkage disequilibrium parameter D. (DOC 184 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Risch, N., Choudhry, S., Via, M. et al. Ancestry-related assortative mating in Latino populations. Genome Biol 10, R132 (2009). https://doi.org/10.1186/gb-2009-10-11-r132

Download citation

Received: 02 September 2009
Revised: 17 October 2009
Accepted: 20 November 2009
Published: 20 November 2009
DOI: https://doi.org/10.1186/gb-2009-10-11-r132

Ancestry-related assortative mating in Latino populations

Abstract

Background

Results

Conclusions

Background

Results

Discussion

Conclusions

Materials and methods

Subjects

Assessment of socioeconomic status

Selection of ancestry informative markers

Genotyping

Analytic methods

Estimation of ancestry

t-tests

Interclass correlations

Single locus analyses

Pairwise locus analyses

Linear regressions to estimate effects of ancestry assortative mating

Decrease of ancestry variance over time

Additional data files

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us