Sample
The Norwegian Mother, Father and Child Cohort Study (MoBa [28]) is a prospective population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health. Pregnant women were recruited from across Norway from 1999 to 2008. The women consented to initial participation in 41% of the pregnancies. The total cohort includes 114,500 children, 95,200 mothers and 75,200 fathers. To date, 98,110 individuals who are part of a trio (both parents and a child) from MoBa have been genotyped (Additional file 1: Table S1). As part of the Intergenerational Transmission of Risk (ITOR) subproject, MoBa has also been linked to Norwegian registry data containing pedigree and twin zygosity information. Version 11 of the quality-assured MoBa data files were used, released in 2018.
RDR and pedigree analyses required slightly different subsamples of MoBa families. For RDR, we used genotyped trios with available child anxiety and depression data. Siblings in the child generation were removed so that trios were independent families. For pedigree analyses, we retained all available cousins, siblings, half-siblings and twins in the child generation with anxiety and depression data, without requiring the presence of genotype data. Since only a subset of the families in MoBa was genotyped, all individuals included in genomic analyses were also within the sample used for pedigree analyses.
Measures
Our two outcome variables are well-validated and reliable quantitative measures of childhood anxiety and depression symptoms at age 8. Anxiety was measured using the 5-item version of the Screen for Child Anxiety Related Disorders (SCARED) [29]. Depression was measured using the 13-item Short Moods and Feelings Questionnaire (SMFQ) [30]. Both questionnaires were rated by mothers using three-point Likert response scales. Phenotypes were regressed on child sex prior to analyses.
To test whether measured parent anxiety and depression symptoms explained any genetic nurture effects, we used mother-rated Hopkins Symptoms Checklist (SCL-8) [31]. The SCL-8 assesses self-report anxiety and depression symptoms experienced during the last 2 weeks. To obtain a reliable measure of maternal symptoms that children are consistently exposed to, we constructed a common factor of SCL-8 scores at five time-points: 15 weeks of pregnancy, 30 weeks of pregnancy, child age 6 months, 18 months, and 8 years. As has been demonstrated with childhood data [32], a stable factor composed of measurements at multiple time points better captures a heritable core trait than scores at a single time point. Applying this approach to assess maternal symptoms leads to a measure that captures stable exposure to maternal depression symptoms, rather than exposure associated with temporary fluctuations. Mediation of any genetic nurture effect by stable maternal anxiety and depression symptoms would be indicated by an observed change in the genetic nurture point estimate when using the factor score as a covariate.
Genotype quality control
The current MoBa genomic dataset comprises imputed genetic data for 98,110 individuals (~ 32,000 parent-offspring trios), derived from nine batches of participants, who make up four study cohorts. Within each batch, parent and offspring genetic data were quality controlled separately. Quality control exclusion criteria for individuals were genotyping call rate < 95% or autosomal heterozygosity > 4 standard deviations from the sample mean. Quality control exclusion criteria for SNPs (single nucleotide polymorphisms) were ambiguous (A/T and C/G), genotyping call rate < 98%, minor allele frequency < 1%, or Hardy-Weinberg equilibrium P value < 1 × 10–6. Population stratification was assessed, using the HapMap phase 3 release 3 as a reference, by principal component analysis using EIGENSTRAT version 6.1.4. Visual inspection identified a homogenous population and individuals of non-European ancestries were removed based on principal component analysis of markers overlapping with available HapMap markers. The parent and offspring datasets were then merged into one dataset per genotyping batch, keeping only the SNPs that passed quality control in both datasets. Phasing was conducted using Shapeit 2 release 837 and the duoHMM approach was used to account for the pedigree structure. Imputation was conducted using the Haplotype reference consortium (HRC) release 1–1 as the genetic reference panel. The Sanger Imputation Server was used to perform the imputation with the Positional Burrows-Wheeler Transform (PBWT). The phasing and imputation were conducted separately for each genotyping batch. Additional file 1 Table S1 contains details of the numbers of SNPs and individuals in each batch. More detailed information about the cohorts, quality control and imputation can be found at https://github.com/folkehelseinstituttet/mobagen.
We conducted post-imputation quality control, selecting SNPs meeting the following criteria: high imputation confidence scores (INFO > 0.8 on average across batches), minor allele frequency > 0.05, Hardy-Weinberg equilibrium p > 1 × 10–6, non-multiallelic, and non-duplicated.
Before calculating relatedness matrices for RDR, we removed individuals who were not part of a complete genotyped trio, restricted to one child per family so that pairs of focal individuals did not share a home environment and pruned down to 451,442 variants in approximate linkage equilibrium using an r2 threshold of 0.5 (to reduce the size of the matrices and therefore the computational burden of the analyses).
After data management, a final sample of 25,828 genotyped parent-child trios was identified. Almost 11,600 of these trios also had child anxiety and depression symptom data.
Statistical analyses
Genomic analysis of trios: relatedness disequilibrium regression (RDR)
The RDR method [25] allows the estimation of parent and child genetic effects on traits. This is achieved by extending a standard genomic method for estimating heritability—single-component GREML (Genomic-Relatedness based restricted Maximum-Likelihood) [33]—to include individuals’ parents. Standard GREML estimates the variance explained by common SNPs by comparing a matrix of pairwise genomic similarity for unrelated individuals across genotyped SNPs to a matrix of their pairwise phenotypic similarity, using a random-effects mixed linear model. Instead of using the random variation in genetic similarity among unrelated individuals, RDR estimates heritability by capitalising on the random variation in genetic similarity between pairs of individuals conditional on their parents’ genetic similarity, which arises through random segregation of alleles when gametes are formed, and is independent of environmental factors.
There are two versions of RDR: one uses identity-by-descent (IBD) relatedness, which distinguishes parts of the genome that are inherited from common ancestors, and the other uses common SNP-based relatedness. We used the SNP version since it has similar properties to the IBD version but has greater statistical power [25]. Rather than estimating a single genetic variance component, RDR estimates three. The first estimates the direct effect of children’s own genetic variation on their trait. This is independent of the effect of being reared by biological parents. Importantly, a direct genetic effect is only direct in the sense that it does not stem from another individual’s genotype. Notably, mechanisms by which individuals evoke and select environments based on their genotype are essential in how genes lead to phenotypes [34], and these are included in estimates of direct genetic influence.
The second variance component estimates the effect of parent genetics on the child trait, controlling for child genetic effects: ‘genetic nurture’. Any parent genetic effect over and above child-driven direct effects must be an indirect genetic effect, where parents’ genetics affect child traits by influencing parent behaviours and the rearing environment they provide. Notably, it is assumed that genetic nurture effects are from parents (not siblings) and that mating in the population is random. To the extent that these assumptions do not hold, the genetic nurture variance will be biased. Non-random mating would magnify the genetic nurture variance because it induces correlations between causal alleles across the genome, most importantly between transmitted and non-transmitted parts of the parental genomes [26].
The third component captures variance in the offspring phenotype attributable to covariance between the direct and nurturing genetic effects. This somewhat abstract variance component is easier to understand when considering the conditions for the estimate to be zero. Specifically, the direct-nurturing genetic covariance would not explain any phenotypic variation if only one generation contributes genetic effects to the child trait, or if different SNPs contribute to child and parent genetic influences, such that loci have only either direct or indirect effects. Covariance between direct and nurturing genetic effects can be thought of as a ‘passive gene-environment correlation’. This refers to a magnification of the environmental effect of genetically influenced parent behaviour, which happens because children passively inherit and are directly influenced by that same genetic material (see Additional file 1 Figure S1 for detail on this concept).
Finally, the residual component captures environmental effects on the trait of interest that are not correlated with measured parent genetic variation, the effects of variants not tagged by genotyped SNPs (e.g. rarer), and measurement error.
In practice, the variance components are estimated by regressing phenotypic resemblance on three genomic relatedness matrices simultaneously. The first is similar to the matrix used in standard GREML: the genome-wide genetic relatedness of the children in the sample. The second and third represent the genetic relatedness of the parents and the genetic covariance between children and parents.
Notably, the genotypes of mothers and fathers are combined to allow estimation of the effect of both parents. We calculated parental genotypes by summing the unnormalised maternal and paternal genotype matrices. We then standardised parental genotypes to have a mean zero and variance two. In an outbred random-mating population, the variance for the parental genotypes is twice that of the offspring genotype as it is the sum of maternal and paternal genotypes [25]. Notably, the summing of maternal and paternal genotypes contrasts to a similar model, M-GCTA [35], which does not involve paternal data but estimates the effects of the mother and child genomes and of their covariance.
All RDR analyses included 10 ancestry principal components and genotyping batch, both derived from the child generation, as covariates. Analyses were performed in the GCTA software. We used the --reml-no-constrain flag to allow components to take negative as well as positive values, given the theoretical and empirical evidence for negative covariance between direct and indirect genetic effects on complex traits. This can happen if a proportion of parental genetic variants associated with lower child trait scores are associated with higher child trait scores when present in the child genome. For example, studies have identified loci exert opposing maternal and fetal effects on human birth weight [36, 37].
To test whether any genetic nurture effect was partially explained by parent anxiety and depression symptoms, we re-ran the RDR models adding a measure of stable maternal anxiety and depression symptoms as a covariate. As mentioned above, this longitudinally-derived measure is preferable to time-specific measures as it captures a more reliable core trait that children are consistently exposed to. In addition, the measure maximises sample size and minimises bias in maternal reporting due to contemporaneous collection of maternal self- and child-report. We also tested the individual time-specific maternal measures as covariates.
To test whether any of the variance components were biased because some children were genotyped in a different batch to their parents, we re-ran the RDR models using only individuals who were genotyped as a complete trio (90% of the analysis sample).
To test the sensitivity of the genetic nurture estimate to a more stringent exclusion of relatives, we restricted all GRMs to relatedness < 0.1 and re-ran the RDR models.
Finally, to estimate standard SNP heritability, we ran single component GREML models using unrelated child genotypes. This is equivalent to running RDR with the genetic nurture and direct-nurturing genetic covariance components set to zero.
Classical pedigree modelling
To compare RDR results to a traditional quantitative genetic (non-genomic) design, we implemented a univariate pedigree model [38]. As in the classic twin design, this model allows estimation of genetic, shared environmental, and non-shared environmental (residual) influences on anxiety and depression symptoms at age 8. The model used phenotypic correlations among twins, siblings, half-siblings and cousins in the child generation, to derive estimates based on the following specifications: genetic correlations (assumed, not directly measured) are 1.00, 0.50, 0.25 and 0.125 for identical twins, non-identical twins/siblings, maternal half-siblings and cousins, respectively, and shared environmental correlations are 1.00 for all siblings and 0.00 for cousins. Sample sizes for pairs of relatives with available 8-year anxiety and depression data were 233 identical twins, 11,375 non-identical twins and siblings, 175 maternal half-siblings and 15,227 cousins, giving an overall sample of 27,010 pairs.
Comparison of variance components from RDR and pedigree models
Before comparing RDR and pedigree results, it is essential to note the differences between variance components derived from the models. First, with respect to child genetic effects, RDR only captures additive effects tagged by measured common variants, but the pedigree genetic component also includes non-additive genetic effects, and the effects of rare variants not tagged by the common variants included on the SNP array. Second, regarding family environmental effects, the shared environment component in the pedigree model is broader than the genetic nurture estimate using RDR. The shared environment captures at least three sources of variance that genetic nurture does not parent behaviours not tagged by common SNPs, shared environments beyond parent behaviours, and effects of any correlation between the shared environment and genetic components (passive gene-environment correlation). Third, whereas pedigree and twin models assume passive gene-environment correlation is absent (i.e. that genetic and environmental effects are uncorrelated), RDR can directly estimate it as the covariance between genetic and environmental (genetic nurturing) effects.
Additional file 1 Figure S1 and its accompanying text explore overlaps and distinctions between the concepts of genetic nurture, passive gene-environment correlation and shared environment.
Software
Genome-wide relatedness matrices were constructed using python, and RDR models were run using GCTA [33]. Pedigree analyses were conducted in R using the structural equation modelling package OpenMx [39].