Study population
All data included in this analysis were obtained from UK Biobank. The UK Biobank is a large-scale cohort, including 502,507 participants recruited from 22 assessment centers throughout the UK during 2006–2010 [19]. All participants completed a written informed consent form, a self-completed touch-screen questionnaire, a brief computer-assisted interview, and physical measures. Meanwhile, biological samples including blood were collected through strict quality control during the baseline period from different centers [19].
In this study, we excluded 46,533 patients with cancer at baseline, 30,035 participants without CRP information, and 4975 individuals without genetic data. Finally, a total of 420,964 participants were included in this study.
Assessment of exposure, outcome, and covariates
The CRP concentration was measured by immunoturbidimetric-high sensitivity analysis on Beckman Coulter AU5800 at baseline, with a range from 0.08 to 79.96 mg/L. The outliers were capped by the 1st percentile (Q1) or 99th percentile (Q99) of CRP level. A special detail of collection and processing of blood sample has been described elsewhere [20].
Cancer outcomes were defined based on the ICD10 coding and obtained from the national cancer registry. The follow-up time referred to the period from baseline enrollment to the first diagnosis of cancer, the first registration of cancer or loss, or end of follow-up (31 October 2015 for Scotland and 31 March 2016 for England and Wales). After excluding site-specific cancer with less than 100 incident cases, we finally included overall cancer and 21 site-specific cancers in this study (Additional file 2: Table S1).
Variables that might affect the association between CRP and cancer risk based on previous studies were considered as covariates in our analysis, including age, family cancer history, body mass index (BMI), height, smoking status, alcohol use, and physical activity for both male and female, as well as menopausal, oral contraceptive use, and hormone replacement therapy for female [21]. Besides, we also included sex, ethnic, education, Townsend deprivation index, and assessment center as covariates. These covariates were collected using a touchscreen questionnaire or measured by trained staffs at baseline, and no covariates had more than 2.0% of missing values (Additional file 2: Table S2). The missing values on continued covariates were replaced with the sex-specific mean value of each variable. And missing values on categorical covariates were considered as “unknown” category.
Genotyping
Genome-wide genotyping was performed using the Affymetrix UK BiLEVE Axiom array or the Affymetrix UK Biobank Axiom array. The two arrays share 95% of the markers. Imputation was performed with SHAPEIT3 and IMPUTE3 based on merged UK10K and 1000 Genomes phase3 panels [22]. Markers with minor allele frequency > 0.001 and Info score > 0.3 were retained in UK Biobank. Detail information on genotype quality, quality control, and genotype imputation has been described in previous study [22].
Genetic instrument for serum CRP level
A total of 52 susceptibility loci associated with serum CRP concentration have been identified in a previous GWAS [16], which was used to construct the genetic instrument of CRP by calculating the weighted genetic risk score (wGRS). The genetic instrument was strongly associated with serum CRP concentration with an F statistic of 216 and could explain 2.6% of the variance of CRP in this study (Additional file 1). In addition, five SNPs associated with both colorectal cancer and serum CRP concentration were further excluded in the sensitivity analysis to evaluate the validity of the instruments (Table S3 in the Additional file 2).
Statistical analysis
Cox proportional hazards regression was conducted to assess the association between CRP and cancer risk. Schoenfeld residuals and log-log inspection were used to test the assumption of proportional hazards. The time scale in the Cox PH regression was from the enrolment until the time of cancer diagnosis, death, withdrawal from study, or the end of follow-up, whichever came first. We estimated the hazard ratio (HR) associated with CRP (per 1 mg/L increase) for each site-specific cancer in all eligible participants and re-evaluated the HRs by dividing participants into low CRP level (≤ 3mg/L) and high CRP level (>3 mg/L) [23]. We further applied restricted cubic spline analysis to explore the possibly non-linear association shapes between serum CRP concentration and cancer risk. To balance the best fitting and over fitting in the splines for cancer, the number of knots were tested from three to five, and we chose that with the lowest value of Akaike information criterion (AIC); if the same AIC was observed for different knots, the lowest number of knots was chosen [24]. Except for lung cancer (4 knots at the 5th, 35th, 65th, and 95th percentile of CRP), we fitted the models of overall cancer and other site-specific cancer with 3 knots at the 10th, 50th, and 90th percentile of CRP. We used a likelihood ratio test to calculate P-value for non-linearity by comparing the model with only a linear term against the model with linear and cubic spline terms [25]. We further performed subgroup analyses to assess potential effect modification by age, sex, and smoking status using likelihood ratio tests. To examine the robustness of our results, we performed several sensitivity analyses: (1) re-analysis the association between log-transformed CRP level and cancer risk, (2) exclusion or only inclusion of participants diagnosed with cancer within the first two follow-up to avoid the potential reverse causality, (3) exclusion of participants with CRP level of >10 mg/L to avoid the effect of acute serious infection, (4) additionally adjusted for cardiovascular disease and diabetes, and (5) additionally adjusted for regular use of aspirin and ibuprofen.
The potential linear and non-linear causal associations between CRP concentration and cancer risk were simultaneously evaluated in this study. To evaluate the potential linear associations, we performed a two-stage MR analysis. In the first stage, we estimate the fitted values using a regression of CRP against wGRS, and in second stage, the predicted value was further fitted in a Cox regression model with cancer risk. Covariates, including age at baseline, sex, and the top 10 genetic principal components, were adjusted in both stages. In addition, several sensitivity analyses were also performed in the analysis: (1) we re-estimated the causal associations between log-transformed CRP level and cancer risks, (2) two-stage MR was only conducted in participants of British ancestry, and (3) rs2794520, the strongest SNP in previous GWAS, was used as an instrument variable to minimize the possibility of introducing horizontal pleiotropy [16].
For non-linear MR analysis, the sample was stratified into three strata according to residual CRP (the CRP minus the genetically predicted CRP). Next, we assessed the exposure-outcome associations using the piecewise linear method within each stratum, by contributing a line piece whose gradient is the LACE [17]. Two tests were then applied for non-linear hypothesis: (1) a heterogeneity test using Cochran’s Q statistic to analyze the difference between the LACE estimates and (2) a trend test, which conducted a meta-regression of the LACE estimates against the mean value of the CRP in each stratum.
All analyses were performed with R (version 3.6.0), and the two-sided P value of <0.05 was considered as statistically significant. To avoid the inflation of false-positive findings, we calculated the false-discovery rate (FDR) adjusted P values across the main analyses. Linear and non-linear MR analyses were conducted using the “MendelianRandomization” and “nlmr” packages, respectively.