Proteomic insights into the associations between obesity, lifestyle factors, and coronary artery disease

Background We aimed to investigate the protein pathways linking obesity and lifestyle factors to coronary artery disease (CAD). Methods Summary-level genome-wide association statistics of CAD were obtained from the CARDIoGRAMplusC4D consortium (60,801 cases and 123,504 controls) and the FinnGen study (R8, 39,036 cases and 303,463 controls). Proteome-wide Mendelian randomization (MR) analysis was conducted to identify CAD-associated blood proteins, supplemented by colocalization analysis to minimize potential bias caused by linkage disequilibrium. Two-sample MR analyses were performed to assess the associations of genetically predicted four obesity measures and 13 lifestyle factors with CAD risk and CAD-associated proteins’ levels. A two-step network MR analysis was conducted to explore the mediating effects of proteins in the associations between these modifiable factors and CAD. Results Genetically predicted levels of 41 circulating proteins were associated with CAD, and 17 of them were supported by medium to high colocalization evidence. PTK7 (protein tyrosine kinase-7), RGMB (repulsive guidance molecule BMP co-receptor B), TAGLN2 (transgelin-2), TIMP3 (tissue inhibitor of metalloproteinases 3), and VIM (vimentin) were identified as promising therapeutic targets. Several proteins were found to mediate the associations between some modifiable factors and CAD, with PCSK9, C1S, AGER (advanced glycosylation end product-specific receptor), and MST1 (mammalian Ste20-like kinase 1) exhibiting highest frequency among the mediating networks. Conclusions This study suggests pathways explaining the associations of obesity and lifestyle factors with CAD from alterations in blood protein levels. These insights may be used to prioritize therapeutic intervention for further study. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-023-03197-8.


INTRODUCTION 2
Background Explain the scientific background and rationale for the reported study.What is the exposure?Is a potential causal relationship between exposure and outcome plausible?Justify why MR is a helpful method to address the study question

5-6
The application of Mendelian randomization (MR), specifically leveraging genetic variants predicting the concentration of circulating proteins, could strengthen potential for causal inference.Therefore, we conducted the current study to explore the causal effects of circulating proteins on CAD and thus to investigate potential therapeutic targets by employing a proteome-wide MR analysis.
3 Objectives State specific objectives clearly, including pre-specified causal hypotheses (if any).
State that MR is a method that, under specific assumptions, intends to estimate causal effects 6 Here, we conducted a study to identify blood proteins associated with CAD by employing a proteome-wide MR approach and further explored the mediating network involving modifiable factors, proteins, and CAD, thereby contributing to a deeper understanding of the pathogenesis.

Study design and data sources
Present key elements of the study design early in the article.Consider including a table listing sources of data for all phases of the study.For each data source contributing to the analysis, describe the following: a) Setting: Describe the study design and the underlying population, if possible.Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection, when available.

6-7
A comprehensive investigation employing an integrated genetic approach was designed (Figure 1).b) Participants: Give the eligibility criteria, and the sources and methods of selection of participants.Report the sample size, and whether any power or sample size calculations were carried out prior to the main analysis

7-9
We obtained summary-level statistics from a comprehensive protein quantitative trait loci (pQTL) study conducted in a population of 35,559 individuals of Icelandic descent.Additionally, the online tool known as mRnd (https://shiny.cnsgenomics.com/mRnd/)was utilized for the power calculation, and the statistical power exceeding 80% was deemed satisfying.The included genome-wide association studies (GWASs) had obtained the necessary ethical approvals from the relevant committees and written informed consent was obtained from all individuals involved in these studies 5 Assumptions Explicitly state the three core IV assumptions for the main analysis (relevance, independence and exclusion restriction) as well assumptions for any additional or sensitivity analysis

6-7
In detail, the validity of two-step MR results also relies on three fundamental assumptions: (i) relevance assumption, i.e., the genetic variants should exhibit a strong association with the exposure, (ii) independence assumption, i.e., the genetic variants should be independent of potential confounding variables; and (iii) exclusion restriction, i.e., the genetic variants should solely impact the outcome through the exposure. 6

Statistical methods: main analysis
Describe statistical methods and statistics used a) Describe how quantitative variables were handled in the analyses (i.e., scale, units, model) 7 The aptamers further underwent adjustment for age and sex by applying an adjusted rank-inverse normal transformation to their levels.Subsequently, the residuals were also subjected to standardization using rank-inverse normal transformation and served as phenotypes in the genome-wide association analyses.
b) Describe how genetic variants were handled in the analyses and, if applicable, how their weights were selected 8 Genetic IVs for these factors were constructed by selecting single nucleotide polymorphisms (SNPs) identified at the genome-wide significance threshold (p < 5×10−8) and in low linkage disequilibrium (r2 < 0.01).The data sources of used GWAS datasets and detailed definitions of modifiable factors were provided in the Supplementary 7

Assessment of assumptions
Describe any methods or prior knowledge used to assess the assumptions or justify their validity 7 To satisfy the first assumption, our IV selection was confined to SNPs achieving the genome-wide significance threshold.The second assumption is usually satisfied and a merit of the MR approach since genetic variants are randomly assorted at conception and therefore unassociated with confounders (e.g., environmental and self-adopted factors).The most challenge for MR analysis is the third assumption.For protein-wide MR analysis, this assumption is likely to be satisfied since we selected cis-SNPs with limited pleiotropic effects as IVs. 8

Sensitivity analyses and additional analyses
Describe any sensitivity analyses or additional analyses performed (e.g.  2 12

Assessment of assumptions
a) Report the assessment of the validity of the assumptions 7 For MR analysis on modifiable risk factors, we conducted several supplementary analyses to fortify the resilience of the primary results as well as to detect and correct for potential horizontal pleiotropy.To examine potential bias due to weak instrument, we have estimated F-statistic to assess the strength of the used genetic instrumental variables.The SNP with the F-statistic < 10 was removed from the analysis.Additionally, the online tool known as mRnd (https://shiny.cnsgenomics.com/mRnd/)was utilized for the power calculation, and the statistical power exceeding 80% was deemed satisfying.
b) Report any additional statistics (e.g., assessments of heterogeneity across genetic variants, such as I 2 , Q statistic or E-value) 7 For MR analysis on modifiable risk factors, we conducted several supplementary analyses to fortify the resilience of the primary results as well as to detect and correct for potential horizontal pleiotropy.
13 Sensitivity analyses and additional analyses a) Report any sensitivity analyses to assess the robustness of the main results to violations of the assumptions 13 Consistent and stable association patterns were observed in the sensitivity analyses employing various statistical methods (Supplementary 19

Data and data sharing
Provide the data used to perform all analyses or report where and how the data can be accessed, and reference these sources in the article.Provide the statistical code needed to reproduce the results in the article, or report whether the code is publicly accessible and if so, where 19 All the data used in the present study had been publicly available.The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Table 1
-3. c) Describe the MR estimator (e.g.two-stage least squares, Wald ratio) and related statistics.Detail the included covariates and, in case of two-sample MR, whether the same covariate set was used for adjustment in the two samples e) If applicable, indicate how multiple testing was addressed 11 The statistical significance threshold for the link of lifestyle factors with CAD risk was set at a twosided p value of <0.003 (=0.05/19 tests).
comparison of effect estimates from different approaches, independent replication, bias analytic techniques, validation of instruments, simulations) Provide justification of the similarity of the genetic variant-exposure associations between the exposure and outcome samples ii.Provide information on the number of individuals who overlap between the exposure and outcome studies (https://sb452.shinyapps.io/overlap/) to assess the bias from sample overlap and the corresponding type 1 error rate (Supplementary Table 4-5).[24] d) For two-sample MR: 9 For the analysis with the exposure and outcome sample partially overlapped, we used an online tool i. c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period NA d) Consider plots to visualize results (e.g.forest plot, scatterplot of associations between genetic variants and outcome versus between genetic variants and exposure) / Figure

Table 7
Discuss underlying biological mechanisms that could drive a potential causal relationship between the investigated exposure and the outcome, and whether the gene-environment equivalence assumption is reasonable.Use causal language carefully, clarifying that IV estimates may provide causal effects only under certain assumptions Describe sources of funding and the role of funders in the present study and, if applicable, sources of funding for the databases and original study or studies on which the present study is based 20 This work was supported by grants from the Key Laboratory of Precision Medicine for Atherosclerotic Diseases of Zhejiang Province, China (Grant No. 2022E10026), National Natural Science Foundation of China (82200489), the Major Project of Science and Technology Innovation 2025 in Ningbo, China (Grant No. 2021Z134), the Key research and development project of Zhejiang Province, China (Grant No. 2021C03096).