Are immune-related adverse events associated with the efficacy of immune checkpoint inhibitors in patients with cancer? A systematic review and meta-analysis

Background A number of studies have reported an association between the occurrence of immune-related adverse events (irAEs) and clinical efficacy in patients undergoing treatment with immune checkpoint inhibitors (ICIs), but the results remain controversial. Methods Under the guidance of a predefined protocol and Preferred Reporting Items for Systematic Reviews and Meta-analyses statement, this meta-analysis included cohort studies investigating the association of irAEs and efficacy of ICIs in patients with cancer. The primary outcome was overall survival (OS), and the secondary outcome was progression-free survival (PFS). Subgroup analyses involving the cancer type, class of ICIs, combination therapy, sample size, model, landmark analysis, and approach used to extract the data were performed. Specific analyses of the type and grade of irAEs were also performed. Results This meta-analysis included 30 studies including 4971 individuals. Patients with cancer who developed irAEs experienced both an OS benefit and a PFS benefit from ICI therapy compared to patients who did not develop irAEs (OS: hazard ratio (HR), 0.54, 95% confidence interval (CI), 0.45–0.65; p < 0.001; PFS: HR, 0.52, 95% CI, 0.44–0.61, p < 0.001). Subgroup analyses of the study quality characteristics and cancer types recapitulated these findings. Specific analyses of endocrine irAEs (OS: HR, 0.52, 95% CI, 0.44–0.62, p < 0.001), dermatological irAEs (OS: HR, 0.45, 95% CI, 0.35–0.59, p < 0.001), and low-grade irAEs (OS: HR, 0.57, 95% CI, 0.43–0.75; p < 0.001) yielded similar results. The association between irAE development and a favorable benefit on survival was significant in patients with cancer who were undergoing treatment with programmed cell death-1 inhibitors (OS: HR, 0.51, 95% CI, 0.42–0.62; p < 0.001), but not cytotoxic T-lymphocyte antigen-4 inhibitors (OS: HR, 0.89, 95% CI, 0.49–1.61; p = 0.706). Additionally, the association was significant in patients with cancer who were treated with ICIs as a monotherapy (OS: HR, 0.53, 95% CI, 0.43–0.65; p < 0.001), but not as a combination therapy (OS: HR, 0.62, 95% CI, 0.36–1.05; p = 0.073). Conclusions The occurrence of irAEs was significantly associated with a better ICI efficacy in patients with cancer, particularly endocrine, dermatological, and low-grade irAEs. Further large-scale prospective studies are warranted to validate our findings. Systematic review registration PROSPERO CRD42019129310.


Background
Immune checkpoint inhibitors (ICIs) targeting cytotoxic T-lymphocyte antigen-4 (CTLA-4) or programmed cell death-1 (PD-1) pathways are reshaping the landscape of cancer therapy, yielding unprecedented clinical success in treating multiple cancer types [1]. By blocking the inhibitory pathway between T lymphocytes and tumor cells or antigen-presenting cells, ICIs aim to release the brake of the anergized T cells and reactivate their antitumor cytolytic function [2]. Monoclonal antibodies targeting CTLA-4 and PD-1/programmed cell death ligand-1 (PD-L1) axes are currently two major categories applied in cancer immunotherapies. Immune-related adverse events (irAEs) are a unique spectrum of side effects of ICIs that resemble autoimmune responses. irAEs affect almost every organ of the body and are most commonly observed in the skin, gastrointestinal tract, lung, and endocrine, musculoskeletal, and other systems [3]. Since irAEs occur via a process of immune activation, suggesting that the exhausted immune cells have been reinvigorated and attack not only tumor cells but also normal tissue, theoretically, the occurrence of irAEs may indicate a better response to ICI therapy. Nevertheless, whether irAE development is predictive of the ICI response remains controversial.
A robust and precise systemic review is required to evaluate the association between irAE occurrence and the efficacy of ICIs.
Herein, we conducted a systemic review of the published studies to investigate the association between irAE occurrence and the efficacy of ICIs. We used a standard meta-analysis approach to obtain a statistical and comprehensive view of the association. Our study addressed the question using the PECO tools: (P) patients with cancer receiving ICIs, (E) occurrence of irAEs, (C) nonoccurrence of irAEs, (O) efficacy of ICIs (measured using different outcomes). To the best of our knowledge, the present study is the first meta-analysis to explore the association of the occurrence of irAE and the efficacy of ICIs by pooling the results of eligible studies collectively. Additionally, we separately pooled the predictive effects of different irAE types and irAE grades to investigate their specific roles in homogeneous settings.

Methods
This systematic review and meta-analysis was conducted under the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [37]. We prospectively registered the protocol in PROSPERO (ID: CRD42019129310). Additional information about the methods is provided in the appendix (Additional file 1: Supplementary Methods).

Literature search strategy
We retrieved articles from the PubMed, Embase, and Cochrane databases to identify studies that reported the association between irAE occurrence and the efficacy of ICIs in patients with cancer that were published from database inception to March 22, 2019. The key retrieval items included irAEs, PD-1, PD-L1, CTLA-4, efficacy, and cancer. No restriction for time was established, while language was confined to English. We also manually reviewed the citations of relevant reviews, editorials, and commentaries and included relevant studies to avoid omission. We performed an additional retrieval from the database inception to June 3, 2019, to identify recent published studies using the same procedure.

Inclusion and exclusion criteria
The following inclusion criteria were adopted: Studies not adhering to the inclusion criteria were excluded. Other exclusion criteria were as follows: Studies that reported adverse events that were not related to immune function Studies that reported only survival curves and p values, but not HRs, for the association between the occurrence of irAEs and the efficacy of ICIs For duplicate publications or overlapping study populations, we included only the most recent and complete report.

Data collection and quality assessment
Two researchers (X.Z. and Z.Y.) independently extracted data from the included publications in accordance with a predefined procedure. The data extracted included the author, publication year, area in which the population was located, trial design, criteria for grading irAEs, statistical model, variables for adjustment, landmark analysis, cancer type, agent, follow-up time, sample size, irAE type, grade of irAE, median irAE onset time, and HRs and 95% CIs of OS and PFS for global irAEs, organ-specific irAEs, and grade-specific irAEs. If a study reported both multivariate and univariate HRs, the former was extracted to avoid confounding. If a study reported both HRs with or without a landmark analysis, the former was chosen to avoid time-dependent bias.
The two researchers (X.Z. and Z.Y.) also independently reviewed the included publications to evaluate their methodological quality with the Newcastle-Ottawa scale (NOS) criteria [38]. Every included study was awarded a score ranging from 0 (poor methodological quality) to 9 (optimal methodological quality) points regarding the selection, comparability and outcomes of study cohorts. Any discrepancies were resolved by reaching a consensus with a third author (H.Y. or N.L.).

Data analyses
We utilized Stata 12.0 software (Stata Corporation, College Station, Texas, USA) and R gui software (version 3.4.4), with the forestplot_v.1.7.2 package for statistical analyses and plotting. The log HRs of irAEs versus non-irAEs and 95% CIs were adopted to aggregate the survival results. If a study reported only HRs and p values, but not 95% CIs, the conversion formula proposed by Altman et al. was utilized to calculate the 95% CIs [39]. If an HR of non-irAEs versus irAEs rather than the opposite comparison was reported, then an HR of irAEs versus non-irAEs was calculated by determining the reciprocal of original HR and corresponding CIs [40]. The χ 2 test and I 2 statistic were applied to estimate the between-study heterogeneity. Significant heterogeneity was indicated if p < 0.10 for the χ 2 test or I 2 > 50%, and a random effects model was applied to the pooled analysis [41]. Otherwise, we applied a fixed effects model [42]. For the sensitivity analysis, one study was sequentially omitted to judge the stability of the pooled results. Begg's test and Egger's test were utilized to identify publication bias [43,44]. For both tests, significant publication bias was considered when p < 0.05. Moreover, the "trim and fill" method was adopted to identify and adjust for potential sources of publication bias [45]. This method estimated possibly missing studies and incorporated these hypothetical studies into the original analysis to calculate an adjusted effect. For all analyses, a twosided p < 0.05 was considered representative of statistical significance.
We pooled the studies for organ-specific irAEs, lowgrade irAEs (grades 1-2), and severe-grade irAEs (grade greater than or equal to 3) if at least two studies were identified. We also pooled HRs from all studies to identify the overall effect. For the overall pooled analysis, if a study reported both HRs of all-grade irAEs and gradespecific irAEs, the former was selected; if a study reported both HRs of global irAEs and organ-specific irAEs, the former was selected; and if a study reported HRs of multiple organ-specific irAEs, but not of global irAEs, the analysis with the largest sample size of the irAE cohort was selected.
We performed predefined subgroup analyses to investigate the effects of irAEs on the efficacy of ICIs among different settings and potential sources of heterogeneity. We considered subgroups, including cancer type, class of ICIs, combination therapy, sample size, model, landmark analysis, and approach for data extraction, to the overall cohort. In addition, we performed a subgroup analysis of the class of ICIs in the melanoma cohort. We only analyzed subgroups containing at least two studies.

Literature search results
Our literature search retrieved 2236 studies, from which we collected 78 potentially eligible studies after screening the titles and abstracts. Finally, we selected 30 studies after a review of the full article . The reasons for exclusion were as follows: Twenty-five studies did not report data on OS or PFS, 9 did not report data of HRs, 1 was a duplicate publication, 1 used an improper control group, 1 reported non-ICI therapy, 1 was a published conference abstract, and 1 was a case report. The detailed retrieval process is shown in Fig. 1.

Characteristics of the identified studies
We retrieved information from 4971 individuals in this study. The characteristics of the 30 included studies are described in Table 1 and Additional file 1: Tables S1 and S2. These studies were performed in 10 countries. Sixteen studies analyzed patients with non-small cell lung cancer (NSCLC), 7 analyzed patients with melanoma, 2 analyzed patients with renal cell carcinoma (RCC), and 5 analyzed patients with multiple cancer types. Twenty-six studies adopted anti-PD-1 inhibitors, 3 adopted anti-CTLA-4 inhibitors, and 1 adopted anti-PD-1/PD-L1 inhibitors. The number of patients included in the survival analysis ranged from 18 to 613. Twenty studies reported extractable data on global irAEs, and 16 studies reported organspecific irAEs. Twenty-six studies reported HRs of OS and 22 studies reported HRs of PFS. Six studies utilized a landmark analysis, whereas 24 studies did not. Seventeen studies adopted a multivariate model to control for confounding factors, and 13 studies adopted a univariate model. Two studies were had a prospective cohort design, and 28 studies employed a retrospective cohort design. All studies enrolled patients with stage III or higher cancer, except two studies that did not report this information. Four studies included on-trial patients, 22 studies included off-trial patients, and 4 studies included both on-trial and off-trial patients. The median irAE onset time ranged from 4.2 to 20 weeks. Twenty-six studies adopted ICIs as monotherapy, and 4 studies adopted ICIs as combination therapy (2 with a peptide vaccine, 1 with radiotherapy, and 1 with vemurafenib). All studies adopted the Common Terminology Criteria for Adverse Events (CTCAE) to grade irAEs, with the exception of 3 studies that did not report this information. The NOS scores allocated for the included studies ranged from 4 to 8 points.
The HRs of OS for patients presenting with low and severe irAE grades were also analyzed. The occurrence of low-grade irAEs was significantly associated with a favorable OS in patients receiving ICIs (HR, 0.57, 95% CI, 0.43-0.75, p < 0.001), whereas the occurrence of severegrade irAEs did not display a significant association with OS (HR, 0.99, 95% CI, 0.43-2.25, p = 0.976). Significant heterogeneity was observed in severe-grade irAEs (I 2 = 83.4%, p < 0.001), but not in low-grade irAEs (Fig. 3).
The results of subgroup analyses were similar to OS (Additional file 1: Figure S3). The association of irAE occurrences with a reduced risk of progression in patients with cancer receiving ICIs was significant in each subgroup stratified by study quality characteristics. Nonetheless, the predictive effect of irAEs on a favorable PFS was not consistently significant in subgroups stratified by patient characteristics.

Sensitivity analysis and publication bias
In the sensitivity analysis, the pooled results for OS and PFS both remained significant, regardless of which study was deleted, indicating that the significant association between irAE occurrence and ICI efficacy in patients with cancer was robust (Additional file 1: Figure S4). Regarding the overall analysis, the Begg funnel plot for OS displayed evident asymmetry (p = 0.033), indicating that publication bias should be considered, although Egger's test showed no evidence of publication bias (p = 0.122) (Additional file 1: Figure S5A). Next, we used the trim and fill method to assess the effect of publication bias on the pooled results. However, no study was trimmed or filled in the output results, leaving the pooled result of OS unchanged, which supported the stability of results (Additional file 1: Figure S5A and Additional file 2). Because publication bias is generally caused by small-sized studies, restricting the pooled analysis to large-sized studies (≥100) might provide clues for the origin of Fig. 2 Forest plot (random effects model) of the association between immune-related adverse event development and overall survival. The sizes of the squares indicate the weight of the study. Abbreviations: HR, hazard ratio; irAEs, immune-related adverse events; non-irAEs, non-immunerelated adverse events. a Results for grade 1-2 immune-related adverse events (irAEs). b Results for grade 3-4 irAEs publication bias. Indeed, the pooled HR of OS in the large-sized studies was comparable to the overall effect (0.58 vs. 0.54) (Fig. 4). Notably, neither the Begg funnel plot (p = 0.721) nor Egger's test (p = 0.872) revealed publication bias for OS in large-sized studies, which further confirmed the stability of the OS results (Additional file 1: Figure S6). Regarding PFS, the Begg funnel plot displayed no obvious asymmetry (p = 0.180), indicating that no evident publication bias was detected, and Egger's test confirmed this finding (p = 0.134) (Additional file 1: Figure S7).

Principal findings and implications
Currently, a decision regarding whether the occurrence of irAEs is associated with ICI therapy remains controversial. To our knowledge, our study represents the largest and most comprehensive analysis of the association between irAEs and ICI efficacy performed to date. The conclusions listed below were drawn based on our results.
Generally, patients with cancer who developed irAEs experienced an increased OS and PFS compared with patients who did not develop irAEs. Regarding the irAE types, the survival benefit for patients who developed irAEs was observed in patients presenting endocrinal and dermatological abnormalities, but not in patients presenting a gastrointestinal, pulmonary, hepatobiliary, or musculoskeletal abnormality. The occurrence of low-grade irAEs, but not severegrade irAEs, was associated with better ICI efficacy in patients with cancer. The occurrence of irAEs was significantly associated with a favorable efficacy of PD-1 inhibitors, but not CTLA-4 inhibitors.
The mechanisms underlying the association between irAEs and survival benefits have not been completely elucidated. Antigen mimicry theory has been one of the most promising hypotheses. Preclinical data identified multiple epitopes that were shared in both melanoma and normal melanocytes [46,47]. The release of shared antigens by ICI therapy might result in the priming of a secondary immune response to host antigens, which was supported by the finding that T cell clones infiltrating irAE lesions and tumors were significantly overlapped among ICI-treated patients with melanoma and NSCLC [48]. Hence, the development of irAEs indicates a robust immune reaction towards both the tumor and healthy tissue, thereby predicting better treatment responses. Moreover, in addition to the antigen mimicry theory, the dysregulation of humoral immunity has been proposed as Fig. 3 Meta-analyses of the association between immune-related adverse event development and outcome. Abbreviations: HR, hazard ratio; irAEs, immune-related adverse events; non-irAEs, non-immune-related adverse events. Low grade indicates grades 1-2; severe grade indicates a grade greater than or equal to 3. a The HR was directly presented without pooling because only one study was available a possible explanation for the association. The PD-1 signaling pathway modulates B cell activation in both a T cell-dependent and T cell-independent manner [49,50]. According to the clinical evidence, thyroid dysfunction during ICI treatment is characterized by the production of anti-thyroid antibodies [12], suggesting that the presence of autoantibodies may account for the irAE-prognosis relationship.
Notably, in addition to overall irAEs, the favorable results remained significant for endocrine irAEs and dermatological irAEs, but not for gastrointestinal irAEs, pulmonary irAEs, hepatobiliary irAEs, and musculoskeletal irAEs. As the incidence of musculoskeletal irAEs is low [51], statistical significance may not have been reached due to the insufficient number of samples. Gastrointestinal irAEs are more frequently observed in response to anti-CTLA-4 treatment [52], and thus, the insignificance of pooled results for PFS might be explained by heterogeneity. However, the variation among other organ-specific irAEs is likely attributable to the clinical importance of different systems. The respiratory and hepatobiliary systems are the most commonly affected organs in patients who experienced fatal irAEs and received anti-PD-1/PD-L1 treatment [53], increasing the risk of mortality for the patient and leading to a poorer outcome due to the side effects of ICIs. Therefore, the proper management of irAEs is very important to maximize the benefits of ICIs.
The prognostic value of irAEs also differed in patients with heterogeneous irAE grades, i.e., the predictive effect of irAEs was significant on low-grade irAEs but not severe-grade irAEs. Severe irAEs are potentially life- threatening and require systemic immunosuppressive treatment, which may counteract the effect of ICIs. Glucocorticoids extensively modify cytokine signaling and inhibit the IL-2 and INF-γ pathways [54][55][56], which are reactivated to create the inflammatory tumor microenvironment during ICI therapy. Therefore, exposure to large amounts of immunosuppressive reagents during high-grade irAEs would be expected to alter the antitumor effect.
Regarding the subgroup analyses, the distinct outcomes observed for the anti-PD-1 and anti-CTLA-4 subgroups indicated a possibly different mechanism of irAEs, as CTLA-4 blockade activates T cells at an earlier stage of their development and might thus directly disrupt central tolerance without affecting the tumor immune response. Meanwhile, irAEs induced by PD-1 inhibitors predict a better clinical response by patients with cancer, but the association between irAEs and survival in patients undergoing anti-CTLA-4 therapy remains controversial [8,29,[57][58][59][60], demanding larger size of studies in the future. The ability of irAEs to predict a favorable OS and PFS was consistently significant in patients with NSCLC and other cancers (including RCC and multiple cancer types), but not in patients with melanoma. However, we prefer to attribute this inconsistency in patients with melanoma to the heterogeneity caused by the large proportion of studies investigating CTLA-4 inhibitors (OS: 3 in 7; PFS: 1 in 3), as the additional subgroup analysis revealed that restricting the melanoma cohort to treatment with PD-1 inhibitors yielded significant association between irAE development and a favorable OS.

Strengths and comparison with other studies
A systematic review performed by Ouwerkerk et al. summarized studies investigating the association between irAEs and ICI efficacy in patients with melanoma [36]. However, the report by Ouwerkerk et al. did not pool the results through a meta-analysis and thus did not provide exact statistical information about the critical question of whether the occurrence of irAEs was associated with ICI efficacy. Additionally, the research scope of their study was restricted to patients with melanoma. In the present systematic review and metaanalysis, we performed a comprehensive pan-cancer meta-analysis. Therefore, an accurate magnitude of the predictive effect of irAEs on ICI efficacy was obtained. Additionally, we separately pooled the predictive effects of different irAE types and irAE grades to investigate their specific roles in homogeneous settings.
The relevance of our findings is strengthened by their consistency across all analyzed subsets stratified by the study quality characteristics, except for data extraction. Nevertheless, some of the included studies utilized a univariate model or were small in size, and a landmark analysis was not performed in every study we selected. However, as described above, the subgroup HR value remained significant, regardless of the adjustment for the model, sample size, or landmark analysis, indicating that the biases derived from these low-quality studies are unlikely to change our results.

Limitations
Several limitations of our study should be noted. First, the Begg funnel plot and Begg's test identified evident publication bias in the pooled results for OS, indicating that the results of the pooled analysis of OS might be exaggerated. According to the exclusion criteria, we excluded several major studies from the current meta-analysis because they reported only survival curves but not HR values. Two studies presented no significant difference in survival outcome based on irAEs [34,35], but one study reported a significant difference in survival according to irAEs with landmark analysis [61]. However, Egger's test and the trim and fill method detected no evidence of publication bias. Compared with the overall analysis of OS, the pooled analysis of OS in large-sized studies was comparable and showed no publication bias, indicating that the bias attributed to small-sized studies was moderate. In addition, the PFS analysis showed a similar significant overall effect to the OS analysis and exhibited no publication bias. Taken together, we believe that our study provides meaningful evidence specifying the association of irAE development with survival benefits in patients with cancer who were treated with ICIs. Second, significant heterogeneity was observed in the OS analysis, which might be attributed to variations in irAE types, irAE grade, class of ICIs, etc. In an attempt to reduce the impact of heterogeneity, we performed specific analyses of each type and grade of irAEs. We also conducted subgroup analyses of patient characteristics and study quality characteristics. Third, our study included limited types of malignancy that were mostly weighted towards NSCLC, melanoma, and RCC, restricting the broad application of our findings. Additional analyses of broader types of cancer are necessary to confirm our conclusions. Fourth, only two publications included in our study employed a prospective design, raising the concerns regarding the quality of evidence analyzed in our study. Hence, further large-scale prospective cohort studies are warranted.

Conclusions
In this systematic review and meta-analysis, the occurrence of irAEs was associated with better ICI efficacy in patients with cancer, particularly endocrine, dermatological, and low-grade irAEs. Further largescale prospective studies are warranted to confirm our discoveries.
Additional file 1. Contains additional information about the methods, literature search and data analyses. Table S1. Additional characteristics of the eligible studies. Table S2. The Newcastle-Ottawa Scale (NOS) quality assessment of the enrolled studies. Figure S1. Subgroup analysis stratified by class of immune checkpoint inhibitors in the melanoma cohort. Figure S2. Forest plot (fixed effects model) of the association between immune-related adverse event development and progressionfree survival. Figure S3. Subgroup analyses of the association between immune-related adverse event development and progression-free survival. Figure S4. Sensitivity analysis of the impact of each individual study on the pooled effect. A) Overall survival; B) Progression-free survival. Figure S5. Funnel plots of the overall survival results. (A) Without trim and fill; (B) With trim and fill. Figure S6. Funnel plots of the overall survival results in large sample size studies. Figure S7. Funnel plots of the progression-free survival results.
Additional file 2. Log file of trim and fill method in Figure S5.