Support of personalized medicine through risk-stratified treatment recommendations - an environmental scan of clinical practice guidelines

Background Risk-stratified treatment recommendations facilitate treatment decision-making that balances patient-specific risks and preferences. It is unclear if and how such recommendations are developed in clinical practice guidelines (CPGs). Our aim was to assess if and how CPGs develop risk-stratified treatment recommendations for the prevention or treatment of common chronic diseases. Methods We searched the United States National Guideline Clearinghouse for US, Canadian and National Institute for Health and Clinical Excellence (United Kingdom) CPGs for heart disease, stroke, cancer, chronic obstructive pulmonary disease and diabetes that make risk-stratified treatment recommendations. We included only those CPGs that made risk-stratified treatment recommendations based on risk assessment tools. Two reviewers independently identified CPGs and extracted information on recommended risk assessment tools; type of evidence about treatment benefits and harms; methods for linking risk estimates to treatment evidence and for developing treatment thresholds; and consideration of patient preferences. Results We identified 20 CPGs that made risk-stratified treatment recommendations out of 133 CPGs that made any type of treatment recommendations for the chronic diseases considered in this study. Of the included 20 CPGs, 16 (80%) used evidence about treatment benefits from randomized controlled trials, meta-analyses or other guidelines, and the source of evidence was unclear in the remaining four (20%) CPGs. Nine CPGs (45%) used evidence on harms from randomized controlled trials or observational studies, while 11 CPGs (55%) did not clearly refer to harms. Nine CPGs (45%) explained how risk prediction and evidence about treatments effects were linked (for example, applying estimates of relative risk reductions to absolute risks), but only one CPG (5%) assessed benefit and harm quantitatively and three CPGs (15%) explicitly reported consideration of patient preferences. Conclusions Only a small proportion of CPGs for chronic diseases make risk-stratified treatment recommendations with a focus on heart disease and stroke prevention, diabetes and breast cancer. For most CPGs it is unclear how risk-stratified treatment recommendations were developed. As a consequence, it is uncertain if CPGs support patients and physicians in finding an acceptable benefit- harm balance that reflects both profile-specific outcome risks and preferences.


Background
An important goal of evidence-based health care is to maximize benefits and minimize harms from medical treatments. To achieve an optimal balance, patients' individual profiles and preferences need to be considered [1]. For example, inhaled corticosteroids are used to prevent exacerbations in patients with chronic obstructive pulmonary disease (COPD) [2][3][4], but these drugs are associated with an increased risk for pneumonia and fractures [5,6]. In patients at high risk for exacerbations, the potential benefits (preventing exacerbations) are likely to be larger than harms, while patients at low risk for exacerbations may experience more harms from inhaled corticosteroids than benefits.
Risk-stratified treatment recommendations are potentially useful to support personalized medicine. Personalized medicine aims at optimizing the benefit-harm balance by considering patient profiles (combination of characteristics) and preferences [7]. For the prevention and treatment of chronic disease, most health care decisions are sensitive to patient profiles and preferences [8]. Risk-stratified treatment recommendations suggest different treatment regimens for patients who are at different risks for outcomes [9]. For example, in the Third Report of the National Cholesterol Education Program's Adult Treatment Panel treatment algorithm [10], the recommendation for primary prevention of coronary heart disease is based on the Framingham Risk Score.
According to different risk categories predicted by the Framingham Risk Score, individuals with higher predicted absolute risk (10-year risk > 20%) are recommended for more intensive treatments (such as combined pharmacological and non-pharmacological treatments) than those with lower predicted risk (10-year risk < 10%). There is evidence that using risk-stratified treatments is superior to treatments that are not informed by a risk assessment tool [11][12][13].
Risk-stratified treatment recommendations only serve their purpose of supporting personalized medicine if valid methods were used to develop them. Because it is not known what proportion of clinical practice guidelines (CPGs) make risk-stratified treatment recommendations and what methods were used to develop them, our aim was to assess the methods CPGs applied in developing risk-stratified treatment recommendations for the prevention or treatment of selected common chronic diseases.

Framework for developing risk-stratified treatment recommendations
We started by forming a framework for developing riskstratified treatment recommendations. Figure 1 outlines the major steps for developing risk-stratified treatment recommendations, each of which requires high quality evidence from observational studies (development and Risk assessment tool to estimate outcome risk Evidence about treatment effects on benefit and harm outcomes Application of treatment evidence to outcome risks

Patient preferences
Benefit harm assessment to define treatment thresholds according to outcome risks validation of risk assessment tools), randomized trials (evidence about treatment effects) and studies to elicit patient preferences (using various study designs, for example, discrete choice experiments). It is well known for all guidelines that evidence about treatment effects on benefit and harm outcomes must be available. In addition, a risk assessment tool should be available that allows the assigning of patients to different risk categories. A method is required to estimate how treatment evidence applies to patients at different risks and how the benefits compare to the harms in patients at different risks. As a result of such a benefit-harm assessment, treatment thresholds can be defined for patients with different risk profiles that maximize the chance for benefits while minimizing harms. In addition, patient preferences for outcomes would ideally be explicitly considered for the development of risk-stratified treatment recommendations or their application in practice.

Environmental scan of clinical practice guidelines
We performed an environmental scan of CPGs, which included a limited literature search (described below) but not a comprehensive, systematic review of all CPGs. We focused on CPGs for major chronic diseases and from the United States (US), Canada, or the United Kingdom (UK) National Institute for Health and Clinical Excellence (NICE). The completed PRISMA checklist is available as Additional file 1.

Data sources and searches
We searched the US National Guideline Clearinghouse (NGC) database on February 5 2011 for CPGs with treatment recommendations for five major chronic diseases. The top five chronic diseases in the US are heart disease, cancer, stroke, COPD and diabetes, accounting for more than two-thirds of all deaths [14]. In the NGC database, guidelines were categorized by disease topics that were linked to a specific term derived from the US National Library of Medicine's Medical Subject Headings classification. For heart disease and stroke, we performed our search within the Cardiovascular Diseases section of the database (n = 442) and considered CPGs specific for primary prevention of heart disease and stroke, that is, the prevention of an event in persons free of established cardiovascular diseases. For cancer, we chose to examine three types of cancer with the highest mortality rates in the US (lung cancer, prostate cancer and breast cancer) [15]. We searched for CPGs within the Lung Neoplasms (n = 53), Prostatic Neoplasms (n = 26) and Breast Neoplasms (n = 52) sections, respectively. For COPD, we considered CPGs specific for COPD within the Respiratory Tract Diseases section (n = 102). For diabetes mellitus, we considered CPGs for type II diabetes within the Diabetes Mellitus, Type 2 section (n = 44).

Eligibility criteria for guidelines
We included CPGs that recommended using risk assessment tools to inform treatment decisions. Risk assessment tools are tools to calculate the probability of developing an event or a disease based on a prediction model (binary outcome), or tools that make projections about the course of disease measured by patientreported or other continuous outcomes (for example, decline of functional status over time). We excluded CPGs if they were not from the US, Canada or NICE (UK); focused on childhood diseases; made recommendations on screening, genetic counseling or diagnostic work-up alone; or did not use any risk assessment tools to inform risk-stratified treatment decisions. This latter excluded category involved guidelines that recommended treatments according to diagnostic criteria, as for example based on pathological staging, rather than according to prognostic information (for example, the risk stratification scheme proposed by D'Amico et al. in prostate cancer guidelines [16]).

Guideline selection
Two reviewers (TY and DV) independently reviewed the Guideline Summary section of each CPG on the NGC website to assess its potential eligibility. We excluded the CPGs labeled ineligible by both reviewers. For the other CPGs, we retrieved and examined the full text and resolved any discrepancies in eligibility through discussion or arbitration by a third reviewer (MP).

Data extraction and synthesis
We developed a standardized form to extract data from the included CPGs and the background documents detailing the methods used in developing CPGs when available. We extracted general items such as guideline title, bibliographic source, date released and guideline developer. We then extracted information related to five key components for developing risk-stratified treatment recommendations ( Figure 1). We extracted the following information on risk assessment tools: the name of the prediction model, the outcome and the timeframe (for example, 10 years) used in the model, and whether validation of the model (for example, assessment of discrimination and/or calibration) was discussed in the CPGs. We extracted information on the type of evidence used to determine the effects of treatments on benefit and harm outcomes (observational studies, single or several randomized controlled trials (RCTs), or meta-analyses). We recorded the methods to link risk prediction and evidence on treatment effects (for example, applying relative risk reductions to different absolute risks calculated from the risk assessment tool). We recorded the way the treatment benefits and harms were assessed and how treatment thresholds (based on risk assessment tools) were determined. We also extracted information on assumptions made for linking risk prediction and treatment evidence (for example, assumption of constant relative risk reductions across the risk spectrum) and on assumptions made for the assessment of benefits and harms (for example, assumption that benefit and harm outcomes can be put on a single scale and the overall net benefit expressed as a single number indicating benefit or harm). Finally, we noted whether patient preferences (for example, relative importance of different benefit and harm outcomes) were considered for developing riskstratified treatment recommendations. Because some CPGs were very brief, without detailing the development process but referring to other documents, we considered those documents for data extraction to avoid underestimating the rigor of the development process of a CPG. Two reviewers (TY and DV) independently extracted all relevant information from each CPG and the discrepancies were resolved by discussion or third-party (MP) arbitration. We constructed a table for comparison of recommendations from each of the included CPGs.

Treatments recommended and evidence of treatment benefits and harms
Of the 16 CPGs for type II diabetes and primary prevention of heart disease and stroke, nine (56%) suggested specific target lipid levels for each risk category when making recommendations about lifestyle management or pharmacotherapy (for example, aspirin, statins and antihypertensive drugs) [10,[19][20][21][26][27][28][29]31]. The four CPGs on breast cancer [33-36] provided recommendations on surgery or pharmacotherapy (for example, tamoxifen, raloxifene and aromatase inhibitors) according to risk levels.

Linking treatment effects to baseline risks
In reviewing how CPGs made the link between risk prediction and treatment effects, we found fewer than half of the CPGs (eight out of 20, 40%) explicitly or implicitly stated that they applied evidence of relative risk reductions from RCTs and/or meta-analyses to different absolute risks [10,18,[23][24][25][26][27]32]. For instance, the U.S. Preventive Services Task Force (USPSTF) guideline [25] applied a 32% risk reduction of myocardial infarction (in men) and a 17% risk reduction of strokes (in women) with regular aspirin use to absolute outcome risks and assumed that the effects were constant across risk levels and age categories. One (5%) of the 20 included CPGs [33], instead of applying treatment evidence to all risk levels, used the evidence from RCTs with the same risk profile (high breast cancer risk) population for which the recommendation was made. Eleven (55%) of the included CPGs did not report the way in which they linked risk prediction to treatment effects (Table 2) [19][20][21][22][28][29][30][31][34][35][36].

Benefit-harm assessment to define treatment thresholds and consideration of patient preferences
Only a small proportion (two out of 20, 10% [25,35]) of CPGs explicitly stated that they planned to perform benefit and harm assessment as the basis for making risk-stratified treatment recommendations. To define treatment thresholds, only the USPSTF guideline quantitatively  weighed treatment benefits and harms by putting the expected benefit and harm outcomes on the same scale (events per 1,000 persons treated over 10 years). The USPSTF guideline recommended using aspirin when the treatment benefits (number of myocardial infarctions or strokes prevented per 1,000 persons treated over 10 years) outweigh the treatment harms (number of gastrointestinal bleedings or hemorrhagic strokes per 1,000 persons     •At least two major risk factors and 10-year risk for CHD < 10% The cardiovascular risk exceeds 20% over 10 years •Moderate risk (< 20% 10year CHD risk) •High risk (≥20% 10-year CHD risk) •Adults with type 1 or type 2 diabetes at increased cardiovascular risk (10-year CVD risk > 10%) •Adults with diabetes and 10year CVD risk < 5% •Adults with 10-year CVD risk 5% to 10% Table 2 Risk-stratified treatment recommendations of the included guidelines. (Continued)

Methods to develop treatment thresholds
Unclear, presumably putting benefits and harms on the same scale and find a balance between them to recommend using aspirin; referring to NCEP ATP-III guideline on LDL-C and non-HDL-C goals Not reported Not reported Unclear, presumably putting benefits and harms on the same scale and find a balance between them Explicitly planned benefit and harm assessment as the basis for making recommendations treated over 10 years). For example, the expected number of myocardial infarctions prevented by aspirin was estimated to be 16 per 1,000 men of age 60 to 69 years if men had a 10-year risk for myocardial infarction of 5%, while the expected number of excess gastrointestinal bleedings was 24 and hemorrhagic strokes was one. Because the number of excess events exceeded the number of prevented myocardial infarctions, the USPSTF recommended against the use of aspirin in men at 5% risk for myocardial infarction and an age of 60 to 69 years. Based on observational studies, the USPSTF assumed different risks for gastrointestinal bleeding with aspirin according to age. Finally, the USPSTF presented their benefit-harm assessment and the resulting treatment thresholds as a matrix table with categories for age and risk for myocardial infarction defining each cell. Three (15%) of the 20 CPGs qualitatively weighed the treatment benefits and harms [23,29,32]. Nine (45%) of the 20 CPGs made the recommendation on thresholds based on expert consensus or referred to other guidelines [18,19,21,22,[26][27][28]33,33]. Seven (35%) of the 20 CPGs did not report how they determined the treatment thresholds when making recommendations [10,20,24,30,31,34,36]. With regard to involving patient preferences when developing treatment recommendations, only three (15%) of the 20 CPGs explicitly reported that they considered patient preferences in the process (Table 2) [25,30,36]. For example, the USPSTF focused on major benefit (myocardial infarction) and harm events (gastrointestinal bleeding and hemorrhagic stroke) and assumed equal preferences (that is, importance) for those outcomes.

Discussion
We found a rather small proportion of CPGs for heart disease, cancer, stroke, COPD and diabetes that made risk-stratified treatment recommendations using risk assessment tools. Most of these CPGs recommend risk assessment tools that had been shown to accurately predict outcome risk in the target population of the CPGs and most of the treatment evidence is based on RCTs and meta-analyses. For the majority of the CPGs, however, it was not explicitly explained how treatment effects on benefit and harm outcomes were estimated for patients at different risks. Perhaps most importantly, it was unclear for all but one CPG how treatment thresholds were determined to generate risk-stratified treatment recommendations.
We formed a framework for the development of riskstratified treatment recommendations (Figure 1) to systematically identify the strengths and weaknesses of current CPGs. Our findings suggest that risk assessment tools were carefully appraised and selected during the development of CPGs. For example, some CPG developers critically appraised validation studies of risk tools to judge their calibration (agreement between predicted and observed risk) and discrimination (probability that those with an event receive higher risk predictions that those without an event) [10,30]. Minimizing misclassification of outcome risks is important to avoid over-or under-treatment [37][38][39]. While some CPGs recommended specific risk assessment tools, one CPG suggested using the risk assessment tool that is most likely to be accurate in the specific population of interest [30]. However, the set of CPGs selected in this study may give an overoptimistic picture of risk assessment tools proposed by guidelines. For many diseases and geographical locations other than the US, Canada and the UK, calibrated and discriminative risk assessment tools may not exist. A strength of existing CPGs is that the majority of them relied on RCTs and meta-analyses of RCTs for intervention effectiveness. The CPG developers recognized limitations within this body of evidence, including insufficient evidence on treatment heterogeneity (that is, subgroup effects) and scarcity of data on harm outcomes. We discovered a number of major limitations in how CPGs develop risk-stratified treatment recommendations. It should be noted that some limitations propagated from single, prominent CPG (for example, National Cholesterol Education Program) to other CPGs that adopted the approach or even the recommendations. For example, it was often unclear how the benefit and harm outcomes were estimated for different risk profiles. Some CPGs applied estimates on relative risk reduction to absolute risks. This approach relies on the assumption of constant (relative) effects across the risk spectrum. This assumption of constant relative treatment effects may be justifiable in many instances but it is usually difficult to verify. No alternative approaches for linking the absolute risk with treatment evidence were used. Additional sensitivity analyses may sometimes be appropriate to explore the assumption of relative treatment effects. For example, one could obtain risk-specific treatment estimates from large trials using individual patient data [12]. Or, one could employ simulation studies to estimate the probability of outcomes in the population of interest by combining observational data and treatment effects from randomized trials. It is currently unclear what the most appropriate approach is to link risk predictions with evidence from randomized trials. Nevertheless, we believe CPGs should be explicit about the method they use and acknowledge the associated advantages and limitations (for example, assumption of constant relative risk reduction).
In our view, the greatest limitation of current CPGs is that it is unclear how treatment thresholds were developed for most of them. Some CPGs stated that the thresholds were determined by experts. The USPSTF guideline on aspirin [25] was the only guideline that conducted a formal quantitative assessment by comparing the expected number of benefit and harm events for patients at different risk for myocardial infarction and major gastrointestinal bleeding. We believe that transparency will be enhanced by conducting quantitative benefit-harm assessments alongside more qualitative approaches, such as using expert consensus about treatment thresholds.
Treatment thresholds are important because medical decision-making is discrete (to treat the patient or not). It is challenging to determine thresholds because clear cuts on the (commonly) continuous benefit-harm scale may not exist. In addition, there may often be substantial uncertainty about harms and heterogeneity of treatment effects as a consequence of poor reporting or a lack of evidence from primary studies. However, this should, in our view, not prevent CPG developers from making risk-stratified recommendations because health care providers need evidence-based guidance nevertheless and because variability in delivering health care may be unacceptably high in the absence of guidance. Quanstrum and Hayward [40] recently suggested an approach that acknowledges uncertainty about treatment decision thresholds and proposed two thresholds instead of one: one above which physicians should recommend treatments (benefits outweighing harms irrespective of patient preferences and uncertainties about evidence base) and one below which physicians should recommend against treatments (harms outweighing benefits). The interval between the two thresholds represents an area where treatment could provide small benefits or harms depending on patient preferences but also where uncertainty about the evidence precludes CPG developers from making recommendations. Alternatively, CPG developers could frame strong recommendations for or against treatment for patients at outcomes risks above or below the two thresholds, respectively, and weak recommendations for patients at outcome risks between the two thresholds [41].
One may criticize the approach used by the USPSTF, assigning equal weight to benefit and harm outcomes to calculate events expected per 1,000 people treated over 10 years, because empirical evidence suggests that patients, on average, assign different importance to myocardial infarction, major gastrointestinal bleeding and major stroke, the major drivers of the benefit-harm balance of aspirin [42]. Nevertheless, such transparency about the relative importance of outcomes comes with several important advantages. Users of CPGs can understand and replicate how the treatment thresholds were derived and, if they do not agree with certain assumptions (for example, equal importance of myocardial infarction and major gastrointestinal bleeding), they can adjust the result to derive thresholds that would suit their settings (for example, myocardial infarction considered twice as important as major gastrointestinal bleeding). This would also allow the guideline to be interpreted for an individual patient, who may weigh the various outcomes differently than those preferences assumed in the CPG.
The framework for developing risk-stratified treatment recommendation we proposed may be useful for those developing CPGs and to stimulate further research. While much research has been done on how to select and appraise evidence on treatment benefits and harms [43,44] and how to judge the validity of prediction models [37][38][39], it is less clear how to link risk prediction and treatment evidence, how to select a method for benefitharm assessment to develop treatment thresholds, and how to include patient preferences. It would be useful to have empirical evidence on how the results of different approaches for linking risk prediction and treatment evidence and for defining treatment thresholds differ and how sensitive they are to assumptions [45]. As for patient preferences, little research has been done to find ways to include stakeholders in the process of selecting important outcomes, or a benefit-harm assessment method that provides the information patients need in order to make decisions [46][47][48]. The newly founded Patient-Centered Outcomes Research Institute is likely to contribute substantially to the questions raised.
Our study has some weaknesses. We selected guidelines from five major disease categories and from one database and focused on CPGs from the US, Canada and NICE (UK). Thus our results may not be generalizable, but provide an optimistic assessment of CPGs because we included some of the most prominent guidelines in medicine. For the fields of cardiovascular medicine and diabetes, guideline developers have a long tradition of making risk-stratified treatment recommendations. We relied on published reports, which may not reflect the true underlying development process for CPGs. We considered all background documents that were openly accessible but we may have missed some information on the development of risk-stratified treatment recommendations.

Conclusions
We found that the methods for linking risk prediction with treatment evidence are often not reported and that it was unclear for all but one CPG how treatment thresholds were developed. Therefore, current CPGs for major chronic diseases may not support patients and physicians in finding an acceptable benefit-harm balance that reflects profile-specific outcome risks and preferences.