Skip to main content

Defining responders to therapies by a statistical modeling approach applied to randomized clinical trial data



Personalized medicine is the tailoring of treatment to the individual characteristics of patients. Once a treatment has been tested in a clinical trial and its effect overall quantified, it would be of great value to be able to use the baseline patients’ characteristics to identify patients with larger/lower benefits from treatment, for a more personalized approach to therapy.


We show here a previously published statistical method, aimed at identifying patients’ profiles associated to larger treatment benefits applied to three identical randomized clinical trials in multiple sclerosis, testing laquinimod vs placebo (ALLEGRO, BRAVO, and CONCERTO). We identified on the ALLEGRO patients’ specific linear combinations of baseline variables, predicting heterogeneous response to treatment on disability progression. We choose the best score on the BRAVO, based on its ability to identify responders to treatment in this dataset. We finally got an external validation on the CONCERTO, testing on this new dataset the performance of the score in defining responders and non-responders.


The best response score defined on the ALLEGRO and the BRAVO was a linear combination of age, sex, previous relapses, brain volume, and MRI lesion activity. Splitting patients into responders and non-responders according to the score distribution, in the ALLEGRO, the hazard ratio (HR) for disability progression of laquinimod vs placebo was 0.38 for responders, HR = 1.31 for non-responders (interaction p = 0.0007). In the BRAVO, we had similar results: HR = 0.40 for responders and HR = 1.24 for non-responders (interaction p = 0.006). These findings were successfully replicated in the CONCERTO study, with HR = 0.44 for responders and HR=1.08 for non-responders (interaction p = 0.033).


This study demonstrates the possibility to refine and personalize the treatment effect estimated in randomized studies by using the baseline demographic and clinical characteristics of the included patients. The method can be applied to any randomized trial in any medical condition to create a treatment-specific score associated to different levels of response to the treatment tested in the trial. This is an easy and affordable method toward therapy personalization, indicating patient profiles related to a larger benefit from a specific drug, which may have implications for taking clinical decisions in everyday clinical practice.

Peer Review reports


The results of clinical trials comparing a new treatment with a control via a randomized procedure are based on an overall summary measure over the whole population enrolled. Assessing the individual patient response to a new treatment is not possible, since each study subject is assigned to receive either the new treatment or the control, but not both. That is, the treatment effect is not observable at an individual level. The treatment efficacy is expressed by a single number (the treatment effect size) that is assumed to apply to each treated patient, but this treatment effect can vary according to specific characteristics of the patients enrolled. The aim of personalized medicine is the tailoring of medical treatment to the individual characteristics of each patient in order to optimize individuals’ outcomes. The key issue for personalized medicine is finding the criteria for an early identification of patients who can be responders and non-responders to each therapy. In this study, we present a powerful statistical modeling approach previously developed by Zhao et al. [1], aimed at identifying subgroups of patients with the largest benefit from a therapy tested in a randomized clinical trial, by a multivariate post hoc analysis of baseline patients’ characteristics.

The standard analytical approach to characterize patients with the largest benefits from a treatment is based on post hoc subgroup analyses, which tests the treatment effect on one or more dichotomized baseline variables. This procedure, however, may not be efficient, especially when the number of baseline variables is large. Further, findings from post hoc subgroup analyses need to be validated in independent studies.

Various novel quantitative methods have been proposed to deal with the identification of factors associated to heterogeneous treatment effects, both in cases of single covariates [2,3,4,5] and in those of multiple covariates [6]. In 2013, Zhao et al. [1] proposed a simple modeling approach to build a continuous score made up of multiple baseline covariates from a randomized clinical trial to efficiently identify patients with different levels of treatment benefit. This approach has some advantages: it is very simple from the computational point of view since it is based on the difference between two standard multivariate prognostic models (on the control and on the experimental arm); it generates a linear score made of baseline variables, which can be calculated in a post hoc analysis with no additional costs; when large clinical trials, or, even better, multiple trials testing the same drug are available, it is possible to replicate the results to validate them. The method’s original application was to improve enrichment strategies for future randomized controlled trials. The aim of this paper is to present the application of such a methodology to build a treatment-specific algorithm for potential use in making treatment decisions in clinical practice.

We used here, as a working example, three large clinical trials in multiple sclerosis (MS) who failed to bring the tested drug to approval. This is an ideal scenario, since having three identical trials allows for a rigorous testing–validation–external validation procedure. The approach proposed here can be easily extended to the post hoc analysis of any trial in any medical field. This methodology, without additional cost beyond that of the trials themselves, will indicate the characteristics of individual subjects with higher/lower benefits from any specific treatment, providing patients and clinicians with insights into personalized treatment options.



We used for the present analysis three large randomized clinical trials in relapsing–remitting (RR) MS that tested the efficacy of laquinimod vs placebo. Laquinimod is an orally available carboxamide derivative developed for RRMS. Its mechanism of action may comprise immunomodulatory effects on T cells, monocytes, and dendritic cells as well as neuroprotective effects with prominent actions on astrocytes. Laquinimod was tested in phase II and III clinical trials in RRMS at different dosages initially ranging from 0.1 to 0.6 mg/day. The compound was well tolerated, yet the dosages tested only led to moderate effects on the reduction of relapse rates as a primary study endpoint in two phase III trials (ALLEGRO [7] and BRAVO [8]). In contrast, significant effects on brain volume and disease progression were observed. The Committee for Medicinal Products for Human Use (CHMP) refused marketing authorization for RRMS based on the assessment of the risk–benefit ratio with regard to data from mechanistic and animal studies. A third clinical trial (CONCERTO [9]) was run with disability progression as the primary endpoint. The trial was negative on this endpoint, and the laquinimod program for RRMS was not continued. The analysis presented here is therefore focused on showing the power of the methodology to identify responders to a therapy, taking advantage from the ideal setting of three large clinical trials, and showing that even a drug whose efficacy was not confirmed on the overall population can give large benefits on a specific subgroup of patients.

Patients and study design

This is a post hoc analysis of three randomized clinical trials, the ALLEGRO, BRAVO, and CONCERTO studies ( identifiers: NCT00509145, NCT00605215, and NCT01707992, respectively). The study design and inclusion/exclusion criteria of the trials were the same for the three studies and have been described elsewhere [7,8,9]. Briefly, eligibility criteria included age 18–55 years, diagnosis of RRMS (revised McDonald criteria [10]), Expanded Disability Status Scale (EDSS) scores of 0–5.5, and presence of relapse activity in the previous 12–24 months.

Primary endpoint

The primary endpoint of this analysis is the disability progression as defined in the three trials. A progression event was defined as 1.0 point increase in the EDSS score if baseline score was between 0 and 5.0, or a 0.5 point increase if baseline score was 5.5, sustained for 3 months.

Statistical analysis

According to the accepted methodology of building and validating a prediction rule, a training set (model creation), a validation set (model performance evaluation and refinement toward finalization), and an external validation set not utilized in the model creation (final model validation) are needed [11].

Therefore, in this study, the model predicting response to therapy was developed in these three steps:

  1. 1.

    A set of scores was created on the training set (trial 1 = ALLEGRO)

  2. 2.

    The best score was chosen according to the performance on the validation set (trial 2 = BRAVO)

  3. 3.

    The score was externally validated on the third independent validation set (trial 3 = CONCERTO)

Building the score on the training set

The score building followed the modeling approach described by Zhao et al. [1].

For each subject i, we can observe the four values (Ti, Yi, Ai, Zi), where Ti is the time to disability progression, Yi is the progression event (yes/no), Ai is the treatment arm (here a binary variable: PL = placebo and AT = active treatment), and Zi is the covariate vector (Z1i, Z2i,...,Zpi) made of the values of all the baseline characteristics. By fitting two separate Cox models one for each arm, we obtain the following models:

$$ \log \left({h}_{\mathrm{PL}}(t)\right)=\log \left({h}_{0\mathrm{PL}}(t)\right)+\left(\ {\beta}_{1\mathrm{PL}}\bullet {Z}_1+{\beta}_{2\mathrm{PL}}\bullet {Z}_2+\cdots +{\beta}_{p\mathrm{PL}}\bullet {Z}_p\right) $$
$$ \log \left({h}_{\mathrm{AT}}(t)\right)=\log \left({h}_{0\mathrm{AT}}(t)\right)+\left({\beta}_{1\mathrm{AT}}\bullet {Z}_1+{\beta}_{2\mathrm{AT}}\bullet {Z}_2+\cdots +{\beta}_{p\mathrm{AT}}\bullet {Z}_p\right) $$

where h(t) is the hazard function.

Assuming a common baseline hazard function h0(t) = h0AT(t) = h0PL(t), due to the randomized nature of the two groups, it is possible to calculate a difference score:

$$ D(Z)=\log (HR)=\left({\beta}_{1\mathrm{AT}}-{\beta}_{1\mathrm{PL}}\right)\bullet {Z}_1+\left({\beta}_{2\mathrm{AT}}-{\beta}_{2\mathrm{PL}}\right)\bullet {Z}_2+\cdots +\left({\beta}_{p\mathrm{AT}}-{\beta}_{p\mathrm{PL}}\right)\bullet {Z}_p $$

The difference score D(Z) is the patient-specific score (Z representing the set of patient-specific covariates), predicting the size of treatment effect according to his/her profile.

We selected the baseline variables collected in all the three trials to build the response score, as:

  • Age (continuous OR in 3 groups with cut-offs: < 20, 20–50, 50+)

  • Disease duration (continuous, log transformed)

  • Sex (binary)

  • EDSS (continuous OR binary with cut-off 4)

  • Relapses in the previous year (continuous)

  • T2 lesion volume (continuous, log transformed)

  • T1 lesion volume (continuous, log transformed)

  • Normalized brain volume (NBV) (continuous)

  • Presence of gadolinium-enhancing (Gd+) lesions (binary)

The model building described so far was detailed in the Zhao et al. [1] paper. We expanded the proposed methodology adding a strategy for the best model selection, based on an unsupervised approach. We generated all the possible combinations of the 9 baseline variables, building 511 models (29 − 1) on the placebo and on the treated group, according to the procedure described above. This number was multiplied by 4 considering the different coding of age and EDSS, for a total number of 2044 response scores.

Then, we ranked these scorej (j = 1,…,2044) for their ability to identify patients with heterogeneous response to treatment by ranking the p values of the treatment by scorei interaction in Cox models including treatment, scorej, and treatment by scorej interaction. We selected all the scores giving a p value for interaction with treatment < 0.05; this cut-off was arbitrarily defined to screen the scores. This procedure indicated a first set of candidate response models on the training set.

Analysis on the validation set

The n models with a p value for interaction < 0.05 on the training set were selected and tested on the validation set. The models were ranked in ascending order according to the p values for treatment by score interaction on the validation set. The first five models (that is, the model with rank = 1 to 5) on the validation set were selected as the candidate models to predict response to therapy. To choose one model among these five scores, we merged the training and the validation set, re-estimating the five models’ coefficients. The score with the lower p value for treatment by score interaction on the merged dataset was chosen as the final model. This procedure is based on the performances on the validation set of the models defined on the training set; the final model will be the one among the best 5 on the validation set with the best performance on the integrated training–validation set. To evaluate the consistency of the model choice, the area under the AD(q) curve [1] was calculated.

Zhao et al. [1] defined the AD(c) curve as the curve reporting the average treatment effect (that is the laquinimod vs placebo hazard ratio (HR)) in the subgroup of patients with difference score D(Z) ≤ c, where c represents the response score values ranked in ascending order [1]. In other words, the AD(c) curve reports the treatment effect in subgroups of patients with score ≤ c, for each c. To compare two different scoring systems D1 and D2, the AD(c) curve must be transformed to the AD(q) curve, where q represents the proportion of patients with score ≤ c [1]. The area under the AD(q) curve would be equal to the overall HR in case of a perfectly uniform treatment effect in every subgroup of patients (defined as the q% of the overall population). The lower the area under the AD(q) curve, the higher the heterogeneity of treatment effect according to different levels of the score. Technical details about the definition and the properties of the AD(q) are reported in the cited paper [1].

Cut-off choice

Once the best model was selected, we choose a cut-off to visualize the treatment effect in patients defined as responders and non-responders. The cut-off was chosen by a visual inspection of the AD(q) curve of the selected model, as the point of change of slope. Also, the shape of the distribution of the score was evaluated. This distribution, in fact, represents the predicted distribution of responses to the drug according to the baseline profile in all the patients. The shape and the width of the distribution can give indication about the heterogeneity of response to the assessed drug and can help identify a possible cut-off point separating responders vs non-responders. Once a range of candidate cut-offs was identified by a visual inspection of the AD(q) curve and of the score distribution, a systematic procedure testing all the possible cut-off values in the identified range of values was run, choosing the one dividing the cohort in two subgroups with the lowest p value for treatment by subgroup interaction on the training set.

Final external validation

The score calculated on the training/validation set was then tested on the external dataset. The p value for the score by treatment interaction (indicating whether the treatment has a different efficacy according to different values of the score) was calculated. The dichotomized score (according to the previously defined cut-off) was tested, and HR and 95% confidence intervals (95% CI) in the non-responders and in the responders group were evaluated and compared. A final evaluation of the differential treatment effect in responders vs non-responders on all the three datasets was carried out by a test for homogeneity of treatment effects in each subgroup across the three studies.


Summary of overall clinical trial results

Two arms were included from all studies: placebo and laquinimod 0.6 mg daily. Complete clinical and demographic baseline data were available for 1101/1106 (99.6%) patients from ALLEGRO study, 881/884 (99.7%) from BRAVO study, and 1456/1467 (99.3%) from CONCERTO study, for a total sample of 3438 patients. Table 1 reports the baseline characteristics of the patients included in this analysis.

Table 1 Baseline demographic and clinical characteristics of the population enrolled in each of the three clinical trials included in the analysis

Overall, laquinimod reduced the disability progression as compared to placebo by 36% (HR = 0.64, 95% CI 0.45–0.91, p = 0.01) in the ALLEGRO study, by 31% (HR = 0.69, 95% CI 0.46–1.02, p = 0.06) in the BRAVO study, and by 6% in the CONCERTO study (HR = 0.94, 95% CI 0.65–1.35, p = 0.72).

Score building on the training set and score selection on the validation set

The full procedure for building and validating the response score is described in Fig. 1.

Fig. 1

Flow chart of the process to define the response score using the training (ALLEGRO trial), the validation (BRAVO trial), and the external validation (CONCERTO trial) approach

All 2044 response scores from all possible combinations of the selected baseline variables were generated and tested on the training set (ALLEGRO trial). The scores were generated according to the methodology aforementioned in the “Methods” section with respect to their ability to discriminate different levels of response to treatment. Of the 2044 models generated, 906 had a p value for score by treatment interaction < 0.05 on the training set. Of these, 302 had a p value for treatment interaction on the validation set < 0.05 (BRAVO). The first five models ranked according to the p for interaction on the validation set are reported in Table 2 (the list of all the 2044 response scores are available upon request).

Table 2 List of the five best models obtained defining the models on the training set and testing them on the validation set, ordered according to the p value for treatment by score interaction on the merged training and validation datasets

As the five models reflect equivalent discriminant ability on the validation set, these were re-fitted on the merged training–validation set. The five resulting scores were then ranked, and the one with the lowest p for interaction (p = 0.00027) with treatment was chosen.

The final selected response score was the following:

$$ \mathrm{Response}\ \mathrm{score}=-0.38\times \mathrm{age}\ \left(\mathrm{age}<20\ \mathrm{years}=1,\mathrm{age}\ \mathrm{between}\ 20\ \mathrm{and}\ 50=2,\mathrm{age}>50\ \mathrm{years}=3\right)+0.65\times \mathrm{sex}\ \left(\mathrm{male}=1,\mathrm{female}=0\right)+0.39\times \mathrm{relapses}\ \mathrm{in}\ \mathrm{the}\ \mathrm{previous}\ \mathrm{year}\ \left(\mathrm{number}\right)-0.002\times \mathrm{normalized}\ \mathrm{brain}\ \mathrm{volume}\ \left({\mathrm{cm}}^3\right)-0.20\times \mathrm{presence}\ \mathrm{of}\ \mathrm{Gd}+\mathrm{activity}\ \left(\mathrm{yes}=1,\mathrm{no}=0\right) $$

A constant value was added to the score (k = 3.44), in order to obtain a mean response score equal to the overall logHR. The score can be then interpreted as the response to the drug predicted by the model for each patient. The distribution of the response score is reported in Fig. 2.

Fig. 2

Histogram of the response score distribution, representing the distribution of the predicted response to the drug, with the two normal curves giving the best fit to the score distribution

The distribution is not normal but is skewed to the right and is better fitted by a mixture of two normal distributions. The shape of the response score distribution suggests the presence of responders and non-responders to the drug. This was confirmed by the AD(q) curve (Fig. 3), showing a change of curve slope.

Fig. 3

Treatment effect (hazard ratio (HR), on the y axis) in the cumulative percentage of patients ranked by increasing values of the response score (x axis). The HR is estimated on the increasing q% of patients ranked by increasing values of the response score. Therefore, the HR for a q = 0.5 represents the HR in the 50% of subjects with the lower score, while the HR for a q = 1 represents the HR of the whole cohort (100% of the subject enrolled)

The best score resulted to be:

$$ \mathrm{cut}-\mathrm{off}=-0.31 $$

We then tested the score with this definition of responders (those patients with a response score ≤ − 0.31) and non-responders (those patients with a response score > − 0.31) on the trial’s dataset. The proportion of responders was 61.3% (N = 675) in the ALLEGRO and 58.3% (N = 514) in the BRAVO datasets. The treatment effect in the subgroups defined as responders and non-responders are reported in Table 3.

Table 3 Response to treatment by treatment response score subgroups in the three clinical trials

In the non-responders group, the observed HR for disability progression of laquinimod vs placebo was 1.31 (95% CI 0.77–2.25) in the ALLEGRO trial and HR = 1.24 (95% CI 0.70–2.19) in the BRAVO trial (Fig. 4a, b). In the responders group, the observed HR for disability progression of laquinimod vs placebo was 0.38 (95% CI 0.23–0.62) in the ALLEGRO trial and HR = 0.40 (95% CI 0.22–0.73) in the BRAVO trial respectively (Fig. 4d, e).

Fig. 4

Kaplan–Meier survival curves for the cumulative probability to be free from progression on the Expanded Disability Status Scale (EDSS) confirmed at month 3, in responders (panels d-e-f) and non-responders (panels a-b-c), as defined by the response score, in ALLEGRO, BRAVO, and CONCERTO trials

External validation of the response score

These findings were lastly checked against the external validation dataset (CONCERTO study) that was not used in the previous analyses. The response score by treatment interaction was statistically significant (p = 0.033) providing indication that also on CONCERTO the response score was able to identify a heterogeneous response to laquinimod. In the non-responders group (69.7% of the patients), the HR for disability progression of laquinimod vs placebo was 1.08 (95% CI = 0.74–1.58), and, in the responders group (30.3% of the patients), the HR for disability progression of laquinimod vs placebo was 0.44 (95% CI = 0.21–0.94) (Fig. 4c, f).

The baseline characteristics of responders and non-responders in the three trials are reported in Table 4.

Table 4 Baseline demographic and clinical characteristics of responders and non-responders in the three trials


When many drugs become available for a disease, new challenges arise for clinicians, based on the possibility to make a choice. On the other hand, the scientific breakthroughs in our understanding of how a person’s unique molecular and genetic profile makes them susceptible to respond to specific treatments for some medical conditions have made it possible to identify patient subgroups who would benefit the most from specific therapies. This is not the case for MS: since the first disease-modifying drug was approved in 1993, the approach to therapies has evolved in the direction of initiating treatment early in the disease course and to switch to a different drug in case of breakthrough disease activity or poor tolerability. Few specific biomarkers (more related to safety rather than efficacy) are available to guide the treatment choice. It is also well recognized that the MS disease course, as well as the drug response of most patients, is highly heterogeneous. Thus, it is not possible to identify drug responders or non-responders in advance, or even retrospectively.

Post hoc subgroup analyses of clinical trials in MS did not find specific markers of response to the approved drugs: the results of a recent meta-analysis [12] including all the post hoc subgroup analyses of clinical trials in RRMS indicated generic predictors of higher response to immunomodulatory treatments, like a younger age, a lower EDSS, and a higher MRI activity measured by the presence of Gd+ lesions on the baseline scan. In this paper, we report a method to create a combination of baseline variables in a continuous score, representing a predicted response to a specific treatment. It is possible, in fact, that the full patient profile made up of the combination of multiple baseline variables is more powerful and more informative than each single characteristic to predict the response to treatments. Interestingly, even if all the three factors mentioned above are included into the response score for this drug in MS, age is included in the opposite direction (larger effect for older patients), when in combination with the other selected factors.

This analysis of the distribution of response to treatment is able to explain the contrasting results of the three trials. The third clinical trial of laquinimod (the CONCERTO trial) was launched to test the disproportionate effect on disability progression detected both in the ALLEGRO (HR = 0.64, p = 0.01) and in the BRAVO (HR = 0.69, p = 0.06) studies, despite the low activity of the drug on inflammatory markers such as relapses and MRI endpoints. The present analysis highlighted that there is a subgroup of patients (that can be identified by the set of baseline variables combined in the response score) with a larger response to laquinimod (about 60% of patients both in ALLEGRO and in BRAVO) leading to an EDSS progression risk reduction higher than 60%, while there is a subgroup with no effect (about 40% of patients). The proportion of non-responding patients is higher in the CONCERTO study (around 70%) while those with a favorable response score are just the 30%, due to differences in baseline characteristics as compared to those of the cohort enrolled in the previous studies. In the subgroup of responding patients, the EDSS progression risk reduction is of the same magnitude as in the two previous trials. Therefore, this analysis suggests that the CONCERTO trial failed to demonstrate a treatment benefit mainly because the proportion of non-responding patients enrolled in this study was higher.

It is important to note that the model selection was based here just on objective model performance criteria. Other criteria, more focused on biological/medical knowledge or driven by parsimony and simplicity of implementation in clinical practice, could be considered in the final step for model selection.


This study demonstrates the possibility to refine and personalize the treatment effect estimated in randomized clinical trials by using the baseline demographic and clinical characteristics of the included patients. This is an easy and affordable method toward therapy personalization, suggesting patient profiles related to a larger benefit from a specific drug.

This analysis was conducted on trials of a drug that was not approved on the market; therefore, the implications of this study are not for the specific treatment: the main aim of this analysis is to show the potential of the method and to stimulate similar exercises (with practical implications) on approved therapies. This analysis could be run, in fact, on clinical trials testing every drug approved for a specific condition to create a set of drug-specific scores. This effort calls for pharmaceutical companies to allow the re-analysis of clinical trial data and the publication of results that might indicate patient characteristics of drug responders. Ultimately, the value of personalized treatment approaches should be confirmed by evidence generated by prospective clinical trials which incorporate the response scores.

Availability of data and materials

The data that support the findings of this study are available from Teva Pharmaceutical Industries, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Teva Pharmaceutical Industries.


95% CI:

95% confidence interval


Committee for Medicinal Products for Human Use


Expanded Disability Status Scale




Hazard ratio


Multiple sclerosis


Normalized brain volume






Randomized clinical trial




Standard deviation


  1. 1.

    Zhao L, Tian L, Cai T, Clagget B, Wei LJ. Effectively selecting a target population for a future comparative study. J Am Stat Assoc. 2013;108:527–39.

    CAS  Article  Google Scholar 

  2. 2.

    Song X, Pepe MS. Evaluating markers for selecting a patient’s treatment. Biometrics. 2004;60(4):874–83.

    Article  Google Scholar 

  3. 3.

    Song X, Zhou XH. Evaluating markers for treatment selection based on survival time. Stat Med. 2011;30(18):2251–64.

    Article  Google Scholar 

  4. 4.

    Janes H, Pepe MS, Bossuyt PM, Barlow WE. Measuring the performance of markers for guiding treatment decisions. Ann Intern Med. 2011;154:253–9.

    Article  Google Scholar 

  5. 5.

    Bonetti M, Gelber RD. A graphical method to assess treatment-covariate interactions using the Cox model on subsets of the data. Stat Med. 2000;19(19):2595–609.

    CAS  Article  Google Scholar 

  6. 6.

    Cai T, Tian L, Wong PH, Wei LJ. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics. 2011;12:270–82.

    Article  Google Scholar 

  7. 7.

    Comi G, Jeffery D, Kappos L, Montalban X, Boyko A, Rocca MA, et al. Placebo-controlled trial of oral laquinimod for multiple sclerosis. N Engl J Med. 2012;366:1000–9.

    CAS  Article  Google Scholar 

  8. 8.

    Vollmer TL, Sorensen PS, Selmaj K, Zipp F, Havrdova E, Cohen JA, et al. A randomized placebo-controlled phase III trial of oral laquinimod for multiple sclerosis. J Neurol. 2014;261:773–83.

    CAS  Article  Google Scholar 

  9. 9.

    Comi G. CONCERTO: a placebo-controlled trial of oral laquinimod in patients with relapsing-remitting multiple sclerosis. ECTRIMS Online Library; 2017. p. 202596.

    Google Scholar 

  10. 10.

    Polman CH, Reingold SC, Edan G, Filippi M, Hartung HP, Kappos L, et al. Diagnostic criteria for multiple sclerosis: 2005 revisions to the “McDonald Criteria”. Ann Neurol. 2005;58:840–6.

    Article  Google Scholar 

  11. 11.

    Steyerberg EW. Clinical prediction models. A practical approach to development, validation and updating. New York: Springer-Verlag; 2009.

    Google Scholar 

  12. 12.

    Signori A, Schiavetti I, Gallo F, Sormani MP. Subgroups of multiple sclerosis patients with larger treatment benefits: a meta-analysis of randomized trials. Eur J Neurol. 2015;22:960–6.

    CAS  Article  Google Scholar 

Download references


The authors thank Teva Pharmaceutical Industries for sharing data and collaborating on the post hoc analysis of three clinical trial datasets. The authors thank the reviewers for their valuable comments that helped to improve the manuscript.


This study received no funds. TEVA allowed to re-analyze 3 clinical trial datasets.

Author information




FB managed the database merging, run the statistical analysis, and drafted the manuscript. LC developed an R script to calculate the response scores and contributed to the manuscript draft. AS and MP interpreted the data and contributed to the manuscript draft. JRS, TL, and APT contributed to the manuscript draft. MPS designed the study, analyzed and interpreted the data, contributed to the manuscript draft, and supervised the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Maria Pia Sormani.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

FB and AS received teaching honoraria from Novartis. LC has nothing to disclose. MP receives research support from Novartis and received honoraria from Merck and Novartis. TL and APT are Teva employees. JRS was a Teva employee at the time of data analysis. MPS has received personal compensation for consulting services and for speaking activities from Merck, Teva, Novartis, Roche, Sanofi Genzyme, Medday, GeNeuro, and Biogen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bovis, F., Carmisciano, L., Signori, A. et al. Defining responders to therapies by a statistical modeling approach applied to randomized clinical trial data. BMC Med 17, 113 (2019).

Download citation


  • Multiple sclerosis
  • Personalized treatment effects
  • Personalized medicine
  • Clinical trials