Which placebo to cure depression? A thought-provoking network meta-analysis
© Naudet et al.; licensee BioMed Central Ltd. 2013
Received: 7 February 2013
Accepted: 4 October 2013
Published: 25 October 2013
Skip to main content
© Naudet et al.; licensee BioMed Central Ltd. 2013
Received: 7 February 2013
Accepted: 4 October 2013
Published: 25 October 2013
Antidepressants are often considered to be mere placebos despite the fact that meta-analyses are able to rank them. It follows that it should also be possible to rank different placebos, which are all made of sucrose. To explore this issue, which is rather more epistemological than clinical, we designed an unusual meta-analysis to investigate whether the effects of placebo in one situation are different from the effects of placebo in another situation.
Published and unpublished studies were searched for by three reviewers on Medline, the Cochrane Library, Embase, clinicaltrials.gov, Current Controlled Trial, in bibliographies, and by mailing key organizations. The following studies in first-line treatment for major depressive disorder were considered to construct an “evidence network”: 1) randomized controlled trials (RCTs) versus placebo on fluoxetine, venlafaxine and 2) fluoxetine versus venlafaxine head-to-head RCTs.
Two network meta-analyses were run to indirectly compare response and remission rates among three different placebos: 1) fluoxetine placebo, 2) venlafaxine placebo, and 3) venlafaxine/fluoxetine placebo (that is, placebo compared to both venlafaxine and fluoxetine). Publication biases were assessed using funnel plots and statistically tested.
The three placebos were not significantly different in terms of response or remission. The antidepressant agents were significantly more efficacious than the placebos, and venlafaxine was more efficacious than fluoxetine. The funnel plots, however, showed a major publication bias.
The presence of significant levels of publication bias indicates that we cannot even be certain of the conclusion that sucrose equals sucrose in trials of major depressive disorder. This result should remind clinicians to step back to take a more objective view when interpreting a scientific result. It is of crucial importance for their practice, far more so than ranking antidepressant efficacy.
The history of medicine is closely linked to the history of placebos. Pre-scientific medicine was based on many bizarre and ineffective medical interventions and on the belief that such treatments were effective . Placebo was used for the first time as a control in 1784 to debunk the healing claims of mesmerism , and it became a standard control in experimental procedures in the second half of the 20th century. Randomized controlled trials (RCTs) against placebo have enabled major progress in modern medicine. Nevertheless, these studies have limitations in terms of external validity and even internal validity, and antidepressant literature on major depressive disorder is a striking example of these limitations: some practitioners and researchers  consider that most of the antidepressant efficacy reflects simply the placebo effect, especially for depression in patients with mild or moderate symptoms [4, 5]. However, many patients are satisfied with these treatments, many clinicians trust them and use them, and a large part of discussions during medical staff meetings is devoted to the choice of the right sort of antidepressant drug . Recently, Cipriani et al. in a multiple-treatment meta-analysis, ranked 12 new-generation antidepressants  to address this question.
This state-of-the-art raises a fundamental question: if much of the effect of antidepressants is attributable to the placebo effect and if it is possible to rank antidepressants, then it should also be possible to rank different placebos, which are all made of sucrose. In a more global perspective, it questions whether or not we can be certain about anything in psychiatry (or, indeed, in medicine), and, in particular, whether the evidence that we usually rely on provides us with a reasonable degree of certainty about the nature and effectiveness of our practices. We set about investigating this question, which is rather more epistemological than clinical, by investigating whether the effects of placebo in one situation are different from the effects of placebo in another situation. We thus designed an unusual meta-analysis on aggregated data which allows us to examine the apparently incontrovertible fact that sucrose equals sucrose by comparing the placebos of two famous antidepressant blockbusters: 1) fluoxetine, one of the first selective serotonin reuptake inhibitors available on the market, which has become a reference drug, and 2) venlafaxine, a serotonin-norepinephrine reuptake inhibitor.
We reviewed studies involving adults (age 18 and over) with a diagnosis of major depressive episode (DSM IV, DSM IV-R, DSM III, DSM III-R, ICD 10, Feighner criteria, Research Diagnostic Criteria). Studies involving patients with other psychiatric or medical comorbidity were considered only when it was not an explicit inclusion criterion for the study. Studies involving patients with a diagnosis of anxious depression were also considered.
Studies involving more than 20% subjects with bipolar disorder were excluded, as were studies exclusively involving patients with seasonal affective disorder, post-partum depression, postmenopausal depression, atypical depression, dysthymia and studies involving elderly patients.
Our primary aim was to compare placebo arms. We focused our attention on three different placebos: 1) fluoxetine placebo (FLUp, where placebo was compared to fluoxetine), 2) venlafaxine placebo (VENLAFp, where placebo was compared to venlafaxine), and 3) venlafaxine/fluoxetine placebo (FLU/VENLAFp, where placebo was compared to both venlafaxine and fluoxetine), which were obviously compared with the corresponding antidepressants in oral mono-therapy in major depressive disorder first line acute treatment.
Response was chosen as the primary outcome. Remission was chosen as a secondary outcome. These outcomes are the most consistently reported estimates of acute-treatment efficacy. Response was defined as the proportion of patients who had a reduction of at least 50% from the baseline score on the Hamilton Depression Rating Scale (HDRS)  or the Montgomery-Åsberg Depression Rating Scale (MADRS) . Remission was defined as the proportion of patients who had a HDRS score ≤7 or a MADRS score ≤12.
When trials reported results from both rating scales, we extracted data from the scale considered in the study as the primary outcome.
In this review were included 1) randomized controlled trials of fluoxetine or venlafaxine against placebo and 2) head-to-head trials of fluoxetine versus venlafaxine with or without placebo control. All studies were conducted from January 1989 to July 2009. Only study reports in English, French and Spanish language were considered.
We used the search strategy from an earlier paper  on venlafaxine and fluoxetine to conduct this meta-analysis on aggregated data.
Eligible studies were identified from PubMed/Medline, the Cochrane library and Embase, including congress abstracts. A three-step search was used for each component of this review. In a first step, an initial search on Medline was carried out in order to refresh optimal search terms and include possible changes in the databases. The search terms used were double-checked before starting the main search. In a second step, all keywords identified were used to search all the above-mentioned databases. A third search was undertaken by searching the reference lists of articles identified. Initial keywords used were: “Depressive Disorder NOT Depression, Postpartum NOT Seasonal Affective Disorder”; “Antidepressive Agents”; “Fluoxetine”; “Venlafaxine”. In addition, manual searches of articles were performed in previous meta-analyses.
Unpublished studies were searched for by communication with key organizations such as the Food and Drug Administration (FDA) and the European Medicines Agency (EMEA), and key researchers in the area. A search on clinicaltrials.gov and Current Controlled Trial was performed.
Authors of abstracts or meta-analysis were contacted for further information and were asked for references of the studies when needed. If no response was obtained to a first solicitation, they were then contacted a second time.
Eligibility assessment was performed independently in a blinded standardized manner by two reviewers (NF and MAS). Disagreements were resolved by consensus or in consultation with a third reviewer (FB).
A comparison across the studies, checking author names, treatment comparisons, sample sizes and outcomes was performed to avoid duplicates and compilations of data from several reports of the same study.
Each paper was then assessed for methodological quality prior to inclusion in the review using an appropriate standardized critical appraisal instrument from the Joanna Briggs Institute .
A data extraction sheet based on the Cochrane Handbook for Systematic Reviews of Interventions’ guidelines (Version 5.0.2, updated September 2009)  was used. One review author (NF) extracted these data from the studies included.
The dichotomous outcomes used here are considered as robust outcome measures of treatment efficacy . When these outcomes were not reported, the studies were excluded, and no imputation method was used. Responder and remitter data were extracted as the original study investigators analyzed the data, mainly using the LOCF (Last Observation Carried Forward) method.
We used visual inspection of the forest plots and the Q statistic  to investigate the possibility of statistical heterogeneity. In the absence of heterogeneity we performed pair-wise meta-analyses by synthesizing studies that compared the same interventions with a fixed-effects model (Mantel-Haenszel); in case of possible heterogeneity, we performed pair-wise meta-analysis with a random effect model (DerSimonian and Laird).
In a second step, to compare indirectly the different placebos, and following recommendations by Glenny et al., two network meta-analyses [15, 16] were run. The dependent variables were 1) response and 2) remission; the treatment was considered as the explanatory variables (fluoxetine, venlafaxine, FLUp, VENLAFp and FLU/VENLAFp). Both fixed effect and random effect approaches were performed for each network meta-analysis; final models were selected by comparing a model fit criterion (Akaïke’s Information Criterion (AIC)). Results of these meta-analyses are the odds ratio (OR) between treatments with their 95% confidence interval and the statistical significance level of the comparison.
Publication bias was investigated graphically using funnel plots for each fixed effect meta-analysis. Funnel plot asymmetry was tested using the rank correlation test when there were at least 10 studies .
Analyses were performed using R  with libraries meta , rmeta  and lme4  (lmer function, family = binomial). Results are presented according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statements .
Summary of study methodology
V vs F vs P (n = 4)
V vs F (n = 8)
V vs P (n = 10)
F vs P (n = 9)
Total (n = 31)
(1999, 2003, 2004, 2009)
(1994, 1997, 1998, 2000, 2007)
(1993, 1996, 1997, 1998, 2004)
(1991, 1998, 2002, 2004, 2005)
(1991, 1997, 1999, 2003, 2009)
Anxious depression (No)
Study duration (weeks)
(6, 6, 8, 12)
(6, 8, 9, 12, 12)
(4, 6, 8, 11, 13)
(4, 6, 8, 12, 13)
(4, 6, 8, 12, 13)
Number of follow-up visits
(6, 7, 8, 8)
(6, 6, 7, 8, 9) (NA = 1)
(5, 6, 7, 8, 8)
(3, 8, 9, 9, 9) (NA = 1)
(3, 6, 7, 8, 9) (NA = 2)
Industry sponsorship (yes)
8 (100%) (NA = 1)
30 (100%) (NA = 1)
Exclusion of placebo responders (yes)
6 (75%) (NA = 1)
25 (83%) (NA = 1)
(NA = 1)
(NA = 1)
(NA = 2)
Outpatients in primary care
Type of analysis
ITT with LOCF
Initial severity (HDRS score ≥25)
3 (37.5%) (NA = 1)
17 (57%) (NA = 1)
Using the Q statistic, no significant heterogeneity was detected for fluoxetine versus placebo, venlafaxine versus placebo or for fluoxetine versus venlafaxine in the response and remission meta-analyses. Nevertheless, as visual inspection of the forest plot suggested that heterogeneity could not be totally excluded, and as heterogeneity tests are often under-powered , we ran pair-wise fixed effects and random effects models which found the same results.
Results of the network meta-analyses
[0.75 to 1.41]
[0.63 to 1.25]
[0.46 to 0.74]
[0.35 to 0.60]
[0.61 to 1.43]
[0.64 to 1.16]
[0.46 to 0.70]
[0.37 to 0.52]
[0.59 to 1.39]
[0.66 to 1.42]
[0.51 to 0.84]
[0.40 to 0.66]
[0.37 to 0.71]
[0.41 to 0.73]
[0.42 to 0.76]
[0.68 to 0.91]
[0.31 to 0.63]
[0.37 to 0.61]
[0.36 to 0.65]
[0.74 to 1.00]
The three placebos considered were not significantly different in terms of response or remission. Antidepressant agents were significantly more efficacious than placebos, and venlafaxine was more efficacious than fluoxetine. This was coherent with previous meta-analyses [7, 24]. Thus, venlafaxine placebo appears as a “me-too” placebo without greater benefit in relation to fluoxetine placebo and/or fluoxetine and venlafaxine placebo.
Since, obviously, no direct evidence of well-powered, randomized, direct-comparisons exist, indirect comparisons were the only option for comparing the three different placebos. Such indirect evidence is not totally the same as direct evidence and in some cases it has been shown that indirect comparisons led to conflicting results as compared with direct evidence. Such a paradox has been recently shown concerning antidepressants in a recent paper comparing citalopram with its “me-too” escitalopram which found an inconsistency between direct evidence (showing a superiority of escitalopram) and indirect evidence (which did not find any significant difference) . Additionally, Song et al. have suggested that in some cases indirect evidence is less biased than direct evidence . Moreover, the validity of indirect comparison is dependent on the distribution of relative treatment effect modifiers across different comparisons . In our study, initial severity (HDRS score ≥25) could be an important effect modifier [4, 5] but despite slight variations, its distribution seemed well-balanced across the different direct comparisons (that is, there was no systematic difference in its distribution between the different direct comparisons).
Nevertheless, results of our indirect comparisons were consistent with the implicit conception that if the three placebos (all made of the same sucrose) were compared in a double blind randomized trial, no difference would be observed. These results are good news indeed: 1) for the supporters of placebo because they conclude that the choice of the placebo is not really important, 2) for the supporters of antidepressants because antidepressants prove superior to placebos, and 3) for supporters of rationality because a validated method has not led to invalid conclusions, that is, we have managed to conclude that “sucrose equals sucrose” in the treatment of major depressive disorder, which is wholly reassuring. Yet significant limitations question these findings.
The quality of a meta-analysis is linked to the quality of the individual studies included . In this respect, the National Institute for Health and Clinical Excellence (NICE) guideline on the treatment and management of depression in adults  advises caution in the application of results from RCTs and meta-analyses in routine practice. In particular, our model is based on the postulate that all placebo responders should be antidepressant responders (additive model). This key assumption has never been proved to be true. Indeed, antidepressant response and placebo response could be independent or at least substantially overlapping phenomena (non-additive model), with four different types of patients: 1) placebo-only responders, 2) treatment-only responders, 3) placebo and treatment responders, and 4) never responders . Moreover, the classic logic of the randomized controlled trial casts the placebo effect as a negative foil for measuring therapeutic efficacy and a large amount of important information concerning placebo is not reported in these studies, such as the appearance of the medication (size, shape and color of the pills) .
As well, the indirect comparison may have had low statistical power. Even if no trend toward statistical significance was observed within the indirect comparisons of placebos, insignificant P-values never tell much about equivalence .
Above all, a publication bias and a selective outcome reporting bias might account for some of the effects we observed. The funnel plot for comparison between venlafaxine and fluoxetine shows some asymmetry in favor of venlafaxine. As “true heterogeneity” could not explain this result, and since studies on antidepressants generate substantial conflicts of interest (these drugs generate vast sales revenues), the result is very much open to suspicion of reporting bias . Concerning comparisons between active antidepressants and placebo, no evidence was found for a publication bias, but as statistical tests for asymmetry typically have low power , this bias cannot be excluded. The bias is well known  and, for example, it has led to considering reboxetine as a serious antidepressant agent, whereas it is probably ineffective and potentially harmful . It has been recently demonstrated that the selective reporting of studies in network meta-analysis of antidepressants biases estimates of relative treatment efficacy .
Various barriers were encountered in our meta-analytic quest for exhaustiveness: many trials were carried out in China and published in Chinese journals. Nevertheless, the quality of many of these studies could be expected to be poor , and excluding these trials, although it means a loss of randomized evidence, thus avoids other major biases.
Moreover - and this is probably the main problem - antidepressant research is completely controlled by the pharmaceutical industry : 1) the firms that promoted some of the trials we identified refused to communicate results from these studies; 2) in this meta-analysis all the studies were sponsored by the pharmaceutical industry. Such studies have been shown to be more likely to demonstrate positive effects for the sponsor’s drug than independent studies .
Thus, in view of these limitations, a reasonable measure of skepticism should discourage hasty conclusions. They illustrate the fact that every scientific result is uncertain and that it is difficult to be sure of an individual study conclusion, even if it explored something as patently obvious as “sucrose = sucrose”, however rigorous the method. Nevertheless, although published research findings can be erroneous , Science often generates representations that leave no room for skepticism. This is probably the most insidious pitfall in Evidence-Based Medicine; it does not concern the findings of Science, it concerns academics’ understanding of Science (the knowledge-producing activity). The present-day context of medicalization of modern society  implicitly dictates that scientific results should have the status of Truth. Concerning major depressive disorder, although it is likely to be untrue, clinicians and a great number of patients  strongly believe that antidepressant drugs target a specific biological state that produces depression  and the pleasing serotonin hypothesis is often taken as gospel .
As for the study by Cipriani et al., it was disputed with similar arguments to those set out in the present study [37, 44–47] and its results were not replicated by Gartlehner et al.. Moreover, whereas in our study insignificant P-values do not tell much about equivalence, in over-powered studies, like Cipriani’s study, statistically significant differences never tell much about clinically significant differences . It is nevertheless mentioned in the NICE guideline , with some kind of double bind: qualitatively no recommendations for ranking antidepressants are made, but quantitatively special emphasis is placed on the study (Cipriani’s name is cited 23 times versus 3 times for Gartlehner, tables present the results, and so on). What is more, in day-to-day practice, clinicians generally consider that Cipriani’s study is solid evidence for choosing antidepressants when treating a patient with newly diagnosed depression . Uncertainty is sometimes acknowledged theoretically, but not in clinical practice.
This epistemological position translates into the well-known anthropological observation that the hopes and expectations of the physician are just as crucial as those of the patient in the healing process . This suggests that while the opposite may not be true, the best placebos to treat Major Depressive disorder could be antidepressants because they are believed to be effective, which is probably an important determinant in the placebo effect .
Nonetheless, we cannot simply assume that, because patients appear to improve on placebos in the short term, as we observe in randomized controlled trials, placebos have demonstrated the required cost-benefit balance. When one uses a treatment that relies on expectation, one must also be careful as to its possible harmful consequences, which could be linked with the corollary, disappointment. But here again this is an unsolved central question in Evidence Based Medicine.
As in pre-scientific medicine, modern physicians need to believe in the effectiveness of their techniques, and current medical literature, with its strengths and also its limitations, appears as the sophisticated way to generate such beliefs. It also raises the ethical issue of the dissemination of scientific evidence, for example, when editors permit reprints for the pharmaceutical industry intended to present results to doctors via a more commercial than epistemological approach, as was the case for the Cipriani study .
We did not find any superiority of one placebo over the other. However, a critical approach to our results prevents any firm conclusion on this apparently obvious result. This result should remind clinicians to step back to take a more objective view when interpreting a scientific result, keeping in mind that Science can never be actually sure that “sucrose = sucrose” in the treatment of major depressive disorder. It is of crucial importance for their practice, far more so than ranking antidepressant efficacy.
Akaïke’s information criterion
Diagnostic and statistical manual of mental disorders, 3rd edition
Diagnostic and statistical manual of mental disorders, 3rd edition, text revised
Diagnostic and statistical manual of mental disorders, 4th edition
Diagnostic and statistical manual of mental disorders, 4th edition, text revised
European medicines agency
Food and drug administration
Hamilton depression rating scale
International classification of diseases, 10th Revision
Last observation carried forward
Montgomery-Åsberg depression rating scale
National institute for health and clinical excellence
Preferred reporting items for systematic reviews and meta-analyses
Randomized controlled trials
We thank all the authors of the different studies who cooperated, Guillemette Utard for her advice on bibliographic research, Lionel Riou França, Hervé Maisonneuve and Nicolas Naudet for their very interesting comments, Claudine Naudet, and Angela Swaine Verdier for revising the English.
ASM is funded by the Conseil General d'Ile de France (PICRI). This paper was supported by the Institut National de la Santé et de la Recherche Médicale (INSERM). The sponsor had no role concerning the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.