Commentary | Open | Open Peer Review | Published:
Triple P-Positive Parenting programs: the folly of basing social policy on underpowered flawed studies
BMC Medicinevolume 11, Article number: 11 (2013)
Wilson et al. provided a valuable systematic and meta-analytic review of the Triple P-Positive Parenting program in which they identified substantial problems in the quality of available evidence. Their review largely escaped unscathed after Sanders et al.'s critical commentary. However, both of these sources overlook the most serious problem with the Triple P literature, namely, the over-reliance on positive but substantially underpowered trials. Such trials are particularly susceptible to risks of bias and investigator manipulation of apparent results. We offer a justification for the criterion of no fewer than 35 participants in either the intervention or control group. Applying this criterion, 19 of the 23 trials identified by Wilson et al. were eliminated. A number of these trials were so small that it would be statistically improbable that they would detect an effect even if it were present. We argued that clinicians and policymakers implementing Triple P programs incorporate evaluations to ensure that goals are being met and resources are not being squandered.
Wilson and colleagues  provided a great service to behavioral science and public policy by identifying serious limitations in the quality of evidence usually uncritically cited in support of the effectiveness of Triple P-Parenting programs. Using tools that can be readily applied by others, they showed the heavy reliance on self-referred, media-recruited two-parent families; a lack of comparisons between Triple P and alternative active treatments; assessment of outcomes with a battery of measures with little or no evidence of a priori designation of a primary outcome; biased reporting of findings in abstracts; and pervasive conflicts of interest with the authors of the bulk of the articles receiving royalties and other professional and financial rewards from the promotion of Triple P.
A defensive rejoinder from the promoters of Triple P, Sanders and colleagues , was disappointingly unresponsive, particularly in light of the extravagant claims currently being made for empirical support on Triple P websites [3, 4] and in promotional material distributed around the world. Some of the risks of bias in the Triple P literature identified by Wilson and colleagues are indeed endemic to literature evaluating psychosocial interventions, but that does not excuse continuing promotion of Triple P without explicit acknowledgment of the limitations of the quantity and quality of available evidence. Sanders and colleagues' rejoinder underscores the problems of relying on developers and promoters of interventions to remain objective in evaluating their programs and in receiving criticism.
Yet, both Wilson and colleagues and the response from Sanders and colleagues overlook a fundamental weakness in the Triple P literature that amplifies other sources of bias. Namely, evidence consisting of studies with a high risk of bias is further limited by the preponderance of underpowered studies yielding positive results at a statistically improbable rate.
Wilson and colleagues noted a number of times in their review that many of the trials are small, but they do not dwell on how many, how small or with what implications. We have adopted the lower limit of 35 participants in the smallest group for inclusion of trials in meta-analyses . The rationale is that any trial that is smaller than this does not have a 50% probability of detecting a moderate sized effect, even if it is present. Moreover, small trials are subject to publication bias in that if results are not claimed to be statistically significant, they will not get published with the justification that the trial was insufficiently powered to obtain a significant effect. On the other hand, when significant results are obtained, they are greeted with great enthusiasm precisely because the trials are so small. Small trials, when combined with flexible rules for deciding when to stop a trial (often based on a peek at the data), failure to specify primary outcomes ahead of time, and flexible rules for analyses, can usually be made to yield positive findings that will not replicate. Small studies are vulnerable to outliers and sampling error, and randomization does not necessarily equalize group differences that can prove crucial in determining results. Combining published small trials in a meta-analysis does not overcome these problems, because of publication bias and because of all or many of the trials sharing methodological and reporting problems .
What happens when we apply the exclusion criterion to Triple P trials of less than 35 participants in the smallest group? Looking at Table 2 in Wilson and colleagues' review, we see that 19 of 23 of the individual papers included in the meta-analyses are excluded. Figure 2 in the Wilson et al. review provides the forest plot of effect sizes for two of the key outcome measures reported in Triple P trials. Small trials account for the outlying strongest finding , but also the second-weakest finding , a likely sampling error from inclusion of small trials. Meta-analyses often attempt to control for the influence of small trials by introducing weights, but this strategy is inadequate when the bulk of the trials are small . Again examining Figure 2, we see that even with the weights, such small trials still add up to over 76% of the contribution to the overall effect size. Of the four trials that are not underpowered [10–13], one  has a non-significant effect entered into the meta-analysis. In addition, the confidence interval for one of the positive, moderate-sized trials barely excludes zero (.06) .
Many of the trials evaluating Triple P were quite small, with eight trials having less than 20 participants (9 to 18) in the smallest group. This is grossly inadequate to achieve the benefits of randomization and such trials are extremely vulnerable to reclassification or loss to follow-up or missing data from one or two participants. Moreover, we are given no indication how the investigators settled on an intervention or control group this small. Certainly it could not have been decided on the basis of an a priori power analysis, raising concerns of data snooping  having occurred. The consistently positive findings reported in the abstracts of such small studies raise further suspicions that investigators have manipulated results by hypothesizing after the results are known (harking) , cherry-picking and other inappropriate strategies for handling and reporting data . Such small trials are statistically quite unlikely to detect even a moderate-sized effect, and that so many nonetheless get significant findings attests to a publication bias or obligatory replication  being enforced at some points in the publication process.
Many communities and charities are proceeding with ambitious and costly implementations of Triple P-Parenting programs with the expectation that this will lead to the alleviation of social and public health problems associated with poor parenting. Wilson and colleagues highlighted the inadequacy of the existing clinical trial data. Adding to that the dominance of biased positive reporting of underpowered trials, it becomes incumbent upon clinicians and policymakers to adequately monitor the implementation of Triple P and evaluation of clinical outcomes to ensure that scarce resources are not squandered.
Wilson P, Rush R, Hussey S, Puckering C, Sim F, Allely CS, Doku P, McConnachie A, Gillberg C: How evidence-based is an 'evidence-based parenting program'? A PRISMA systematic review and meta-analysis of Triple P. BMC Med. 2012, 10: 130-10.1186/1741-7015-10-130.
Sanders MR, Pickering JA, Kirby JN, Turner KMT, Morawska A, Mazzucchelli T, Ralph A, Sofronoff K: A commentary on evidence-based parenting programs: redressing misconceptions of the empirical support for Triple P. BMC Med. 2012, 10: 145-10.1186/1741-7015-10-145.
Triple P: Positive Parenting Program. [http://www.triplep.net/]
Triple P for Practitioners. [http://www19.triplep.net/?pid=59]
Coyne JC, Thombs BD, Hagedoorn M: Ain't necessarily so: review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychol. 2010, 29: 107-116.
Kraemer HC, Gardner C, Brooks JO, Yesavage JA: Advantages of excluding underpowered studies in meta-analysis: inclusionist versus exclusionist viewpoints. Psychol Methods. 1998, 3: 23-31.
Connell S, Sanders MR, Markie-Dadds C: Self-directed behavioral family intervention for parents of oppositional children in rural and remote areas. Behav Modif. 1997, 21: 379-408. 10.1177/01454455970214001.
Turner KM, Sanders MR: Help when it's needed first: a controlled evaluation of brief, preventive behavioral family intervention in a primary care setting. Behav Ther. 2006, 37: 131-142. 10.1016/j.beth.2005.05.004.
Sterne JA, Gavaghan D, Egger M: Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000, 53: 1119-1129. 10.1016/S0895-4356(00)00242-0.
Hahlweg K, Heinrichs N, Kuschel A, Bertram H, Naumann S: Long-term outcome of a randomized controlled universal prevention trial through a positive parenting program: is it worth the effort?. Child Adolesc Psychiatry Ment Health. 2010, 4: 14-10.1186/1753-2000-4-14.
Bodenmann G, Cina A, Ledermann T, Sanders MR: The efficacy of the Triple P-Positive Parenting Program in improving parenting and child behavior: a comparison with two other treatment conditions. Behav Res Ther. 2008, 46: 411-427. 10.1016/j.brat.2008.01.001.
Morawska A, Sanders MR: Self-administered behavioral family intervention for parents of toddlers: part I. Efficacy. J Consult Clin Psychol. 2006, 74: 10-19.
Sanders MR, Markie-Dadds C, Tully LA, Bor W: The Triple P-Positive Parenting Program: a comparison of enhanced, standard, and self-directed behavioral family intervention for parents of children with early onset conduct problems. J Consult Clin Psychol. 2000, 68: 624-640.
Bettis RA: The search for asterisks: compromised statistical tests and flawed theories. Strat Mgmt J. 2012, 33: 108-113. 10.1002/smj.975.
Giner-Sorolla R: Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect Psychol Sci. 2012, 7: 562-571. 10.1177/1745691612457576.
Simmons JP, Nelson LD, Simonsohn U: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011, 22: 1359-1366. 10.1177/0956797611417632.
Ioannidis J: Scientific inbreeding and same-team replication: Type D personality as an example. J Psychosom Res. 2012, 73: 408-410. 10.1016/j.jpsychores.2012.09.014.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/11/11/prepub
The authors declare that they have no competing interests.
Both authors contributed to the conceptualization, drafting and editing of the manuscript. Both authors read and approved the final manuscript.