In primary care patients well known by their GPs, the two-question screening method for major depression displayed high sensitivity (91%) and low specificity (65%). As suspected, adding the 'help' question led to a decreased sensitivity (59%) but a higher specificity (88%). We also observed a lower specificity for the two-question and three-question methods in subpopulations with other psychiatric conditions (such as generalised anxiety) and in patients who had exhibited major depression 1 year previously.
The strengths of our study are its large sample size, the number and diversity of the participating GPs, and the use of standardised, validated measures for mental disorders. Furthermore, the random selection of patients and their recruitment from a large number of GPs in various settings decreased the risk of selection bias. We therefore believe that our observations are relevant for most patients with physical complaints in primary care in developed countries. However, our study is limited because the two screening questions for major depression were similar to those of the PHQ-9, our reference standard. Therefore, the sensitivity of the screening method is expected to be very high. Finally, the PHQ-9 may not be the best reference standard for major depression for the following three reasons: (1) it is self-report, (2) it doesn't apply exclusion criteria, and (3) it doesn't apply clinical significance criteria. Thus PHQ-9 can only be interpreted as a proxy of DSM-IV [21, 22]. Therefore a standardised visit to a psychiatrist would have been preferred.
Whooley et al.  and Arroll et al.  first introduced the two-question screening method and reported high sensitivities (96% and 97%, respectively) and low specificities (57% and 67%, respectively). Löwe et al.  evaluated the two screening questions in outpatients and obtained similar results with a dichotomous answer (yes/no). Furthermore, the two-question method was able to detect changes in a patient's state of depression. Here we report observations similar to those of Arroll et al.  regarding screening for major depression with two questions. The high sensitivity of these questions allows GPs to securely rule out negative patients, but the relatively low specificity requires further investigations to confidently diagnose major depression in positive cases .
Introduction of the third 'help' question was a very interesting and logical proposition, and should have facilitated the diagnosis of major depression. When we added the 'help' question to the screening method, however, our observations were substantially different from those of Arroll et al.,  who reported increased specificity (89%) but identical sensitivity (96%). As an important number of their patients with major depression responded 'no' to the 'help' question, it is not clear why the sensitivity remained identical. In a second study, Goodyear et al.  validated the two-question and three-question methods using the PHQ-9 as a reference standard for major depression. Although the two-question method was associated with a sensitivity of 98% and a specificity of 73%, and the specificity of the three-question method questions was reported to be 99%, the sensitivity of the three-question method was not provided. A recent publication by the same authors determines a sensitivity of 99.2% and a specificity of 70.4% for the two-question method, whereas the sensitivity decreased to 87.1% and the specificity increased to 94.8% for the three-question method .
An independent study by Baker-Glenn et al.  observed a sensitivity of 23.7% and specificity 97.8% in patients attending chemotherapy with the three-question method. We therefore believe Arrol et al.'s  results to be misleading. These findings support the latest NICE  guidelines that recommend only the use of the two screening questions.
Our analysis indicates that although the three-question method has high negative predictive value, the high false negative rate implies that as many as four patients out of ten (28/69) with major depression would not be correctly diagnosed with this method. In comparison, less than one out of ten patients (6/69) with major depression will not be diagnosed when using the two-question method. It is therefore not helpful to include the third 'help' question to rule out major depression in patients well known by their GPs. But as Kroenke  suggests, 'screening for depression is not enough'. Patients identified with depression have to be treated. Therefore the 'help' question remains clinically relevant, even if more than half of patients with major depression did not ask for help. But within the context of the consultation, the 'help' question enables a continuing discussion about mood disorders and allows evaluation of the appropriateness of a psychiatric treatment and referral. Baker-Glenn et al. conclude, as we do, that the 'help' question may highlight patients willing to accept support . This also underlines GPs' role in investigating and answering patient expectations for their psychological distress as described by Walters showing that patients with milder symptoms usually prefer simple human contact, and informal resource rather than formal interventions or medication . While all these questions may help GPs screen for major depression in their patients, this tool should not replace clinical judgment; indeed, GPs seldom rely on questionnaires alone [31, 32].
Our observations suggest that the sensitivity of the two screening questions is consistent across various patient subpopulations guaranteeing a low number of false negatives regardless of patient characteristics. However, as the specificity differs across patients, GPs may frequently and falsely diagnose major depression in patients who present other mental disorders. Additional studies are necessary to quantify the actual benefits of screening mental disorders in primary care with the two-question and three-question screening methods.