We present a ranking of 44 primary care tests based on the between-practice variation in their use. We analysed over 16 million tests from 444 general practices and ranked tests by their adjusted coefficient of variation. The test subject to the greatest variation was non-illicit drug monitoring tests (urine, blood or serum), urine microalbumin, pelvic CT, and Pap smear. We also identified seven tests with both a rate of ordering and a coefficient of variation above average: clotting, vitamin D, urine albumin, PSA, bone profile, urine MCS and CRP.
Strengths and limitations in relation to previous research
Our analysis adjusted for demographic differences between practices; however, there may be valid reasons to explain the residual variation we present. Previous work has suggested that differences in disease prevalence, patient choice, data artefact (differences in data quality), resource availability, local policy and guidelines, and service configurations may also contribute to variation in healthcare resource use [10]. Other previous research suggests further reasons, not all justifiable. The influence of local key-opinion leaders [18]—such as a hospital consultant preferring a one test over another—and the variation in management of uncertainty among general practitioners [19] have both been suggested as contributors.
We used the conventional statistical analysis for count data, a Poisson regression model. However, we used the outputs of this model in a less conventional manner. The aim of this paper was to determine which tests were subject to the most between-variation practice in their use once patient demographics between practices had been accounted for. We did not use the Poisson models to determine and compare the predictive ability of our covariates (patient demographics). As is expected when analysing health care data [20], the model accounted for some, but not all, of the variation in test use. This residual variation—“overdispersion”—represents the variation in test use once patient demographics between practices had been accounted for. We ranked tests by their residual variation; variation in use that persisted despite adjustment of patient demographic differences between practices.
A strength of our study is our examination of all types of tests (imaging, laboratory and miscellaneous), inclusion of many tests and our use of the appropriate statistical methods to quantify variation. One previous study has presented a ranking of primary care tests on their between-practice variation [13]. This study only examined laboratory tests and included a smaller number of tests [21] in a smaller sample of patients. This study also ranked test variation by their standard deviation (SD). We preferred CoV to SD because SD is affected by the rate of testing (sample size). We found that tests that are most commonly ordered are more likely to have higher standard deviations. The use of SD to rank the between-practice variation may make tests that are ordered more commonly appear to have higher between-practice variation. A limitation of our use of CoV is that it may overestimate variation in low ordering tests; to try and mitigate this, we presented both the tests with the greatest between-practice variation and the tests with a rate of ordering and a CoV above average. We believe tests with both a high CoV and high rate of ordering should be the focus of future academic and policy work.
A further strength of our study is the use of high-quality, validated electronic health record data and the identification of tests that are subject to the greatest between-practice variation. Most previous research exploring geographical variation in healthcare resource use has focused on identifying regions that order a greater or lesser number of tests or treatments compared to the national average [11, 22,23,24,25].
Implications for practice and policy
The wide between-practice variation in test use we present is unlikely to be explained entirely by clinical indication. We present a list of common and important primary care tests ranked by their between-practice variation. Policy makers must decide if the residual variation we present is warranted, and if it is not, understand why this variation exists and what can be done to mitigate it. Between-practice variation and, more broadly, geographical variation have long been used to highlight potential over or underuse of healthcare resources [6, 24, 26, 27]. Our ranking of tests can direct policy makers to the primary care tests most likely subject to overuse—the use of a test when it will not result in patient benefit—or underuse—the failure to use a test when it would result in patient benefit. However, it should be noted the variation we present does not directly consider individual patient data nor the clinical indications for test use. As such, our results can be considered a potential, not definitive, indicator of over and underuse.
In some cases, there are content-specific reasons to explain the between-practice variation in test use. For instance, the notable between-practice variation in the use of clotting and drug monitoring tests may reflect regional differences in drug use. In UK primary care, there has been an increase in the use of novel oral anticoagulants (NOAC) (also known as direct acting oral anticoagulants (DOACs) [28]; from 2009 to 2015, there was a 17-fold increase in NOAC use [28]. However, there is marked geographical variation in their use [21]. This variation may reflect the non-specific NICE guidance; it states that patients with atrial fibrillation can be anti-coagulated with “apixaban, dabigatran, rivaroxaban or a vitamin K antagonist” [29]. However, this guidance is now out-of-date compared to more recent evidence. A 2017 systematic review and network meta-analysis concluded that “the risk of all-cause mortality was lower with all DOACs” and “several DOACs are of net benefit compared with warfarin” [30]. With clear guidance, reflecting the underlying evidence, it is plausible that geographical variation in clotting tests would diminish.
Similarly, the variability of drug monitoring tests is likely to be related to regional differences in disease prevalence. Drug monitoring tests include tests for tacrolimus, cyclosporin, salicylate, lamotrigine, lithium and gentamicin (among others). All of these tests, individually, had low rates of use. Lastly, it should be noted that tests can be directly wasteful, but can also contribute to healthcare costs indirectly, for instance via incidental imaging findings [31].
Future research
It would be advantageous for future studies to investigate variation using a different unit of an analysis. We chose to investigate variation at a practice level; however, future studies could investigate variation at a patient level or at a regional level. We chose to investigate at a practice level as previous literature suggests practice-level factors contribute substantially to healthcare variation [10, 18]. Differences in disease prevalence, cultural attitude to tests and their risks, local key-opinion leaders, resource availability, local policy and guidelines, and service configurations have all been suggested as practice-level contributors to variation.
A similar analysis aggregated at a regional, rather than practice, level may provide further insight into unwarranted variation. It is plausible that our analysis at a practice level may be too sensitive to variation in disease prevalence; this may in part explain non-illicit drug testing as an outlier. However, the aggregation of data at a regional level may obfuscate true, unwarranted variation. Furthermore, the CPRD only allows practices to be identified at a broad regional level (e.g. within Wales). Conversely, future research that analyses data at an individual patient level may provide more nuanced insight into variation, but risks being overly sensitive, making the distinction between warranted and unwarranted variation more difficult. Nevertheless, we would welcome any further studies that adopted any of the aforementioned units of analyses.
Furthermore, beyond adjustment for demographic differences, we could not directly determine the appropriateness of the between-practice variation we noted. Future research studies should aim to determine if the tests with the greatest between-practice variation are also subject to the greatest underuse and overuse. This research should ideally involve individual patient data (IPD) either in the form of notes review, or IPD data audit [32], commonly against guidelines [33]. Finally, some of our team are involved in delivering OpenPathology.net [24]; an open data tool (like OpenPrescribing.net) that provides easy access to various analytic approaches identifying test-ordering behaviour in primary care. This tool will continue our work exploring temporal trends on a live interface.