Skip to main content

The accuracy of pulse oximetry in measuring oxygen saturation by levels of skin pigmentation: a systematic review and meta-analysis

Abstract

Background

During the COVID-19 pandemic, there have been concerns regarding potential bias in pulse oximetry measurements for people with high levels of skin pigmentation. We systematically reviewed the effects of skin pigmentation on the accuracy of oxygen saturation measurement by pulse oximetry (SpO2) compared with the gold standard SaO2 measured by CO-oximetry.

Methods

We searched Ovid MEDLINE, Ovid Embase, EBSCO CINAHL, ClinicalTrials.gov, and WHO International Clinical Trials Registry Platform (up to December 2021) for studies with SpO2–SaO2 comparisons and measuring the impact of skin pigmentation or ethnicity on pulse oximetry accuracy. We performed meta-analyses for mean bias (the primary outcome in this review) and its standard deviations (SDs) across studies included for each subgroup of skin pigmentation and ethnicity and used these pooled mean biases and SDs to calculate accuracy root-mean-square (Arms) and 95% limits of agreement. The review was registered with the Open Science Framework (https://osf.io/gm7ty).

Results

We included 32 studies (6505 participants): 15 measured skin pigmentation and 22 referred to ethnicity. Compared with standard SaO2 measurement, pulse oximetry probably overestimates oxygen saturation in people with the high level of skin pigmentation (pooled mean bias 1.11%; 95% confidence interval 0.29 to 1.93%) and people described as Black/African American (1.52%; 0.95 to 2.09%) (moderate- and low-certainty evidence). The bias of pulse oximetry measurements for people with other levels of skin pigmentation or those from other ethnic groups is either more uncertain or suggests no overestimation. Whilst the extent of mean bias is small or negligible for all subgroups evaluated, the associated imprecision is unacceptably large (pooled SDs > 1%). When the extent of measurement bias and precision is considered jointly, pulse oximetry measurements for all the subgroups appear acceptably accurate (with Arms < 4%).

Conclusions

Pulse oximetry may overestimate oxygen saturation in people with high levels of skin pigmentation and people whose ethnicity is reported as Black/African American, compared with SaO2. The extent of overestimation may be small in hospital settings but unknown in community settings.

Review protocol registration

https://osf.io/gm7ty

Peer Review reports

Background

Blood oxygen saturation levels require monitoring for health reasons in a wide range of circumstances. Low blood oxygen saturation, if identified to be hypoxemia, requires medical intervention and has been linked to an increased risk of death [1]. The gold standard measure of blood oxygen saturation levels (SaO2) requires a sample of arterial blood and measurement using CO-oximetry. Pulse oximetry, measuring SpO2 as a proxy for SaO2 using a non-invasive and simple device, is frequently used to detect low blood oxygen levels. Pulse oximetry has been widely used during the COVID-19 pandemic, including in non-clinical settings, to detect hypoxemia and inform decisions to escalate care [2].

The current WHO COVID-19 management guideline recommends the ‘use of pulse oximetry monitoring at home as part of a package of care’ for symptomatic people with COVID-19 [3]. Many countries have specific guidance or services for home pulse oximetry in line with this recommendation [2, 4], such as the NHS England COVID Oximetry@home service [2]. The reporting of possible bias in pulse oximetry measurement, including due to skin pigmentation, raised a growing concern about the accuracy of oxygen self-monitoring [5]. Pulse oximetry works by beaming light through skin into the blood and inferring an SpO2 reading from the amount of light absorbed. Higher levels of skin pigmentation could, in theory, affect how light is absorbed, thus possibly affecting the accuracy of pulse oximetry readings. Measurement inaccuracy could have serious clinical implications including the delay of urgent medical care [6]. A recent US study analysed retrospective cohort data from more than 10,000 people, comparing where a diagnosis of occult hypoxemia (an SaO2 of less than 88%) was missed by pulse oximetry [7]. Results showed people described as Black had ‘nearly three times the frequency of occult hypoxemia that was not detected by pulse oximetry’ as those described as White [7]. In November 2021, the UK Health Secretary ordered a review into racial bias in medical equipment, including pulse oximeters.

It is an important time to consider the current evidence base for the impact of skin pigmentation on the accuracy of pulse oximetry compared with the gold standard measure of SaO2. The only current relevant systematic review, published in 1995, included three studies that explicitly considered the impact of skin pigmentation on pulse oximetry accuracy [8]. The review suggested that pulse oximeters may overestimate blood oxygen saturation in people with dark skin [8]. The recent rapid review by the NHS Race and Health Observatory came to similar conclusions but used a non-systematic review process, i.e., no comprehensive search, risk of bias assessment or meta-analysis [6]. Our objective was to conduct a rigorous systematic review of research on the influence of skin pigmentation on the accuracy of oxygen saturation measurement by pulse oximetry (SpO2) compared with SaO2 measured by standard CO-oximetry.

Methods

Search strategy and selection criteria

We report this review in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [9]. The methods used were described in the registered protocol (https://osf.io/gm7ty).

We included any methods-comparison study that compared SpO2 values in any population, in any care setting, measured using any type of commercially available pulse oximeter, with SaO2 measured by standard CO-oximetry [10]; and investigated the accuracy of pulse oximetry based on both the level of skin pigmentation and ethnic group (Additional file 1: Table S1).

We excluded studies that used (1) prototype pulse oximetry devices, (2) pulse oximeters that require high-skilled specialists to operate (such as intra-partum pulse oximetry devices), and (3) pulse oximeters used for measuring venous blood oxygen saturation. We also excluded studies that reported diagnostic test accuracy measures and those with ineligible comparators, including reference pulse oximetry, use of ineligible reference values of oxygen saturation, e.g. arterial oxygen pressure (PaO2), calculated SaO2, fractional saturation (%O2Hb or FO2Hb) [10, 11].

Following the British Standards Institution 2019 standards for pulse oximetry [10], we included data on the overall accuracy (accuracy root-mean-square, Arms), mean bias, precision (standard deviation of mean bias, SD) and/or the limits of agreement for the SpO2 and SaO2 comparison, with mean bias as the review’s primary outcome (Additional file 1: Table S1). The Arms combines mean bias and precision in a single measure [10]. Arms, though being given a primacy in relation to other outcomes in the British Standards Institution standards, has no intuitive relevance to clinical decision-making. For example, an Arms value of 4% means that about 68% of pulse oximetry readings would be within ± 4% of the gold standard CO-oximetry reading. To aid clinical relevance and interpretation, we use mean bias as the review’s primary outcome. The mean difference between ‘true’ blood oxygen saturation levels and pulse oximetry readings can more clearly indicate how clinical decisions referring to threshold values (e.g. admission to hospital with a pulse oximetry reading of 92% or lower) could be impacted by bias.

We identified English language reports of relevant studies through searching (1) Ovid MEDLINE, Ovid Embase and EBSCO CINAHL Plus between the inception of databases and 5 August 2021, updated to 14 December 2021, using the same search strategies (Additional file 2: Box S1); (2) the ClinicalTrials.gov and World Health Organization International Clinical Trials Registry Platform for ongoing studies in August 2021; and (3) the reference lists of retrieved included studies, relevant systematic reviews, and guideline reports. We also contacted authors of key abstracts to request further information about their studies.

Two reviewers (CS and MG, or JH, OH) independently assessed titles and abstracts of the search results for relevance and the full texts of all potentially eligible studies for inclusion, with disagreements resolved through discussion or involving a third reviewer (GN) where necessary.

Data analysis

One reviewer (CS, or OH or JH) independently extracted data from included studies for items in Additional file 3: Box S2 and assessed the risk of bias for the included studies using an adapted QUADAS-2 (Additional file 4: Box S3) [12], all checked by another reviewer (JH, MG, OH, GN). We resolved any disagreements through discussion. Where necessary, we contacted study authors to clarify methods and data, and transformed data into a format needed for analyses, e.g. from reported 95% limits of agreement to standard deviation (SD) [13].

We pre-specified separate analysis of studies reporting level of skin pigmentation and ethnicity. When pooling data for mean bias and its SD across studies, we used the correlated hierarchical effects model with small-sample corrections under the robust variance estimation (RVE) framework. The approach enabled us to include single-measure design study data, together with multiple dependent effect size estimates of a repeated-measures design study in meta-analysis even when the dependence structure is unknown [14, 15]. We used Tau2, I2, the Q statistic and the related χ2 test to fully assess heterogeneity in meta-analysis. There is no established approach to pooling data for Arms and 95% limits of agreement across studies directly. We used the pooled mean bias and the pooled SDs produced by related meta-analyses and followed the British Standards Institution methods to calculate the Arms[10] and Bland and Altman’s methods to calculate the population 95% limits of agreement [16]. Using R (version 4.1.2), we performed RVE meta-analyses and produced forest plots as described in Additional file 5: Box S4. When meta-analysis was not appropriate, we synthesised relevant evidence following the Synthesis Without Meta-analysis in systematic reviews (SWiM) guidance [17].

One reviewer (CS) assessed the certainty of evidence on mean bias using the GRADE approach developed for the test accuracy topic, checked by another reviewer (GN) [18, 19]. Using this approach the certainty of mean bias findings could be assessed as at high, moderate, low or very low certainty. In interpreting review findings, we used the British Standards Institution-recommended thresholds described in the Additional file 1: Table S1 to judge the accuracy of pulse oximetry [10]. With the mean bias as the primary outcome, any pooled mean bias of > 0% would indicate overestimation with pulse oximetry and a risk of missing the detection of hypoxemia whilst a mean bias of < 0% (indicating underestimation) risks over-treatment. Given pulse oximeter devices commonly present integers in percentage, we rounded pooled estimates to be integers when interpreting the related findings such as rounding mean bias values within ± 0.50 to 0%.

We analysed data on pulse oximeters of different brands/manufacturers separately where possible. We undertook pre-planned sensitivity analyses through (1) excluding studies where all participants had similar skin pigmentation or the same ethnicity, (2) excluding studies with no data available for meta-analysis without transformation, and (3) excluding studies at high overall risk of bias. We undertook post hoc sensitivity analysis by excluding studies that used descriptors of ethnicity to indicate levels of skin pigmentation. We assessed publication bias following a qualitative approach given funnel plots or Egger’s tests were not considered appropriate for this review [20].

Results

Study selection and characteristics

We assessed titles and abstracts of 9920 records identified from electronic databases, 152 from trial registries, and 14 records identified by screening the reference lists of relevant publications. Of these records, we identified 33 publications of 32 studies—published between 1985 and 2021—as eligible for inclusion (Fig. 1) [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53]. We identified one ongoing study from electronic searches [54]. We received raw or study-level summary data for two studies directly from study authors [29, 41].

Fig. 1
figure 1

The study selection flowchart. This flowchart shows the number of records and studies at each stage of the study selection process

Table 1 summarises included studies, with more details in the Additional file 6: Table S2. The 32 studies (6505 participants) reported SpO2-SaO2 comparison evaluations of 54 different pulse oximeters (26 manufacturers) cf. standard SaO2 (Additional file 7: Table S3). Of the 32 studies, 16 (50%) reported the ranges of SaO2 over which the accuracy of pulse oximeters was evaluated: the minimum values of these ranges had a median of 76% whilst the maximum values had a median of 100%. Of the 16 studies, four had SaO2 ranges that were in line with the recommended range of 70 to 100%; eight had narrower ranges such as 80 or 90 to 100%; and four had wider ranges such as 50 or 60 to 100%.

Table 1 Summary characteristics of the included studies

Assessment results of risk of bias and applicability

Using QUADAS-2, we considered 14/32 studies (43.75%) to be at unclear risk of bias for all four domains or high risk of bias for at least one domain, and the remaining 18 (56.25%) to be at low risk of bias for at least one of the four domains (Fig. 2).

Fig. 2
figure 2

Risk of bias assessment results. The left section of this figure shows risk of bias judgements for each domain of the QUADAS-2 tool for each study and the right section shows applicability judgements for each concern domain of the QUADAS-2 tool for each study. Please see Additional file 4: Box S3 for all signalling questions used in the QUADAS-2 assessment and further considerations

Key issues that led to downgrading for risk of bias were as follows: (1) for the patient selection domain, where specific sub-populations were inappropriately excluded from a study, or where selection criteria were unclear or not stated (19 studies); (2) for index test and reference standard domains, where there was no blinding information for either pulse oximetry SpO2 measurements (20 studies) or CO-oximeter SaO2 readings (30 studies); and (3) for the flow and timing domain, where the time intervals between SpO2 readings and the arterial blood sampling for SaO2 measurement were too long or participants were excluded from the analysis without rationale (2 studies).

We judged the applicability concern as high for one study, moderate for 13 studies, and low in terms of all three applicability considerations for the remaining 18 studies. Applicability concerns largely resulted from the lack of detail about the pulse oximeters being evaluated, CO-oximeter devices used, and/or arterial blood sampling procedures, meaning the study would be hard to reproduce.

Pulse oximetry accuracy by levels of skin pigmentation

Fifteen of the 32 studies (1800 participants) reported by level of skin pigmentation [22, 24, 25, 27,28,29,30,31,32,33,34, 41,42,43,44, 53]. Eight of these studies (1297 participants) had available data and were included in the meta-analyses: [22, 24, 25, 27, 29, 30, 41, 42, 53] Additional file 8: Table S4 presents the mapping of originally reported terms of skin pigmentation into ‘low’, ‘medium’ or ‘high’ pigmentation categories. The remaining seven studies (503 participants) were excluded from meta-analysis due to lack of mean bias data by levels of skin pigmentation (Additional file 9: Table S5). Table 2 presents pooled accuracy data. Further details and GRADE assessment results are in Additional files 10, 11, 12 and 13: Figures S1-S3 and Table S6.

Table 2 Result summaries of meta-analysis for levels of skin pigmentation and ethnic groups

Hospital-based pulse oximetry probably overestimates oxygen saturation for people with high levels of skin pigmentation compared with standard SaO2 (8 studies, 24 comparisons, 3270 SpO2-SaO2 pairs from 221 participants): pooled mean bias 1.11% (95% CI 0.29 to 1.93%), moderate-certainty evidence. This means that, on average, pulse oximetry probably overestimates blood oxygen saturation by approximately 1%, but overestimation may be as low as 0.29% or as high as 2%. The evidence for people with medium skin pigmentation is uncertain (very low certainty evidence). The evidence for people with low levels of skin pigmentation does not suggest clinically important systematic bias (pooled mean bias -0.35, 95% CI − 1.36 to 0.67), but the finding is of low certainty. For all the levels of skin pigmentation, the Arms values are around 2% or lower (95% CI non-estimable), and the pooled SD values are around 1.50% on average (Table 2). This means that, for people with any level of skin pigmentation, about 68% of their pulse oximetry readings would be within ± 2% of the CO-oximetry readings, with one SD indicating a variation around the mean bias of minus 1.50 to plus 1.50%. We tested the sensitivity of the findings: Arms and SD values were generally consistent but there was increased uncertainty for mean bias findings. Additional file 14: Figure S4 presents evidence for different types of pulse oximeter: overall, most devices slightly overestimated oxygen saturation in people with high levels of skin pigmentation, with imprecision around estimates.

Pulse oximetry accuracy by ethnicity

Twenty-two of the 32 studies (4910 participants) described participants by ethnicity rather than level of skin pigmentation [21, 23, 24, 26, 28, 29, 31, 35,36,37,38,39,40, 45,46,47,48,49,50,51,52,53]. We included 14 studies (3510 participants) in meta-analyses [21, 23, 24, 29, 35,36,37, 39, 40, 49,50,51,52,53]; the remaining eight (1400 participants) did not contribute to meta-analysis (Additional file 15: Table S7). Pooled data are shown in Table 2 (further data are reported in Additional files 16, 17 and 18: Figures S5-S7, and Additional file 13: Table S6). Oxygen saturation measured for people described in study reports as Black or African American may be overestimated using hospital pulse oximetry compared with standard SaO2 readings: mean bias 1.52% (95% CI 0.95 to 2.09%), low-certainty evidence. The 95% confidence interval of this estimate ranges between an overestimation of 1 and 2%. The evidence for people described in studies as Asian, Hispanic or of mixed ethnicity does not indicate a clinically important systematic bias (mean bias 0.31%, 0.09 to 0.54%), but it is of low certainty. The evidence is uncertain for groups described in papers as White/Caucasian, meaning further research is likely to alter findings (very low certainty evidence). The Arms values are around 2% or lower (95% CI non-estimable) for all these subgroups, and the pooled SD values are around 1.50% on average (Table 2). We tested the sensitivity of the findings: Arms and SD values were generally consistent but there was increased uncertainty for mean bias findings. Additional file 19: Figure S8 presents evidence for each type of pulse oximeter evaluated: overall, most devices overestimated oxygen saturation in people described as Black or African American.

Discussion

Summary of findings

This review suggests that for people with high levels of skin pigmentation and people described in studies as Black or African American, oxygen saturation may be overestimated by pulse oximetry in hospital compared with gold standard SaO2. Pulse oximetry for people with other levels of skin pigmentation is less likely to be overestimated but the evidence is uncertain. These results are for clinician-measured oximetry in controlled clinical environments and do not necessarily reflect the measurement bias of home pulse oximetry by patients or carers. The low certainty for much of the data presented means that further research could overturn these conclusions. For all the subgroups of populations evaluated, whilst the degree of mean bias is small or negligible over the ranges of SaO2 reported (median minimum value of 76% and maximum value of 100%), pulse oximetry readings appear unacceptably imprecise (pooled SDs > the recommended criterion of 1%) [10, 55]. Nevertheless, when the extents of measurement bias and precision are considered jointly in Arms, pulse oximetry measurements for all the subgroups appear acceptably accurate (with Arms < the internationally recommended threshold of 4% [10, 55], or even the more conservative threshold of 3% in the US FDA guidance) [56].

Evidence in context

Our findings have several implications. Even though our estimates suggest that the internationally recommended thresholds were met in terms of measurement bias [10, 55], the relatively small amount of mean bias identified could impact on clinical decision-making at threshold values for diagnosis of hypoxaemia. Overestimation could lead to clinically important hypoxaemia remaining undetected and untreated. Underestimated SpO2 readings could also be harmful, resulting in unnecessary treatment with oxygen (and the risk of hyperoxaemia) and wider impacts such as delayed hospital discharge. Two recent diagnostic studies provide evidence on clinical implications resulting from the bias in pulse oximetry for blood oxygen saturation levels [7, 57]. In these studies, people described as Black had a higher risk of ‘occult hypoxemia that was not detected by pulse oximetry’ compared with those described as White [7]. This may suggest that even small amounts of mean bias, when at the margins of diagnostic thresholds, could have an impact on diagnostic accuracy. Further understanding of these impacts could be explored via evidence synthesis of diagnostic accuracy (classification) studies to assess the clinical implications of measurement bias in relation to clinical decision-making thresholds. The amount of bias identified for people from ethnic groups such as Asian, Hispanic or mixed ethnicity appears negligible, although the certainty of the evidence is low. In terms of COVID-19 management, the 2021 WHO living guidance recommends using pulse oximetry monitoring at home as part of care package for symptomatic people in community settings but does not note the potential impact of level of skin pigmentation [3]. Our findings indicate that sub-population specific recommendations would be needed for future updates.

It is interesting to note that, despite clinically important mean bias and unacceptably large imprecision identified, the calculated Arms values are generally around 2% or less over the ranges of SaO2 reported, that is, the Arms values are far below the Arms threshold of 4% required by the current international and UK standards [10, 55]. The current standards did not point out evidence sources used to underpin such requirements, but the specified values of mean bias (SD for precision) (2% (± 1%)) are consistent with the outdated 1995 Jensen review results [8]. These suggested values are even larger than the average values of our estimates (1% (± 1.5%)) in people with darker skin. Given these, currently recommended thresholds may need re-evaluation, and use of the more conservative criterion of 3% applied by the US FDA guidance may have merit [56].

Findings also support calls for better calibrating algorithms used in oximeter device software to inherently address possible measurement bias. Manufacturers should ensure, and demonstrate, that their pulse oximeters are accurate for all levels of skin pigmentation. This review results offer some insights into the possible amount of bias to consider. This however may be complex, and future work could consider a more immediate approach to clinical pathways that recognise the potential impact of small overestimations in people with darker skin.

The evidence identified has limitations in its completeness and applicability. Firstly, pulse oximetry is widely used in clinical practice and promoted for home use during the COVID-19 pandemic [2]. Many factors could theoretically affect pulse oximetry accuracy in the real world such as types of pulse oximeter probe, comorbidities, movement, age of the patient and the range of SaO2 levels [8]. However, most included studies in this review were based in hospital settings and had limited information whether the pulse oximeters evaluated were appropriate for home self-monitoring. This review only addresses skin pigmentation and ethnicity. Therefore, little is known for the case of pulse oximetry undertaken by untrained people at home where other factors such as movement need to be considered. Secondly, pulse oximeters have been developed and upgraded since 1970s. The included studies were published between 1985 and 2021 and some of the older studies may have used discontinued devices. Nevertheless, the overestimation of oxygen saturation for darker skin appears consistent in general across most devices evaluated. To keep the completeness of evidence in this review, we included study data for all pulse oximeter devices included.

Strengths and limitations of this review

Before this review, our scoping exercise using a simple search of Ovid Medline with ‘pulse oximetry’ terms identified one systematic review in this area published by Jensen and colleagues in 1995 [8]. It evaluated the overall accuracy of pulse oximetry and explored possible factors that affected the accuracy. It included only one study with data on the impact of skin pigmentation, and findings were inconclusive. The comparators used for pulse oximetry measures in the Jensen review are reference measures of SaO2 such as PaO2, calculated SaO2 and %O2Hb that are now considered incorrect or outdated. We also identified a recent rapid review by the NHS Race and Health Observatory that had an unclear methodology [6]. In this rapid review, a summary of narrative findings suggested the overestimation of blood oxygen saturation levels in people with darker skin. Of the nine studies identified in this rapid review, seven had appropriate SpO2–SaO2 comparison data but the other two used inappropriate designs for the question being addressed.

Following prespecified methods to minimise the risk of bias in the review process, this review has important strengths. Our search for research is comprehensive and identified more studies. We used the gold standard CO-oximetry as the comparator for pulse oximetry, and accuracy outcomes as recommended in the British Standards Institution standards for pulse oximetry. We developed a correlated hierarchical effects model and used the novel RVE approaches to meta-analyse not only independent data (of 11 studies) but also data from studies (n = 21) with repeated-measures design [15]. This approach deals with correlations of multiple effect size estimates within a repeated-measures design study [14, 15].

This review has some limitations. Firstly, some included studies compared SpO2-SaO2 bias data between different subgroups of skin pigmentation or ethnicity and presented only tests of significance results, rather than SpO2 and SaO2 data per se at each subgroup level. At least two studies used diagnostic accuracy design and only presented proportions of participants with specific ranges of SpO2 in relation to specific SaO2 values, again rather than SpO2 and SaO2 data [7, 57]. We contacted authors of these studies to request relevant data and received data for two studies [29, 41]. If more data were received, then the review results could change.

Secondly, we are aware of the difference between the concepts of race and ethnicity. For simplicity, we chose to use the term of ‘ethnicity’ throughout this review given race and ethnicity are context/country-specific concepts and there is no globally accepted classification approach to distinguishing them [58]. If we had treated race and ethnicity data separately, the evidence base would change; however, we would not expect the overall conclusion to change. We also acknowledge the limitation of using scales like the Fitzpatrick scale to measure levels of skin pigmentation [59]. Such scales are criticised as being too blunt—an issue that impacts on the findings of this review and should be considered in future research.

Thirdly, we did not consider the differences between specific pulse oximeter devices, the differences between children and adults and their health conditions or the difference between skin pigmentation measurement methods. Regarding pulse oximeters evaluated, there may be differences between devices for the use of health professionals in hospitals and those for home self-monitoring. Because of these, meta-analyses in this review demonstrated between-studies heterogeneity (Table 2). However, we found, across devices evaluated and types of participants, included studies were largely consistent in suggesting oxygen saturation overestimation of using pulse oximetry. We therefore chose to pool study data, without undertaking further subgroups for these differences.

Fourthly, we only searched for English language peer-reviewed publications, without considering preprints. However, there are probably no major differences between summary treatment effects in English-language restricted meta-analyses and other language-inclusive meta-analyses [60], and the exclusion of non-English language publications from systematic reviews had no impacts on overall findings [61]. We considered the possible publication bias in assessing the certainty of evidence using GRADE approach.

Finally, no available approach to risk of bias and GRADE assessment is specific to the topic of this review. We were only able to use the relevant approaches developed for the test accuracy topic, and the GRADE approach used was only applicable to assess the certainty of evidence for mean bias, rather than precision, Arms and limits of agreement.

Conclusions

Pulse oximetry may overestimate blood oxygen saturation levels for people with dark skin in hospital settings compared with gold standard SaO2 measures. The evidence for the measurement bias identified for other levels of skin pigmentation or ethnicities is more uncertain. Whilst the extent of measurement bias and overall accuracy meet current international thresholds, the variation of pulse oximetry measurements appears unacceptably wide. Such a small overestimation may be crucial for some patients: particularly at the threshold that informs clinical decision-making.

Availability of data and materials

All relevant data are within the manuscript and its Additional files. No additional data available.

Abbreviations

A rms :

Accuracy root-mean-square

FDA:

Food and Drug Administration

%O2Hb or FO2Hb:

Fractional saturation

PaO2 :

Arterial oxygen pressure

PRISMA:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

RVE:

Robust variance estimation

SaO2 :

Gold standard measure of blood oxygen saturation levels

SD:

Standard deviation

SpO2 :

Blood oxygen saturation levels measured using pulse oximetry

SWiM:

Synthesis Without Meta-analysis in systematic reviews

References

  1. Swigris JJ, Zhou X, Wamboldt FS, Du Bois R, Keith R, Fischer A, et al. Exercise peripheral oxygen saturation (SpO2) accurately reflects arterial oxygen saturation (SaO2) and predicts mortality in systemic sclerosis. Thorax. 2009;64(7):626–30.

    CAS  PubMed  Article  Google Scholar 

  2. NHS England. COVID Oximetry@home. 2022. https://www.england.nhs.uk/nhs-at-home/covid-oximetry-at-home/. Accessed 21 Apr 2022.

  3. World Health Organization (WHO). COVID-19 clinical management: living guideline (updated 25.1.21). 2021. https://apps.who.int/iris/handle/10665/338882.  Accessed 21 Apr 2022.

  4. The Royal Australian College of General Practitioners. Managing COVID-19 at home with assistance from your general practice: a guide, action plan and symptom diary for patients. East Melbourne: RACGP; 2021.

    Google Scholar 

  5. Luks AM, Swenson ER. Pulse oximetry for monitoring patients with COVID-19 at home: potential pitfalls and practical guidance. Ann Am Thorac Soc. 2020;17:1040–6.

    PubMed  PubMed Central  Article  Google Scholar 

  6. NHS Race and Health Observatory. Pulse oximetry and racial bias: recommendations for national healthcare, regulatory and research bodies. 2021. https://www.nhsrho.org/wp-content/uploads/2021/03/Pulse-oximetry-racial-bias-report.pdf. Accessed 21 Apr 2022.

    Google Scholar 

  7. Sjoding MW, Dickson RP, Iwashyna TJ, Gay SE, Valley TS. Racial bias in pulse oximetry measurement. N Engl J Med. 2020;383(25):2477–8.

    PubMed  PubMed Central  Article  Google Scholar 

  8. Jensen LA, Onyskiw JE, Prasad NG. Meta-analysis of arterial oxygen saturation monitoring by pulse oximetry in adults. Heart Lung. 1998;27(6):387–408.

    CAS  PubMed  Article  Google Scholar 

  9. Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160.

    PubMed  PubMed Central  Article  Google Scholar 

  10. British Standards Institution (BSI). BS EN ISO 80601–2–61:2019: Medical electrical equipment – Part 2–61: Particular requirements for basic safety and essential performance of pulse oximeter equipment (ISO 80601–2–61:2017, Corrected version 2018–02). London: British Standards Institution; 2019.

    Google Scholar 

  11. Toffaletti J, Zijlstra WG. Misconceptions in reporting oxygen saturation. Anesth Analg. 2007;105(6):S5-9.

    PubMed  Article  Google Scholar 

  12. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

    PubMed  Article  Google Scholar 

  13. Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14(1):1–3.

    Article  Google Scholar 

  14. Hedges LV, Tipton E, Johnson MC. Robust variance estimation in meta-regression with dependent effect size estimates. Res Synth Methods. 2010;1(1):39–65.

    PubMed  Article  Google Scholar 

  15. Pustejovsky JE, Tipton E. Meta-analysis with robust variance estimation: Expanding the range of working models. Prev Sci. 2021;7:1–4.

    Google Scholar 

  16. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10.

    CAS  PubMed  Article  Google Scholar 

  17. Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (SwiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890.

    PubMed  PubMed Central  Article  Google Scholar 

  18. Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 1. Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol. 2020;122:129–41.

    PubMed  Article  Google Scholar 

  19. Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol. 2020;122:142–52.

    PubMed  Article  Google Scholar 

  20. Wolfgang (https://stats.stackexchange.com/users/1934/wolfgang), Metafor package: bias and sensitivity diagnostics. 7 June 2015. https://stats.stackexchange.com/q/155875. Accessed 5 Jan 2022.

  21. Abrams GA, Sanders MK, Fallon MB. Utility of pulse oximetry in the detection of arterial hypoxemia in liver transplant candidates. Liver Transpl. 2002;8(4):391–6.

    PubMed  Article  Google Scholar 

  22. Adler JN, Hughes LA, Vivilecchia R, Camargo CA Jr. Effect of skin pigmentation on pulse oximetry accuracy in the emergency department. Acad Emerg Med. 1998;5(10):965–70.

    CAS  PubMed  Article  Google Scholar 

  23. Avant MG, Lowe N, Torres A Jr. Comparison of accuracy and signal consistency of two reusable pulse oximeter probes in critically ill children. Respir Care. 1997;42(7):698–704.

    Google Scholar 

  24. Bickler PE, Feiner JR, Severinghaus JW. Effects of skin pigmentation on pulse oximeter accuracy at low saturation. Anesthesiology. 2005;102(4):715–9.

    PubMed  Article  Google Scholar 

  25. Bothma PA, Joynt GM, Upman J, Hon H, Mathala B, Scribante J, et al. Accuracy of pulse oximetry in pigmented patients. S Afr Med J. 1996;86(5):8914569.

    Google Scholar 

  26. [preprint] Brooks JC, Raman S, Gibbons K, et al. Transcutaneous oxygen saturation accuracy in critically ill children. Research Square 2020. DOI: https://doi.org/10.21203/rs.2.21938/v1.

  27. Ebmeier SJ, Barker M, Bacon M, Beasley RC, Bellomo R, Chong CK, et al. A two centre observational study of simultaneous pulse oximetry and arterial oxygen saturation recordings in intensive care unit patients. Anaesth Intensive Care. 2018;46(3):297–303.

    CAS  PubMed  Article  Google Scholar 

  28. Escourrou PJ, Delaperche MF, Visseaux A. Reliability of pulse oximetry during exercise in pulmonary patients. Chest. 1990;97(3):635–8.

    CAS  PubMed  Article  Google Scholar 

  29. Feiner JR, Severinghaus JW, Bickler PE. Dark skin decreases the accuracy of pulse oximeters at low oxygen saturation: the effects of oximeter probe type and gender. Anesth Analg. 2007;105(6):S18-23.

    PubMed  Article  Google Scholar 

  30. Foglia E, Whyte R, Chaudhary A, Mott A, Chen J, Propert K, et al. Accuracy and precision of pulse oximetry in hypoxemic infants. J Pediatr. 2017;182:375–7.

    PubMed  Article  Google Scholar 

  31. Gabrielczyk MR, Buist RJ. Pulse oximetry and postoperative hypothermia: an evaluation of the Nellcor N-100 in a cardiac surgical intensive care unit. Anaesthesia. 1988;43(5):402–4.

    CAS  PubMed  Article  Google Scholar 

  32. Harris BU, Char DS, Feinstein JA, Verma A, Shiboski SC, Ramamoorthy C. Accuracy of pulse oximeters intended for hypoxemic pediatric patients. Pediatr Crit Care Med. 2016;17:315–20.

    PubMed  Article  Google Scholar 

  33. Harris BU, Stewart S, Verma A, Hoen H, Stein ML, Wright G, et al. Accuracy of a portable pulse oximeter in monitoring hypoxemic infants with cyanotic heart disease. Cardiol Young. 2019;29(8):1025–9.

    PubMed  Article  Google Scholar 

  34. Harskamp R, Bekker L, Himmelreich J, De Clercq L, Karregat EP, Sleeswijk ME, et al. Performance of popular pulse oximeters compared with simultaneous arterial oxygen saturation or clinical-grade pulse oximetry: a cross-sectional validation study in intensive care patients. BMJ Open Respir Res. 2021;8(1):e000939.

    PubMed  Article  Google Scholar 

  35. Hinkelbein J, Genzwuerker HV, Sogl R, Fiedler F. Effect of nail polish on oxygen saturation determined by pulse oximetry in critically ill patients. Resuscitation. 2007;72(1):82–91.

    PubMed  Article  Google Scholar 

  36. Hinkelbein J, Koehler H, Genzwuerker HV, Fiedler F. Artificial acrylic finger nails may alter pulse oximetry measurement. Resuscitation. 2007;74(1):75–82.

    PubMed  Article  Google Scholar 

  37. Jubran A, Tobin MJ. Reliability of pulse oximetry in titrating supplemental oxygen therapy in ventilator-dependent patients. Chest. 1990;97:1420–5.

    CAS  PubMed  Article  Google Scholar 

  38. Lee KH, Hui KP, Tan WC, Lim TK. Factors influencing pulse oximetry as compared to functional arterial saturation in multi-ethnic Singapore. Singapore Med J. 1993;34:385–7.

    CAS  PubMed  Google Scholar 

  39. McGovern JP, Sasse SA, Stansbury DW, Causing LA, Light RW. Comparison of oxygen saturation by pulse oximetry and co-oximetry during exercise testing in patients with COPD. Chest. 1996;109(5):1151–5.

    CAS  PubMed  Article  Google Scholar 

  40. Muñoz X, Torres F, Sampol G, Rios J, Martí S, Escrich E. Accuracy and reliability of pulse oximetry at different arterial carbon dioxide pressure levels. Eur Respir J. 2008;32(4):1053–9.

    PubMed  Article  Google Scholar 

  41. Pilcher J, Ploen L, McKinstry S, Bardsley G, Chien J, Howard L, et al. A multicentre prospective observational study comparing arterial blood gas values to those obtained by pulse oximeters used in adult patients attending Australian and New Zealand hospitals. BMC Pulm Med. 2020;20(1):1–9.

    Article  CAS  Google Scholar 

  42. Ploen L, Pilcher J, Beckert L, Swanney M, Beasley R. An investigation into the bias of pulse oximeters. Respirology. 2016;21(Suppl 2):6.

    Google Scholar 

  43. Ries AL, Farrow JT, Clausen JL. Accuracy of two ear oximeters at rest and during exercise in pulmonary patients. Am Rev Respir Dis. 1985;132(3):685–9.

    CAS  PubMed  Google Scholar 

  44. Ries AL, Prewitt LM, Johnson JJ. Skin color and ear oximetry. Chest. 1989;96(2):287–90.

    CAS  PubMed  Article  Google Scholar 

  45. Ross PA, Newth CJ, Khemani RG. Accuracy of pulse oximetry in children. Pediatrics. 2014;133(1):22–9.

    PubMed  Article  Google Scholar 

  46. Schallom M, Prentice D, Sona C, Arroyo C, Mazuski J. Comparison of nasal and forehead oximetry accuracy and pressure injury in critically ill patients. Heart Lung. 2018;47(2):93–9.

    PubMed  Article  Google Scholar 

  47. Smyth RJ, D’Urzo AD, Slutsky AS, Galko BM, Rebuck AS. Ear oximetry during combined hypoxia and exercise. J Appl Physiol. 1986;60(2):716–9.

    CAS  PubMed  Article  Google Scholar 

  48. Stewart KG, Rowbottom SJ. Inaccuracy of pulse oximetry in patients with severe tricuspid regurgitation. Anaesthesia. 1991;46(8):668–70.

    CAS  PubMed  Article  Google Scholar 

  49. Thrush D, Hodges MR. Accuracy of pulse oximetry during hypoxemia. South Med J. 1994;87(4):518–21.

    CAS  PubMed  Article  Google Scholar 

  50. Valbuena VS, Barbaro RP, Claar D, Valley TS, Dickson RP, Gay SE, et al. Racial bias in pulse oximetry measurement among patients about to undergo extracorporeal membrane oxygenation in 2019–2020: a retrospective cohort study. Chest. 2022;161(4):971–8.

    PubMed  Article  Google Scholar 

  51. Vesoulis Z, Tims A, Lodhi H, Lalos N, Whitehead H. Racial discrepancy in pulse oximeter accuracy in preterm infants. J Perinatol. 2021;42(1):79–85.

    PubMed  PubMed Central  Article  Google Scholar 

  52. Wiles MD, El-Nayal A, Elton G, Malaj M, Winterbottom J, Gillies C, et al. The effect of patient ethnicity on the accuracy of peripheral pulse oximetry in patients with COVID-19 pneumonitis: a single-centre, retrospective analysis. Anaesthesia. 2022;77(4):489–91.

    CAS  PubMed  Article  Google Scholar 

  53. Zeballos RJ, Weisman IM. Reliability of noninvasive oximetry in black subjects during exercise and hypoxia. Am Rev Respir Dis. 1991;144:1240–4.

    CAS  PubMed  Article  Google Scholar 

  54. Blanchet MA, Mercier G, Bouchard PA, Rousseau E, Lellouche F. Accuracy of pulse oximetry (SpO2) with different oximeters. Oxygap study. Intensive Care Med Exp. 2021;9(Suppl 1):51.

    Google Scholar 

  55. International Organization for Standardization. Medical electrical equipment — Part 2–61: Particular requirements for basic safety and essential performance of pulse oximeter equipment (ISO 80601–2–61:2017). https://committee.iso.org/standard/67963.html. Accessed 21 Apr 2022.

  56. Food and Drug Administration. Pulse oximeters – premarket notification submissions: guidance for industry and food and drug administration staff. 2013. (https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM081352pdf. Accessed 10 Feb 2022).

    Google Scholar 

  57. Wong AK, Charpignon M, Kim H, Josef C, de Hond AA, Fojas JJ, et al. Analysis of discrepancies between pulse oximetry and arterial oxygen saturation measurements by race and ethnicity and association with organ dysfunction and mortality. JAMA Netw Open. 2021;4(11):e2131674.

    PubMed  PubMed Central  Article  Google Scholar 

  58. Silver SE. Skin color is not the same thing as race. Arch Dermatol. 2004;140(3):361.

    PubMed  Article  Google Scholar 

  59. Ware OR, Dawson JE, Shinohara MM, Taylor SC. Racial limitations of Fitzpatrick skin type. Cutis. 2020;105(2):77–80.

    PubMed  Google Scholar 

  60. Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M, et al. The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int J Technol Assess Health Care. 2012;28(2):138–44.

    PubMed  Article  Google Scholar 

  61. Nussbaumer-Streit B, Klerings I, Dobrescu AI, Persad E, Stevens A, Garritty C, et al. Excluding non-English publications from evidence-syntheses did not change conclusions: a meta-epidemiological study. J Clin Epidemiol. 2020;118:42–54.

    CAS  PubMed  Article  Google Scholar 

Download references

Acknowledgements

We thank the information specialist Catherine Harris for helping design search strategies and run electronic database searches in August 2021. We also thank Dr John Feiner and Dr Paul Young for sharing their study data and others who responded to our queries for clarifying study methods in deciding the eligibility of their studies.

Funding

The review is funded by the National Institute for Health Research Applied Research Collaboration (NIHR ARC) Greater Manchester and NIHR ARC North West Coast (ARC NWC). It is also supported by the National Institute for Health Research and the Accelerated Access Collaborative at NHS England and NHS Improvement. The views expressed in this article are those of the authors and not necessarily those of the National Institute for Health Research or the Department of Health and Social Care or NHS England and NHS Improvement.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualisation: CS, MG, JD, AC, CLW, GG, CEL, NC. Data curation: CS. Investigation: CS, MG, JH, GN, OH. Formal analysis: CS, JD, AH. Funding acquisition: JD, CLW, NC. Methodology: CS, JD, JH, GN, AC, AH. Project administration: CS, JD, JH. Resources: JD, AC, GG. Software: CS. Supervision: JD, AC, CLW, NC. Validation: AH, JD, GN. Visualisation: CS. Writing—original draft: CS, JD, NC. Writing—review and editing: CS, MG, JD, JH, GN, OH, AC, AH, CLW, GG, CEL, PD, NC. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Chunhu Shi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

NC reports research grants from the National Institute for Health Research and the Accelerated Access Collaborative at NHS England and NHS Improvement and payments were made to the University of Manchester. PD is the National Deputy Medical Director, NIHR Clinical Research Network Coordinating Centre, UK, developing and delivering clinical research in the subject area and contributing to developing and implementing NIHR’s Equality, Diversity and Inclusion strategy. All other authors have declared that no competing interests exist.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Measure, definition, and data formats to assess the accuracy of pulse oximetry compared with reference measures.

Additional file 2: Box S1.

The Ovid MEDLINE search strategy.

Additional file 3: Box S2.

Data items in the data extraction form.

Additional file 4: Box S3.

The QUADAS-2 tool used for assessing risk of bias and applicability with further explanations.

Additional file 5: Box S4.

Data synthesis methods and generic R codes used.

Additional file 6: Table S2.

Characteristics of the included studies.

Additional file 7: Table S3.

Types of pulse oximeters and CO-oximetry evaluated in the included studies.

Additional file 8: Table S4.

Mapping terms originally used for indicating skin pigmentation into low, medium or high level of skin pigmentation defined in the review for meta-analysis.

Additional file 9: Table S5.

Evidence from studies where skin pigmentation measures cannot be specified or grouped into low, medium, and/or high pigmentation.

Additional file 10: Figure S1.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of high (dark) skin pigmentation.

Additional file 11: Figure S2.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of medium skin pigmentation.

Additional file 12: Figure S3.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of low (light) skin pigmentation.

Additional file 13: Table S6.

Summary of findings table for the impact of skin pigmentation and ethnicity on the accuracy of pulse oximetry compared with CO-oximetry.

Additional file 14: Figure S4.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for levels of skin pigmentation by the different types of pulse oximeters.

Additional file 15: Table S7.

Evidence from studies that could not be included in quantitative data pooling for the ethnicity factor.

Additional file 16: Figure S5.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of Black/African American ethnic groups.

Additional file 17: Figure S6.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of non-Black, non-White ethnic groups.

Additional file 18: Figure S7.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for the subgroup of White/Caucasian ethnic groups.

Additional file 19: Figure S8.

Summary presentations of study sample sizes (n) and numbers of data pairs compared (N), accuracy root mean square (Arms), mean bias (SD) and limits of agreement (LoA) of pulse oximeters for ethnic groups by the different types of pulse oximeters.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shi, C., Goodall, M., Dumville, J. et al. The accuracy of pulse oximetry in measuring oxygen saturation by levels of skin pigmentation: a systematic review and meta-analysis. BMC Med 20, 267 (2022). https://doi.org/10.1186/s12916-022-02452-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12916-022-02452-8

Keywords

  • Pulse oximetry
  • Arterial blood oxygen saturation
  • Measurement bias
  • Skin pigmentation
  • Ethnicity
  • Systematic review