Performance in the MRCP(UK) Examination 2003–4: analysis of pass rates of UK graduates in relation to self-declared ethnicity and gender
© Dewhurst et al; licensee BioMed Central Ltd. 2007
Received: 01 November 2006
Accepted: 03 May 2007
Published: 03 May 2007
Male students and students from ethnic minorities have been reported to underperform in undergraduate medical examinations. We examined the effects of ethnicity and gender on pass rates in UK medical graduates sitting the Membership of the Royal Colleges of Physicians in the United Kingdom [MRCP(UK)] Examination in 2003–4.
Pass rates for each part of the examination were analysed for differences between graduate groupings based on self-declared ethnicity and gender.
All candidates declared their gender, and 84–90% declared their ethnicity. In all three parts of the examination, white candidates performed better than other ethnic groups (P < 0.001). In the MRCP(UK) Part 1 and Part 2 Written Examinations, there was no significant difference in pass rate between male and female graduates, nor was there any interaction between gender and ethnicity. In the Part 2 Clinical Examination (Practical Assessment of Clinical Examination Skills, PACES), women performed better than did men (P < 0.001). Non-white men performed more poorly than expected, relative to white men or non-white women. Analysis of individual station marks showed significant interaction between candidate and examiner ethnicity for performance on communication skills (P = 0.011), but not on clinical skills (P = 0.176). Analysis of overall average marks showed no interaction between candidate gender and the number of assessments made by female examiners (P = 0.151).
The cause of these differences is most likely to be multifactorial, but cannot be readily explained in terms of previous educational experience or differential performance on particular parts of the examination. Potential examiner prejudice, significant only in the cases where there were two non-white examiners and the candidate was non-white, might indicate different cultural interpretations of the judgements being made.
The Membership of the Royal Colleges of Physicians in the United Kingdom [MRCP(UK)] Examination is a three-part examination providing summative assessment of knowledge requirements and clinical skills necessary for trainee physicians before undertaking higher training in internal medicine and/or a medical specialty. The Part 1 and Part 2 Written Examinations are criterion-referenced, single-version, computer-marked papers. The Part 2 Clinical Examination (Practical Assessment of Clinical Skills; PACES) assesses trainees against an agreed standard of competence in all aspects of clinical consultation. It consists of 14 assessments by 10 examiners at five Stations: two communication stations [stations 2 (history-taking) and 4 (communication skills and ethics)] and three clinical skills stations [stations 1 (respiratory and abdominal systems), 3 (cardiovascular and central nervous systems) and 5 (skin, locomotor, endocrine system and eye)].
Ethnic minority and male students may underperform in undergraduate [1–4] and postgraduate medical examinations, particularly if they have graduated from non-UK medical schools [5, 6]. The aim of this study was to assess effects of ethnicity and gender for UK medical graduates on pass rates in the MRCP (UK) Examination sat in the UK in 2003–4. In the Part 2 Clinical Examination (PACES) we examined the potential for interaction between ethnicity and gender of examiners and candidates.
Candidates volunteered gender and ethnicity using 14 ethnic categories approved by the UK Commission for Racial Equality. Candidates who did not self-declare were subsequently invited to do so by letter. Ethnicity was grouped into eight categories: Afro-Caribbean (Black-African, Black-Caribbean, and Black-Other), Asian sub-continent (Indian, Pakistani, Bangladeshi and Asian-Other), Far East (Chinese/Chinese British and Malay), Middle Eastern (Arabic and Other Middle Eastern), Mixed, White, Other and Unknown (consisting of candidates who did not declare). Examiners declared gender and ethnicity using the same categories.
The results were analysed using SPSS software (version 13.0; SPSS Inc., Chicago, IL, USA). Analysis was performed using SPSS version 13.0. A Chi-squared test was initially employed to determine any overall differences between ethnic group categories. Logistic regression was used to test differences in pass rates by ethnic group and gender, and is reported for each part of the examination. ANOVA (repeated measures analysis of variance) was used to investigate differential performance across stations. The data were then analysed to identify any interaction of candidate and examiner ethnicity and candidate and examiner gender.
MRCP(UK) Part 1 Examination
Pass rates in the MRCP (UK) Examination by ethnic group
MRCP (UK) Part 1 Examination
MRCP (UK) Part 2 Written Examination
MRCP (UK) Part 2 Clinical Examination*
Overall pass rate
Overall pass rate
Overall pass rate
Similar analysis restricted to 3100 first-attempt candidates (Table 1), showed a difference in pass rate between the eight groups (χ2 = 57.39, df = 7, P < 0.001), a difference between the seven groups with known ethnicity (χ2 = 56.95, df = 6, P < 0.001), and a highly significant difference between white and combined other groups (χ2 = 53.15, P < 0.001), but no significant difference between the six non-white groups (χ2 = 3.91, df = 5, P = 0.55).
White candidates with an overall pass rate of 50.3% [95% confidence interval (CI) 48.6–52.0%] performed significantly better than did candidates from other groups (pass rate 37.9%; 95% CI 35.7–40.1%). There were no significant differences between other groups.
Data were then analysed by logistic regression, with passing or failing as the dependent variable. Predictor variables were gender (male versus female), attempt number (linear and quadratic effects), and ethnicity (white versus non-white). Preliminary analysis of all candidates who had declared their ethnicity showed that the quadratic effect of attempt was not significant (P = 0.18), and it was excluded from the model. There was a highly significant linear effect of attempt (b = -0.19, Wald χ2 = 82.39, P < 0.001), with an odds ratio of 0.8 (95% CI 0.79–0.86) for each additional attempt. There was no effect of gender (b = -0.052, Wald χ2 = 0.83, df = 1, P = 0.36). Ethnicity was highly significant (Wald χ2 = 58.70, df = 1, P < 0.001), with white candidates being 1.58 times (95% CI 1.41–1.78) more likely to pass.
MRCP (UK) Part 2 Written Examination
In total, 2718 graduates made 3238 attempts, 1548 (47.8%) by men. Of 2718 candidates, 2389 (87.9%) declared ethnic origin, i.e., the ethnicity of candidates at 2811 of 3238 (86.8%) attempts was known. Table 1 shows the pass rates in the eight groups. Differences between groups were highly significant (χ2 = 79.02, df = 7, P < 0.001). Excluding the group that had not declared ethnicity, differences were still significant (χ2 = 45.23, df = 6, P < 0.001), with the white group having the highest pass rate. Comparison of the white group with all other groups combined showed a highly significant difference (χ2 = 39.81, df = 1, P < 0.001). Comparison of the six non-white groups showed no significant differences between groups (χ2 = 4.43, df = 5, P = 0.49).
A similar set of analyses restricted to 2494 first-attempt candidates (Table 1), showed a difference in pass rate between the eight groups (χ2 = 45.91, df = 7, P < 0.001), a difference between the seven groups for whom ethnicity was known (χ2 = 43.47, df = 6, P < 0.001), and a highly significant difference between white and combined other groups (χ2 = 33.78, P < 0.001), but no significant difference between the five non-white groups (χ2 = 7.53, df = 5, P = 0.18).
White candidates performed significantly better (pass rate 83.1%, 95% CI 81.4–84.8%) than candidates from other groups (pass rate 72.8%, 95% CI 69.9–75.7%). There were no significant differences between other ethnic groups.
A preliminary logistic regression of candidates who had declared their ethnicity showed that the quadratic effect of attempt was not significant (P = 0.680), and it was excluded from the model. There was a highly significant linear effect of attempt (b = -0.456, Wald χ2 = 51.12, P < 0.001), with an odds ratio of 0.634 (95% CI 0.56–0.72) for each additional attempt. There was no effect of gender (b = -0.104, Wald χ2 = 1.160, 1, P = 0.28). The ethnicity effect was highly significant (Wald χ2 = 30.98, df = 1, P < 0.001), white candidates being 1.73 times (95% CI 1.43–2.1) more likely to pass after taking into account gender and attempt number.
MRCP(UK) Part 2 Clinical Examination (PACES)
In total, 2353 graduates made 3008 attempts, with 1541 (51.2%) made by men. Of 2353 candidates, 1988 (84.5%) declared their ethnic origin, i.e., the ethnicity of candidates at 2528 of 3008 (84.0%) attempts is known. Table 1 shows the pass rates in the eight groups. Differences between the groups were highly significant (χ2 = 82.32, df = 7, P < 0.001). Excluding the group that had not declared ethnicity, differences were still significant (χ2 = 69.16, df = 6, P < 0.001), with the white group having the highest pass rate. Comparison of the white group with all other groups combined showed a highly significant difference (χ2 = 61.89, df = 1, P < 0.001). Comparison of the five non-white groups showed no significant differences between the groups (χ2 = 6.31, df = 5, P = 0.28).
A similar analysis restricted to the 2140 first-attempt candidates (Table 1), showed a difference in pass rate between the eight groups (χ2 = 52.39, df = 7, P < 0.001), a difference between the seven groups for whom ethnicity was known (χ2 = 51.95, df = 6, P < 0.001), and a highly significant difference between white and combined other groups (χ2 = 45.40, P < 0.001), but no difference between the six non-white groups (χ2 = 5.61, df = 5, P = 0.35).
Overall, whites (pass rate of 75.5%; 95% CI 73.5–77.5%) performed significantly better than candidates from other groups (pass rate 60.3%; 95% CI 57.0–63.6%), and there were no significant differences between other ethnic groups.
Interaction of gender and ethnicity in the pass rates of candidates taking the MRCP (UK) Part 2 Clinical Examination (Practical Assessment of Clinical Examination Skills; PACES)
The ethnicity effect was also highly significant (b = 0.679, Wald χ2 = 53.97, df = 1, P < 0.001), with white candidates being 1.973 times (95% CI 1.65–2.37) more likely to pass after taking into account gender and attempt number. A separate analysis assessed the possibility of gender × ethnicity interaction, which was found to be significant (Wald χ2 = 5.51, P = 0.019). Non-white male trainees performed more poorly than expected, relative to white male trainees or non-white female trainees (Table 2).
Overall, the station × ethnicity interaction was almost significant (F(6,11190) = 1.94, P = 0.071), but there was no suggestion of station × gender or station × ethnicity × gender interaction (P = 0.908 and P = 0.540 respectively). Station × ethnicity interaction was explored in a series of subanalyses. Comparison of performance on clinical skills stations with communication stations showed significant station type × ethnicity interaction (F(1,1865) = 4.60, P = 0.032). Analysis of clinical skills assessments alone showed no evidence of any interaction of clinical skills with ethnicity (P = 0.442) or gender (P = 0.772). However, analysis of the two communication stations showed significant station × ethnicity interaction (F(1,1865) = 3.96, P = 0.047), with no evidence of gender × station or gender × ethnicity × station interactions (P = 812 and P = 0.403 respectively). Inspection of Figure 1 shows that non-whites underperformed on history-taking to a similar extent to their underperformance on clinical skills, but that they also performed disproportionately poorly at the communication skills and ethics station.
As performance in PACES could depend on not only the gender and ethnicity of candidates but also on the gender and ethnicity of examiners, this aspect was analysed. Ethnicity and gender of examiners was known in 97.7% and 100% of cases respectively. Candidates are allocated at random to examiners, analysis confirming no statistical association between gender or ethnicity of candidates and examiners.
In total, 1869 first attempt candidates received a total of 2666 assessments. There were 2289 (8.8%) assessments by female examiners, with candidates having a mean of 1.23 assessments. There were 3761 assessments (14.4%) by non-white examiners, with candidates having a mean of 2.01 assessments.
Statistical analysis is complicated as the 14 assessments for each candidate are not independent. The primary analysis therefore used multiple regression to assess whether there was interaction between candidate's ethnicity (or gender) and the linear trend of the number of assessments made by non-white (or female) examiners, after taking candidate ethnicity, candidate gender, and their interaction into account. The procedure is seen most readily in Figure 1, which analyses the overall average mark (1 = clear fail; 2 = fail; 3 = pass; 4 = clear pass) of candidates according to ethnicity and number of assessments by non-white examiners. The interaction between candidate ethnicity and examiner ethnicity was almost significant (F(1,1861) = 3.474, P = 0.063), suggesting that the fitted lines in Figure 1 are probably not parallel, and that the relative difference between white and non-white candidates diminishes as the number of assessments by non-white examiners increases.
Interaction of examiner and candidate gender was assessed by the statistical approach used for ethnicity. Analysis of overall average mark showed no interaction of candidate gender and the number of assessments made by female examiners (F(1,1861) = 2.068, P = 0.151). Analysis of the average mark on clinical skills stations showed no interaction between candidate gender and number of clinical skills assessments made by female examiners (F(1,1861) = 2.471, P = 0.116). Neither did average mark on communication stations show an interaction between candidate gender and the number of assessments made by female examiners (F(1,1861) = 0.183, P = 0.669).
Applications from non-white ethnic groups to UK medical school are increasing . Relatively poor performance by ethnic minority students has been reported in the year 3 objective structured clinical examinations (OSCE) [2, 4] and OSCE stations assessing communication skills in final examinations . McManus et al. identified poorer performance by ethnic minorities across multiple assessment modalities in final examinations, concluding that differences identified could not be explained by previous examination achievement, study habits, examination style or clinical experience . Male and female UK-educated Asian students, using English as their first language, performed less well than their white European peers in OSCE and written assessments . An Australian study also identified poorer outcomes in finals for Indian, Asian and Middle Eastern students compared with those from Australia, New Zealand, North America and Western Europe . Place of birth, schooling and preclinical undergraduate medical education could influence outcomes. At the time of data collection, we did not routinely collect data on place of birth or first language. However, as a result of updating the Colleges' policy on equality and diversity, we have recently expanded our database to include this.
Our study reveals that white candidates achieved the highest pass rates in all three parts of the MRCP(UK) Examination and it seems likely that trends already observed by others in undergraduate examinations continue through into the "high-stakes" postgraduate arena. The hypothesis that poorer achievement results from either overt or covert discrimination by examiners cannot be sustained for the MRCP(UK) Written Examinations, which are computer-marked multiple-choice papers.
One possible explanation may be that cultural differences in the perceived status of a medical career have resulted in non-white candidates making exceptional efforts to gain entrance into medical school – efforts that were unsustainable in the long term, resulting in regression to the mean. Another possibility is that for cultural reasons the best of the non-white graduates were attracted to specialties not requiring MRCP(UK), such as surgery or psychiatry, while medicine attracted the best of the white candidates. Further research looking at other postgraduate examinations would be needed to substantiate this.
Undergraduate examination success is more likely for female students , and although there were no overall gender differences in pass rates in the written examinations, women performed significantly better in PACES. In North American Clinical Skills Assessments, Rothman et al  found significant gender differences in 9 of 23 clinical skills stations; in 8, these differences favoured women, and similar differences have also been identified in the Educational Commission for Foreign Medical Graduates' Clinical Skills Assessments . In a communication skills OSCE-style assessment in general practice, women performed better,  which could be related to specific traits including "the ability to listen"  and a greater sense of "patient care values" . In addition, female practitioners may find it easier to develop co-operative approaches to doctor-patient interactions . Thus, it seems probable that in any postgraduate medical examination, female candidates will perform better at assessments involving consultation and communication.
Analysis of overall average marks showed no interaction between candidate gender and the number of assessments made by female examiners, in keeping with the analysis by Ringdahl et al, which failed to demonstrate gender bias from senior residents and faculty members in rating family-practice interns .
Although female candidates performed better on PACES as a whole, there was no evidence that they performed particularly well on communication; rather they performed better to an equal extent on all stations. Likewise, non-white candidates performed relatively poorly on both examination skills and communication, with the sole exception that they performed particularly poorly on the communication skills and ethics station. This differential performance between ethnic minority UK graduates and white UK graduates has also been identified in PACES revision courses .
Performance of non-white male trainees was particularly poor across all sections of the examination. This cannot be explained readily in terms of generally poorer communicative ability, as their relative performance on the history-taking station was equivalent to that in clinical skills stations. As all candidates in this study graduated in the UK, the command and comprehension of English should not be a factor. The relative underperformance on the communication skills and ethics station may represent, however, a specific problem of cross-cultural interpretation or understanding.
Clinical examinations generate much interest in examiner fairness. In PACES, individual examiner bias is minimised by using objective rather than subjective criteria ("anchor statements") offering candidates of both sexes equal opportunity to demonstrate competence. Examiners are advised to follow the same line of questioning for each candidate-surrogate interaction minimising any potential for bias in individual encounters.
A review of MRCP(UK) examiner performance has shown non-white examiners to have a higher stringency score , but analysis of the joint effect of examiner ethnicity and candidate ethnicity shows a significant interaction. More detailed analysis shows that the effect is primarily occurring in the "talking stations", and there is no evidence of interaction on clinical skills stations. Any simplistic explanation in terms of examiner prejudice can be excluded, as bias would also be expected to be evident in clinical skills stations. The effect is statistically significant in the communication stations, but only, it seems, in cases where two non-white examiners meet a non-white candidate. This might reflect different cultural interpretations of judgements being made, particularly when communication skills and ethics are being assessed.
Roberts et al highlighted the problems for ethnic minority candidates in a conventional oral examination in the MRCGP examination. They postulated that candidates' styles of communication could be at odds with that of white examiners, with examiners switching between styles of discourse, leading to the potential for misunderstandings . Thus, when two non-white examiners encounter a non-white candidate, the style of discourse may be more consistent, resulting in an opportunity for inadvertent positive bias.
Our study has identified significant variations in pass rates for UK graduates based on their self-declared ethnicity and, in the clinical examination, gender. The cause of these differences is most likely to be multifactorial, but cannot be readily explained in terms of previous educational experience or in terms of differential performance on particular parts of the examination.
Taken overall, these detailed analyses suggest that any effects of examiner and candidate concordance or discordance of ethnicity are very small and restricted to a subset of the communication stations, and are absent on clinical skills stations. That the effect of ethnicity is not primarily an effect of bias is supported by the presence of a similar size of effect in the computer-marked Part 1 and Part 2 Written Examinations. The reasons for a significant joint effect of examiner ethnicity and candidate ethnicity are not clear, but are unlikely to include conscious or unconscious bias on the part of examiners. The findings merit a more detailed analysis of station score, candidate and examiner ethnicity and scenario topic and content. When communication skills and ethics are being assessed, different cultural interpretations may be made.
- McManus IC, Richards P, Winder BC, Sproston KA: Final examination performance of medical students from ethnic minorities. Med Educ. 1996, 30: 195-200.PubMedView ArticleGoogle Scholar
- Lumb AB, Vail A: Comparison of academic, application form and social factors in predicting early performance on the medical course. Med Educ. 2004, 38: 1002-5. 10.1111/j.1365-2929.2004.01912.x.PubMedView ArticleGoogle Scholar
- Dillner L: Manchester tackles failure rate of Asian students. BMJ. 1995, 310: 209.PubMedView ArticleGoogle Scholar
- Haq I, Higham J, Morris R, Dacre J: Effect of ethnicity and gender on performance in undergraduate medical examinations. Med Educ. 2005, 39: 1126-28. 10.1111/j.1365-2929.2005.02319.x.PubMedView ArticleGoogle Scholar
- Wakeford R, Farooqi A, Rashid A, Southgate L: Does the MRCGP examination discriminate against Asian doctors?. BMJ. 1992, 305: 92-94.PubMedPubMed CentralView ArticleGoogle Scholar
- Tyrer SP, Leung W-C, Smalls J, Katona C: The relationship between medical school of training, age, gender and success in the MRCPsych examinations. Psychiatr Bull R Coll Psychiatr. 2002, 26: 257-63. 10.1192/pb.26.7.257.View ArticleGoogle Scholar
- Bedi R, Gilthorpe MS: Ethnic and gender variations in university applicants to United Kingdom medical and dental schools. Br Dent J. 2000, 189: 212-15. 10.1038/sj.bdj.4800725a.PubMedGoogle Scholar
- Wass V, Roberts C, Hoogenboom R, Jones R, Van der Vleuten C: Effect of ethnicity on performance in a final objective structured clinical examination: qualitative and quantitative study. BMJ. 2003, 326: 800-803. 10.1136/bmj.326.7393.800.PubMedPubMed CentralView ArticleGoogle Scholar
- Liddell MJ, Koritsas S: Effect of medical students' ethnicity on their attitudes towards consultation skills and final year examination performance. Med Educ. 2004, 38: 187-98. 10.1111/j.1365-2923.2004.01753.x.PubMedView ArticleGoogle Scholar
- Acheson AG: Do male medical students face prejudice?. Lancet. 1997, 350: 964-10.1016/S0140-6736(05)63312-0.PubMedView ArticleGoogle Scholar
- Rothman AI, Cohen R, Ross J, Poldre P, Dawson B: Station gender bias in a multiple-station test of clinical skills. Acad Med. 1995, 70: 42-46. 10.1097/00001888-199501000-00012.PubMedView ArticleGoogle Scholar
- Van Zanten M, Boulet JR, McKinley DW: Correlates of performance of the ECFMG Clinical Skills Assessment: influences of candidate characteristics on performance. Acad Med. 2003, 78: S72-S74. 10.1097/00001888-200310001-00023.PubMedView ArticleGoogle Scholar
- Wiskin CM, Allan TF, Skelton JR: Gender as a variable in the assessment of final year degree-level communication skills. Med Educ. 2004, 38: 129-37. 10.1111/j.1365-2923.2004.01746.x.PubMedView ArticleGoogle Scholar
- Clack GB, Head JO: Gender differences in medical graduates' assessment of their personal attributes. Med Educ. 1999, 33: 101-5. 10.1046/j.1365-2923.1999.00268.x.PubMedView ArticleGoogle Scholar
- Zaharias G, Piterman L, Liddell M: Doctors and patients: gender interaction in the consultation. Acad Med. 2004, 79: 148-55. 10.1097/00001888-200402000-00011.PubMedView ArticleGoogle Scholar
- Skelton JR, Hobbs FD: Descriptive study of cooperative language in primary care consultations by male and female doctors. BMJ. 1999, 318: 576-79.PubMedPubMed CentralView ArticleGoogle Scholar
- Ringdahl EN, Delzell JE, Kruse RL: Evaluation of interns by senior residents and faculty: is there any difference?. Med Educ. 2004, 38: 646-51. 10.1111/j.1365-2929.2004.01832.x.PubMedView ArticleGoogle Scholar
- Bessant R, Bessant D, Chesser A, Coakley G: Analysis of predictors of success in the MRCP(UK) PACES examination in candidates attending a revision course. Postgrad Med J. 2006, 82: 145-9. 10.1136/pmj.2005.035998.PubMedPubMed CentralView ArticleGoogle Scholar
- McManus IC, Thompson M, Mollon J: Assessment of examiner leniency and stringency ('hawk-dove effect') in the MRCP(UK) clinical examination (PACES): using multi-facet Rasch modelling. BMC Medical Education. 2006, 6: 42-10.1186/1472-6920-6-42.PubMedPubMed CentralView ArticleGoogle Scholar
- Roberts C, Sarangi S, Southgate L, Wakeford R, Wass V: Oral examinations-equal opportunities, ethnicity, and fairness in the MRCGP. BMJ. 2000, 320: 370-5. 10.1136/bmj.320.7231.370.PubMedPubMed CentralView ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1741-7015/5/8/prepub