ECG interpretation is a core clinical skill that needs to be acquired during undergraduate medical education . This is the first study to compare the relative impact of different levels of teaching and different consequences of examinations on student performance of a clinical skill. Confirming our hypothesis, we found a strong association between summative examinations and better performance in the ECG exit examination while teaching intensity did not predict student performance.
Comparison with other studies
In 2005, a survey of Clerkship Directors in Internal Medicine in the US revealed that the predominant instructional format for ECG interpretation was large-group teaching with 75% of medical schools offering lectures to teach ECG reading skills . A number of studies have assessed the effect of different instructional formats on student ECG interpretation skills [22, 23]. Comparability of these studies is limited as different methods were used to measure student performance (for example, multiple choice tests, open questions), and most studies failed to report whether examinations were formative or summative. The available literature suggests that large-group teaching is more effective than no teaching . More recently, Mahler et al. reported that self-directed learning was inferior to lectures and workshops in promoting ECG interpretation skills . This resonates with our current findings, but that study did not allow any conclusions to be drawn regarding the effect of examination consequences on student performance. Moreover, it has not been assessed whether examination consequences have a moderating effect on the effectiveness of different levels of teaching intensity. To that end, we assessed the interaction between examination consequences and teaching intensity with regard to their effects on student performance and learning behavior and found no significant interaction for performance in the ECG exit exam and student learning time. In accordance with the unadjusted data presented in Figure 2, we found a significantly greater effect of examination consequences on the use of additional learning material in the context of peer teaching than in the context of SDL. It might be hypothesized that students in the SDL condition might not have been as motivated to consult additional learning material even in the face of a summative exam as students experiencing the benefits of peer teaching. This hypothesis should be tested in future studies. However, the overall effect of examination consequences appeared to be independent of the effect of teaching intensity on student performance.
Our study provides some evidence that teaching format does impact on learning behavior. As expected from underlying theory , small-group peer teaching was more effective in stimulating self-directed learning than lectures, and this finding is important with regard to preparing undergraduate medical students for lifelong learning in clinical medicine. In fact, an ANOVA using the percentage score of points achieved in the ECG exit exam (rather than the percentage of students correctly identifying ≥3 out of 5 diagnoses) as the dependent variable showed that higher teaching intensity was significantly associated with better exam performance but that effect was much smaller than the effect of examination consequences on percentage score.
Taken together, our data suggest that identifying the 'ideal' teaching format might be futile if learning is not adequately incentivized by an adequate summative assessment that is matched to the learning objective.
Owing to the dominance of psychometric theory during the second half of the 20th century, great emphasis was put on the numerical aspects of assessments in medical education. In contrast to this, assessments are now perceived as being at the heart of the educational design . In this regard, the paucity of research into the mechanisms by which assessments guide student learning is surprising , particularly in the light of the repeated calls for such research [9, 26]. The fact that, in our present study, a summative assessment was the only significant predictor of student performance even after adjusting for motivation questions the general notion that medical students' motivation to learn is mainly driven by the aspiration of becoming a 'good doctor' . It also contradicts the 'andragogy hypothesis' which states that adult learners are intrinsically motivated to learn because they acknowledge the relevance of the content taught to the professional activity for which they are training . While this hypothesis has already been challenged on theoretical grounds , we here provide data suggesting that summative examinations generate a strong extrinsic motivation to learn that may even override intrinsic motivation. Finally, it should be noted that medical students are a diverse population, and the impact of examination consequences and teaching format may vary greatly between individuals. This study was not designed to identify subgroups that benefit most from interactive teaching, but such research is clearly needed to help medical educators design curricula that are tailored to their students' needs. In addition, it would be interesting to assess how student experiences with different teaching formats gained in this study impact on subsequent learning behavior (that is, students in the SDL condition who scored highly in the ECG exit exam might feel more confident to engage in SDL activities and become less dependent on didactic teaching).
Strengths and limitations of the study
The design of this study allowed the identification of predictors of student performance in a reliable test of ECG interpretation skills. Since production tests are regarded superior to recognition tests , we used a written examination format and did not provide predefined answers. We enrolled over 500 undergraduate medical students and obtained complete data for over 94% of eligible participants, thus rendering any selection bias unlikely. All differences in baseline performance levels between the six groups were adjusted for in the multivariate analysis. In order to allow comparisons across groups, identical ECG examinations were used in all groups. We took great care to collect all test materials after each examination, and the marginally weaker performance of the final cohort suggests that these students did not have access to any examination materials, thus rendering contamination bias unlikely.
The trial was only partially randomized as ethical reasons prohibited randomizing students of the same cohort to either summative or formative examinations; this would have disadvantaged students who would not have been able to score additional credit points in the ECG exit examination. As the reference conditions of SDL and a formative assessment were only used in the final cohort, we cannot entirely rule out a potential historical threat to validity as that cohort might have had different experiences than the other ones. However, as far as the baseline variables were concerned, there was no evidence of the final cohort being any different from the others.
Learning and performance in examinations have been shown to be case specific . The sampling used for the primary outcome of this study may have been insufficient; however, including more ECG tracings in the exit examination would have increased the time required to complete the test, thereby increasing the risk of higher dropout rates in study groups with a formative examination. In addition, reanalyzing the data using raw point scores did not change the results, suggesting that the approach used in our analysis was valid. Our study was conducted at one German medical school, and we only assessed one learning objective. Future research needs to determine whether our findings generalize across cognitive, practical and affective learning objectives, medical curricula and countries. Finally, we did not assess long-term retention of ECG interpretation skills. Given that the impact of problem-based learning on retention might only become apparent after longer periods of time , future studies should investigate the effect of examination consequences and teaching format during undergraduate medical education on performance in residency. However, control of confounding is particularly challenging in this type of study.