Zum Hauptinhalt springen

Validity, reliability, feasibility and satisfaction of the Mini-Clinical Evaluation Exercise (Mini-CEX) for cardiology residency training.

Alves de Lima, A ; Barrero, C ; et al.
In: Medical teacher, Jg. 29 (2007-10-01), Heft 8, S. 785-90
Online academicJournal

Validity, reliability, feasibility and satisfaction of the Mini-Clinical Evaluation Exercise (Mini-CEX) for cardiology residency training.  Introduction

Aims: The purpose of the study was to determine the validity, reliability, feasibility and satisfaction of the Mini-CEX. Methods and Results: From May 2003 to December 2004, 108 residents from 17 cardiology residency programs in Buenos Aires were monitored by the educational board of the Argentine Society of Cardiology. Validity was evaluated by the instrument's capability to discriminate between pre-existing levels of clinical seniority. For reliability, generalisability theory was used. Feasibility was defined by a minimum number of completed observations: 50% of the residents obtaining at least four Mini-CEX's. Satisfaction was evaluated through a one to nine rating scale from the evaluators, and residents' perspectives. The total number of encounters was 253. Regarding validity, Mini-CEX was able to discriminate significantly between residents of different seniority. Reliability analysis indicated that a minimum of ten evaluations are necessary to produce a minimally reliable inference, but more are preferable. Feasibility was poor: 15% of the residents were evaluated four or more times during the study period. High satisfaction ratings from evaluators' and residents' were achieved. Conclusion: Mini-CEX discriminates between pre-existing levels of seniority, requires considerable sampling to achieve sufficient reliability, and was not feasible within the current circumstances, but it was considered a valuable assessment tool as indicated by the evaluators' and residents' satisfaction ratings.

The Mini-CEX has been designed to incorporate the skills that residents require in both actual patient encounters and in the educational interactions that they routinely encounter with attending physicians during teaching rounds (Norcini et al. [18]; Norcini et al. [19]; Holmboe et al. 1998; Norcini et al. [20]). A single faculty member observes and evaluates a resident while that resident conducts a focused history and physical examination on an inpatient or outpatient, or in the emergency department setting. After asking the resident for a diagnostic and treatment plan, the faculty member completes a short evaluation form and gives the resident feedback (Holmboe [11]). It is a performance-based evaluation method that is used to assess selected clinical competencies (e.g. patient charts and physical examination, also communication and interpersonal skills) in the medical training context. It is a performance-based assessment tool that intended to evaluate candidates at the 'does' level, that is, in real life settings and not in simulated situations (Miller [15]). As the interaction is relatively brief and occurs as a natural part of the process in the training environment, each individual can be evaluated on several occasions and by various faculty members.

Assessment constitutes the most vital factor influencing student learning behavior (Newble et al. 1983; Newble et al. [17]; Van der Vleuten [24]). When students see that the recall of factual information is a predominant requirement in the examination system, they tend to adopt a rote-learning or surface approach; however if examiners wish to assess students at the 'does' level, they must evaluate the student's habitual performance in daily practice (van der Vleuten [25]).

The major purpose of this study is to document if Mini-CEX applied in a broad range of clinical settings and in a big number of cardiology residency programs leads to achieving adequate levels of validity, reliability, feasibility and satisfaction rates from residents and teachers.

Practice points

The Mini-CEX has been designed to incorporate the skills that residents require in both actual patient encounters and in the educational interactions that they routinely encounter with attending physicians during teaching rounds.

Regarding validity, Mini-CEX clearly was able to discriminate between pre-existing levels of global competency between residents.

Mini-CEX has insufficient reproducibility when only a few evaluations are sampled.

Mini-CEX has satisfaction rates according to both evaluators and residents satisfaction rates.

Regarding feasibility, it appeared not easy to achieve the number of encounters required.

Methods

For each Mini-CEX, a single faculty member observed and evaluated the resident while the latter conducts a history and physical examination on an 'in' or outpatient or on a patient in the emergency department. After asking the resident for a diagnosis and treatment plan, the faculty member completes a short evaluation form and gives direct feedback. All formal mini-CEX evaluation data was collected on a one-page form that was the same in all of the different study sites where the study was carried out (Appendix 1). The form was previously translated and transculturally adapted into Spanish.

Research subjects were cardiology residents from 17 cardiology training programs from Buenos Aires, Argentina. All the programs are affiliated to the University of Buenos Aires and consist of a four-year training period. The total number of residents of the entire program was 118. All the program directors were invited to participate. It was a completely new assessment strategy for all of them. Participation was voluntary, no incentives were provided. Written instructions about the application of the format were distributed and required at least four encounters during the 19 months study period. There were no sanctions for failing to participate, but all of them accepted to participate. The assessment was used as a maximum-performance but formative evaluation. Results were not used in evaluating residents for promotion.

Statistical analysis

Validity was evaluated by the ability of Mini-CEX to discriminate between pre-existing levels of expertise. In this case it was expected that significant mean resident performance differences are to be found between different years of training. The descriptive data were expressed as its means and standard deviations. For testing significance across expertise groups the non-parametric Mann Whitney test was used. A value of p < 0.05 was considered statistically significant.

To evaluate reliability, generalizability theory was used (Brennan [3]). An ANOVA was carried out by identifying Year-of-training with (Y), Residents-within-Year with (P:Y) and Evaluations-within-Residents-within-Year with (E:P:Y), and variance components were estimated using the URGenova program. Since there might be significant growth in mean ratings throughout the years, variance associated with year was estimated separately (Y) to arrive at a more unbiased estimate of the variance of trainees. Separate evaluations of a single trainee could either be done by the same examiner or by a different one. This might have led to an underestimate of the variance across evaluations (intra-rater variability is probably smaller than inter-rater variability). Two indices of reliability were estimated using the variance components: Dependability Coefficients (D) and Standard Errors of Measurement (SEM), both as a function of the number of evaluations. The D-coefficient can be interpreted as a reliability coefficient, i.e. the expected correlation between other random evaluations of the number indicated using other evaluators and patients at random. The SEM is an estimate of the standard error and can be used to estimate confidence intervals around the score of an individual resident on the original scoring scale. For a 95% confidence interval the SEM is multiplied by 1.96 (z-score under which 95% of normal distribution lies). The SEM should be below 0.26 (0.5/1.96) in order to produce a reliable inference on the scoring scale of at least one unit.

Feasibility was defined, according to the American Board of Internal Medicine's guidelines for Mini-CEX's implementations, on average a minimum of four Mini-CEX per resident (American Board of Internal Medicine [2]).

Satisfaction was evaluated through the examination of the Mini-CEX from the perspective of evaluators, emphasizing on the ratings of the residents and on their satisfaction of the format. Ratings were carried out on a nine-point rating scale.

Results

From May 2003 to December 2004, 253 Mini-CEX encounters were carried out. 108 residents and 53 evaluators from 17 cardiology residency programs participated in the study. Each resident had gone through 1 to 7 evaluations (mean: 2.34): 13.7% of the residents were in their first year, 34.8% in the second, 41.2% in the third and 10.3% in their fourth year of residency. Each evaluator conducted between 1 and 21 (mean 4.77) evaluations. The total numbers of encounters were 253 and constituted the basis of the analysis. Of the 253 encounters, 52% of the encounters occurred in the coronary care unit setting, 30% were carried out in step-down care unit, 6% in the emergency room, 6% in the ambulatory care unit and 6% in the cardiovascular intensive care unit. The overall competence ratings were similar in all settings. Forty-one percent of the encounters represented the first visit of a patient to a particular resident and 59% were return visits to that resident. The means total Mini-CEX time was 42.77 minutes (SD 19.97). If we divide the Mini-CEXs total time between assessment and feedback period, each individual assessment period takes 25.80 (SD 11.95, range 5–65) minutes and feedback period 17.31 (SD 11.28, range 5–65). The patient's problems or diagnoses were specified by the evaluator and covered a broad range of problems in cardiology such as AMI, cardiac failure, unstable angina, atrial fibrillation, valvular disease and post-CABG. The mean ratings given by the 53 evaluators are reported in Table 1.

Table 1.  Mean Ratings given by all evaluators by year of training

Domain1 year2 year3 year4 yearP
Communication7.16 (± 0.64)7.57 (± 0.81)7.57 (± 0.92)8.00 (± 0.88)0.002
Physical exam7.12 (± 0.84)7.48 (± 0.93)7.59 (± 0.96)8.16 (± 0.91)0.0006
Professionalism7.64 (± 0.75)7.82(± 0.84)7.83 (± 1.00)8.20 (± 0.93)0.079
Clinical judgement7.43 (± 0.71)7.56 (± 0.86)7.88 (± 0.90)8.20 (± 0.93)0.0004
Counselling7.43 (± 0.77)7.44 (± 0.84)7.59 (± 1.11)8.12 (± 0.90)0.01
Organisation7.32 (± 0.94)7.54 (± 0.88)7.68 (± 1.01)8.12 (± 0.90)0.008
Global Competency7.19 (± 0.74)7.51 (± 0.82)7.76 (± 0.86)8.16 (± 0.91)0.0008

Validity analysis

Validity was evaluated by examining if the instrument was capable of discriminating between pre-existing levels of clinical seniority. Mini-CEX discriminates between pre-existing levels of global competency between residents; first year residents 7.19 (SD 0.74), second 7.51 (SD 0.82), third 7.76 (SD 0.86) and fourth year residents 8.16 (SD 0.91), this difference reaches statistical significance. P = 0.0008 (Table 1).

Reliability analysis

The generalizability study theory yielded variance components for Y, P:Y and E:P:Y which were respectively 0.1643 (19.88% of total variance), 00482 (5.83%) and 0.614 (74.29%). Using the SEM benchmark of 0.26, a minimum of 10 evaluations were necessary to produce a minimal reliable inference (Table 2). This corresponds to a D-coefficient of 0.44.

Table 2.  G-coefficients and SEM's are reported as a function of the sample size of evaluations

Number of evaluationsGSEM
10.070.78
20.140.55
50.280.35
100.440.25
150.540.20
300.700.14
500.800.11

657 G: generalizability-coefficient, SEM: standard error of measurement.

Feasibility analysis

For feasibility analysis the data showed that only 14.81% of all the cohort was evaluated four or more times during the study period.

Satisfaction analysis

The residents (108) were generally satisfied with the mini-CEX format; their ratings ranged from 5 to 9 (mean 8.08 ± 0.83). Satisfaction rate for first-year residents was of 8.1, second-year residents 7.8, third-year residents 8.1 and fourth-year residents 8.5. The evaluators were also satisfied with the mini-CEX format; their ratings ranged from 6 to 9 (mean 8.06 ± 0.74).

Discussion

The purpose of the study was to report logistic and psychometric data for the Mini-CEX format. Regarding validity, Mini-CEX clearly was able to discriminate between pre-existing levels of global competency between residents, has insufficient reproducibility when only a few evaluations are sampled and had high satisfaction rates according to both evaluators and residents satisfaction rates. Regarding feasibility, it appeared not possible to achieve the number of encounters required.

Some issues may explain this difficulty. It was a new assessment tool never applied earlier in this environment. We only developed written instructions, and perhaps in vivo faculties training programs for Mini-CEX would be preferable. Regarding reliability, the variance components for Y, P:Y and E:P:Y one-fifth of the variance can be attributed to growth in competence throughout the years. Provided that the Mini-CEX should offer information on growth towards a final level of competence and should be able to discriminate throughout years of training. Only approximately 6% of total variance is related to resident (or person) variance. Since the instrument is designed to discriminate among residents, this constitutes desirable (true score or universe score) variance and the larger it is the better. As usual in many competence and performance assessments, however, this component is relatively small (van der Vleuten [23]).

About three-quarters of variance is associated with differences between examiners/evaluation occasions and residual error. Using the SEM benchmark of 0.26, a minimum of ten evaluations are necessary to produce a minimal reliable inference. The D coefficient for ten observations was, however, rather low (0.44). Reliability coefficients of 0.80 or higher are generally accepted as a threshold for high-stakes judgements, such as the registration of a doctor for licensure (Crossley et al. [5]). In our dataset this was achieved with 50 observations.

The required number of encounters as derived from this study is more demanding than reported by other studies in the literature. Norcini et al. ([20]) concluded that ten or more encounters produced relatively tight confidential intervals and that an increase in the number of encounters beyond that produced only small gains in consistency. Carline et al. ([4]) concluded that seven independent ratings would be necessary to judge overall clinical performance. Similar results have been reported by Kreiter and colleagues (more than eight ratings needed), Kwolek (7–8 ratings), Ramsey (11 ratings), Violato (10 ratings) and Kroboth (6–10 ratings) (Kroboth et al. [13]; Ramsey et al. [21]; Kwolek et al. [14]; Violato et al. [26]; Kreiter et al. [12]; Durning et al. [6]). All reports agree that somehow between seven and 11 ratings are necessary to achieve a generalizable global estimate of competence when ratings are based on a non-systematic sample of observations (Williams et al. [27]). Although there has been only limited research on the component skills included under the broad category of clinical competence, it is reasonable to expect that these abilities will develop at different rates and may differ in their stability in different situations. Results of performed work suggest that different numbers of observations will be required to establish stable estimates of clinical competence in various clinical competence areas (Williams et al. [27]).

We have not analysed inter-rater reliability since we took a time sampling perspective in which each observation is but one observation in a longer time-framework. We analysed reliability across these observations. Rater variance as well as performance variability variance across observations will be part of our reliability estimation (although we can't partition out the two sources of variance). If only one observation is used for a Mini-CEX assessment it would be important to judge the inter-rater reliability. Our data, unfortunately, do not provide this information.

Consistent with previous works, examiners were satisfied with the format (Williams et al. [27]).

Limitations of the study

The number of residents participating in our study was relatively low and this group may perhaps not be fully representative of broader populations.

We are aware that improvement of performance across years of training is a weak form of construct validity. However, it is a fundamental one: absence of performance differences across different expertise groups would be detrimental to construct validity of the instrument used. This not being the case, we conclude to a first indication of validity. Further studies into construct validity should be the next step. Studies looking at the incremental validity over existing more standardized performance instruments would provide compelling construct validity evidence.

The assessment used was a maximum-performance but formative evaluation. If we take into account that this assessment does not assign grades or certifications, this could seriously have affected the perception of the fellows and influence their satisfaction rates.

The reliability analysis carried out here used the usual assumption of local independence between the repeated measurement moments (one measurement is not influenced by another measurement). This assumption is clearly violated in Mini-CEX studies including ours. Every evaluation is actually meant to provide feedback and to change the performance of the person being assessed. The Mini-CEX evaluations are therefore not independent of each other. This is a general problem in the literature of performance measure which are dispersed in time and used for formative purposes (Alves de Lima et al. [1]).

Conclusion

The direct observation of residents' behaviour is essential to assess clinical skills (Holmboe et al. [11]; Holmboe et al. [9]; Schuwirth & Van der Vleuten [22]). For decades, clinical supervisors have taken at face value the veracity of the history and physical examination presented on inpatient and outpatient rounds without ever directly observing how the trainee actually performed them (Holmboe [11]).

Medical educators' major challenge lying ahead is to find the way to ensure that they themselves have not only strong clinical skills, but also the necessary skills to effectively observe, evaluate, and provide constructive feedback to trainees regarding clinical skills. In this direction Mini-CEX ensures direct observation and feedback of residents from different faculties in a broad range of patients' problems in various settings. Furthermore, as this demonstrated feasibility is an issue. Application strategies should be reinforced. We don't think that Mini-CEX requires any modification in itself but it should never be used as a unique assessment tool. Direct observation of trainees in clinical setting can be connected to other exercises that trainees may perform after their encounters with patients, such as oral case presentation, written exercises that assess the clinical reasoning, and literature searches. In addition, review videos of encounters with patients offer a powerful means of evaluating and providing feedback on trainees' skills in clinical interaction (Epstein [7]). It can only be approved if this evaluation method becomes part of the clinical routine of the clinician and resident.

Appendix 1: The Mini-CEX form

Graph

References 1 Alves de Lima A, Henquin R, Thierer J, Paulin J, Lamari S, Belcastro F, van der Vleuten CP. A qualitative study of impact on learning of the mini clinical evaluation exercise in postgraduate training. Med Teach 2005; 27: 46–52 2 http://www.abim.org/, /minicex/default.htm (24 Oct 2005), American Board of Internal Medicine. The Mini-CEX: A quality tool in evaluation 3 Brennan RL. Generalizability Theory. Springer-Verlag, New York 2001 4 Carline JD, Paauw DS, Thiede KW, Ramsey PG. Factors affecting the reliability of ratings of students' clinical skills in a medicine clerkship. J Gen Intern Med 1992; 7: 506–510 5 Crossley J, Davies H, Humphris G, Jolly B. Generalisability: a key to unlock professional assessment. Med Educ 2002; 36: 972–978 6 Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med 2002; 77: 900–904 7 Epstein R. Asessment in medical education. N Eng J Med 2007; 356: 387–396 8 Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med 1998; 129: 42–48 9 Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents' clinical competence: a randomized trial. Ann Intern Med 2004; 140: 874–881 Holmboe ES. Faculty and the observation of trainees' clinical skills: problems and opportunities. Acad Med 2004; 79: 16–22 Holmboe ES, Yepes M, Williams F, Huot SJ. Feedback and the mini clinical evaluation exercise. J Gen Intern Med 2004; 19: 558–561 Kreiter CD, Ferguson K, Lee WC, Brennan RL, Densen P. A generalizability study of a new standardized rating form used to evaluate students' clinical clerkship performances. Acad Med 1998; 73: 1294–1298 Kroboth FJ, Hanusa BH, Parker S, Coulehan JL, Kapoor WN, Brown FH, Karpf M, Levey GS. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med 1992; 7: 174–179 Kwolek CJ, Donnelly MB, Sloan DA, Birrell SN, Strodel WE, Schwartz RW. Ward evaluations: should they be abandoned?. J Surg Res 1997; 69: 1–6 Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990; 65: S63–67 Newble DI, Jaeger K. The effect of assessments and examinations on the learning of medical students. Med Educ 1983; 17: 165–171 Newble DI, Hejka EJ, Whelan G. The approaches to learning of specialist physicians. Med Educ 1990; 24: 101–119 Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med 1995; 123: 795–799 Norcini JJ, Blank LL, Arnold GK, Kimball HR. Examiner differences in the mini-CEX. Adv Health Sci Educ Theory Pract 1997; 2: 27–33 Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med 2003; 138: 476–481 Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. Use of peer ratings to evaluate physician performance. JAMA 1993; 269: 1655–1660 Schuwirth LW, Van der Vleuten CP. A plea for new psychometric models in educational assessment. Med Educ 2006; 40: 296–300 Van der Vleuten C. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ Theory Pract 1996; 1: 41–67 Van der Vleuten C. Validity of final examinations in undergraduate medical training. BMJ 2000; 321: 1217–1219 Van der Vleuten C, Schuwirth L. Assessment of professional competence: from methods to programmes. Med Educ 2005; 39: 309–317 Violato C, Marini A, Toews J, Lockyer J, Fidler H. Feasibility and psychometric properties of using peers, consulting physicians, co-workers, and patients to assess physicians. Acad Med 1997; 72: S82–84 Williams RG, Klamen DA, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med 2003; 15: 270–292

By Alberto Alves de Lima; Carlos Barrero; Sergio Baratta; Yanina Castillo Costa; Guillermo Bortman; Justo Carabajales; Diego Conde; Amanda Galli; Graciela Degrange and Cees Van DER Vleuten

Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author; Author

ALBERTO ALVES DE LIMA, MD, MHPE is a cardiologist, Head of the Education Department at the Instituto Cardiovascular de Buenos Aires, Buenos Aires, Argentina and a member of the educational department of the Argentine Society of cardiology.

CARLOS BARRERO, MD is a cardiologist and a member of the educational department of the Argentine Society of cardiology.

SERGIO BARATTA, MD is a cardiologist and a member of the educational department of the Argentine Society of cardiology.

YANINA CASTILLO COSTA, MD is a cardiologist and a member of the educational department of the Argentine Society of Cardiology.

GUILLERMO BORTMAN, MD is a cardiologist and a member of the educational department of the Argentine Society of Cardiology.

JUSTO CARABAJALES, MD is a cardiologist and a member of the educational department of the Argentine Society of Cardiology.

DIEGO CONDE, MD is a cardiologist and a member of the Education Department at the Instituto Cardiovascular de Buenos Aires, Buenos Aires, Argentina.

AMANDA GALLI is educational Psychologist and a member of the Education Department of the Argentine Society of cardiology.

GRACIELA DEGRANGE, MD is a cardiologist and a member of the educational department of the Argentine Society of cardiology.

CEES VAN DER VLEUTEN, Professor, is Psychologist and Chairperson of the Department of Educational Development and Research, Maastricht University, The Netherlands.

Titel:
Validity, reliability, feasibility and satisfaction of the Mini-Clinical Evaluation Exercise (Mini-CEX) for cardiology residency training.
Autor/in / Beteiligte Person: Alves de Lima, A ; Barrero, C ; Baratta, S ; Castillo Costa, Y ; Bortman, G ; Carabajales, J ; Conde, D ; Galli, A ; Degrange, G ; Van der Vleuten, C
Link:
Zeitschrift: Medical teacher, Jg. 29 (2007-10-01), Heft 8, S. 785-90
Veröffentlichung: London : Informa Healthcare ; <i>Original Publication</i>: London, Update Publications., 2007
Medientyp: academicJournal
ISSN: 1466-187X (electronic)
DOI: 10.1080/01421590701352261
Schlagwort:
  • Argentina
  • Clinical Competence
  • Consumer Behavior
  • Humans
  • Reproducibility of Results
  • Cardiology education
  • Educational Measurement methods
  • Internship and Residency methods
Sonstiges:
  • Nachgewiesen in: MEDLINE
  • Sprachen: English
  • Publication Type: Journal Article; Validation Study
  • Language: English
  • [Med Teach] 2007 Oct; Vol. 29 (8), pp. 785-90.
  • MeSH Terms: Cardiology / *education ; Educational Measurement / *methods ; Internship and Residency / *methods ; Argentina ; Clinical Competence ; Consumer Behavior ; Humans ; Reproducibility of Results
  • Entry Date(s): Date Created: 20071006 Date Completed: 20080620 Latest Revision: 20220311
  • Update Code: 20240513

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -