Zum Hauptinhalt springen

The pursuit of fairness in assessment: Looking beyond the objective

Valentine, Nyoli ; Durning, Steven J. ; et al.
In: Medical Teacher, Jg. 44 (2022-02-01), S. 353-359
Online unknown

The pursuit of fairness in assessment: Looking beyond the objective 

Health professions education has undergone significant changes over the last few decades, including the rise of competency-based medical education, a shift to authentic workplace-based assessments, and increased emphasis on programmes of assessment. Despite these changes, there is still a commonly held assumption that objectivity always leads to and is the only way to achieve fairness in assessment. However, there are well-documented limitations to using objectivity as the 'gold standard' to which assessments are judged. Fairness, on the other hand, is a fundamental quality of assessment and a principle that almost no one contests. Taking a step back and changing perspectives to focus on fairness in assessment may help re-set a traditional objective approach and identify an equal role for subjective human judgement in assessment alongside objective methods. This paper explores fairness as a fundamental quality of assessments. This approach legitimises human judgement and shared subjectivity in assessment decisions alongside objective methods. Widening the answer to the question: 'What is fair assessment' to include not only objectivity but also expert human judgement and shared subjectivity can add significant value in ensuring learners are better equipped to be the health professionals required of the 21st century.

Keywords: Clinical; postgraduate; undergraduate; general; assessment

Introduction

Fourteen years ago, Schuwirth and van der Vleuten made a plea for new psychometric models in education assessment (Schuwirth and Van der Vleuten [57]).

Practice points

  • Objective approaches remain a dominant discourse in assessment.
  • There are limitations to using objectivity as the 'gold standard' to which assessments are judged.
  • Within the literature, there is an increasing push to better utilise human judgement and subjectivity in assessment.
  • Changing perspectives to focus on the fundamental underlying value of fairness in assessment may help re-set a traditional objective approach.

Their paper argued 'Assessment should be fair, honest and defensible...but the strict operationalisation of these values is–in our humble opinion–currently of limited value' (Schuwirth and Van der Vleuten [57]). They made an appeal for a major revision of statistical concepts, approaches to assessment, and the development of a new model that fits current assessment developments better (Schuwirth and Van der Vleuten [57]). Indeed, around the turn of the century, many changes were made to medical education assessment. Competency-based medical education became the dominant approach to medical education in many countries (Ten Cate [59]). With this, the role of the doctor was redefined to include features that had previously not been emphasised, and learners were certified based on outcome rather than the input (Ten Cate and Billett [60]). Assessment of clinical competence moved from written assessments back into the authentic context of the workplace, and individual assessments made way for programmes of assessment (Dauphinee [11]; Van der Vleuten and Schuwirth [67]; Valentine and Schuwirth [65]). More recently, competencies have been defined into professional tasks which a learner is entrusted to complete independently (Ten Cate and Scheele [62]).

Throughout these times of change, objective approaches have still remained a dominant discourse in assessment, with many seeing objectivity as the 'gold standard' to which assessments should be judged (Van der Vleuten et al. [66]; Govaerts and Van der Vleuten [30]; Ten Cate and Regehr [61]; Valentine and Schuwirth [65]).

More recently, there has been an increasing push in the literature to better utilise the role of human judgement and subjectivity in assessment (Jones [38]; Schuwirth and Van der Vleuten [57]; Govaerts and Van der Vleuten [30]; Hodges [37]; Gingerich et al. [25]; Bacon et al. [4]; Rotthoff [56]; Ten Cate and Regehr [61]) and in 2020, the Ottawa consensus statement report for performance in assessment specifically called for assessment programs to 're-instate expert judgement' (Boursicot [6]).

Perhaps with the benefit of a decade and a half of hindsight, we could say this 2006 paper (Schuwirth and Van der Vleuten [57]) didn't go quite far enough as it was still looking for new psychometric methods with an 'objective' mindset. Objectivity which De Groot defined as 'judgement without interference or even potential interference of personal opinions, preferences, modes of observation, views, interests or sentiments' (Ten Cate and Regehr [61]) has frequently become synonymous with fairness and is often used to determine the quality of the assessment. Workplace-based assessments, designed to assess authentic performance, are often judged using a quantitative psychometric framework and therefore criticised for not meeting validity and reliability criteria (Govaerts and Van der Vleuten [30]). However, an exclusive focus on traditional psychometric approaches can disregard key issues of competence, performance, and assessment in complex workplace settings (Govaerts and Van der Vleuten [30]). Taking a step back and changing perspectives to focus on the fundamental underlying value of fairness in assessment may help re-set the traditional objective approach.

Fairness is a fundamental quality of assessment and a general principle that no one contests (Green et al. [33]; Tierney [63]; General Medical Council [24]). However, fairness is not a simple construct that is easy to define or conclusively described in the literature. There is no simple, all-encompassing consensus definition of fairness. Fairness has been associated with a wide range of assessment-related qualities, such as equitable, consistent, balanced, useful, and ethically feasible. This breath demonstrates that fairness in assessment is a comprehensive term and not something that can be reduced to a number, determined dichotomously, or a straightforward process (Tierney [63], [64]). More recently, Shaw and Nisbet considered fair assessment in the light of COVID-19 and similarly identified multiple challenges to consider in defining fairness (Shaw and Nisbet [58]). Simply equating fairness to objectivity, can reduce a complex, diverse multi-dimensional, context-dependent construct to a single linear, likely non-representative rule. If the lens of improvement remains on optimising objectivity, then the focus is on better psychometric techniques, but if we return to the principle of fairness, then the focus becomes much wider.

Objective

In this paper, we seek to focus more broadly on fairness as a fundamental quality of assessment, not in a traditional systematic review but rather by synthesising and linking literature, identifying established knowledge and perspectives, highlighting gaps in understanding, and providing direction on what remains to be understood (Eva [20]). We look to challenge the often-held assumption that objectivity always leads to, and is the only way to achieve fairness in assessment, and suggest that subjective human judgement has a legitimate place alongside objectivity in fair assessment. Two arguments will be put forward to challenge this assumption. Firstly, we consider the contention that objectivity can comprehensively assess the complexity of clinical practice to be a fallacy, and secondly, that true objectivity in assessment is unobtainable. Finally, from the collation of the perspectives in the literature, we will suggest that focusing on fairness rather than objectivity ensures that expert judgement and shared subjectivity can be seen as at least equal to and used in combination with objectivity. We will also discuss opportunities for future research from a fairness lens.

Limitations of objectivity

Whilst there are many benefits to objectivity in assessment, it can lead to a naïve trust in linear causality, a reliance on reproducible, quantitative, and psychometric data, and reification. During the Vietnam War, Robert McNamara, the US Secretary of Defence, quantified the war effort into metrics, such as body counts and territory gained. As a past chairman of the Ford Motor Company, McNamara applied objective, quantified metrics to improve production lines with great success. However, war is a complex non-linear, and largely unpredictable process, and the approach of reducing complex human processes to body counts and territory gained failed as it ignored the actions of highly motivated people and the chaos and destruction of war (McNamara [44]). This led to the McNamara fallacy, which was coined by Yankelovich (Figure 1) (Yankelovich [72]).

Graph: Figure 1. The McNamara fallacy.

Medicine, like war, is also complex, non-linear, and to a certain extent unpredictable. Complex systems are characterised as having multiple, dynamic components interacting in non-linear and unpredictable ways, where the whole system is more than the sum of the parts (Reed et al. [55]). Health professionals work with complex problems presented in a variety of different ways and through multiple dimensions of human experience (biological, psychological, social, spiritual). Treatment decisions are often made in the face of uncertainty as every patient responds differently to the array of therapeutic options (Kaldjian [39]). Furthermore, society expects health professionals to have not only knowledge of established and evolving diseases, but also interpersonal and communication skills, be able to apply ethical principles, and so on (Norcini et al. [49]).

McNamara was blindsided by the data, convinced the USA was winning the war despite his commanders telling him the exact opposite (Carmody [7]). Similarly, within clinical practice, equating 'quality' with someone who strictly adheres to guidelines or protocols, is to overlook the evidence on the more sophisticated process of expertise (Greenhalgh et al. [34]). With regard to assessment, any effort to prioritise and quantify one aspect of trainees' qualities, such as knowledge, will inevitably reduce the emphasis on other aspects which might be deemed important (Eva [21]). It could actually be argued that objectivity can reduce fairness because it only measures what can be measured by a quantitative value. This is unfair to learners with broader skills and unfair to society who highly value unquantifiable competencies, such as compassion, kindness, and courage in their health care professionals (O'Mahony [52]; Wayne et al. [70]). These skills, as well as other not easily quantifiable skills, such as communication, collaboration, and professionalism are often the ones needed within our health care systems (Frank et al. [23]). Such reductionist approaches may also carry the risk of negatively impacting student learning behaviour. Cilliers et al. demonstrated that the effects of assessment on student learning are complex. Overreliance on 'objectivity' and quantitative results were perceived as punitive and unfair, and encouraged students' learning activities to be directed to passing assessments rather than learning to become a good clinician (Cilliers et al. [8]; Cilliers et al. [9]).

There are also further limitations to the use of objectivity in assessment. Assessment is always an evaluative process and therefore subjective. In the late 20th century, medical education assessments moved back into the authentic context of the workplace to help ease the tension between what is being measured and what should be measured (Rotthoff [56]; Boulet and Durning [5]). However, in an attempt to remain true to the paradigm of objectivity, assessments, such as the objective structured clinical examination (OSCE), were designed to minimise human judgement as much as possible. This was believed to improve fairness (Norcini et al. [51]; Gingerich et al. [25]; Ten Cate and Regehr [61]; Valentine and Schuwirth [65]). Objectivity, from a positivist perspective that has played a prominent role in health professions education, suggests that for each item being measured, a 'true' score exists and any deviation from this true score is a measurement error (Ten Cate and Regehr [61]). However, even an 'objective' multiple-choice examination involves a series of judgements by experts: what topics should be included, the choice of questions, specific wordings, decisions about pass scores, and so on (Norcini and Shea [50]; Ten Cate and Regehr [61]; Valentine and Schuwirth [65]). And, as other authors have noted, these judgements are rarely unanimous, often requiring complex negotiation between experts, which based on De Groot's definition, is far from objective (Ten Cate and Regehr [61]).

This can also be said of all quantitative measurement scales. Downie and colleagues stated 'if the underlying purpose of questionnaires and measurement scales is to avoid the need for judgement then it does not succeed' (Downie and Macnaughton [15]). Judgement is required in deciding what questions to ask, what numbers to assign, and how to interpret final scores. Judgements can be dangerous when the professionals are unaware they are making them and believe themselves to be 'objective' (Downie and Macnaughton [15]). Within the health professions education literature it has been assumed that expert practitioners have a shared understanding of what competency-based assessment is and the criteria for a competent performance (Apramian et al. [1]). But, assessors have been shown repeatedly to have different interpretations of individual performances (Apramian et al. [3]; Ten Cate and Regehr [61]), different perceptions of whether a performance upholds competence principles (Bacon et al. [4]; Apramian et al. [1]), and make different inferences about knowledge, skills, and attitudes which can't be directly observed (Kogan et al. [40]; Gingerich et al. [25]). Furthermore, it has been noted that assessors attach varying importance to different aspects of assessments or clinical procedures (Kogan et al. [40]; Apramian et al. [2], [3]) as well as attach importance to factors outside of competency frameworks (Oudkerk Pool et al. [53]). Assessors draw from multiple frames of reference (Kogan et al. [41]; Kogan et al. [40]; Yeates et al. [73]; Bacon et al. [4]; Apramian et al. [1]) and use various methods to synthesise judgements into numerical ratings (Kogan et al. [40]).

Moreover, the complexity of the task and the context of the work environment also influence assessment decisions (Gingerich et al. [26]). Returning to McNamara, The Economist observed 'he was haunted by the thought that amid all the objective-setting and evaluating, the careful counting and the cost-benefit analysis, stood ordinary human beings. They behaved unpredictably' (O'Mahony [52]). Workplace-based assessment occurs in environments where people are free to act in ways that are not predictable, and whose actions are interconnected so that one person's actions change the context for other people (Plsek and Greenhalgh [54]; Mennin [45]; Greenhalgh and Papoutsi [35]). For example, no one can predict what a patient will say in 3 min time so there can be no protocol to access the learner taking the history. In their 2006 paper, Schuwirth and van der Vleuten noted 'We dismiss variance between observers as an error because we start from the assumption that the universe is homogeneous, where in fact the more logical conclusion would have been that the universe is more variant' (Schuwirth and Van der Vleuten [57]).

This difficulty in obtaining agreement and a 'single truth' is not surprising because decision-making is idiosyncratic and individual (Durning et al. [18]; Bacon et al. [4]). The psychology and cognitive science literature note that decision-making processes are highly complex, subject to multiple influences and no single theory of learning or performance can fully represent the underlying mechanisms. Furthermore, human working memory is thought to only hold approximately seven information elements at a time, and actively process no more than two to four elements at a time (Young et al. [74]). To overcome a limited working memory, information is rearranged and connected to pre-existing knowledge frameworks of information (schema) activated from long-term memory (Moulton et al. [46]; Young et al. [74]). These pre-existing schemas are used to judge the 'new' information being observed and influence what judgements are reached and an assessor's recall of what occurs. In assessment, assessors are active information processors who recognise, select and interpret relevant information, and integrate this information using their past experiences as well as their understandings of their social, cultural, and contextual surroundings to form impressions and assign ratings (Gingerich et al. [26]; Govaerts et al. [31]; Govaerts and Van der Vleuten [30]). Previously, assessor variability has been framed primarily as a 'training issue' with the belief that the assessor is trainable. However, widespread assessor-agreement, improved test psychometrics and reduced measurement 'error' has remained elusive despite extensive efforts at faculty training (Newble et al. [48]; Downing [16]; Gingerich et al. [26]; Gingerich et al. [25]). Delandshere and Petrosky noted: 'Judges' values, experiences, and interests are what makes them capable of interpreting complex performances, but it will never be possible to eliminate those attributes that make them different, even with extensive training and 'calibration' (Delandshere and Petrosky [12]).

One common attempt to overcome this variability has been to simply exclude outliers to ensure a common perspective and perceived reliability among the remaining raters (Newble et al. [48]). However, as noted by other authors, this approach does not exclude subjectivity—at best it masks subjectivity behind a constructed consensus (Ten Cate and Regehr [61]) and perhaps eliminates the outliers who are unwilling to modify their assessment despite fear of unpleasant repercussions (Kogan et al. [40]). The development of the OSCE approach (Harden and Gleeson [36]) was an illustration of reducing this assumed assessor-related 'error' through the process rather than 'objectifying'. Although standardisation was initially prescribed through checklists, a sampling framework was also developed (having the candidate rotate from examiner to examiner) which accepted variability in assessors. The narrative at that time was psychometric but through the current lens, this can be seen as a procedural approach to ensuring fairness.

Moving beyond 'bias free objectivity' in the pursuit of fairness

Pursuing fairness in assessment through objectivity has many benefits but also has limitations. Firstly, it overlooks the complexity of clinical and educational practice and the wide variety of skills demanded of health professionals. Secondly, despite multiple efforts over several decades at both internal (such as faculty training) and external solutions (such as structured forms), researchers have not satisfactorily managed to achieve 'bias-free objectivity'.

The call to move away from an over-reliance on an objectivity paradigm has been echoed throughout the literature for several decades now. Gingerich and colleagues have suggested that perhaps the time has come to acknowledge a 'single' truth does not exist and consider an alternative conception of rater 'error' (Gingerich et al. [25]) and in 1999 Jones wrote that it is time 'to acknowledge that the role of a professional vocational educator is to make educational judgements' (Jones [38]). These views have been echoed by other authors as they call for health professions education to have the courage to acknowledge the benefits of subjectivity in assessment (Govaerts and Van der Vleuten [30]; Hodges [37]; Bacon et al. [4]; Rotthoff [56]; Ten Cate and Regehr [61]).

Changing perspectives to look at fairness in assessment, rather than objectivity in assessment, allows for many different legitimate perspectives. Ten Cate and Regehr argue 'objectivity' might be better understood as negotiating a 'shared subjectivity': a convergence on an agreed-upon and socially constructed perspective (Ten Cate and Regehr [61]). They note that although this convergence might achieve consensus, it isn't bias-free objectivity (Ten Cate and Regehr [61]). However, at the core of subjectivity is human judgment. To embrace subjectivity requires embracing human judgement and accepting that multiple different perspectives are not measurement errors to be corrected psychometrically but rather legitimate complementary perspectives of performance, much like how different angles of an object are present in different but still 'correct' visions. And if we accept this premise, then we must develop new ways to determine the fairness of the judgements supervisors make.

The idea of human judgement and shared subjectivity in health professions education is widely used through competence committees and exam boards, especially since the introduction of programmatic longitudinal assessment (Van der Vleuten et al. [68]). Ten Cate and Regehr argue there are many positives in embracing multiple legitimate perspectives on a single performance, including helping the learner be alert to the fact that in real life, there are multiple ways in which behaviours can be interpreted and respond to accordingly (Ten Cate and Regehr [61]). Differences in opinions may be noise in a psychometric perspective, but these differences may be very beneficial for the learning of an individual. Through the use of narrative instead of numbers, human judgement offers meaningful learning affordances through the possibility of richer feedback. Furthermore, the use of descriptive narrative in assessment has been shown in several studies to be a sensitive way of identifying at risk learners earlier (Cohen et al. [10]; Durning et al. [19]), is a reliable way to distinguish between learners even after only a few assessment reports (Ginsburg et al. [27]; Ginsburg et al. [28]) and can predict future performance or need for remediation (Cohen et al. [10]).

Accepting and legitimising human judgement allows for context to be considered and this may be more defensible as assessors are not forced to document a context-free inference about the learner (Ten Cate and Regehr [61]). Furthermore, 'objectivity' in assessment can lead to many assessment forms aiming to be context-independent, where assessors are forced to make judgements on a wide range of competencies not observed or in the context of the clinical situation every time they complete an assessment form. Some authors have noted this can diminish the learners' trust in the assessor and process, and hides potentially credible decisions in a 'mountain of meaningless platitudes' (McCready [43]; Watling [69]). Subjectivity may help overcome this.

The human judgement also has its limitations. Multiple studies have demonstrated actuarial methods are superior to clinical prediction in many situations (Marchese [42]). The use of clinical guidelines has improved patient outcomes in many scenarios, such as heart failure, breast cancer, atrial fibrillation, ventilator-assisted pneumonia, and so on (Murad [47]). In both medical education and clinical medicine, objectively derived scales, guidelines, and matrixes are essential tools. However, these tools need to be used smartly. As Woolf and colleagues note it, 'clinical guidelines are useful when practitioners are unclear about appropriate practice and when scientific evidence can provide an answer. They are a poor remedy in other settings' (Woolf et al. [71]). Human judgement is essential in determining when and how to apply these tools. We have tried to emphasise in this paper, that a desire to better utilise and legitimise subjective judgements in assessment, is not to dismiss the work done on objectivity in assessment over the last century. Nor does acknowledging that quantitative multiple-choice tests are in fact based on a series of subjective decisions, mean they no longer have a place in modern assessment. Numerical ratings and standardised assessments are valuable elements in fair assessment. Instead, objectivity, subjectivity, and human judgement are tensions that should be reconciled (Govaerts et al. [32]). Recognising the role of human judgement in assessment acknowledges that alongside knowledge tests there needs to be the assessment of professional capabilities, and alongside debates about psychometrics, there need to be debates about the credibility and defensibility of human judgements. It isn't an either-or, but rather a careful balancing of approaches in assessment programmes (Govaerts and Van der Vleuten [30]; Govaerts et al. [32]). Changing focus from objectivity to fairness can assist with this. Widening the answer to the question: 'What is fair assessment' to include human judgement can add significant value in ensuring a broader range of capabilities are being assessed to ensure learners are better equipped to be the health professionals required of the 21st century.

Conclusion

In increasingly complex clinical and educational environments, the challenge is to continue to move beyond the assumption that objectivity always leads to, and is the only way to achieve fairness. Changing the focus from workplace-based assessment being judged in terms of an objective psychometric framework to the assessment being judged in terms of fairness will help avoid falling prey to McNamara's fallacy and ensure we fulfill our social contract with society to train health professionals who can thrive in this ever-changing environment, whilst remaining fair to the learners themselves. If we can move beyond the objective paradigm in our pursuit of fairness in assessment, we can start to explore shared subjectivity and human judgement in more depth. And a different ontological understanding of what makes human judgements and shared subjectivity fair in assessment is crucial.

There is no consensus roadmap to determine what is fair assessment conveniently published in the literature. Developing a deeper understanding of what fair human judgement looks like, how this can be defined, how it can be optimised for learning, and how this can be supported is needed. What are the essential foundations of fairness and how can these be applied to judgements in the complex environment of workplace-based assessment? It has been suggested fairness of human judgement can be enhanced through the use of a palette of assessor perspectives, the combination of multiple assessments, the use of narrative and paper trails in judgement decisions (Dijkstra et al. [14]; Dijkstra et al. [13]; Van der Vleuten et al. [68]). Some authors have also suggested looking to qualitative research strategies as an alternative to build rigour in assessment (Driessen et al. [17]; Frambach et al. [22]), however further research is still needed. Gipps and Stobart noted 'We will never achieve the fair assessment, but we can make it fairer' (Gipps and Stobart [29]). And in 21st century clinical practice, perhaps we can make the health profession's assessment fairer by looking beyond the objective paradigm.

Disclosure statement

The authors declare that they have no conflict of interest. The views expressed herein are those of the authors and not necessarily those of the Department of Defense, Uniformed Services University of the Health Sciences or other Federal agencies.

References 1 Apramian T, Cristancho S, Sener A, Lingard L. 2018. How do thresholds of principle and preference influence surgeon assessments of learner performance? Ann Surg. 268 (2): 385 – 390. 2 Apramian T, Cristancho S, Watling C, Ott M, Lingard L. 2015. Thresholds of principle and preference: exploring procedural variation in postgraduate surgical education. Acad Med. 90 (11 Suppl): S70 – S76. 3 Apramian T, Cristancho S, Watling C, Ott M, Lingard L. 2016. "Staying in the game": how procedural variation shapes competence judgments in surgical education. Acad Med. 91 (11): S37 – S43. 4 Bacon R, Williams LT, Grealish L, Jamieson M. 2015. Competency-based assessment for clinical supervisors: design-based research on a web-delivered program. JMIR Res Protoc. 4 (1): e26. 5 Boulet JR, Durning SJ. 2019. What we measure ... and what we should measure in medical education. Med Educ. 53 (1): 86 – 94. 6 Boursicot K. 2020. Consensus statement reports: performance assessment. Ottawa; Kuala Lumpur. 7 Carmody JB. 2019. On residency selection and the quantitative fallacy. J Grad Med Educ. 11 (4): 420 – 421. 8 Cilliers FJ, Schuwirth LW, Adendorff HJ, Herman N, Van der Vleuten CP. 2010. The mechanism of impact of summative assessment on medical students' learning. Adv Health Sci Educ Theory Pract. 15 (5): 695 – 715. 9 Cilliers FJ, Schuwirth LW, Herman N, Adendorff HJ, van der Vleuten CP. 2012. A model of the pre-assessment learning effects of summative assessment in medical education. Adv Health Sci Educ Theory Pract. 17 (1): 39 – 53. Cohen GS, Blumberg P, Ryan NC, Sullivan PL. 1993. Do final grades reflect written qualitative evaluations of student performance? Teach Learn Med. 5 (1): 10 – 15. Dauphinee WD. 1995. Assessing clinical performance. Where do we stand and what might we expect? JAMA. 274 (9): 741 – 743. Delandshere G, Petrosky AR. 1994. Capturing teachers' knowledge: performance assessment: a) and post-structuralist epistemology b) from a post-structuralist perspective c) and post-structuralism d) none of the above. Educ Res. 23 (5): 11 – 18. Dijkstra J, Galbraith R, Hodges BD, McAvoy PA, McCrorie P, Southgate LJ, Van der Vleuten CP, Wass V, Schuwirth LW. 2012. Expert validation of fit-for-purpose guidelines for designing programmes of assessment. BMC Med Educ. 12 : 20. Dijkstra J, Van der Vleuten CP, Schuwirth LW. 2010. A new framework for designing programmes of assessment. Adv Health Sci Educ Theory Pract. 15 (3): 379 – 393. Downie R, Macnaughton J. 2013. In defence of professional judgement. Adv Psychiatr Treat. 15 (5): 322 – 327. Downing SM. 2004. Reliability: on the reproducibility of assessment data. Med Educ. 38 (9): 1006 – 1012. Driessen E, Van der Vleuten C, Schuwirth L, van Tartwijk J, Vermunt J. 2005. The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Med Educ. 39 (2): 214 – 220. Durning SJ, Artino AR Jr., Schuwirth L, Van der Vleuten C. 2013. Clarifying assumptions to enhance our understanding and assessment of clinical reasoning. Acad Med. 88 (4): 442 – 448. Durning SJ, Hanson J, Gilliland W, McManigle JM, Waechter D, Pangaro LN. 2010. Using qualitative data from a program director's evaluation form as an outcome measurement for medical school. Mil Med. 175 (6): 448 – 452. Eva KW. 2008. On the limits of systematicity. Med Educ. 42 (9): 852 – 853. Eva KW. 2015. Moving beyond childish notions of fair and equitable. Med Educ. 49 (1): 1 – 3. Frambach JM, Van der Vleuten CP, Durning SJ. 2013. AM last page. Quality criteria in qualitative and quantitative research. Acad Med. 88 (4): 552. Frank JR, Snell LS, Cate OT, Holmboe ES, Carraccio C, Swing SR, Harris P, Glasgow NJ, Campbell C, Dath D, et al. 2010. Competency-based medical education: theory to practice. Med Teach. 32 (8): 638 – 645. General Medical Council. 2017. Designing and maintaining postgraduate assessment programmes. London : General Medical Council. Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. 2014. Seeing the 'black box' differently: assessor cognition from three research perspectives. Med Educ. 48 (11): 1055 – 1068. Gingerich A, Regehr G, Eva KW. 2011. Rater-based assessments as social judgments: rethinking the etiology of rater errors. Acad Med. 86 (10 Suppl): S1 – S7. Ginsburg S, Eva K, Regehr G. 2013. Do in-training evaluation reports deserve their bad reputations? A study of the reliability and predictive ability of ITER scores and narrative comments. Acad Med. 88 (10): 1539 – 1544. Ginsburg S, Van der Vleuten CPM, Eva KW. 2017. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data. Acad Med. 92 (11): 1617 – 1621. Gipps C, Stobart G. 2009. Fairness in assessment. Educational assessment in the 21st century. Dordrecht (Netherlands): Springer; p. 105 – 118. Govaerts M, Van der Vleuten CP. 2013. Validity in work-based assessment: expanding our horizons. Med Educ. 47 (12): 1164 – 1174. Govaerts MJ, Schuwirth LW, Van der Vleuten CP, Muijtjens AM. 2011. Workplace-based assessment: effects of rater expertise. Adv Health Sci Educ Theory Pract. 16 (2): 151 – 165. Govaerts MJB, Van der Vleuten CPM, Holmboe ES. 2019. Managing tensions in assessment: moving beyond either-or thinking. Med Educ. 53 (1): 64 – 75. Green SK, Johnson RL, Kim D-H, Pope NS. 2007. Ethics in classroom assessment practices: issues and attitudes. Teach Teach Educ. 23 (7): 999 – 1011. Greenhalgh T, Howick J, Maskrey N, Evidence Based Medicine Renaissance Group. 2014. Evidence based medicine: a movement in crisis? BMJ. 348 : g3725. Greenhalgh T, Papoutsi C. 2018. Studying complexity in health services research: desperately seeking an overdue paradigm shift. BMC Med. 16 (1): 95. Harden RM, Gleeson F. 1979. Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Educ. 13 (1): 39 – 54. Hodges B. 2013. Assessment in the post-psychometric era: learning to love the subjective and collective. Med Teach. 35 (7): 564 – 568. Jones A. 1999. The place of judgement in competency-based assessment. J Vocat Educ. 51 (1): 145 – 160. Kaldjian LC. 2010. Teaching practical wisdom in medicine through clinical judgement, goals of care, and ethical reasoning. J Med Ethics. 36 (9): 558 – 562. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. 2011. Opening the black box of clinical skills assessment via observation: a conceptual model. Med Educ. 45 (10): 1048 – 1060. Kogan JR, Hess BJ, Conforti LN, Holmboe ES. 2010. What drives faculty ratings of residents' clinical skills? The impact of faculty's own clinical skills. Acad Med. 85 (10 Suppl): S25 – S28. Marchese MC. 1992. Clinical versus actuarial prediction: a review of the literature. Percept Mot Skills. 75 (2): 583 – 594. McCready T. 2007. Portfolios and the assessment of competence in nursing: a literature review. Int J Nurs Stud. 44 (1): 143 – 151. McNamara R. 1995. In retrospect: the tragedy and lessons of Vietnam. New York (NY). Times Books, Random House. Mennin S. 2010. Self-organisation, integration and curriculum in the complex world of medical education. Med Educ. 44 (1): 20 – 30. Moulton CA, Regehr G, Mylopoulos M, MacRae HM. 2007. Slowing down when you should: a new model of expert judgment. Acad Med. 82 (10 Suppl): S109 – S116. Murad MH. 2017. Clinical practice guidelines: a primer on development and dissemination. Mayo Clin Proc. 92 (3): 423 – 433. Newble DI, Hoare J, Sheldrake PF. 1980. The selection and training of examiners for clinical examinations. Med Educ. 14 (5): 345 – 349. Norcini J, Anderson B, Bollela V, Burch V, Costa MJ, Duvivier R, Galbraith R, Hays R, Kent A, Perrott V, et al. 2011. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 33 (3): 206 – 214. Norcini J, Shea J. 1997. The credibility and comparability of standards. App Meas Educ. 10 (1): 39 – 59. Norcini JJ, Blank LL, Arnold GK, Kimball HR. 1995. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med. 123 (10): 795 – 799. O'Mahony S. 2017. Medicine and the McNamara fallacy. J R Coll Physicians Edinb. 47 (3): 281 – 287. Oudkerk Pool A, Govaerts MJB, Jaarsma D, Driessen EW. 2018. From aggregation to interpretation: how assessors judge complex data in a competency-based portfolio. Adv Health Sci Educ Theory Pract. 23 (2): 275 – 287. Plsek PE, Greenhalgh T. 2001. Complexity science: the challenge of complexity in health care. BMJ. 323 (7313): 625 – 628. Reed JE, Howe C, Doyle C, Bell D. 2018. Simple rules for evidence translation in complex systems: a qualitative study. BMC Med. 16 (1): 92. Rotthoff T. 2018. Standing up for subjectivity in the assessment of competencies. GMS J Med Educ. 35 (3): Doc29. Schuwirth LW, Van der Vleuten CP. 2006. A plea for new psychometric models in educational assessment. Med Educ. 40 (4): 296 – 300. Shaw S, Nisbet I. 2021. Attitudes to fair assessment in the light of COVID-19. Matters. 31 : 6 – 21. Ten Cate O. 2017. Competency-based postgraduate medical education: past, present and future. GMS J Med Educ. 34 (5): Doc69. Ten Cate O, Billett S. 2014. Competency-based medical education: origins, perspectives and potentialities. Med Educ. 48 (3): 325 – 332. Ten Cate O, Regehr G. 2019. The power of subjectivity in the assessment of medical trainees. Acad Med. 94 (3): 333 – 337. Ten Cate O, Scheele F. 2007. Competency-based postgraduate training: can we bridge the gap between theory and clinical practice? Acad Med. 82 (6): 542 – 547. Tierney RD. 2013. Fairness in classroom assessment. In: SAGE handbook of research on classroom assessment. CA: SAGE Publications, Inc. Chapter 8. p. 125. Tierney RD. 2014. Fairness as a multifaceted quality in classroom assessment. Stud Educ Evaluation. 43 : 55 – 69. Valentine N, Schuwirth L. 2019. Identifying the narrative used by educators in articulating judgement of performance. Perspect Med Educ. 8 (2): 83 – 89. Van der Vleuten CP, Norman GR, De Graaff E. 1991. Pitfalls in the pursuit of objectivity: issues of reliability. Med Educ. 25 (2): 110 – 118. Van der Vleuten CP, Schuwirth LW. 2005. Assessing professional competence: from methods to programmes. Med Educ. 39 (3): 309 – 317. Van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. 2015. Twelve tips for programmatic assessment. Med Teach. 37 (7): 641 – 646. Watling CJ. 2014. Unfulfilled promise, untapped potential: feedback at the crossroads. Med Teach. 36 (8): 692 – 697. Wayne DB, Green M, Neilson EG. 2020. Medical education in the time of COVID-19. Sci Adv. 6 (31): eabc7110. Woolf SH, Grol R, Hutchinson A, Eccles M, Grimshaw J. 1999. Clinical guidelines: potential benefits, limitations, and harms of clinical guidelines. BMJ. 318 (7182): 527 – 530. Yankelovich D. 1972. Corporate priorities: a continuing study of the new demands on business. Stamford, CT : Yankelovich Inc. Yeates P, O'Neill P, Mann K, Eva K. 2013. Seeing the same thing differently: mechanisms that contribute to assessor differences in directly-observed performance assessments. Adv Health Sci Educ Theory Pract. 18 (3): 325 – 341. Young JQ, Van Merrienboer J, Durning S, Ten Cate O. 2014. Cognitive load theory: implications for medical education: AMEE Guide No. 86. Med Teach. 36 (5): 371 – 384.

By Nyoli Valentine; Steven J. Durning; Ernst Michael Shanahan; Cees van der Vleuten and Lambert Schuwirth

Reported by Author; Author; Author; Author; Author

Dr. Nyoli Valentine , MBBS, MPH, Prideaux Health Professions Education, Flinders University, Bedford Park, SA, Australia.

Professor Steven J. Durning , MD, PhD, Center for Health Professions Education, Uniformed Services University of the Health Sciences, Bethesda, MD, USA.

Professor Ernst Michael Shanahan , BMBS, MPH, MHPE, PhD, Prideaux Health Professions Education, Flinders University, Bedford Park, SA, Australia.

Professor Cees van der Vleuten , PhD, Department of Educational Development and Research, Maastricht University, Maastricht, the Netherlands.

Professor Lambert Schuwirth , MD, PhD, Prideaux Health Professions Education, Flinders University, Bedford Park, SA, Australia.

Titel:
The pursuit of fairness in assessment: Looking beyond the objective
Autor/in / Beteiligte Person: Valentine, Nyoli ; Durning, Steven J. ; Ernst Michael Shanahan ; Cees van der Vleuten ; Schuwirth, Lambert
Link:
Zeitschrift: Medical Teacher, Jg. 44 (2022-02-01), S. 353-359
Veröffentlichung: Informa UK Limited, 2022
Medientyp: unknown
ISSN: 1466-187X (print) ; 0142-159X (print)
DOI: 10.1080/0142159x.2022.2031943
Schlagwort:
  • Judgment
  • Health Occupations
  • Humans
  • Educational Measurement
  • General Medicine
  • Workplace
  • Competency-Based Education
  • Education
Sonstiges:
  • Nachgewiesen in: OpenAIRE

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -