Conventional approaches to assessment involve teachers and examiners judging the quality of learners work by reference to lists of criteria or other 'outcome' statements. This paper explores a quite different method of assessment using 'Adaptive Comparative Judgement' (ACJ) that was developed within a research project at Goldsmiths University of London between 2004 and 2010. The method was developed into a tool that enabled judges to distinguish better/worse performances not by allocating numbers through mark schemes, but rather by direct, holistic, judgement. The tool was successfully deployed through a series of national and international research and development exercises. But game-changing innovations are never flaw-less first time out (Golley, Jet: Frank Whittle and the Invention of the Jet Engine, Datum Publishing, Liphook Hampshire, 2009; Dyson, Against the odds: an autobiography, Texere Publishing, Knutsford Cheshire, 2001) and a series of careful investigations resulted in a problem being identified within the workings of ACJ (Bramley, Investigating the reliability of Adaptive Comparative Judgment, Cambridge Assessment Research Report, UK, Cambridge, 2015). The issue was with the 'adaptive' component of the algorithm that, under certain conditions, appeared to exaggerate the reliability statistic. The problem was 'worked' by the software company running ACJ and a solution found. This paper reports the whole sequence of events—from the original innovation, through deployment, the emergent problem, and the resulting solution that was published at an international conference (Rangel Smith and Lynch in: PATT36 International Conference. Research & Practice in Technology Education: Perspectives on Human Capacity and Development, 2018) and subsequently deployed within a modified ACJ algorithm.
Keywords: Adaptive comparative judgement (ACJ); Assessment; Reliability; Holistic judgement
I am indebted to Tom Bramley of Cambridge Assessment for his helpful conversations and generosity with providing sources and to Ruth Wright formerly Head of Research at the Engineering Council UK for her valuable review of the manuscript.
This paper concerns a story of innovation in assessment. Historically, the assessment of learners' performance was undertaken by ranking candidates rather than by marking, which only emerged in the 18thC as the industrial revolution expanded the number of candidates for examination. For the last 200 years, with numbers-based marking as an overwhelming methodology, 'true-score-theory' has dominated educational assessment. This paper examines a radical departure from this norm. Specifically it concerns the creation of Adaptive Comparative Judgement (ACJ), an assessment method that was developed within a research project at Goldsmiths University of London between 2004 and 2010.
The project in question was funded by the Qualifications and Curriculum Authority (QCA UK) who were interested to develop digital approaches to assessment for General Certificate of Secondary Education (GCSE; 16+) examinations in England & Wales. The interest of QCA was two-fold; first to create digital approaches to learner-portfolio-building in 'performance' subjects like design & technology (d&t), and second to develop digital approaches to the assessment of those portfolios. The original project was entitled 'e-scape' (e-solutions for creative assessment in performance environments) and work progressed through three phases. Phase 1 was dominated by the concerns of web-portfolio-building. Phase 2 involved a small schools trial of the resulting portfolio approach and a prototype version of ACJ was developed for web-based assessment of the portfolios. This was then fully explored in phase 3, which involved a trial in 17 schools in four regions of England & Wales and produced 357 d&t portfolios. And it was in this phase 3 that a fully developed version of ACJ was first employed for the assessments. (Kimbell et al. [
The story of the project was fully articulated in a Special Issue of the International Journal of Technology and Design Education in 2012 (Williams and Kimbell [eds] [
This paper is not intended as a further exploration of ACJ applications within curriculum or assessment. Rather, it arises because of a technical challenge to ACJ that was raised by Tom Bramley of Cambridge Assessment in 2015. The original comparative judgement algorithm of ACJ had been developed by a team of people including Alastair Pollitt, and Karim Derrick who had both contributed papers to the 2012 IJTDE Special Issue. Bramley ran simulations with the method and was convinced that the reliability levels of ACJ assessment sessions reported in the literature were inflated by the adaptivity of the algorithm (Bramley [
The conventional approach to assessment, eg in current GCSE examinations, is to set questions or challenges that require 'answers' or other outcomes that are then measured against what is deemed (by the examining authority) to be an ideal answer or outcome. This ideal answer or outcome is specified through sets of criteria that are used to decide how thoroughly/accurately the learner has responded to the questions or challenges. The criteria are typically associated with numerical scores, and by adding up learners' scores against each of the criteria, a final score is arrived at reflecting the learners overall level or ability in that examination or assessment (Gipps [
The process of identifying grade-related criteria, or performance-indicators, or Statements-of-Attainment, expanded hugely through the 1980s and 1990s (Kimbell [
There is a long history of challenges to this atomised view of assessment, and particularly in the professional milieu of classrooms and the behaviour of learners and teachers, as Schon ([
This notion of 'tacit' knowing was presented many years earlier by Polanyi ([
Wiliam ([
And the same year ...To the extent that the examiners agree, they agree not because they derive similar meanings from the regulation, (ie they are not criterion-driven) but because they already have in their minds a notion of the required standard. The consistency of such assessments depend on what Polanyi ([
The final word on this should be Polanyi's where he points out that connoisseurship, like skill, can be communicated only by example, not by precept. (Polanyi [
Three of the ideas outlined above—the uncertainty associated with generic criteria; the connoisseurship of expert judges; and the importance of judging through examples (Polanyi)—contributed to the awareness that Goldsmiths (QCA) 'e-scape' project might be an ideal vehicle within which to develop a quite new approach to assessment. An additional contributing factor was that, since the assessments were to be web-based, a computer-managed approach to the assessments would be essential. The issue (in 2005) resolved itself into the question of how teachers' professional judgements of quality could be reconciled within a computer-based assessment approach.
The comparative judgement method was first articulated by Louis Thurstone in a series of articles concerned with the measurement of the psychological perceptions of physical stimuli such as tones and loudness, and of psychological variables such as values and attitudes (Thurstone [
In 2004, Pollitt had explored the difficulty of making judgements by reference to generic criteria or 'grade descriptions' in school-based assessment.When we try to judge a performance against grade descriptors we are imagining or remembering other performances and comparing the new performance to them. But these imagined performances are unlikely to be truly representative of performances of that standard, and very likely to vary in the minds of different judges. (Pollitt [
Pollitt goes on to recommend an approach that involves the direct comparison of one piece with another, and for a very good statistical reason. Imagine we are comparing two pieces of work. Which is more thorough.. A or B? As Pollitt explains..... when a judge compares two performances (using their own personal 'standard' or internalized criteria) the judge's standard cancels out. ...(Pollitt [
An easy/lenient judge might think them both thorough, but one is more-so. Or a strict judge might think them both not-thorough, but one will still be more so than the other. In either event, and despite their different personal standards, they will both identify the same more-thorough piece. As Pollitt noted, their personal standards cancel out, and as Polanyi noted, direct comparison facilitates judgement far better than abstract 'precepts'.
In 2006, as the Goldsmiths e-scape project began to address the challenge of making judgements of the emerging learner web-portfolios, they contacted Pollitt to see whether there might be any useful interaction with his ideas of comparative judgement. A trial was established with twenty paper-based portfolios of known quality (identified in a previous project at Goldsmiths) that could therefore be placed in a rank order. These twenty were judged by a new team of six researchers using a manual (spreadsheet-based) version of Pollitt's comparative judgement approach. The emerging rank correlated well with the original rank (Spearman's correlation co-efficient = 0.89). (Kimbell et al. [
Pollitt's comparative judgement approach had formerly been constrained by these logistic difficulties and was used purely as a research tool. This trial experience made it obvious that for the e-scape project to work, two linked web-based systems needed to work together. The web-portfolio part of the system needed to speak directly to the comparative pairs assessment part of the system (at that time called the 'pairs engine'). The combined e-scape software systems would then make it possible—for the first time—for comparative judgement to become a front-line assessment tool. (Kimbell in Williams and Kimbell [eds] [
The requirement for a level of adaptivity to enhance the efficiency of the algorithm was later fully articulated by Pollitt in The Method of Adaptive Comparative Judgement (Pollitt [
There were many aspects of the e-scape project that were new, not least the challenge of creating on-line (web-based) digital portfolios of performance (including drawings, sound-files, video, photographs and text). The trick with e-scape was that these portfolios were created live (in real-time) from the studio/workshops where learners were undertaking the 7 h design task (see Williams and Kimbell [eds] [
The assessment judgements were made by a team of 28 judges who each judged approx 120 pairs of portfolios. In each case the software that used the ACJ method presented the judge with two portfolios that could be studied separately or together on a split screen. The principal role of the judge was to review them both and to decide which of the two portfolios (A or B) was the stronger. The 'pairs engine' software then presented another pair for comparison. The judges were very familiar with the work as most of them were the teachers in the trial schools, and others were from research groups [Australia/Ireland/Israel/USA] that were interested in—and had been closely following—the work involved in the e-scape project.
If one imagines the 350 portfolios distributed in a matrix, then that matrix is 350 × 350 and there are 122,500 units in the matrix. If every portfolio was to be compared with every other portfolio, then there were 61,075 potential pair combinations. In fact our 28 judges each did 120 comparisons or (in total) 3416 judgements, representing less than 6% of the possible matrix combinations. And yet it still generated a reliable outcome. Pollitt explains the rationale underlying this efficiency, and it turns on the adaptive mechanism for selecting the pairs of portfolios for comparison. As a simple illustration, imagine a sequence of paired comparisons; A beats B, B then beats C, and C then beats D. There is an extremely high probability that A would beat D so an adaptive algorithm uses this information to pick more useful (closer) pairings.The improvement in efficiency targeting generates is similar to that observed in computer-adaptive testing, where a student's 'ability' is re-estimated after every item, and the next item presented is chosen to match closely the new estimate. In adaptive testing, savings have been made of 50% or more in the number of items needed to reach the same level of accuracy as with conventional tests (Weiss [
It is important to understand the e-scape 'pairs engine' procedure, that operated through 'rounds' of judging. A round was completed when all the portfolios had been judged at least once against a selected other portfolio. In the first round the selection was random and the first four rounds of judging operated with a 'Swiss Tournament' procedure.The Swiss Tournament system comes from the world of tournament chess, where it is the most common way to arrange the pairings so that every player is fairly tested, and a winner is found even though no player is ever fully "knocked-out". Using the Swiss Tournament in this context, the first pairs presented to the markers were chosen at random, and the winner received one point. In following rounds the pairs were chosen from groups of 'players' with the same number of points. For example, at the start of round 3 some players had 2 points (having won 2 matches) others had 1 (won one and lost one) and some had zero (lost both). (Pollitt et al. [
At the end of the four Swiss rounds, and with an approximated rank across 5 bands (0–4), the algorithm then moved into the full-scale Rasch estimation rounds and the process for choosing the pairs of portfolios changed.... the algorithm checks a script against all the other scripts in the system. It looks at how many times they were compared and how many times the current script won and lost. This data is then used to calculate the "ideal" parameter value for this script. A separate calculation is then made involving the number of wins and the current parameter value for each script. The difference between these two values is noted and a third calculation is made to generate an adjustment figure for the current script. (Pollitt et al. [
In choosing to compare two portfolios it is important how far apart they are on the putative scale of quality (a representation of differences in quality). Comparing pieces that are a long way apart (a very good piece and a very poor piece) makes it easy for the judge to decide which is better, but very little information is imparted to the system. On the other hand if they are close together, the judge may struggle to distinguish the better piece but the system gains much more information.In statistics the 'information' contributed by a single judgement, is quantified in terms of the modeled probability that it will have one or other output: where p is the probability that the first portfolio would be judged to have more quality, q (equal to 1-p) is the probability that the second would be, and I is the amount of information the comparison adds to the analysis. This function is at a maximum when p and q are both equal to 0.5, it declines slowly at first but more rapidly as p rises beyond 0.7 or falls below 0.3. (Pollitt et al. [
Based on earlier phases of the trial, 0.67 logits (a unit of probability) was used as the separation factor for these Rasch analysis rounds meaning that the odds of one of the portfolios winning were 2:1. The judges found this acceptable in terms of distinguishing the quality of the two portfolios, and the information gathered was still 90% of that from a statistically 'ideal' pairing. A number of other safeguards were also built into the 'pairs engine' algorithm, eg to balance the overall number of judgements for each portfolio, to balance the number of times a script has been seen by the same judge, and to prevent the same pair being shown to the same judge. The rounds of judging continued in this way with each new estimation round being triggered as before, when all the scripts have been involved in one more comparison. As each estimation round was completed, the parameter values for each script were reviewed as well as the summary data for the whole estimation process.
Theoretically the process could just go on and on, until the conditions are met that terminate the process. In the 2009 trial the conditions were met after 17 rounds. At that point Pollitt reported as follows:The final scale spread the portfolios out with a standard deviation of almost 3 units. The average measurement uncertainty for a portfolio was about 0.67 units, and the ratio of these two figures was 4.45. This means that the standard unit of the scale was almost 4.5 times as large as the uncertainty of measurement. This means the portfolios were measured with an uncertainty that is very small compared to the scale as a whole; this ratio is then converted into the traditional reliability statistic – a version of Cronbach's alpha or the KR 20 coefficient. The value obtained was 0.95, which is very high in GCSE terms. (Pollitt in Kimbell et al. [
The principal output from the pairs engine was a set of parameter scores on a scale representing the quality of each portfolio. Theoretically, if this were to be a GCSE assessment (QCA was interested to do that), then grades could be calculated along that scale, but readers should ignore these here as they were imposed through a separate process and were not calculated automatically within the algorithm (Fig. 1).
Graph: Fig. 1 The ranking output from the e-scape 'pairs engine'
Many additional elements of data were available through the algorithm. Each portfolio had a 'portfolio-misfit' calculation that indicated which (if any) portfolios had caused judges to disagree; each portfolio score (parameter value) had a 'standard error' calculation which indicated the degree of confidence the system had about the accuracy of the parameter value (standard error reduces with more rounds); each judge had a 'misfit' calculation identifying their consensuality (judgement consistency coefficient) with the judge group as a whole; the judging interface allowed judges to enter notes about each portfolio that could be reviewed by the administrator; each judgement was timed and average times were available for each judge. These and many more features made this a very carefully monitored assessment process.
We should note a most important, and perhaps the most astonishing, features of the process. The judges were able to make reliable assessments of very complex multi-media portfolios of performance by just comparing pairs of portfolios and using holistic judgement. And moreover they could do it speedily. Our judges were mainly experienced teachers who were familiar with prevailing school-based assessment methods and in the post-judgement review they were clear about the contrast. All the judge comments are from Kimbell et al. ([
They also commented on the holistic nature of the assessment.It gives more appropriate results than atomised approaches which can lead to inaccurate overall assessment especially when the overall attainment is more than the sum of the parts. This often happens when the various elements of a designing process come together in a successful outcome that outstrips the quality of work in any (or all) the parts of the process. (DP).GCSE marking relies heavily on a tick box assessment of a pupil's work. It can be frustrating when confronted with an excellent piece of designing and making that does not meet the exam board's criteria. Too often the linear pattern of coursework requires the assessor to jump back and forth to find the marks that a student deserves. The e-scape judging is so simple in comparison. (AM).One of the major strengths of holistic judgements I see is its flexibility... in which you can give credit to students for what they have actually done rather than whether they are able to "tick the boxes" to match a set of assessment criteria. (DW).
But additionally, they had a view about the enhanced fairness of the approach, since with the e-scape 'pairs engine', each portfolio was seen by many judges—not just their teacher and (perhaps) a moderator as would normally be the case with portfolio assessments at GCSE.The judging system feels to be fair; it doesn't rely on only one person assessing a single piece of work. It removes virtually all risk of bias.... It feels safe knowing that even if you make a mistake in one judgement it won't significantly make a difference to the outcome or grade awarded to the student as other judges will also assess the same project. Also knowing that the system automatically checks the consistency of the assessor's judgements again reinforces the feeling of fairness that this process brings. (DW).
Interestingly, despite its origins within a summative assessment setting for GCSE, the use of ACJ in classrooms has increasingly been focussed on formative assessment for learning. This possibility arose when (in discussion with some of the e-scape trial teachers) we asked the question 'what if we ask the learners themselves to be the judges?' When learners themselves are asked to review two pieces of work and to identify which they think is better, and why, it inevitably leads to discussion about what the learners mean by 'good' and 'better', and the concrete examples of work make it easier for learners to crystalise and articulate their own constructs of quality – as Polanyi argued 60 years ago. McLaren, (primary classrooms in Scotland) and Seery & Canty (Higher Education in Ireland) pioneered this strand of pedagogic applications (both in Williams and Kimbell [eds] [
The use of comparative judgement within a schools assessment context gained increasing attention during the years of the e-scape project (2004–10). Shortly after the publication of the e-scape phase 3 (2009) report, Cambridge Assessment released a two-page summary of 'Rank ordering and paired comparisons—the way Cambridge Assessment is using them in operational and experimental work' (Bramley and Oates [
Subsequently, in 2015, Bramley produced a Cambridge Assessment Research Report specifically ... Investigating the reliability of Adaptive Comparative Judgment (Bramley [
The approach was used by Jones and Alcock ([
In combination, it was Tom Bramley's initial challenge, linked to the problem observed by Digital Assess (the developers) that led to the launch of a thorough investigation of the reliability problem. A number of causes were considered for this result, including looking within the algorithm itself and particularly within that part of the algorithm that dealt with the distribution and matching of pairs.
The investigation was presented in a report "Addressing the issue of bias in the measurement of reliability in the method of Adaptive Comparative Judgment" (Rangel-Smith and Lynch [
- The Standard Deviation (SD) of the items (portfolios): This is a measure of discriminability; the range of quality represented in the items. For low SD the quality range is small (say between − 2 and + 2) so discriminating judgements are difficult to make. But for bigger SD the quality range can be more like − 10 to + 10 and discrimination is much easier. Discriminability interacts with the expertise of judges as inexperienced judges will be less able to distinguish the quality of objects, whereas expert judges can achieve higher consensus and discrimination power.
- Level of adaptivity: This concerns making use of the parameter value of a portfolio (estimated after the previous round) to choose the portfolio that it will next be judged against. How far apart should they be on the putative scale of quality? The "gap" between portfolios becomes critical; a big gap (eg. comparing a very good piece with a very poor piece) produces easy judgements but less information for the system, whereas a small gap produces difficult judgements but more information for the system.
- Scale Separation Reliability (SSR): The resulting reliability statistic generated within the algorithm. It was this figure that Bramley challenged, claiming a 'bias' (reliability inflation) generated by the adaptivity.
In order to test the effect of adaptivity, Rangel-Smith and Lynch simulated the judging process many times with different starting hypotheses about the SD of the items and the level of adaptivity (the gap). In all the simulations, the "true quality" parameters of 100 objects were generated and were judged approximately the same number of times, and the session was simulated to last 40 rounds of judgments. Each hypothesis was simulated independently 40 times to reduce the effects of any statistical fluctuations in the result. Four central findings emerged.
First, for all hypothesized variables, the system shows bias in the reliability values in early rounds (less than 10), where there is not enough data. This arises because there is too much uncertainty for the information function to work effectively. This bias reduces as the data expands through later rounds. Second, an adaptive algorithm maximizes the performance of the system where there is higher discrimination (expert judges). On the other hand, the adaptivity process brings a bias in the measurement of the reliability in cases where the consistency in the judges is poor (inexperienced judges). Third, with non-adaptive (random) allocations, the bias of the "SSR" metric is smaller than in highly adaptive systems. Fourth however, the reliability performance of non-adaptive (random) selection is significantly poorer than in an adaptive system, where (with high discrimination) the system can reach a value of "True Reliability" 10 rounds earlier than the random allocation system. (See also Crompvoets et al. [
So adaptivity brings advantages and disadvantages. It produces a good SSR result quicker (fewer rounds of judging) especially when using expert judges. But in early judging rounds there will be SSR inflation, and when using inexpert judges the system will need significantly more rounds to generate a reliable result.
The Rangel-Smith/Lynch paper therefore recommended an approach that combines a 'controlled' level of adaptivity (1.5–2.5 logits) rather than the 0.67 of the original e-scape 'pairs engine' algorithm. This increased 'gap' between the selected portfolios makes it easier for judges to distinguish the winner. The second element of the solution involved the Standard Deviation (SD) of the portfolios. An SD of 0.0 logits is no discrimination (like tossing a coin); whereas 1.9–2.5 logits is medium discrimination (non-experienced judges), and 6.6 logits is high discrimination (expert judges). So the Scale Separation Reliability generated by the algorithm is based on three factors; the 'gap' between selected portfolios; the SD of the items (expertise of the judges), and the number of rounds of judging.
The key chart that illustrates this in the Rangel-Smith/Lynch paper is shown here. It shows the SSR value, as a function of the number of rounds in a judging session. Assuming an SD of 1.9 and a gap of 1.5 logits, then the red dots show the SSR in relation to the number of rounds. At 12 rounds the value is 0.87, at 15 rounds it is 0.89, and at 20 rounds it is > 0.9 (Fig. 2).
Graph: Fig. 2 Scale Separation Reliability (SSR) improves through 'rounds' of judging
The Rangel-Smith/Lynch study therefore recommends as follows:This study advises against using the highest level of adaptivity, where the pair of objects allocated are the closest in its parameter values.... It is recommended to run an ACJ session with a "controlled" level of adaptivity, translated by using a minimum "Gap" size to separate the allocated objects (1.5 to 2.5 logits).... Depending on the chosen "Gap" value used in the session, there is a minimum number of rounds that have to occur before trusting the "SSR" metric as a reliability measurement. For a "Gap" vale of 1.5 logits there should be a minimum of 15 rounds, while for a separation of 2.5 logits it can be 12 rounds. (Rangel-Smith and Lynch [
In March 2020 and then again in Oct 2020 I had a discussions with Tom Bramley in which I asked him whether he thought the solution presented in the 2018 paper would deal with the concerns that he had raised in his 2015 paper. In extended and frank discussions he made three points in relation to the proposed solution. First, he thought that the SD values for discriminability should be set at what the paper describes as 'medium' discrimination. He believes a value of 1.9 is realistic. Second, he thinks the 'gap' is a sensible approach since (as the paper argues) it reduces the extent of the adaptivity and of any reliability inflation. He thought the gap value of 1.5 logits was sensible. Third, he agrees with Rangel-Smith/Lynch that this would then require 15 rounds of judging to remove the bias (reliability inflation) and 20 rounds to generate an SSR value > 0.9.
So the two-part question that launched this paper has its answers. Bramley and Rangel-Smith/Lynch agree that (i) the original 'pairs engine' algorithm did cause SSR inflation in particular conditions; with low SD, with inexpert judges and particularly in early rounds of judging. However they also agree that (ii) the new algorithm, in the conditions discussed here, will eliminate bias (reliability inflation) and will produce a secure SSR value of > 0.9.
Since the Rangel-Smith/Lynch paper was written, the company RM Education (a leading supplier of learning and assessment resources to the education sector) has acquired the original 'pairs engine' algorithm from Digital Assess and has already implemented the recommendations of the Rangel-Smith/Lynch paper. The RM Education product 'RM Compare' now optimises the algorithm to offer the advantages of an adaptive approach to comparative judgement, whilst minimizing any reliability inflation.
Beyond the three specific points that Bramley made about the Rangel-Smith/Lynch proposal, he added an important, and more generic question. All parties agree that adaptivity works both ways ... it enables a more efficient selection of pairs but it runs the risk of reliability inflation. (Crompvoets et al. [[
One of the features emerging from the research into the method of ACJ is that it illustrates the mistake of thinking in terms of using or not-using adaptivity. Such binary (on/off) thinking is less helpful than thinking about degrees of adaptivity (turning it up/down). All tools have advantages and disadvantages in the jobs that they do—but we do not say that therefore we won't use any tools. Rather we seek to use tools for what they are good at whilst taking steps to avoid their disadvantages. In the case of ACJ, an effective algorithm should seek to take advantage of the efficiency benefits of adaptivity (a proven useful tool in computer-based assessments) whilst controlling the tendency towards bias in early rounds.
In the years since the e-scape project at Goldsmiths first launched the ACJ 'pairs engine' into the arena of educational assessment, many more comparative judgement tools have emerged. 'No-More-Marking' have developed a comparative judgement tool particularly to help teachers and learners with writing tasks; Microsoft research have developed 'TrueSkill' using ranking in the context of doctors' judgments of videos of patients; in Belgium, Digital Platform for Assessing Competencies (D-PAC) has a digital tool to help in the assessment of video and image; and Bramley and his colleagues at Cambridge Assessment have developed 'Cambridge CJ scaling'. Even the UK Government—in the form of Ofqual—is 'running pilot studies involving comparative judgement methods for capturing expert judgement for the purpose of standard maintaining' (Ofqual [
The method of Adaptive Comparative Judgement has existed for only a very short time. It is about 15 years old and has emerged into fields of educational scholarship (assessment and pedagogy) that have histories spanning centuries. In that 15 years it has started to open many new doors at the interface of assessment with learning.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
By Richard Kimbell
Reported by Author