Exam results: Missing the mark and shifting the target

With the exam results in England, Wales, and Northern Ireland due to emerge within the next 24 hours, febrile anticipation is mounting. But confidence in the examination system is now at an all-time low as more about the process is revealed.  Two things have emerged in the lead up to the results that expose serious problems inherent in the process as explained here.

Post author Mike Larkin

Shifting the target.

Firstly, the government has indicated that the distribution of exam grades in 2022 should move closer to that of 2019.  This tells us up-front that grade boundaries are not fixed. Instead, they are moved up and down to get a desired outcome in terms of grade distribution. In this case, fewer high grades in 2022. The regulator in England, Ofqual, will seek to land somewhere between the distribution of 2019 and 2021.  The higher grade results of 2020, reached using teacher assessed grades with no exams held, are left out as anomalous. This move represents an attempt to return to a pre-pandemic ‘norm’.

Missing the mark.

Secondly, there is growing disquiet about the grades awarded being unreliable or unfair. This is amply reflected in the title of a new book from Dennis Sherwood ‘Missing the Mark’ (Canbury Press 2022).  It has been published just days before the 2022 results are released. The arguments are well explained, if not a little complex for the general reader. However, teachers at all levels and most A-level students will be able to grasp the implications with a little application.  They may be dismayed and shocked.

The response to the book has been mostly muted for now. However, in ‘Grade Expectations – Will 1.5 million GCSE and A-Level grades be wrong this summer?’ Rob Cuthbert, Emeritus Professor of Higher Education Management at the University of the West of England, doesn’t hold back. After the suggestion that “If Ofqual remains resistant to change, then universities need to act instead, by changing how they select students” his conclusion was “At present, the school examinations system has a fail grade. As an examiner might say, it ‘must do better’. This book shows the way”.

Put very simply, the essence of the problem is that most examination grades are only reliable to +/- one grade. Thus, a B grade could have been either a C or A grade and so on. This is described as caused by ‘fuzziness’ around grade boundaries.  The ‘fuzziness’ appears to vary depending on the subject being examined.  For example, mathematics and sciences fare better than many other subjects in offering a higher probability of a ‘definitive’ grade being offered’ as shown in the figure below.

This figure is from Ofqual’s own data from 2018 and was reproduced by Sherwood to illustrate the problem. The figure persisted in August 2020 in the report by Ofqual ‘Awarding GCSE, AS, A level, advanced extension awards and extended project qualifications in summer 2020: interim report August 2020’.

Under questioning from the redoubtable Ian Mearns at a Commons Education Committee hearing in September 2020, the then Acting Chief Regulator at Ofqual, Glenys Stacey, admitted that “It is interesting how much faith we put in examination and the grade that comes out of that. We know from research, as I think Michelle mentioned, that we have faith in them, but they are reliable to one grade either way”. Oops!

TEFS concluded that Ofqual merely demonstrates that they “get the grades wrong most of the time, but do this with great precision” (TEFS 18th August 2020 ‘Exams 2020 and the demise of Ofqual, who pays the ferryman?’). 

Where do the ideas come from?

Sherwood has not plucked the problem from thin air. Originally educated as a scientist, he has extensive experience in both science and management consultancy. At one stage he even acted as a consultant for Ofqual. The result is that ‘Missing the Mark’ has the unmistakable hallmarks of being authored by a scientist who is driven by logic.

This is unsurprising as he has authored books that range from Crystals, X-rays and Proteins’ in 1976  to the most recent ‘How to Be Creative: A Practical Guide for the Mathematical Sciences’ earlier this year.

Sherwood simply sees an obvious problem and seeks to explain its causes.  Also, his observations arise from Ofqual’s own analyses over recent years. This explains why they have failed to challenge Sherwood’s assertions. They simply have no answer.

Sherwood started investigating in 2019 with a series of articles on the Higher Education Policy Institute (HEPI) site.  In a January 2019 he introduced us to the startling idea that ‘1 school exam grade in 4 is wrong. Does this matter? That same summer he offered some solutions in ‘Students will be given more than 1.5 million wrong GCSE, AS and A level grades this summer. Here are some potential solutions. Which do you prefer?’. That this fell on deaf ears meant the more extensive arguments in his 2022 book were bound to emerge.

The ghost in the machine

TEFS sought to look more closely at the overall issues back in late 2020 as the dust settled from the examination fallout earlier that summer. Examinations and ‘the ghost in the machine’ (11th December 2020) summarised a series of five blog articles looking in more depth underneath and around the issues raised by the observations of Sherwood.

Ofqual fights back  reported the defence mounted by Ofqual and the realisation that they could easily make the same mistakes again.

What of fairness and equality?  Looked at why Ofqual thought its methods were fair, despite the lack of reliability in the examination marking/grading process itself.

Accuracy, reliability and the ‘William Tell’ effect  considered further the inherent problem of the examination grading processes being unreliable. It appeared that the goal of maintaining standards as the main aim was at the expense of equality and fairness. 

Are the teachers or the students being assessed?  was the question then asked. The answer appeared to be that schools or colleges and their teachers are really being assessed in examinations taken by their students. Observations by Ofqual of a ‘sawtooth’ effect in exam performance before and after reforms of examinations illustrated the extent to which the effectiveness of teachers and schools influences results. 

Finally, Philosophical musings examined the ‘behaviourist’ philosophy that appeared to underpin the blind reliance on single examinations.

There is acceptance that presupposes there is an inherent ability in every individual student that can be readily measured. This manifests itself in looking narrowly at attainment that is tested in a single ‘one-off’ examination. But the conclusion that marking and grading examinations is unreliable was bound to emerge. It ignores other data available about the student, the context of the learning environment and the many less tangible attributes that make us who we are. It seems the ‘ghost in the machine’ is ignored and left sitting on the bench waiting to be called upon to show its ability.

The infamous ‘algorithm’.

As the pandemic shut down formal examinations, the government and Ofqual had no choice but to revert to teacher assessed grades or TAGS. But with emphasis on final examinations being so great, some schools and teachers lacked enough formal data. The result was an almost incoherent mess. Ofqual were prepared for the likelihood of over optimistic TAGS and employed an ‘algorithm’ to adjust grades, mostly downwards. This was done by referencing the past grades achieved by students at schools and colleges. This conveniently favoured the most advantaged, especially in independent schools. By failing to consider the ‘individual’ by tarring them with the same brush as past students at their school, Ofqual set off an earthquake of indignity. After Scotland realised it had made a similar mistake, within days of the results, the algorithm was binned and the raw TAGS were used. The rest of the UK followed fast.

By reverting to adjusting grade boundaries in 2022, Ofqual are in danger of more aftershocks.

Ofqual tries to explain

The regulator Ofqual will put on a brave face but cannot hide from the inevitable fallout.  In the context of the expected fall in the number of students with higher grades, they have issued two guides for students. Firstly, ‘Student guide to exams and formal assessments in 2021 to 2022’ and ‘Exam results 2022: 10 things to know about GCSE, AS and A level grades’. The notion that the grades might not be reliable is missing. Instead they confirm that “Exam boards will use data from 2019 and 2021 as a starting point for grading”. Then go on to admit that “Grade boundaries will likely be lower than when summer exams were last sat in 2019”.

Ofqual are keen to stress that “GCSEs, AS and A levels are also not norm-referenced. There is no quota for the number of students that can get a particular grade – and there never has been. Grade boundaries are never set until after students have sat the assessments and they have been marked”.

This means there is no quota set in advance. However, this will hardly convince the sceptics who see grade boundaries shifted to match the target after marking. This target lies someplace between 2019 and 2021.

For how long have we missed the mark?

This is not a new thing and the problem of accurate marking and setting reliable grades has been around as long as national examinations have existed.

The obvious conclusion is that Ofqual should concentrate first on getting their marking much more accurate and reliable across all subjects. By comparison, setting standards by a ‘normalisation’ process is a relatively trivial task. However, the danger lies in setting a minimum standard for university access and then using a tightening of grade boundaries to restrict numbers. This would go back to the stealth tactics used in the 1970’s to define those who Robbins considered were in the ‘pool of ability’. (TEFS 17th August 2017 ‘A-Level Playing Field or not: Have things changed over time?’

The ‘Robbins Principle’ stated in 1963 that Higher Education “should be available to all who were qualified for them by ability and attainment”. The questions remains today about how we determine the size of the ‘pool’ and who is in the ‘pool of ability’ (Robbins Committee on Higher Education Report, 1963).

Implications for assessments in universities.

The analysis in ‘Missing the Mark’ has much wider implications than just the A-levels or GCSEs.  The application of any student assessment process will have its problems. This includes the assessment of students in degree courses. As an aside, I have considerable experience of this across Biosciences in several Russell Group universities.  But there are significant differences in universities and some students do not adjust to this regime very quickly.  Firstly, the students are taught by many different staff and the exams are set by the staff themselves. The questions and marking are overseen by a series of external examiners.  Secondly, the grade boundaries for each degree class are set at the outset and are not moved around to fit any expectations. Thus, if a group of students do particularly well in one year, there will be more firsts awarded. Thirdly, and of crucial importance, a wide range of information on a student’s work is considered. Not just a few exams at one time point. Continuous assessments and projects are factored in and results across three or four years of a degree are used. If there are unexpected inconsistencies in the marks achieved, especially when landing near grade boundaries, these are looked at closely and mechanisms exist to compensate for this. It can be time consuming but the future careers of the individual student are at stake.

The solutions possible

Sherwood does not shy away from offering solutions to the problem of reliability and ‘fuzziness’ in examination marks. There are a total of fourteen ideas on offer and each are convincing and have merit. The simplest would be for the exam boards to publish the reliability or ‘fuzziness’ of the grade awarded and even consider a percentage mark with confidence limits. 

TEFS would go much further and incorporate a wider range of data on each student to offer a more rounded assessment. Any inconsistency between teacher expectations, course work assessments and final examination results should be investigated. The application of something akin to the ‘Grade Point Average’ used in the USA would be valuable.

Despite this debate, universities and employers are already adjusting to the uncertainties and the apparent grade inflation.  Many are simply resorting to their own tests. This is amply illustrated in the announcement this week that ‘PwC removes 2:1 criteria for undergraduate and graduate roles to ensure it doesn’t miss out on talent’.  This includes the deployment  of in house tests with some confidence stated as,  “We know that competition for our graduate roles will be as tough as ever but we’re confident that our own aptitude and behavioural testing can assess a candidate’s potential.”  They are not alone in this move. There has been an inexorable rise in employers administering their own tests to select the most able candidates. All include assessing elements of verbal and numerical reasoning in the process.

For universities in the UK, UCAS lists a plethora of these its www site at ‘Admissions Tests’. The idea that A-Level exams alone are a good or reliable measure is frankly losing favour with the further expansion of such tests across the sector. Most of the tests include an element of ability assessment in the form of verbal and numerical reasoning questions. A good example is the BioMedical Admissions Test (BMAT) used in many countries including the UK (see specification here).  It is highly likely that such tests will now spread fast across many subjects and universities. No doubt the independent schools will prepare their students for the experience with enthusiasm.

Avoiding the mad dash for places in clearing.

This will be intense this year, starting tomorrow morning. Those with advantages and sharp elbows are more likely to push to the front. Some in government have suggested that university applications should only be made after the exam results are out. But the idea of this happening for the 2022 intake starting tomorrow, or any future intake, is ludicrous.

The only sensible way out is to defer entry to universities for a year after examinations. This is not such a radical idea if the examinations were held a year earlier at age seventeen. This has been the case in Scotland and Ireland for many years and was proposed by TEFS last year in ‘A radical overhaul of examinations is needed as soon as possible’.  This must now be considered as a viable option to avoid more chaos. After all, surely the attainment and ability of a student can be determined by age seventeen and all available information should be used to achieve that.

The author, Mike Larkin, retired from Queen’s University Belfast after 37 years teaching Microbiology, Biochemistry and Genetics.

Leave a Reply