Universities Reformed the Classroom. The Exam Stayed the Same.

Walk through almost any university campus today and the evidence of pedagogical ambition is hard to miss. Flexible learning spaces designed around how students actually think. Collaborative project work. Competency based progressions. Differentiated instruction. The language of modern higher education is the language of adaptability; meet the student where they are, design for diverse minds, move away from the factory model.

Then exam season arrives, and most of that philosophy quietly suspends itself.

The written, time pressured, invigilated examination (a format largely unchanged since the 19th century) remains the primary instrument by which universities certify whether learning has occurred. The mismatch isn’t subtle. Institutions invest significantly in reshaping how knowledge is built and then measure it through a method that contradicts almost everything that the reshaped classroom was designed to do. This isn’t a pedagogical accident. It is an institutional habit that has become a structural liability.

Table of Contents

The Pedagogy Moved On. The Assessment Didn’t.

The tension between modern teaching practice and traditional assessment is well documented among researchers, even if it rarely surfaces in administrative strategy discussions. A 2025 critical appraisal published by the Assessment Review at CUNY found that many physics instructors continue relying on conventional exam formats out of convenience and departmental tradition, even when reform driven evaluation frameworks are formally in place. The study noted that innovative assessment practices only persist where departmental culture actively supports reflective review. Without that culture, even evidence backed reforms are short lived.

The implication is uncomfortable: an institution can build an entire curriculum around student centred learning and still default to assessment practices that were never designed to measure it. The classroom evolves by policy, but the exam endures by inertia.

This is not a fringe critique. A 2024 essay in the Journal of Microbiology and Biology Education argued that high stakes STEM examinations function as cultural gatekeepers, not primarily measures of competence, but instruments that disadvantage students whose abilities don’t map neatly onto closed book, timed recall. The authors were careful not to argue for abolishing exams. But their core point stands: the exam was not designed around learning outcomes. It was designed around administrative convenience, and it has stayed there.

What Generative AI Just Made Undeniable

For years, the case for rethinking traditional exams could be brushed aside as idealism. Closed book, invigilated exams were held to be the most reliable guarantor of academic integrity, the format least susceptible to substitution or fraud. That argument has now been empirically dismantled.

Researchers at University College London tested whether closed book undergraduate mathematics exams still hold their pedagogical value in an age of generative AI. Their findings, published on arXiv in September 2025, were unambiguous: AI tools achieved results consistent with a first class degree across eight first year mathematics papers at a Russell Group university. More striking still, the AI’s performance was more consistent across the curriculum than that of the actual students sitting those invigilated exams. The researchers’ conclusion was blunt: when AI can already ace these exams, the exams aren’t really testing much anymore.

This places universities in an awkward position. The exam format that was justified on the grounds of integrity now faces an integrity challenge it was never built to handle. Most institutional responses have been to tighten invigilation, restrict devices, and issue increasingly detailed AI use policies. These are compliance responses. They treat the symptom while avoiding the diagnosis.

The Human Cost Nobody Budgets For

The case against unreformed high stakes exams isn’t only structural, and it has a very real human cost that institutions rarely stop to measure.

A 2024 study published in Child Development, drawing on registry data covering the entire Norwegian population of 17 to 21 year olds, found that students who failed a high stakes exit examination were 21% more likely to receive a psychological diagnosis than matched peers who passed. Five years after the exam, they were 57% less likely to have graduated and 44% less likely to have enrolled in tertiary education. These aren’t marginal effects. They are population scale consequences attached to a single assessment event, consequences that compound across a student’s entire educational and professional life.

The exam, in other words, does not merely measure readiness. It shapes outcomes in ways that extend far beyond the room it is taken in. When institutions treat exam delivery as an administrative function rather than an educational one, they are implicitly accepting those downstream consequences as a fixed cost of the system.

The Infrastructure Problem Exam Reform Can’t Ignore

Even if a university wanted to modernise its examination practices, the operational reality is more exposed than most administrators appreciate. Digital assessment infrastructure in higher education is increasingly centralised, and centralised systems are, by definition, high consequence failure points.

That stopped being a theoretical concern in May 2026. A ransomware attack on Canvas, one of the most widely used learning platforms in higher education, locked students out of the system during finals week, forcing universities, including portions of the University of Illinois system, to pause coursework, extend deadlines and improvise alternatives at the moment it mattered most. The breach affected nearly 9,000 schools and educational institutions worldwide, not because each was individually targeted, but because they all shared the same infrastructure. Cybersecurity firm Sophos, whose research has been submitted to US House oversight hearings on critical infrastructure threats, found that educational institutions continue to struggle with operational disruption and recovery costs, with higher education reporting a 25% rise in breaches between 2024 and 2025.

For institutions working out how to manage high stakes exams online, this is not a hypothetical risk scenario. It is the operating environment. And it means the question of how exams are administered is inseparable from the question of whether the infrastructure carrying them can be trusted to bear the institutional and reputational weight of high stakes outcomes when something goes wrong.

Gesturing Toward Something Better

There are signs that more thoughtful institutions are beginning to ask different questions. Not how do we stop students using AI? but what does a valid assessment actually look like when AI exists? Not how do we defend the current format? but does this format measure what we claim it does?

Two stage examinations, portfolio based assessments, adaptive testing models, and structured oral components are all in active use at institutions that have moved beyond compliance thinking. The 2025 Assessment Review at CUNY found that these approaches take root where departments treat assessment as a continuing intellectual problem, not a semester end administrative requirement.

None of this removes the need for rigorous, high stakes evaluation. Universities still need to certify competence. Professional licensing bodies still require it. Employers still expect it. The question is not whether to assess, but whether the instrument being used was designed for the job it is currently being asked to do.

Most weren’t. Most still aren’t. And the gap between what universities teach and what they test is no longer just a pedagogical inconsistency. It is an institutional exposure that is becoming increasingly difficult to ignore.