policyskeptic

Thursday, 9 October 2025

AI is nether the danger or the solution many think

The biggest danger of AI is not that it will become Skynet and destroy us all. It is that it will make us too lazy to exercise critical thinking when doing research.

There are huge expectations that AIs of various forms will transform business and drive big improvements in productivity. The stock market is currently rewarding firms like Microsoft and Meta (Facebook’s parent) not for huge success in deploying AI but for huge investments in the tech needed to run it. There is a good reason why Nvidia (the firm that makes the key hardware needed to build AI tools) is the most valuable firm on the planet right now (by market capitalisation). It is the company that makes spades that makes the money in the early stages of a gold rush.

Much of the faith in AI is driven by the apparent huge achievements in three things: beating human players at Chess and Go; solving the protein folding problem; providing convincing text in searches driven by chat engines like ChatGPT or DeepSeek.

But these apparent successes are not as convincing as they appear.

Take DeepMind’s success at building tools playing games like chess or Go. The extrapolation many want to make from this success is that, once set up, the computers seemed to learn how to play the games at an exponential rate. Therefore, the claim goes, if we set up a suitable AI it will rapidly outpace its creators and learn to solve any problem at a similar rate. But, given that we know how the learning algorithm works, this is a false extrapolation.

To cut a long story short, DeepMind’s AI is a pattern recognition engine. Given a suitable training dataset it was able to see patterns of play that people found hard to see. It found clever and unintuitive ways to play Chess and Go that people had not learned. It learned rapidly and far faster than any human could. But what made the learning so rapid was not magical. The key was that in discrete finite games with simple fixed rules, the computer could generate a reliable training set of a huge number of complete games, a number far exceeding all the games that humans have ever played against each other. That large dataset provided a solid and reliable set of training data from which a pattern recognition engine could detect the interesting patterns leading to success or failure. The key is how quickly reliable training data can be generated. Since the boards of either game are finite and the rules of play completely unambiguous, a computer can create a huge set of possible games extremely rapidly and know for certain which patterns led to success or failure. Few real world problems are like that.

The success of AlphaFold in predicting protein structures from amino-acid sequences looks like a counterargument. Alphafold has done a better job than several decades of alternative algorithms for predicting protein structures. I don’t question that. AlphaFold’s success is significant enough to deserve a Nobel prize. But it has not, as is often claimed, solved the protein structure problem. It has clearly found common patterns in the training dataset of known protein structures (we have several hundred thousand known structures and sequences from 5 decades of hard chemical effort since the first x-ray structures of proteins were seen) that evaded previous analysis. But that training set relies on the slow and difficult task of isolating proteins with known sequences, crystallising them and determining their structures using x-ray crystallography (with some help from sophisticated forms of NMR). For proteins similar to known structures, AlphaFold does a good job, but it often stumbles badly if the new protein is too different (it sometimes fails to predict the new structure when the protein sequence is slightlymutated and it often gives bad predictions when the new protein is very dissimilar to the known structures in the training dataset). The computer can’t do exponential learning as it can’t expand the training data without the slow hard work of real world biological chemists finding new structures.

And extrapolating the success to claim that this will revolutionise drug development–as DeepMind founder Demis Hassabis has recently been doing, is jumping the shark. His claim that we might cure all disease or develop new drugs in months–not years–is ludicrous. There is a particularly good takedown of his claim by Science Columnist Derek Lowe (who actually works in drug development and rapidly saw through the factual absurdity of the Hassabis claims). The limiting factors for AlphaFold is generating the set of known structures for its training data and that is slow. The limiting factor for drug development is not knowing the structures of target proteins. Many factors matter including identifying which targets matter; designing and synthesising actual drugs that affect the target; testing those drugs in real animals to identify efficacy and side effects; testing their actual efficacy in people. All of that takes time that is unaffected by knowing the correct structure of a known protein target.

But, what about the manifest success of AI chat engines at generating computer code or research results far faster than people? Chat GPT is amazing!

This is where claims that such tools will rapidly transform or replace many jobs is most worrying. I’m sure there are many jobs which could be replaced by AI. Many UK local newspapers are now owned by Reach plc. Their content is dominated by clickbait headlines designed to attract attention to an overwhelming flood of equally clickbait adverts. The entire operation could probably be managed by AI without using journalists at all with no diminution of the already abysmal quality. But only because the news and factual content has largely already hit rock bottom and the only performance metric that matters is how many clicks the headlines generate. That many are wrong, inaccurate, factually misleading, full of exaggeration or simply made up is pretty irrelevant. The journalistic ethos has already abandoned any commitment to truth or moral purpose or public good. So replacing journalists with AI that can’t have any useful purpose, focus on truth or moral stance would not make things any worse. By all means replace those journalists.

Sure, many coders now use chat engines to generate code snippets. And this can often generate code much faster than they could write it. This is not unexpected given how AIs work. There is a huge volume of code out there to learn from and AIs can summarise or extract patterns from that huge training set. But is the code always good code? Since one of the major limitations of the design of most AIs is that they are poor at judgement, this is unclear. Some evaluations have actually suggested that, in aggregate, AIs lower the productivity of programmers (speed of writing code is not the primary metric that matters, speed of writing code that works for its intended use is what matters).

A great deal of the time taken to develop software well is taken debugging; more is taken redesigning when users point out it doesn’t quite do what they expected; more is taken eliminating evils such as major security leaks A disturbing amount of AI code replicates major security problems after all the training set they have learned from is full of leaks and bad practice and no AI has the built in judgement to evaluate such things. AIs cannot reliably interpret intent; they are not designed to do so. Though, perhaps, this is also a criticism that can be levelled at many programmers who design their products to meet narrow technical descriptions but ignore the real people who need to use their software. For example, Hospital EPRs are notorious for being hugely hostile to the doctors and nurses who are their primary users. No AI will fix that.

My own narrow experiments in solving simple problems in code often yielded useful rapid results. But my hit rate of code that worked was only about 50%.

And when it comes to using AI to search for useful results I have found what typical tools generate useful but also very unreliable. Chat engines like Chat GPT or Deepseek are in many ways a better search tool than a simple Google search. But the results, in my experience, almost always contain hallucinations. When writing a column recommending some key books for healthcare managers, I asked a question something like “tell me the top 10 books on health economics” In the list of ten, two were entirely fabricated (with plausible authors, titles and cover art). More recently, when I asked for academic references that had evaluated the lives saved by the London Major Trauma system (which I was involved in developing and had kept an eye on over the years) the top two references (both presented alongside hyperlinks supposedly linking directly to the publications) were both entirely fake (the hyperlinks were to real but unrelated papers). Google searches for the dates, authors or journals did not yield relevant papers. In this case using AI cost me more time in checking the results that I would have spent had I not used AI in the first place.

The ability of AIs to generate plausible text looks magical. But that text is untethered to any judgement about the quality or truth of the content. ChatGPT and Deepseek and others have been trained to be bullshit generators (in the sense used by philosopher Harry Frankfurt: bullshit is content entirely indifferent to the distinction between truth and falsehood). The ability to generate plausible pictures also seems magical. But many of those pictures are now polluting the internet with fake images (some historians are very worried about the proliferation of fake history backed by actual plausible-looking images that turn out to be AI generated). There is huge risk that this is a doom loop for reliable facts.

Given the way AI is currently built there is simply no way it can reliably solve difficult real-world problems. It simply doesn’t have a reliable training dataset it can learn from. The upside of this is that there is simply no possibility of AI turning into SkyNet and destroying us all. Creating an apocalypse requires reliable knowledge of how the world works which AIs are ill equipped to have.

The real problem is entirely different. And it is a problem shared with many previous complex computer systems. People tend to believe the results the computer generates even when the results are wrong. The UK prosecuted many of the managers of local post offices for financial fraud on the basis of a big accounting system that contained many huge flaws. It took 20 years to start to fix this huge problem, described by the PM at the time as one of the biggest miscarriages of justice in the history of the UK. Trusting what the computer said despite evidence it was wrong was a major contributor to this catastrophe. But the system was not so opaque that the flaws could not, eventually, be uncovered. Had the system been an AI this might never have been possible as one of the characteristics of most AIs is a fundamental lack of transparency about how they derive their specific outputs. And AIs are very good at generating plausible outputs even when they are provably wrong.

In short it is the plausibility of AI output that is the big danger. When AIs have no ability to test the truth or falsehood of their outputs, plausibility is a huge danger. But that is as much a people problem as an AI problem. If AIs erode our sense of the difference between truth and falsehood or diminish our skepticism then we are in trouble.

Friday, 6 June 2025

The new NHS UEC plan: a huge improvement but still not good enough

A quick, critical look at the key ideas in the new UEC plan

[NB: updated with extra comments on June 8]

The new NHS plan for improving emergency care is out.

It is a huge improvement over previous plans. I don’t often praise NHS plans especially on emergency care, but this one deserves at least some praise. But it is far from perfect and still has flaws carried forward from previous thinking.

Let’s look at the details.

What is better

It is far more honest than previous plans about the truly awful performance of the current UEC system. And finally admits that this performance is one of the critical factors that has cratered public trust in the NHS.

§12 begrudgingly admits that the the problem “...has only in part been fuelled by an

ageing population and an increase in multiple long-term conditions…”. This is better than previous plans that have often asserted that demand is the problem. But a clearer statement that demand isn’t the cause of the problem would have been welcome (and, §11 seems to vastly overstate the demand growth).

§13, §15 and §20 admit that coordination failures and “blame shunting” are part of the problem and should not be tolerated but overemphasises the blame shunting and coordination failures across organisations and under emphasises the significance of those failures inside providers which are probably more significant.

§23 stresses the critical importance of leadership. This is useful as getting hospital leaders aligned on the importance of A&E performance was one of the key factors that delivered the original 4hr target in the years to 2005 (when it was first met with a 98% standard). But, on the other hand, a great deal of NHS policy since 2010 has burdened and confused hospital leaders with a mountain of competing and often incompatible targets.

Perhaps the most important promise is in §21 which promises better, more transparent data on performance. The RCEM have long proposed publishing better site-level data and this should vastly improve transparency and reduce the opportunity for gaming by unbundling the major A&E sites from unrelated UTC sites. Publishing transparent 12hr wait performance will highlight the most harmful waits and minimise the incentive to game 4hr performance at the cost of worse 12hr performance. Having a target for 12hr waits has long been neglected in previous plans and is a vital first step in driving a reduction in the harm and mortality caused by long waits (while waits of 5-11hr also increase mortality, the 1.7m waits over 12hr (as the system has seen for three years in a row) see the highest mortality increase and reducing them first will yield the biggest benefits).

§74 to §79 promise some better focus on issues and interventions related to better performance management. That’s good. As is the emphasis on the importance for local leaders to be focussed on performance managing this problem.

The plan has some refreshing acknowledgement of the problem of long waits for patients with mental health problems. This has been a longstanding problem for some hospitals and specific actions to improve handover specialist mental health services is welcome.

What is still wrong

While this plan is considerably more focussed than the previous plans, it is still not very focussed. There are too many goals and some of them are not consistent. the dominant goal should be reducing waits in A&E: everything else should be about how to achieve shorter waits. Given the extraordinary mortality caused by long waits (see this blog) any other focus is distracting and harmful. And, since demand is not the primary cause of long waits, the space in the plan given to demand reduction initiatives is futile. Demand is not the problem.

The plan also fails to acknowledge the biggest and most compelling reason why reducing long A&E waits is vital: they kill patients. Mortality rises notably when waits are significantly longer than 4hr and the latest estimates of how many extra patients die from those long waits suggests that perhaps an extra 40k-60k annual deaths were being caused by them in 2022 when there were fewer than 950k 12hr waits. There have been more than 1.7m 12hr waits in each of the three years since then, suggesting excess deaths might exceed the mortality from covid, UK military deaths in WW2 or Russian battlefield deaths in Ukraine.

The plan is also somewhat confused about the difference between goals and metrics. It is also confused about the difference between fixing the root causes of the problem and improving the superficial symptoms of the problem. Long ambulance waits, for example, are caused by long delays inside A&Es and, if A&E is not fixed, there is little point in separately setting targets for ambulance handovers (unless there is strong evidence that specific failures in ambulance processes are adding extra delays which the plan does not provide).

§11 claims “Since 2010/11, the number accessing UEC services has risen by 90%”. I have no idea what they are counting here. Major (type 1) attendance is up less than 20% and even UTC attendance (type 3) is only up 45%. The performance problem is all in major A&Es and demand is not the problem there. Most UTCs do not have a performance problem and still meet the 95% 4hr target nationally.

While the idea of having a target to reduce 12hr waits is good there are two problems.

The first is that the target is unambitious. Reducing 12hr waits to under 10% of attendance is shockingly unambitious. We have had three years where moer than 1.7m people waited over 12hr (probably leading to 10k-12k unnecessary extra deaths). A target of eliminating 12hr waits in 2 years might have been better, if ambitious.

The second is that the 12hr target might have been more effective if it temporarily replaced the 4hr target. It should be a step towards improving A&E performance and eventually recovery of the 4hr standard, not a supplement to it (which leaves incentives to game 4hr performance in place.)

The plan chose to retain a 4hr target instead of temporarily replacing it with a 12hr focus. But the 4hr goal it sets is incredibly unambitious and confusing. It is confusing as it is still a "system" target which includes the performance of UTCs (type 3 units) who don't have problems. This encourages gaming of headline performance instead of a focus on the units where the problem is focussed, the major A&Es. The plan should have changed the metric to apply only to major A&Es (and if site level data is going to be published, major A&E sites not multi-site trusts). This change would make the 78% target far more ambitious. Current whole-system performance is in the mid 70-percent range. Current major A&E performance is averaging around 60%. The target should focus on where the improvement is needed.

§22 to §43 consist of two sections focussed on reducing demand. Some of the individual ideas are sensible (more flu vaccination is a generally good idea whether it reduces A&E demand or not). But, since demand is not the major cause of poor A&E performance, the volume of efforts to reduce it are mostly irrelevant to the problem at hand. I’m not saying don’t do the good things recommended in these sections. But I am saying that these things are a major distraction from fixing A&E performance and a focus on the handful of significant actions to achieve that would achieve more improvement.

The promised improvements in data transparency are very significant. But the section on digital investment (§69 to §73) is full of wishful thinking. I doubt that joined up care records can make much different to A&E or ambulance performance however many other benefits they have.

§72 is right to point out that “Rolling out the FDP is one thing: ensuring it is used effectively is another.” The point that implementation matters has rarely, if ever, been admitted in previous NHS strategies. But the belief that the Federated Data Platform is transformative is naive. As is the belief that better forecasting of A&E demand is useful. A&E demand has always been very predictable. The biggest operational problem is, and always has been, matching operational practices to the very-well-known demand patterns. The idea that better forecasting might help is pure snake oil (especially if AI based).

What is irrelevant and silly

The introduction to the plan claims in §1 that “the 10 Year Health Plan will set the

most transformative agenda we have seen in over 2 generations.” §16 places the same mistaken faith in the ten-year plan.

This is silly. The ten-year plan–at least what we have seen about its content–is absolutely not transformative. At best it is irrelevant to the current NHS crises; at worst its development has been a huge distraction from the need to tackle them. The NHS should not be placing any hope in the transformation it promises–it might have noble and useful ambitions–but it completely lacks the practical steps needed to deliver its promises. Moreover the goals of the ten-year plan won’t fix any of the problems causing poor A&E performance anyway.

And, while the promise of improved transparency on critical performance data is good, the idea that the major national digital programmes (like the FDP or the Connected Care Records Programme) will make big contributions is a fantasy. Offering a far bigger programme of training in operational improvement and management with better recruitment and training for analysts (all merely hinted at in §76) would be far more useful.

Conclusion

While the plan has some huge improvements over previous UEC plans, it still has too many distractions from the focus needed to tackle perhaps the biggest performance problem in the NHS. Improvement might happen faster with an even tighter focus on the handful of actions that would tackle the major bottlenecks to better performance and fewer distractions carried over from previous strategies that largely assumed the key problem was demand.

Tuesday, 6 May 2025

A&E waits might be the biggest cause of avoidable death in England

New data from the ONS on their analysis of mortality and A&E waits has enabled new, better calculations on the magnitude of excess deaths related to long waits in A&Es in England. The results are unambiguously apocalyptic and suggest that long waits in A&Es now account for close to 1 in 10 of all annual deaths. In a rational world this would make fixing A&E waits the top NHS priority for improvement in the NHS.

In January the ONS published some new analysis of the relationship between long A&E waits and mortality. I explained their analysis here.

But they left out some important parts of the data that would enable more reliable estimates of the total number of excess deaths. Extra data released by the ONS now allows those calculations to be done.

By excess deaths I mean something specific: the number of deaths that would not have happened if A&E waits were mostly below 4hr as they once were. In 2010 the NHS had fewer than 2% of patients waiting more than 4hr, so we should have some confidence this could be achieved again. Now, more than 10% of patients wait more than 12hr and it does not seem to be a big priority for action.

The new ONS data allows reliable estimates of the human cost of failing to tackle long waits. And we can estimate the the total deaths which is far more impactful and resonant than an abstract, hard to interpret and potentially misleading mortality rate.

I’m going to walk through some of their analysis to justify the claim that long A&E waits are one of the biggest causes of avoidable deaths in England.

Why mortality rates are not enough

The first release of the ONS data included the total number of patients with different waiting times. It showed analysis of the mortality rates and relative mortality rates for a variety of different subgroups of patients (discharge status, presenting condition, age and more). But, while the rates alone send a clear message that longer waits are bad, they can’t tell us how significant the bad is because they didn’t publish the size of the different subgroups. This prevents a good analysis of the number of patients suffering higher mortality and, therefore, the number of excess deaths. For example, if patients with eye conditions are 100 times more likely to die when waiting 12hr than they are at 4hr that sounds bad. But if mortality at 4hr is very low and only a small number of patients have eye conditions, the number of excess deaths might be trivial.

The extra information now published shows the number of patients in each subgroup making further analyses far more useful. A good example of why this matters is given in one of the simpler tables in the released data. This table breaks down the mix of discharged and admitted patients waiting for different times.

A simple version is shown below:

The ONS tables include the total number of admitted and discharged patients for each waiting time (they show time bands from 0hr to over 40hr but I have grouped these for simplicity into all waits less than 4hr and over 12hr).

The important observation is that the mix of patients changes over time. Only 14% of the <4hr group are admitted; but 73% of the >12hr group are admitted. Importantly admitted and discharged patients have hugely different mortalities. In the under 4hr group the mortality for discharged patients is about 0.11% but admitted patients mortality is 1.86%, nearly 20 times higher. If we just averaged the mortality for waits under 4hr and ignored the admitted/discharged mix any estimate of the total mortality would be unreliable as it would ignore the changing mix of admitted and discharged patients over time.

Now we can translate the published rates into total deaths, a number far more impactful than the mortality rates and a far better way to communicate the magnitude of the problem and compare it to other causes of death.

The new ONS data shows both mortality, waiting times and the number of patients for a variety of groupings: discharge status; age bands; acuity on arrival; presenting complaint; plus some demographic variables like patient deprivation and NHS region.

What the data represents

There are many possible criticisms of analyses that can be done with this data. But many fade away when you understand what the data is.

Statistical models, for example, can often contain dodgy assumptions or badly sampled source data so their conclusions can be very controversial. This data is harder to criticise on any of those grounds.

It is not a sample. It is a count of all unique patients attending english A&Es in the year to march 2022 (or, strictly speaking, it is a sample but one that includes about 95% of all patients because of data linking and missing data).

So when it reports that there were 88,657 deaths that year in 6.7m patients we don’t need large error bars on either number. They are reporting actual counts of deaths.

And when I reported above that the mortality for patients waiting below 4hr was 0.11% if they were discharged, that is a reliable fact not speculation. This distinction is worth bearing in mind for most of the results below.

It is also worth providing some context for the numbers based on other mortality data.

There were just over 540k deaths in England in 2022.

The top 6 assigned causes of death were:

Dementia and alzheimers	65,697
Ischaemic heart disease	59,356
Chronic lower respiratory diseases	29,815
Cerebrovascular diseases	29,274
Malignant neoplasm of trachea, bronchus and lung (basically smoking-related)	28,571
covid	22,454

These numbers are worth bearing in mind for the analyses below. Though waiting in A&E is not a specific cause itemised separately in ONS death statistics–those deaths will be classified as caused by disease groups as in the table–the number of deaths caused by excessive waiting times should be considered in the context of major causes of death as they are largely avoidable and the numbers above provide some context for judging how significantly A&E performance might influence the total number of deaths.

Estimates of excess deaths caused by long waits based on admission/discharge status

Because of the huge difference in mortality between discharged and admitted patients the first improved estimate of “excess” deaths can be based on that ONS table.

It is worth clarifying what I mean by “excess” deaths. I start by assuming that long waits in A&E are avoidable. In 2010 fear than 2% of all attendances spent more than 4hr in A&E. In recent years between 40 and 50% wait longer. Worse, in 2010 there were, perhaps, only a few thousand 12hr waits in A&E; each of the last three years have seen more than 1.7m 12hr waits. This is not an inevitable consequence of far higher attendance (this is up only about 20% since 2010). Those of us who worked to achieve the 4hr target in the early 2000s and remember how it was done believe it is still achievable and the problem has been misguided policy not an inevitable consequence of demand.

On that assumption, we can estimate the number of deaths associated with longer waits by comparing the mortality for waits less than 4hr to longer waits. To do that calculation we need to know the mortality for different types of patient at different waiting times and the number of patients suffering those waits.

Here is a table showing the raw data and the simple steps to make that estimate of excess deaths:

The first three columns are facts from the ONS tables separating the numbers for admitted and discharged groups: total patients, total deaths and mortality rates. The other two columns show the comparison of the mortality rates in each waiting time group to the mortality rates for those waiting <4hr and calculate the implied excess deaths from the cohort size of the same waiting time and discharge status and that excess mortality rate.

There were a total of 88,657 deaths. 11,626 of those deaths occured in patients waiting <4hr; 77,031 deaths occurred in patients waiting >4hr. This isn’t an indication of excess deaths as it ignores the base mortality those longer waiting patients would have incurred had they waited <4hr. So the excess mortality rate column makes that adjustment so that the excess deaths can be counted in each group.

On this calculation 51,147 patients died who would not have died had they waited less than 4hr. This adjusts for the changing mix of admitted and discharged patients (which was impossible in the original ONS publication).

Here is what that data looks like in a chart:

This also highlights another important feature of the extra data. While mortality for discharged patients is consistently much lower than for admitted patients (as intuition should expect), the mortality for discharged patients rises far more steeply. It is over 20 times higher at 12hr than it is at <4hr. Mortality for admitted patients starts high but is less than 4 times higher at 12hr.

It could still be argued that other factors in the changing mix of patients could account for some proportion of that excess mortality, so the wait times don’t explain it all (eg the age mix or the mix of specific conditions). More on that later. But we don’t have the details of the sophisticated ONS logistic regression model that tries to decompose the component contributions to mortality. We don’t have the cross-tabs as some analysts would say. But the waiting time is a major factor in every analysis and the changing patient mix is often an indirect consequence of the process that causes long waits so it should be reasonable to assign time in A&E as the major contributor.

So the first cut suggests that long A&E waits were the third biggest cause of avoidable deaths in England, higher than smoking and covid in 2022 with more than 50k excess deaths.

Does accounting for acuity change the picture?

Patients arriving at A&E are assessed on a 4-point acuity scale on arrival at A&E. The ONS data looks a little like this:

NB the lines are annotated with the mortality rates at <4hr and >12hr to indicate the degree of mortality increase associated with longer waits.

The raw data looks like this:

As intuition would expect, the base mortality is higher for higher acuity, with very urgent starting at just over 2% but with low and standard being just over 0.1% each. All mortality rates rise rapidly with waiting times. Less intuitively, mortality for the very urgent group is about 4 times higher for >12hr than the base level but the standard group sees a near 40-fold increase in the rate as does the low group. The relative worsening of mortality with long waits is larger for lower acuity patients than for urgent ones. This empahsises the importance of keeping waits low even for those patients who do not appear so acute on arrival.

Also the urgency with which A&Es handle each group does not match their acuity classification. About 60% of low and standard patients leave in under 4hr (see the % of total in this group column) but only 36% of the urgent and 32% of the very urgent group do. Nearly 10% of the very urgent group wait >12hr.

Applying the same simple calculation used in the admitted/discharged table will give an excess deaths estimate of about 58k, with about 18k of those accounted for by waits >12hr. As with most analyses here, the >12hr waits account for a disproportionate number of the extra deaths.

So splitting the analysis by acuity does not fundamentally change the idea that long waits kill as the excess deaths estimate is in the same ballpark as the estimate based on admission status.

What does the analysis look like by age group?

Another way to group patients is by their age. Older people are expected to have higher mortality than younger people. That analysis looks like this:

In table form:

This is a cruder breakdown than some others as the ONS have used broad age groups and broad groupings of waiting times. Mortality, even with long waits, is very low for ages under 40 and those groups tend to depart A&E much faster than the older groups. Even so a quick excess deaths analysis suggests about 45k deaths caused by waits over 4hr and 15k of those occurring in the group waiting >12hr. Again, in the same ballpark as the result from other group breakdowns. Again, deaths in the group waiting >12hr account for about a third of all excess deaths, about 15k of about 45k.

What about the breakdown by presenting condition?

Again, intuitively, different underlying conditions are expected to have different underlying mortalities. The overall picture looks like this (but I’ve eliminated some presenting conditions to keep the chart clearer:

The full table looks like this:

Note the very large difference in the sizes of the groups.

There are also big differences in the base mortalities for the different groups. The groups eye, head and neck, obstetrics and gynecology, skin and trauma and orthopaedics all being below 0.1%. And airway/breathing, neuro, genitourinary, and (surprisingly) general/minor/admin with base rates over 0.5%.

All groups show major relative elevation in mortality rates for long waits but some start so low the rates are not significant even at 12hr.

The total excess deaths implied using this breakdown is about 58k with about 18k of those coming from waits >12hr. This is again in the same ballpark as other estimates and suggests that 12hr waits are disproportionately bad.

Conclusion: excess deaths associated with long waits are the third largest cause of avoidable deaths in England.

The above analysis shows that the majority of the 89k deaths in A&E (technically within 30 days of discharge after an A&E attendance) are associated with waits longer than 4hr in A&E. The excess deaths associated with long waits calculated using different patient subgroups cluster around 50k in 2022 which would make A&E waits the third biggest cause of deaths in the UK that year, exceeding both covid and smoking (obviously those deaths are all included in other ONS categories in the official tables reporting total deaths but the comparison is helpful for understanding the overall magnitude of the problem.

Worse, the problem might be even worse in recent years. For example, in 2022 only a little over 5% of patients waited more than 12hr to leave A&E. Waits over 12hr have been over 10% since 2023 hinting that excess deaths from extreme waits might be twice as bad now as they were then. Even on a crude extrapolation from the numbers provided by the ONS for 2022 this might imply an extra 15k-20k more deaths which might push deaths from long A&E waits to the top position on the ONS causes of death table if they separated these deaths from other assigned causes.

If the calculations above are correct waits of over 4hr in A&E might be accounting for more than 10% of all annual deaths in England. This is not a small problem the NHS can ignore. Waiting times in A&E might be the biggest cause of avoidable harm in the NHS.

Another way to put the numbers into perspective is to compare them with other huge scandals. The NHS contaminated blood scandal was estimated to have led to about 3,000 deaths but that was over more than a 20 year period. The novel painkiller vioxx was estimated to have caused about 50k deaths in the 5 years it was on the market in the USA before its withdrawal. Neither of these comes close to the number of deaths associated with long waits in England’s A&E departments.

There are two possible defences against making this an urgent NHS priority.

One is that the problem is not caused by the NHS. NHSE spent a lot of time claiming the problem was the result of excess demand, not a dysfunctional NHS. But demand isn’t the problem and never has been.

The other defence is that the NHS has no known strategies that work to reduce long waits. This might actually be true for the NHSE’s outgoing leadership who have persistently pursued bad strategies. But it is a little hard to sustain if your corporate memory extends beyond the last decade. Fewer than 2% of attendances waited over 4hr in 2010. A&E performance had seen huge improvements from 2002 to 2005 when the 98% target was first achieved. Good performance was sustained for 9 years after 2005 despite Andrew Lansley’s poor decision in 2010to relax the standard to 95% (which caused a decline in performance by signalling that A&E speed was no longer a priority). It isn’t that A&E performance can’t be fixed, it has been fixed but the NHS forgot how it was fixed.

It needs to rediscover how it was once fixed as current performance is causing too many deaths for the problem to be ignored.

Notes and caveats

There are some important points to note when looking at the ONS analysis.

The analysis differs from previous analyses of A&E mortality in looking at the mortality by patient not attendance. The previous EMJ analysis used mortality rates per attendance for admitted patients only and is not directly comparable. The ONS looked only at mortality per unique patient (presumably on their last visit to A&E that year). This also means that the mortality rates cannot be directly applied to current attendance to estimate current mortality levels (unique patient statistics are not normally part of the A&E statistics that get published).

The assertion that longer waits cause extra deaths might upset some purist statisticians who demand randomised controlled trials to confidently assert causality. But those are clearly unethical. But even if we don’t make a confident claim of causality, the numbers estimated from a range of different analyses of comprehensive historic data are consistent in shining a very bright flashing red light in the direction of the claim that long waits cause a large number of deaths that would not happen with shorter waits.

Another possible objection is that the ONS have not released the cross-tabs (eg the breakdown of deaths by discharge status and age and acuity etc.). They have promised some of this data and analysis in a future publication. This might allow some separation of the interactions between different factors which might modify the calculation of excess deaths by a small amount. Though the results above suggest this would not be by much. Time in A&E seems to independently relate to higher rates of death for every subgroup studied and the excess death calculations are all in the same ballpark.

And, don’t forget, the factual reporting of deaths in 2022 show that the majority of all deaths related to A&E attendance were associated with waits longer than 4hr. That’s a basic fact, not speculation. Further analysis might highlight the relative contributions to each link in the chain of cause and effect but they won’t alter the overall picture. Long waits kill patients.

And the mortality rises with time, sometimes steeply. So the first priority will have the biggest impact if it tackles very long waits. Nobody argues that 12hr waits are acceptable but recent policy has tried to reduce the waits >4hr as the major metric of success at the cost of allowing 12hr waits to rise. While having few waits over 4hr is a desirable goal, improvement programmes should target a rapid reduction in the very long waits over 12hr first. This policy focus would lead to a bigger reduction in excess deaths and might trigger the sort of notable improvement that would create the momentum so completely lacking in current plans.

Pages