policyskeptic: 2020

Tuesday, 22 December 2020

The final NHSE proposals for measuring A&E performance are still a shonky mess

It helps to understand how A&E works as a system before proposing how to change the public measurements used to manage it. There is little evidence that the latest NHSE proposals do. While the goals are good, the details are contradictory and inconsistent. And a great deal of the evidence used to justify the changes is either wrong or laughably inconsistent.

I've said much of this about previous versions of the proposals so, now I feel compelled to say it again, there might be swearing. And apologies in advance for the length of this post, I thought a thorough review was worth it.

A new consultation has just started on the long awaited proposals from NHS England on how to change how we measure A&E performance (it is part of this larger consultation: Transformation of urgent and emergency care: models of care and measurement).

I've had plenty to say about how metrics can be used to improve performance (see https://policyskeptic.blogspot.com/2020/02/nhs-performance-can-be-improved-by.html) and my specific reaction to the original proposal about replacing the 4hr target: https://policyskeptic.blogspot.com/2019/03/publishing-wider-metrics-about.html).

To be fair, there are some improvements in this version of the proposals but they don't fix the basic problems with the evidence and logic for the changes. Let's look at why that is.

The rationale and content of the new proposals

The rationale given for the new proposals is (my highlighting):

The intention is to enable a new national focus on measuring what is both important to the public, but also clinically meaningful. ... The CRS has concluded that these indicators are critical to understanding, and driving improvements in urgent and emergency care, and proposes a system-wide bundle of new measures that, taken collectively, offer a holistic view of performance through urgent and emergency care patient pathways. This bundle ... will enable both a provider and system-wide lens to assess and understand performance. The review findings show how these metrics will enable systems to focus on addressing what matters to patients and the clinicians delivering their care.

There appear to be 10 key new metrics and 6 or 7 of those are relevant to A&E:

% of ambulance handovers inside 15mins
% initial assessments done inside 15mins
Mean time in A&E non-admitted patients
Mean time in A&E admitted patients
Clinically ready to proceed [I know what this is meant to measure but have no idea what the specific measurement is despite reading the explanation.]
Patients spending more than 12hr in A&E [presumably a count]
Critical time standards [again, though, the specifics are poorly explained]

This is a mix of good ideas, bad ideas and muddled definitions driven by good intentions. But, perhaps the worst idea is that they propose to completely stop using the 4hr metric.

So, what is so wrong with these proposals? Let's have a look at some of the problems.

They obfuscate performance instead of clarifying it and make improvement harder to deliver.

The big problem here is the use of averages as performance metrics (the average wait time for different types of patient). These are good in just one context: retrospectively monitoring changes in performance over time. They are operationally meaningless for supporting staff to spot problems in real time, unlike the 4hr metric which is useful both in real time and in retrospective analysis of changing trends.

And an average time metric obfuscates performance problems by failing to distinguish between situations where excessively long waits can be balanced by many short waits and a situation where the typical wait is reasonable. To illustrate consider the following:

100 patients get discharged in 1hr but another 35 wait 12hr: mean wait 3.8hr
135 patients get discharged in 3.8hr: mean wait 3.8hr

Very long waits are very bad. But the mean time metric totally fails to distinguish the two situations (the 4hr target *would* clearly show 1 as worse than 2).

So those metrics fail to help operationally and fail to clarify performance problems.

One possible excuse that has been used is that they do distinguish the following scenarios:

100 patients leave in in 3hr but another 35 take 4.1hr: mean wait 3.3hr
100 patients leave in 3hr but another 35 take 12hr: mean wait 5.3hr

These two scenarios have the same performance on the 4hr metric but 2 is clearly worse on the mean time metric. This is a good point but hardly a compelling reason to use only a mean time target as it clearly has exactly the same sort of flaws as the 4hr target when used alone.

And the proposals for new metrics have always been adamant that the 4hr metric must go: these are not supplements to the existing target, but substitutes for it.

They are much harder to understand than the simple current metric

Does anyone seriously think that the mean time measured separately for different types of patient will be easier for the public to understand? A count of the patients who have waited too long and a clear time limit that defines that wait is about the simplest metric that can be communicated to the public. What will they make of the new metrics? I don't have any idea and neither does anyone else as nobody has asked them. How will a member of the public judge their personal wait time against an average metric? "I'm sorry you waited for 8hr Ms Fisher, but our average wait is well within the target so there can't be a problem" is not a conversation I want to staff to be having with patients in the future.

Later the consultation also proposes to use some sort of weighted bundle of several very different metrics as a sort of simplified overall score. Again, this contradicts the goal of something that is easy to understand and act on. So not only are the individual metrics less clear but a new jumbled score will be introduced that will be hopelessly incomprehensible to anyone but its creators.

The report introduces a pathetically weak justification for a change based on a Healthwatch survey that claims the public don't understand the current 4hr target. I'm fairly sure the survey didn't ask the public to compare options for alternatives and I'm very sure that any future survey will find a much lower level of comprehension of targets based on mean performance. Any lack of understanding the 4hr target looks like a problem with NHS communication not an inherent problem with the specification of the metric, though that is not how the report wants you to interpret the HealthWatch result. I struggle to see how A&E staff will either comprehend, communicate or use the new targets, never mind the public.

The argument that the system has changes and therefore we need to abolish the 4hr target are weak

The claim is that the system has changed a lot since the 4hr target was introduced and, therefore "this single standard is no longer on its own driving the right improvement". The specific changes given as examples are:

the introduction of specialist centres for stroke care, for example the reconfiguration of London and Manchester stroke services
the development of urgent treatment centres
the introduction of NHS 111
the creation of trauma centres, heart attack centres and acute stroke units
increased access and use of tests in emergency departments
the introduction of new standards for ambulance services
the increasing use of Same Day Emergency Care (SDEC) to avoid unnecessary overnight admissions to hospital.

1 & 4 are irrelevant. Volumes in these specialist centres are not large compared to A&E attendance, they use their own clinical standards already (so don't need new ones) and they have little overall impact on major A&E metrics.

2 could, in principle, be relevant if UTCs diverted patients who would otherwise attend a type 1 A&E. The remaining mix of patients in A&E would then skew towards those with more serious conditions. But there is no evidence that isolated UTCs divert anyone from A&E and those managed alongside A&Es are, basically, a form of patient streaming at the front door and do not imply a different performance metric is needed for a combined unit.

3 is a WTF? I have no idea why the existence of NHS 111 makes any difference to how we should measure A&E performance unless you accept the insane techno optimist idea that booked appointments in A&E would make demand "more predictable" which should be a ludicrous idea to any who has ever done analysis of A&E data. A&E demand is very, very predictable: for any given department the weekly attendance at a given time of year, the busiest day of the week and the busiest time of the day are predictable to a few %. Bookable appointments offer no conceivable gain for performance (though patients might like them for other reasons).

5 offers no good rationale for different targets either (it might justify additional reporting of data already collected, though). The report argues that for sicker patients having more tests we need to have a different waiting time metric. But this is based on the flawed idea that the bottleneck in the patient flow depends on the time taken to do tests or to treat the patient. It has never been the bottleneck in any department where I have seen the detailed analysis. The sicker patients wait longer because the flow out of A&E is blocked (usually because free beds are hard to find). The report does promise to measure the "clinically ready to proceed" time which might highlight that problem. This is a good idea, but less radical than it sounds. The metric is–at least in principle–already accessible from the patient-level data already collected. This is why I'm fairly certain that waiting for treatment is never the bottleneck in the overall A&E wait: i've done the analysis on multiple steps in the patient journey. So reporting this metric routinely would be good especially if it disabuses policy makers about the dominant cause of long waits. But, yet again, this does not justify the abolition of the 4hr target in any way.

6 No, No, NO! The only reason why new A&E targets could be impacted by new ambulance targets is if there were gaming of the handover from ambulance to A&E. That was ruled out more than a decade ago by starting the A&E clock 15mins after ambulance arrival whether the patient had been transferred or not.

7 There is good justification for more SDEC. But it is pure obfuscation to claim this demands new ways to measure A&E performance. Not least because it has been happening for a long time and some hospitals have already dealt with it in ways that are entirely compatible with the 4hr target metric. Sure, we should report something about the use of SDEC and that is a good additional metric for assessing departments. But it doesn't support the abolition of the 4hr target. Some departments have specialist units for SDEC and account for that activity by stopping the A&E clock when patients move into the unit while measuring the % of same-day discharges from it (this being the goal of such units). In this context there is no impact requiring a change to the 4hr metric and the additional metric could be added to public reports without any extra data collection effort.

On the positive side there are some good ideas about additional metrics that might support a holistic view. But none that suggest 4hr should die. My suspicion about the desperate desire to get rid of the 4hr metric is that this was driven by political pressure not from compelling clinical arguments. It is just too embarrassing for the government to be seen to keep failing to meet it.

The idea of adding extra metrics to give a more holistic view of performance is not new. It was first proposed in 2010 but never taken seriously by the incoming government (though many of those metrics can be found in obscure corners of the government web). Most of the possible metrics don't even require a lot of work. The big ones are, in principle, already available to any analyst who can access HES data or local hospital activity data. Indeed any competent A&E analyst should already be looking at them to support local improvement. For reasons known only to the god of bureaucracy that isn't how we collect the public data. Instead hospitals have to report the data via an entirely different route meaning that, to report the new targets they have to do extra work (a sane system would make sure the universally collected activity dataset was the basis for both the public and private reporting which would mean that new metrics required just a few extra equations to be written to automatically generate new reports for public consumption).

I should praise the best idea that the latest report has introduced. That is the measurement of 12hr waits. The report correctly recognises that there is never, ever the slightest justification for patients to wait >12hr in an A&E. The metric is already countable from HES data and should have been a public metric a decade ago. It would be the single most effective metric to highlight and correct the worst behaviours the report claims to want to fix. But the system has strongly resisted publishing the numbers until now. I was once fired from an analyst job for leaking the actual numbers (which I clearly should not have done: key lesson don't tweet when drunk). But they have been easily available and publication was strongly resisted precisely because they provided an extra insight into how bad A&E performance was. In my cynical moments I suspect that they are only willing to publish 12hr waits now because nobody will notice how bad they are in the blizzard of other new metrics being released.

So much for the key arguments that supposedly compel abolition of the 4hr metric and its replacement by other metrics.

The rationale is often based on a flawed analysis of what is currently happening and how the system works

But there is another problem with the proposals. While claiming the NHS needs metrics that drive improvement across the system of emergency care, the report betrays a lack of understanding of how the parts of the system actually interact. It is also confused about what the current trends are.

Different public targets for different types of patient is a flawed idea

Firstly the idea of setting separate public targets for different groups of patients is deeply flawed. The justification is that some seriously ill people need very rapid treatment. So specific targets are required for some clinical groups (actually this idea is not a bad one but every A&E should already be applying such targets internally). The proposal, however, is to have different public waiting targets for patients needing only simple treatments and those needing faster or more complex care.

The public say to Healthwatch that the most sick should be prioritised and the report naively accepts this as a justification for allowing longer waits for "minors". This, and other comments in the report, betray a naive understanding of how A&E departments work and how queues work. The goal–getting fast treatment for some patient groups where urgent action is required–is a good one. The method–achieving this by slowing down treatment for minors–is as dumb as a box of spanners. One common issue in pre-4hr target A&Es was that this was exactly how they handled the queue of patients. Crudely speaking, those patients with minor cuts and bruises would be repeatedly bumped to the back of the queue when a heart attack appeared in the single queue. The consequence was that departments would fill up with minors who waited for a long time and whose treatment was frequently interrupted dramatically lowering the productivity of the staff. And. eventually, the department would be too full to allow fast treatment of the heart attacks and the minors. This is basic queueing theory. Unfortunately it appears that the experts who wrote the report don't know any queuing theory and didn't consult anyone who does. I can excuse the public for not getting this as it isn't something usually taught in schools or even widely in universities. But when the report uses the public's flawed understanding to justify its conclusions I can only assume either unforgivable ignorance or mendacity.

The problem with some patients needing faster treatment was solved in the early days of the 4hr target by recognising that long single queues were part of the problem and made waits worse for all patients. The solution was to split the queues and reserve enough capacity to guarantee speedy treatment for the majors while achieving efficient treatment for the minors at the same time. An efficient process for the minors keeps the overall queue small and leaves enough space for the department to cope with the lower volume of major injuries and problems that need guaranteed fast treatment. Faster treatment for minors actually enables faster treatment for majors: it is not a trade-off. Rapid assessment at the front door enabled departments to stream patients to the right queue. This was one of the key insights that helped A&Es meet the original 4hr target.

The proposal to relax the target times for minors is entirely counterproductive and ignores those insights about how the system works. It might provide some justification for additional metrics that measure time for specific clinical groups, but provides no justification at all for abolishing the 4hr metric.

It is worth noting that individual A&E departments should have many more detailed metrics to track their performance internally. This has always been true even when the 4hr target was the dominant public target. But the specific changes to the public targets are likely to entrench mistaken views about how the overall system works and encourage bad trade-offs.

Some arguments for change are based on a flawed view of system trends

Another naive argument is that attendance in A&E has grown unpredictably quickly and this is a major cause of crowding and slow flow. But when the document quotes numbers for A&E growth and volume, it quotes the numbers that include UTCs and their ilk. This is either naive or dishonest. UTCs, WICs and other type 3 departments were rare two decades ago but are common now. Most of the fast volume growth exists because many new units of this type were opened over that time period. The volume growth in major A&Es (type 1 units) is much lower and has been relatively stable year to year for three decades. And all those new units opening has had no discernible impact on that growth rate (though part of the original rationale was to divert minors away from major A&Es). While UTCs might fulfill a useful and popular need for patients, they have little interaction with major A&Es and should not be grouped together in public reporting (reporting a system performance number including the UTCs has the political advantage of diluting the clear signal of how badly the type 1 A&Es are doing in headlines but a diluted signal makes identifying the need for improvement harder).

If there is a clear problem with volume, it is caused by a failure to match core hospital and A&E funding to the highly predictable long term rate of growth not because type 1 A&Es are overwhelmed by unpredictable demand.

Unpredictability of A&E demand isn't the problem

This last point and the other observation that the pattern of demand by hour, day and season is very predictable also highlights the absurdity of some of the proposed solutions to A&E crowding. One proposal is to have NHS 111 send booked appointments to A&E to make unpredictable demand easier to handle. But this only works if unpredictability is the problem, which it is not. The idea reeks of techno-utopianism where magic new technology somehow solves a big problem.

It is also worth noting that there is another proposed use of direct booking by NHS 111 and that is for GP appointments (this is not part of this consultation). This is also a bad idea as booked slots are the enemy of GP responsiveness and insisting that GPs offer them reduces the speed of response and the overall flexibility of a good GP service. Both booked A&E appointments and GP appointments look like an example of "cool technology will fix all problems" techno-optimistic naivety.

So what?

The report claims its new proposals for targets are needed because we need to have clearer targets that are easier to understand and are more focussed on driving improvement. What it actually proposes are harder to understand targets that muddy the signals required to drive improvement.

The report claims that the 4hr metric must go. But the arguments used to justify this are so weak that cynics will be compelled to conclude that the real motivation is political embarrassment. If we keep publishing performance using the 4hr standard it will be very obvious that the system is failing: a mixed bunch of new obscure targets will obfuscate that failure and reduce demand for change and investment in the resources that might improve it. And the proposals admit the need for new metrics to provide multiple perspectives on performance, but reject the simpler idea of supplementing the current target with additional metrics (for example, why has england never published a 12hr wait metric before?)

Some who take a top-down perspective on NHS economics might argue that improvement is impossible in the current climate where the NHS is focussed on priorities other than increasing bed numbers. I disagree. Yes, this isn't a big part of the NHS plan. But the effective use of the right data can be a powerful lever for improvement even when resources are tight. But replacing a good target with a multitude of bad ones will not drive improvement. The NHS would do far better to encourage more investment in analysis of the rich data it already has than in inventing new public metrics that won't help.

Tuesday, 25 February 2020

NHS performance can be improved by paying more attention to reliable data and sound analysis

While the NHS and many other public services need more resources this might not be the best path to big improvements in quality and performance. Not least because deploying extra resources in the wrong place might yield only small improvements. Maybe we should start by spending more on getting the data and analysis required to understand where the biggest problems are and what interventions might solve them. And then focus on installing effective performance management throughout the system so improvements happen and stick. This will be expensive, but is demonstrably worth it.

It is almost universally assumed by commentators that the only thing that will improve the performance of the NHS or any other public service is a bigger budget. The bottleneck preventing better performance is austerity. The police need more cops; the NHS needs more doctors and nurses.

But is this true? Is a bigger budget the only way to improve anything? Is there nothing else that we should be doing?

The answer is no. really large improvements in performance are possible even without really big increases in the budget. And we know how because it has been done.

In the early 1990s New York had around 2000 murders a year. It now has fewer than 300, the lowest since records started. Other major crimes show similar reductions. What kicked off the improvement and kept that improvement happening for the next 25 years wasn't a vast increase in police numbers but a management process called Compstat.

The story of Compstat is not well known as lazy journalists have tended to credit the improvement to Broken Windows Theory (see, for example the Wikipedia page on crime in New York or this longer discussion about what really happened). Few accounts even mention the man who developed the Compstat process, Jack Maple, who wrote a book (The Crime Fighter) describing how he developed it and how it works.

What Maple describes ought to be of great interest for other public services like the NHS.

According to Maple The essence of Compstat is built on 4 key principles:

Accurate, timely intelligence
Rapid deployment
Effective tactics
Relentless follow-up and assessment

The goal of the process is to reduce crime rather than to pursue proxy metrics like arrest rates or response times. A key part of the process is the weekly meeting of local police commanders where they are not held to account for missing their targets but for not understanding the patterns of crime and not having effective plans to address the crimes in their area. The focus of performance meeting is to develop and share that understanding and to make sure it is being acted on. Commanders who fiddle their numbers rather than tackling their problems are ruthlessly exposed.

This contrasts sharply with how the NHS manages performance. It is worth comparing the NHS process with Compstat.

The NHS shares with Compstat the idea of regular meetings where the performance is reviewed. But what happens in the meetings is very different. In the NHS the meetings are not based on a wide range of insightful metrics that provide insight to why performance is bad but are based on a small number of less reliable headline metrics that provide no insight into the why of performance. And local managers are berated for failing to meet the target and encouraged to promise future performance without being encouraged to show they understand the key causes of poor performance. Neither the central team nor the local managers have a wide range of shared data that helps anyone understand the causes of problems nor do either group have much of a clue about what effective actions to improve performance look like. Since there is no collective understanding of problems there is no sharing of good ways to tackle them. The system reverts to measuring inputs not outputs (more nurses and doctors sounds good but won't help much if we don't know where the problems are).

In short, all the effective habits of Compstat are undermined from the top down in the NHS because there is no focus on understanding why performance is poor.

In a 2015 HSJ article called The way the NHS manages A&E problems is not fit for purpose, Nigel Edwards described this process as "A significant organisational pathology".

The NHS hasn't always been like this. Though it has never been as thorough as Compstat, there was a period in the 2000s where effective actions to improve performance were understood and performance did improve (eg starting in 2002 A&E performance went from ~70% of patients admitted or discharged in 4hr to 98% in a 3 year period). Most of the knowledge from that time seems to have been lost. The Cabinet Office even used studies on what worked in Compstat to help design the way performance management was done (but the 2004 document describing this–see pages 26-27–is buried deep in the national archives and it is doubtful anyone currently in power has read it).

Part of the reason why the NHS has forgotten what it once knew is that running a management process like Compstat is expensive (Maple estimated it might take 5% of the police budget). That is more than the NHS spends on management. Worse the most consistent NHS strategy over time–from both Labour and Conservative governments–has been to cut management so it can devote more resources to the front line. But adding more resources does little for performance if those resources are not well coordinated and deployed.

The Compstat process has worked for more than 25 years in the NYPD and improvements are continuing. But it also worked in cities like New Orleans where the police were far less well funded.

Perhaps the NHS has something to learn. Reforming its shonky management processes might be the best way to improve performance.

Pages

Tuesday, 22 December 2020

The final NHSE proposals for measuring A&E performance are still a shonky mess

Tuesday, 25 February 2020

NHS performance can be improved by paying more attention to reliable data and sound analysis