Pages

Friday, 8 April 2016

The debate on BREXIT illustrates how little evidence determines major public policy decisions

People don't decide their position on BREXIT by looking at the numbers: they choose their position and then seek numbers to justify it. Most of the numbers quoted on either side can't be trusted. This is corrupting public discourse by damaging the credibility of numbers that do tell a clear story.


The ferocity of debate in politics is often inversely proportional to the amount of actual hard evidence available on the topic. Whether the UK should stay in the EU is a typical example. Many decide they don't like the loss of freedom required when you are a member of the club and call for exit; others, like me, decide that the compromises required and the bureaucratic overhang of membership are worth it because the gains from collaboration are worth it. Some choices are more atavistic: many presume (probably incorrectly) that uncontrolled immigration is caused by EU membership (and they also believe the populist myth that immigrants are the cause of many other problems in society). Neither side reaches their conclusion because they have done some calculations: the conclusions come from deep emotional value choices not statistics. But the debate pretends otherwise and dredges up volumes of statistics to confirm the emotionally reached decision.


The numbers thrown around in the debate are proxies for that emotional choice. Most people can't admit that they didn't reason their way to their choice and they seek what look like the rational arguments that got them there. But this process suffers from all of the cognitive biases that so beset much thinking (especially confirmation bias where people are more likely to believe numbers supporting the position they already hold). People seek the numbers that confirm their side of the debate.

Nothing illustrates this better than the meme of European bureaucracy. Even EU supporters often complain about bureaucracy and many even quoted this tweet as an example:

EU cabbage bureaucracy tweet.jpg

Luckily for rational thinkers the BBC's More or Less programme (see this BBC article) decided to test the assertion. Their conclusion: there are no EU regulations specifically about cabbage sales and the 26,911 number originated in a complaint in the post-war USA about government regulations (and was probably mythical even then). Amazingly the same number of words has been repeated by many other objectors to bureaucracy without ever being validated in even the most cursory way against real documents.

The meme of EU bureaucracy is so strong that even pro-EU campaigners didn't think to challenge the basic facts in the assertion.

My main point, though, is about the harder economic numbers bandied about by the two sides and the extent to which they corrupt public discourse when we are debating significant issues. Both sides in the debate, for example, agree that leaving the EU would be disruptive. I agree: that is one of the few things that is clear. But there is a lot of disagreement about how disruptive leaving would be and even more on the economic benefits of staying or leaving. The leave campaign tend to assume that new trade treaties would be easy to negotiate quickly (not that there is any evidence to support this) and they assume that the benefits of an independent UK would be large when freed from the dead hand of EU regulation. The stay camp assume that renegotiation is a long and costly process and that many current jobs in the UK that depend on EU trade would be lost.

Both sides quote specific estimates for the economic benefit of their position. This is where the problem comes. Although specific numbers are quoted, the estimates have no credibility. John Kay, the economist and FT columnist addressed some of the problems with these kinds of estimates in a column in 2011. His concerns, though originally made about the economic rationale for big government projects, apply equally to any analysis of the case for staying or leaving:

...Because so many inputs to the analysis are invented, they can be chosen with a view to the desired result...

...The only information exercises such as these convey is the limits of the imagination of the people who have undertaken them….

...Yet the mistaken belief persists that these procedures provide an objective basis for decision making...

...We do great damage by claiming to know things that are not known, by asserting certainty in the face of uncertainty and ambiguity, and by attaching a veneer of rationality to decisions that have in fact been made on other, rarely articulated, grounds. The paradoxical result is all too obvious. The public sector and large bureaucratic organisations appear as paragons of good decision making process and exemplars of bad decisions.

I would add an extra concern. The specific nature of the forecasts made by each side damage the credibility of almost any analysis intended to illuminate a major public decision. This is bad because, for example, when we have to decide how much to spend to mitigate global warming, the numbers we are shown will have no credibility and we will, most likely, make poor decisions as a result. Sometimes the numbers do point one way on a major decision. If we pollute public discourse with spurious, over-precise analysis, we will undermine our ability to make good decisions when the evidence is clear.

When deciding whether to stay in the EU or to invest in major infrastructure projects we can never have precise estimates of the costs and benefits. We should not deny the uncertainty by pretending that we do. We can't decide on whether to stay in the EU by purely objective analysis: the uncertainties are too large. We should admit that our choice is based on values not numbers. I believe that international cooperation is a better way to conduct our affairs than standing independent and alone, despite the cost in bureaucracy. I don't pretend I can prove it with economic models.

But, if you care about honesty in public debate, there is something you can do whatever position you hold on the EU. Donate to Full Fact's campaign to fact-check the debate. They don't care how you vote but they do care whether the debate is conducted honestly.

Friday, 18 March 2016

The government approach to cutting costs is the worst way to achieve lower costs


Governments love to distribute the pain of budget cuts evenly by slicing a percentage from every department's budget. This seems fair but is the worst way to achieve sustainably lower spending or to minimise the damage caused by the cuts.


The trouble with government is that it is run by politicians. And politicians usually care more about image than they do about substance when it comes to making decisions. So when public finances are squeezed they tend to focus on the fairness of their budget cuts rather than the effectiveness of their budget cuts and this sometimes leads to really dumb decisions.


To do a good job of cost cutting or improving efficiency you need to know where and how the money is being spent now. For example, if you run a factory and its costs are way higher than the competition, it helps to know in detail why that is the case. It could be you have a serious problem with overmanning; or it might be that the skill mix is wrong and you need higher quality, more productive people; or, it might be that you need more people because the machines they work with are unreliable and old; or perhaps you buy all the raw materials from low quality, expensive suppliers. If you don't know which problem you have, cutting the budget might make things worse not better. If your problem is obsolete machinery, for example, the best fix is to invest in new machinery (which involves spending more); short term cuts to the maintenance budget will, ultimately, lead to less reliability and higher costs.


If the first thing you cut is the accounting and analysis department (they don't produce anything, do they?) then you will not be able to diagnose why your costs are high and therefore tell which action is most important for the long term future of your factory. You will most probably make the wrong decisions.


This is a pretty good analogy for how governments cut public spending.


The ONS, for example, is responsible for gathering and analysing the statistics that tell us what is happening in our economy, but they haven't been doing a good enough job (see BBC story or this story in Public Finance ). But government has been cutting their budget and doing crazy things like moving their headquarters to Newport from London (which saved costs by ensuring that most of their experienced staff resigned to stay in London leaving the organisation with a huge experience deficit and a much reduced capacity to do its job).


The NHS as a whole has done relatively well compared to most other government departments with its total budget. But inside the NHS a similar pattern emerges.


NICE is responsible for evaluating the quality and cost effectiveness of drugs and procedures in the NHS. But it is facing significant budget cuts even though spending more might identify more opportunities to save costs and improve quality across the whole NHS. But we can't have more spent on analytics when there are squeezes on providers, can we? That would be unfair.


The new NHS Improvement is supposed to work alongside NHS providers to help them do a better and more efficient job of caring for patients. There is a strong case for doing more to identify good practice and help spread it across the system. But one of the barriers to achieving improvement is a serious lack of reliable data about how the money is spent now and, in many providers, how the staff are deployed. The Carter review of the opportunities for NHS savings was very clear on this (see this analysis). And it isn't as if the current archaic infrastructure of collecting data in the NHS is good enough to support improvement at any level (see this comment). But NHS Improvement is going to have to live with significant headcount cuts when it is finally officially established. Apparently it needs to send a signal to providers about the fairness of the NHS financial squeeze. And that is, apparently, more important than its ability to do a good job of supporting the NHS to improve.


And it isn't as if hospitals will be expanding their information departments in the current climate even if doing so might help them achieve improvement elsewhere. Cutting already inadequate information teams is easy as it doesn't affect patient care tomorrow (and if it is catastrophic for care in a couple of years, who is going to care as the Chief Executive will have moved on by then?)


If you don't understand how things work, arbitrary cost cutting will lead to long term damage


If your approach to cost cutting involves slicing the ends of everything that looks like a salami then you will also cut the ends off a lot of fingers.


In one of the very few good books on business strategy, Richard Rumelt argues that the first step in any effective strategy is a good diagnosis of the problem you are trying to solve (see this interesting analysis of how his thinking applies to the NHS). The current symptom we observe in the NHS is large financial deficits. But this isn't the problem any more than a fever is a problem for a patient with malaria. To understand what the actual problem in the NHS is we need to know where and how the money flows. If we don't understand what is happening to create deficits then we can't develop a coherent plan to create an NHS that can sustainably treat patients without lapsing into periodic financial catastrophe. We can treat a fever with an ice bath, but if we don't understand what caused it (was it malaria or was it viral pneumonia?) the patient's recovery will be brief and the fever will soon return.


I don't want to try and diagnose the underlying problems of the NHS in a short blog article; my point is merely that a salami slicing approach to budget cuts damages the system's ability to do diagnosis. This is especially true when the cuts affect the flow of information or the bodies with the expertise to analyse problems and develop improvements. The NHS has tended to grossly undervalue good, timely operational information and is severely short of the analytical capacity to make sense of that information (for examples see this on A&E data and this on the information infrastructure).


The Carter report on hospital productivity argued (see my analysis) that the biggest barrier to improvement was a lack of good information about operational performance. But the we-must-share-the-pain-equally, salami-slicing model of addressing the current deficits is cutting the capacity to collect and analyse that information even though that capacity was inadequate to start with.


We won't build a sustainable NHS without a good diagnosis of the underlying problems. That requires better information about what is happening inside hospitals. A good strategy for achieving a sustainable NHS would, therefore, focus on getting better information at the start so subsequent decisions could address the most important underlying problems.


What we are likely to get is more salami-slicing which damages our ability to understand the problem and exacerbates the failure to focus on which problems matter most. In fact NHS strategy is still in an era equivalent to medicine when bleeding patients with leeches was still thought to be a good universal cure. As in medicine, repeated application of the leeches will fail to cure the patient. Whatever the apparent short term gain, in the long term the patient will be sicker.


The problem generalises across all of government. Salami-slicing cuts to everyone's budget damages the government's very ability to know what is actually happening and therefore its ability to make good decisions about what really ought to be done.


It looks like the patient with the fever is stuck in the ICU until someone comes up with a better diagnosis.


Tuesday, 1 March 2016

The NHS isn't very good at driving operational improvement: the data it collects could help it get better

The NHS collects a large volume of administrative data. It could use that for driving operational improvement but mostly doesn't.

The central NHS collects patient-level data about what is happening in its hospitals. Since 2007 the major datasets have collected more than a billion records of admissions, outpatient appointments and A&E attendances. These datasets are collectively known as "administrative data" and are used for a variety of internal purposes including paying hospitals for the activity they do.

The primary reason why they are collected isn't operational improvement. Arguably, it should be, though, if it were, we might collect the data differently (we might also disseminate if more speedily and collect additional things).

The controversial care.data programme (which is an attempt to join-up data collected by GPs with hospital data) was promoted as a way to enhance economic growth by exploiting the data for medical research even though it is probably far more useful for driving improvement in the existing care offered by the system. But improvement is the neglected orphan child of NHS data collection and is barely mentioned in any of the arguments about care.data or any other NHS data collections. It should be the primary reason why we bother with this data not least because making NHS care better is easy for patients to understand (and harder to object to) than, for example, helping the pharmaceutical industry make even more otiose margins.

Even though the big data collections are not optimised for supporting improvement, they are still useful. I'm going to illustrate this with a few examples from analysing the HES (hospital episodes statistics) A&E dataset. HES is one of the ways the central data is disseminated back to the system.

What we collect in A&E data and why it is relevant

Since 2007 the English NHS has collected a range of useful data about every A&E attendance (which includes attendance at minor injury units as well as attendance at major 24hr, full service A&E departments). It took several years before that collection achieved nearly complete coverage of all departments in England, but it has mostly been complete for the last 5 years.

The data contains basic information about each attendance such as what time the patient arrived and left A&E plus some other timestamps during their stay (eg when first seen by a professional, time treatment finished and time patient departed the A&E). Basic demographics about the patient are recorded and some data about where they came from.  Data about the where the patient came from and where they went after the visit are also collects as well as information about investigations diagnoses and treatments (though these are often not collected reliably).

This is a rich source of data for identifying why A&E departments struggle to treat patients quickly, which is currently a major concern in many hospitals.

So here are a few examples of how the data can be used.

Local operational insights are available in the data

How well organised you are matters. If you have a grip on the operational detail you will be constantly identifying where things can be improved. One of the key tasks is to identify whereabouts in the process things are broken. We might identify a department that has a problem with one type of patient, or one time of day or one particular step in the process. If we know where the problem is,we can focus improvement effort in one place which is much more effective than wasting effort on a wide range of interventions most of which will have no effect.

I'm going to show two ways the patient-level dataset can support such focus. I'm only going to show how to isolate performance issues with the type of patient and the time of the week. But I hope this illustrates how effective use of the data can support improvement.

One way to get a quick overview of how the complete process functions is to look at the histogram of patient waiting times (ie toting up how many patients wait different lengths of time before leaving A&E). In this case a useful way to do this is to use counts of waits in 15 minute blocks. A typical chart is shown below:



This plot summarises the experience of every patient (in this case over a whole year, but it works well for smaller numbers and time periods). It is common to see a peak in the waits in the 15 minute interval before the 4hr target time. This is a useful indicator of a last minute rush to meet the target (which is bad). But the other features are also useful indicators. We can see at a glance for example the total waits of >12hr (this is the last peak on the right of the chart). We can tell in this case that a lot of patients leave before they get to even 1.5hr (which is good).

Experience shows that we can diagnose many problems in A&E from the shape of this curve.

Some of those are easier to spot if we look at how different types of patient wait. The next chart shows the histogram broken down by 4 categories of patient: admitted patients, discharged patients, discharged patients with a referral and transferred patients (patients admitted to another hospital usually for specialist treatment).




We can instantly see that the shapes are different for different types of patient. And we can see that nearly half of all patients being admitted get admitted in the 15 minute interval before they have waited for 4hrs. Other patient types show a similar but much less strong peak just before 4hr.

This 4hr peak is a sign of bad things in the process. Are doctors making rushed last minute decisions to admit patients? Do they know the patient needs to be admitted earlier but can get access to a bed unless a breach of the target is about to occur? Neither of these are good for the patient. But knowing where the problem is is the first step in fixing it.

To show that not every trust is the same, here is the same analysis for a different (much better) trust. They still have a peak at 4hr for admitted patients. But it is only 15 % of all patients not 50 %: most admissions are spread over the 3hr period before 4hr not the 15 minute period before 4hr.  Other types of patient show only a tiny 4hr rush and the majority are dealt with well before they get close to a 4hr wait.


Analysis of these patterns can tell us a lot about the underlying quality of the process for treating patients. One particular insight found in most trusts is the apparent problems admitting patients quickly when they need to be admitted. The shapes of the admitted patient curve often show a last minute rush to admit just before 4hr. This isn't usually because sick patients need more care in A&E; it is often obvious from the moment they arrive that they will need a bed but free beds are often hard to find. The contrasting pattern for transferred patients is a strong confirmation of this idea. Transferred patients also need a bed, but are often transferred because they need a specialty unavailable in that hospital. Most hospitals achieve that transfer much more quickly than they achieve admission to their own beds. The clock stops when they leave the A&E and they leave faster than admitted patients and often in much less than 4hr. Finding a bed for them is another hospital's problem.

Admitted patients wait until the priority of not breaching the target triggers some action to free up beds. This is bad for patients, who wait longer, and staff, who could be treating other patients instead of searching for free beds.

The insight that the problem is associated with beds is well-known but often neglected in improvement initiatives (not least because it is not really an A&E problem and it is A&E who get the blame for the delays). But A&E departments don't control the flow through beds. Adding more A&E staff or facilities won't fix waits caused by poor bed flow. Nor will diverting patients to other services (you can only divert the minors who are often treated quickly even in departments with bad problems with their beds.)

These sorts of insights should be a crucial part of deciding what initiatives to focus improvement programmes on. But far too much effort is actually spent on non-problems that will have no impact. Sorting out flow in beds is a hard problem; but much harder if you don't even recognise that it is the most important problem.

We can also do other analyses that localise where in the process the problems occur. For example, some departments have problems at particular times of day or particular days of the week. If you know, for example, that some days are usually good and others are usually bad, you can ask what is different on the good days and, perhaps, find ways to improve the bad ones.

Here are some examples.

This shows the average performance for one trust on different weekdays:


There is no huge insight here except that performance at weekends is better than on weekdays. This might reveal some important issues with matching staffing to the volume of attendance or it could be caused by different admission practices at weekends.

But we can drill further into the data and get more detailed insights. Here is the volume and performance by hour of week for the same trust:


We can tell from this that although volume at the weekends is a little lower, performance is better and more consistent. We can also tell that performance falls off a cliff at 8am every weekday but just for that hour, just  when it starts to get busy but no such effect is seen at weekends.

We can drill deeper into the data and look at performance by different types of patient. The chart below is the same as the one above but we have broken out performance and volume by patient type.


In this chart we can see that the unusual performance collapse at 8am occurs only for the discharged patient group (normally considered to be the easiest to deal with). The most likely explanation for this is some major problem with shift handovers at that time in the morning. We can't prove this from the data but we can certainly trigger some careful local analysis to explore the cause. I'm guessing this has not happened since the same pattern is seen over several years since the merger that created this trust. We also can't tell whether this problem is localised to one site (this trust runs several major A&E sites) because this trust doesn't report site-specific data nationally (unhelpfully site-specific reporting is not mandatory). I know they have recently recruited a new data team so I hope they are addressing the problem now.

Just for reference here is the same plot for one of the top A&E performers.


Note that this trust achieves consistent and very good performance for all patient groups almost all the time.

This sort of analysis should be routine when trying to improve A&E performance

A large part of improving performance is knowing where to focus the improvement effort. I hope that these relatively simple examples show that there are plenty of simple analytical tools that can provide that focus. These tools should be available to any competent analyst. Trusts already have the data that feeds them and the national data is available to qualified analysts who want to benchmark their hospitals with others.

Unfortunately this is far from standard practice. Many trusts, even troubled ones being hounded to improve by their management or by external regulators produce analysis that never seems to ask the important questions that would create some focus for improvement. No national body routinely produces tools to enable this sort of analysis even though the data has been available for years.

The NHS has a huge challenge ahead in driving up the rate it can improve. Many large national datasets exist that contain (like the A&E data here) major insights that can help to focus that improvement effort. It is critical that analytical skills are focussed on identifying where problems occur so we can spent improvement effort in the right place. Sadly too many competent analysts in the NHS spend all their time doing routine reports which contain no useful insights for driving improvement. Many of the bodies who could have access to this sort of data don't exploit it for improvement. And many of the national bodies who do have the data never do this sort of analysis. Most surprisingly, perhaps, even the hospitals who could use this data in real time (national bodies only get their data several months after the activity occurs) mostly don't, even the troubled ones who really need to improve.

This has to change or improvement will remain impossible.


Thursday, 4 February 2016

The NICE guidance on safe staffing in A&E adds nothing to our understanding of how to run a safe A&E department



The controversial NICE guidance on staffing in A&E isn't worth arguing about. The evidence base is almost nonexistent and disagrees with better, older analysis. The NHS should ignore its recommendations and focus on gathering better evidence.


When the HSJ prompted the release of the NICE guidance about safe staffing in A&E I thought we might see something interesting. Then I read it and changed my mind. If anything the analysis sets back our understanding of how to run a safe A&E department. In fact it stands as a case study in how not to do useful analysis of an important operational issue for the NHS. Here is why I reached that conclusion.


What NICE did and didn't do

The NICE guidance is based on three sources of evidence: expert judgement; a literature survey; and an economic modelling study. The documents describing these sources of evidence are now available either from the NICE website (the economic model) or the HSJ.

What NICE didn't do is to gather systematic evidence from actual A&E departments in the UK either about staffing or performance (the model used limited evidence from a handful of departments and supplemented this with some average performance evidence from SITREPS and HES data).


What's wrong with the evidence

The NICE review itself sums up some of what is wrong with the literature evidence. Two problems stand out from their own summary: almost none of the evidence relates to the UK and there is very little high-quality evidence to start with.

In addition to this their evidence specifically excluded evidence relating to certain important practices that are common in English A&E departments. A critical example is the exclusion of evidence relating to Emergency Nurse Practitioners (ENPs) and related specialists. This seems to have been a choice so that the recommendations could be focussed on the general level of nurse staffing.

NICE commissioned a simulation model to help clarify some of the relationships that were simply missing from the actual literature evidence. This simulation forms the only significant basis for the actual recommendations (the literature evidence is simply too flimsy and contradictory to support any solid recommendations).

The trouble is that the simulation model is itself deeply flawed. So flawed that it is hard to take its recommendations seriously. It makes assumptions that were known to be naive a decade ago, some that directly contradict common practices in most actual A&E departments and produces results which disagree with actual observations about both staffing and performance. These flaws deserve a whole section to themselves.

Simulation modelling is just a way to hide the link between bad assumptions and your recommendations

[Actually I don't really mean that. Simulation modelling is an effective tool in the right hands and when the right assumptions are made. When a system is well understood but its performance is not it can provide valuable insight into how to improve. But the NICE model shows a failure to understand how A&E works and, therefore, cannot say anything useful about performance.]

The model used by NICE embeds false assumptions about how A&E operates. This is a critical failure in such an important model but NICE didn't pay much for it so perhaps that's all we can expect. Whatever the reason the assumptions are sufficiently bad that the output of the model can tell us nothing useful about staffing in a real A&E department.

Here are three examples of where the model makes really unrealistic assumptions and fails to represent the reality in A&E.

The model assumes a single process for treating patients.
This means that the model assumes that patients with single minor conditions are treated in the same way and by the same people as patients with complex or multiple problems or injuries. Real A&Es don't do that. One of the major innovations that led to much faster A&E treatment times in the early 2000s was the introduction of streaming for different types of patient. The idea of "see and treat" for minors was a major innovation that recognised that many patients don't need multiple investigations or multi-skilled teams to treat them. So many A&Es designed much simpler processes which cut out multiple stages of assessment, investigation or treatment. The process is staffed by people fully qualified to both assess and treat minor injuries or conditions. Patients get assessed and, if they don't have anything complex wrong with them, they get treated immediately, often by a specially qualified nurse. This is fast and efficient. It reduces the number of staff required (by eliminating unnecessary steps in the process for the majority of patients) and speeds the treatment. It leaves far fewer patients waiting around and clogging up the waiting room, thereby reducing crowding (which is good for staff and other patients).

By ignoring this major innovation, the NICE model becomes a hypothetical model of how an A&E department might operate if nobody ever had any good ideas about how to organise one effectively. By modelling something which mostly doesn't exist the model tells us nothing useful about staffing or performance in real A&E departments in England.

The model assumes the patients with more severe injuries go to the front of the queue
This sounds reasonable. But coupled with the previous assumption that all patients get treated in the same process it turns out to be both unrealistic and bad for patients.

It is unrealistic because it generates the output where the majors get treated faster than the minors (that what it assumes should be the process so, of course, that is the output the model generates). This is the opposite of what the data actually shows. In reality the majors--especially the ones who need to be admitted--have the longest waiting times. In well functioning departments the majority of minors are treated in less than 90 minutes, but it isn't uncommon for patients requiring admission to have an average waiting time of 4hrs or more. Moreover there is good evidence about why they wait and it isn't, mostly, caused inside the A&E department at all but by the failure of most hospitals to manage their beds effectively (see this Monitor report on the causes of A&E delays). The model doesn't consider these delays at all.

Streaming of patients into separate processes was developed because a single process is bad for all patients when there is a mix of different patient needs. A single process is wasteful; it creates unnecessary delays for minors; and it uses more staff time for no benefit at all to the majority of patients. Streaming minors into a separate efficient process frees up staff time for the more complex needs of majors and allows the separate process of treating them to operate quickly without interfering with the process for treating minors. Having two processes achieves the result of rapid initial treatment for majors without having to bump the minors to the back of the queue.

By ignoring streaming and modelling a different treatment process that no longer exists, the model fails to address anything useful in real-world A&E staffing or performance.

The model assumes that staff are all much the same
The focus of the model is to understand whether nurse staffing affects performance so it assumes that there are few differences among nursing grade staff and ignores issues with doctor staffing. Again the assumptions made ignore the reality of how A&E departments work.

There are two things that are well known by A&E experts that relate staffing to performance. One is that, when you stream minors to a "see and treat" process, you can use experienced nurses to deliver a lot of the treatment. These specialist staff (called advanced or Emergency Nurse Practitioners--ANPs or ENPs) are dedicated to the stream dealing with minors and allow fast treatment to be delivered efficiently for patients who don't have complex problems. Both the NICE model and the evidence review explicitly exclude anything relating to these specialists. The other staffing issue is that senior medics "on the shop floor" improve performance everywhere, probably because they can make fast confident decisions for edge-case patients where more junior staff would dither or make poor judgement calls. This is also ignored in the NICE evidence and model.

In summary: modelling the wrong thing won't provide any useful insights
I could go on but I won't. The key point here is that if you develop a model that isn't based on the real world you won't get any useful insights about the real world. A Lego model of the Empire State Building won't tell you about the structural integrity of the real Empire State Building. If the engineers used a Lego model for this purpose you would be well advised to stay out of New York.

So NICE have created a model that is uninformed by real world observations about how A&E actually operates; it ignores observations about real A&E departments are staffed; it doesn't have any inputs about how they actually perform; it ignores observations about where problems exist and models a process that doesn't consider the biggest problem (finding beds). Why does anyone think its conclusions are useful?

What NICE should have done

Given the admitted lack of evidence about real A&E departments in England what NICE should have done is to look for useful evidence rather than waste time on summarising poor quality analysis of irrelevant systems in other countries. There are more than 150 major A&Es in England and their performance is measured both in public SITREP data and in less public but more detailed HES data. Most of these departments should have some idea of their staffing profiles and rosters. Putting those two sets of observations together would allow a rich set of "experiments" to be done by comparing the departments to each other. It might take more effort (and actual statistical skill as opposed to modeling or literature review skill). But the results would tell us about the system we actually have.

NICE did none of this.

What is worse, the exercise has been done before and nobody at NICE, it seems, noticed.

When the Audit Commission existed and still did some work on hospital performance they had a programme called the Acute Hospital Portfolio Review. When the programme reviewed A&E it looked at staffing and performance on a range of clinical metrics including speed but also including quality of care. In other words, they did exactly what NICE didn't. The last of their reports that I know of is preserved here (pdf download).

The reports reached some startling and unexpected conclusions about A&E staffing which were credible because they were based on extensive real evidence on actual English A&E departments not on models or academic speculation. Here are two conclusions (with my emphasis):
Common sense would suggest that a large part of the improvements in times spent in A&E departments since 2000 has been due to the increases in staff. However, when comparisons are made at individual department level, there is no association between relative increases in staff and improvement in times spent in A&E.

for comparability, staffing levels need to be expressed as a ratio between actual staff numbers and the numbers of annual attendances (a reasonable measure of the size of a department). When expressed in this way, there is no relationship between times spent in A&E and staffing levels. Tightly staffed departments perform as well as generously staffed departments. This is consistent with the findings in the 2000 review.
Staffing in A&E has improved significantly since the last of these reports was written.

I suspect that the detailed data behind these conclusions has been lost with the abolition of the Audit Commision and its successor. But I know that the evidence was comprehensive and solid over several successive periods of data collection.

If you want to produce guidance about staffing that disagrees with their surprising conclusions then you need to generate some better evidence. Nothing in the NICE recommendations does that.

We also have other recent analysis that shows the biggest problem in A&E performance is nothing to do with A&E staffing but is about coordinating the A&E demand for beds with the flow through the beds in the rest of the hospital. No amount of extra staffing in A&E will help that. So not only is the evidence behind the NICE staffing recommendations as weak as wet toilet roll, it completely fails to address the biggest actual problem in our A&Es.

The controversy over the non-publication of the work has given it a credibility it doesn't deserve. The right response would have been to publish it and ignore it as it has nothing credible to say.