Thursday, 13 July 2017

Political management of operational performance is a catastrophe for NHS improvement

You can't take politics out of the NHS: politicians have to set the budget and raise to taxes to pay for it. But that doesn't mean they should be allowed to interfere in the details of operational management. They don't understand how the NHS works and so, when they do interfere they tend to favour appearance over substance and this throttles efforts to achieve real improvement. But Daily Mail headlines are a lot less important than the wellbeing and lives of NHS patients. This is a huge problem for both sides of the debate on the NHS.

Current political thinking about NHS problems is a catastrophe

A recent story in the HSJ was a perfect illustration of why political thinking is a curse on the NHS. The story was apparently about procurement rules, though this is misleading. The issue was that a hospital wanted to take over a bunch of MIUs and WICs. The reason was that this would enable the hospital to include their (good) performance numbers in the organisations headline performance, diluting the impact of their (awful) A&E performance.

This is roughly the equivalent of Marks and Spencer buying a profitable oil exploration company to disguise the weak performance of its retail business.

Amazingly the focus of the story was on the implication for procurement rules which should prohibit this sort of transaction for competition reasons. Lets face it, only a few diehards care much about those rules (the rest of us probably should care more but it just doesn't seem like the biggest issue for the NHS right now).

The real story here is far more significance for the NHS and how it is run. It is about how political thinking has deeply permeated NHS management and corrupted it. I want to explore the implications of that.

Let's be clear about what was proposed by the trust. They wanted to waste a ton of scarce management effort on a plan to adjust the organisation structure in their local area that would result in absolutely no benefits at all for patients. Their motivation was to be able to report better results to the NHS. Note the key fact again: the rules would let them report better performance despite the fact that absolutely nothing would change for any patient anywhere in their area. As far as I can tell nobody even thought it necessary to disguise this goal. It is as if a doctor choose to treat a melanoma by offering the patient strong cosmetics to cover up the skin blemishes. All the management effort was going to game the system not to improve it.

Some might argue that management were forced into this position by the way the rules work. After all, better reported performance might result in more money for the hospital and that has to be good, right? Bollocks. If the system rewards gaming rather than real improvement the allocation of money might as well be by lottery. On second thoughts a lottery would at least be random and therefore in some sense "fair"; allocation of money to the best bullshitters is actively harmful and destructive.

It isn't a new issue for A&E performance. It was a problem in the early 2000s when the target was first set. Then as now performance was measured across a number of heterogeneous types of unit: major A&Es (type 1 departments) are open 24hrs and handle almost anything; but there are both specialist units and a mix of Walk In Centres (WICs) and Minor Injury Units (MIUs) that only handle minor injuries and don't open 24hrs. The problem is that almost all of the bad performance is in the major A&Es. I can't remember the last time a non-major unit breached the A&E target. So, if you want to know where the problems are you need to look at the type 1 performance. Diluting this focus by allowing hospitals to quote the overall results by including other units makes it harder to see where the real problems are. The top civil servants in the A&E team originally wanted to report only the combined numbers (unsurprisingly they are always better and enable the minister to claim better performance). I argued that this was a mistake and would result in management effort being misdirected. The improvement team needed to focus as clearly as possible on the places that actually had problems. The compromise was to report both numbers internally (but let the minister claim the higher numbers in public).

The story is the epitome of what goes wrong in NHS management when political thinking pushes out good operational practice.

The root of the problem is the way politicians think. They focus on what looks good in newspaper headlines today rather than what will work tomorrow. And few, if any, take the time to explore the root cause of the visible problems in the NHS.

Perhaps we should not be too hard on them as they have to win arguments. And winning argument requires persuasive rhetoric. But the facts must come before the rhetoric. Win the right argument. In reality rhetoric is often all there is and the facts are warped to fit. This doesn't work. Reality is not influenced by propaganda. Pretending things are better than they are disables the key flows of information that tells you how to improve.

It is a problem because the political way of thinking becomes deeply embedded into the way management decisions are made inside the NHS. And when you understand the political mindset you can see that this is deeply corrupting. Good management monitors and measures the things that matter for understanding and improving performance. Political management is only interested in things that can make performance look good.

The political mindset doesn't just corrupt what gets measured; it also drives plans and actions that bear little relationship to reality. So the centre demands that trusts produce "improvement trajectories" that bear about as much relationship to reality as the athletic sex in a porn movie bears to real bedroom behaviour. There is little or no incentive for an honest appraisal of the root causes of problems which might create a realistic chance for improvement but might take far longer than the planning and reporting horizon.

The political mindset is reinforced when politicians directly interfere in operational decisions. The interventions usually seem to be as unrelated to operational reality as the reported numbers. Again, the focus seems to be how to get catchy headlines that proclaim "something is being done". Jeremy Hunt's desire to put lots of GPs at the door of A&E departments sounds good if you know nothing about how A&Es work or where their problems are; if you know what A&Es are really like it looks like the policy equivalent of you taking an antibiotic to cure a viral infection in your dog.

Running the system doesn't mean interfering in the operational details

It is a cliche that we should take all politics out of NHS management. But we can't. The budget and the taxes to pay for it have to be determined by politicians: that's how UK democracy works. To think otherwise is a technocratic fantasy (often perpetuated by wannabe technocrats whose understanding of the sources of NHS problems is no better developed than that of the politicians).

The problem I'm pointing out isn't that: it is the problem of interference in operational issues. Just because politicians set the overall budget and direction doesn't mean they should meddle with how hip replacements are done, how GPs organise themselves to respond to patient needs or how flow in A&E departments is managed.

The problem of political interference isn't helped by the way opposition politicians treat problems in the NHS. They too lack any useful operational insights into the real causes of problems. But they reinforce the idea that the government should be held responsible for the details of how the NHS is run. This would not work even if the leadership at the top had any useful insights into the operational reality of how the NHS actually worked: the organisation is just too big for any central body to know the details of why Mrs Stevens waited 12hr on a trolley in Cornwall's A&E department.

The people who could know those details are the managers and medics in the local hospital or GP practice. They have the capability to understand what is happening in their local operations and to identify what isn't working properly; they should have the accountability for fixing things that don't work.

But the very people who should be able to understand problems and fix them are undermined by the process of political management. The message from the top is that it is more important to make things look good than to actually fix them; the metrics required to measure whether things are working or not are corrupted by the central need to report good news; long term improvement is useless in meeting the short term need for good headlines. Worse, the constant focus on cliches like "more resources to the front line" demonises managers and leads to policy where we might well get more medics (more doctors looks good in headlines) but those medics can't be productive because of a lack of support staff. The lobbyists for doctors and nurses collude in this by constantly arguing that the problem is a lack of whatever staff group they represent ignoring the strong evidence that their members can't be productive in a badly designed system where they are poorly coordinated and don't have enough support staff (a recent report by the Royal College of Surgeons is a notable exception).

It is also worth noting that hospitals have very few managers (on sensible definitions only 2-3% of staff are managers). Most hospitals are have fewer than they really need. Moreover, if we look at the differences between hospitals, the ones with more managers get better results for patients and have better financial outcomes.

Managers and clinical leaders roles in driving local improvement are also strongly undermined by constant burdens from the centre. The NHS isn't content to look at transparent and well designed measures of outcomes and judge local management on whether they achieve them: instead they interfere on the most detailed levels as though the centre understands how the local job should be done (they don't). This results in a demand for local units to report a mountain of badly designed metrics and to follow detailed central guidance on how they work when local experimentation and improvement would be far more effective. Nigel Edwards once described this as:

"A significant organisational pathology"

Also observing:

"Time that should be spent dealing with problems is diverted to reporting on the actions being taken and providing reassurance that previous action plans have been executed."

When bad managers are combined with political management we get the worst of both worlds. Instead of focussing on the root causes of performance problems, the managers now focus on how to game the targets. Hence catastrophic waiting list behaviours and a dangerous spike in apparent activity in A&E departments in the minutes before patients have waited 4hr. This sort of behaviour is sometimes blamed on "targets". But not all targets are bad and the real blame lies with bad managers working in a system that rewards gaming rather than real improvement.

How could we do things differently?

Political management results in poor choices of what is measured, poor choices about what is done and a persistent inability to achieve substantive improvement. It isn't an easy problem to fix but there are a lot of things than can be done even if politicians don't change their spots.

Opposition politicians should stop behaving like the government and do some homework before proposing alternatives to government policy. Understanding what is broken in the NHS is a harder job that proposing the opposite of what the government are doing, but the reward might be to improve the quality of debate.

We could do with more transparency in how we measure performance. Both the design of performance metrics and their dissemination needs to be more independent of government. Manipulation of the metrics to lower the number of bad news stories should not be allowed (see this on how reporting of A&E performance was corrupted). Other government statistics are moving this way, so should everything in health.

If we want to generate some real improvement in the NHS the key is for local organisations to do their own thing. The central metrics don't tell you what you need to do to improve. Ignore them and measure and report what helps you gain insight into performance and quality. Focus on real improvement and the government mandated headlines will follow.

One final plea to fellow commentators: do your homework. If you comment on how messed up the NHS is but do using the same superficial cliches as the government, you are part of the problem not the solution. Current policy is often based on a search for good headlines uninformed by actual analysis of the real problems. Unless those who oppose as well as those who govern learn to break from the habit of seeking headlines rather than solutions, policy won't improve.

This is very important: good Daily Mail headlines are no substitute for the wellbeing and lives of NHS patients. Unless everyone involved in the debate (journalists, commentators, lobbyists and politicians of both sides) up their game then patients will suffer and the NHS won't improve.

Wednesday, 17 May 2017

Equal area HexBins in Tableau without mapping distortion

In a previous post I described a way of creating hexagonal areas on maps automatically making it easier to show area maps using Tableau's HexBin functions. The original method creates automatic hexagons in LatLon coordinates but they become increasing elongated when plotted on the standard Mercator projections used in Tableau maps. It turns out there is a way to adjust for this that creates better-looking regular hexagons on maps (and also also make their actual geographic size more consistent). This post describes both ways of creating hexagonal regions from HexBins in Tableau.
Note, I have also posted the essence of this in
The hexbin function in Tableau works on any X-Y data to bin the raw coordinates into hexagonal areas.
On maps, the data usually comes as Lat-Lon coordinates expressed in degrees. Tableau can plot this data directly but usually does this using the somewhat distorting Web Mercator projection (areas far away from the equator are greatly enlarged). This means that hexagonal bins defined on lat-lon will not be equally sized (one degree in an east-west direction represents a much smaller distance on the Earth's surface when the latitude is high and far from the equator but a degree on a north-south line is always about 111km long).
Simple versions ignore this distortion and generate the hexbins from lat-lon coordinates. I'll describe methods based on this simple technique first. I'm basing this on a dataset containing the locations of every postcode in the UK (~2.5m rows with a wide range of density in different geographies).
The simple method (produces distorted hexagons)
The basic method involves several steps:
  • Generate a scaling factor using a parameter that allows adjustment of the size of the hexagons
  • Generate a value to define the extra points for the vertices of the hexagon (I achieve this by adding a value to each row on the dataset that consists of a random number from 1 to 6. This works when you have enough data points to guarantee a complete hexagon can be creates for each are you want to plot.)
  • Generate the lat-lon coordinates for the hexbin centres
  • Add the 6 points that define the vertices for the hexagons
  • Plot the hexagons as polygons on a map (they can then be coloured by any aggregate based on the underlying data)
Here is a step-by-step guide.
I define a parameter hexbin size that represents the radius of the hexbin in degrees. Then the hex bins are defined in the following way based on the Latitude and Longitude values in each row of the dataset:
hexbinx formula
The hexbiny formula is similar.
The dataset itself contains a field called PointID containing values from 1 to 6. (there are multiple ways to achieve this but doing it in the dataset is easier than doing it in tableau as many databases contain hashing functions that guarantee a random distribution of integers which can be turned into numbers from 1 to 6 by a simple mod function). I define a new field called angle based on the value in PointID (to help define the hexagon's vertices):
angle formula
Now I can add the points that need to be plotted to define the hexagonal polygons:
plot lat formula
The longitude is similar:
plot lon formula
At this point the dataset should contain enough rows within each hexbin to guarantee that each hexbin contains rows with every value from 1-6 so all the points of the hexagon are defined.
To plot the hexagons, make the Geographic Role for the plot lon and plot lat fields is set appropriately and double-click each in turn. Then drag the hexbinx and hexbiny to detail and change the chart type from automatic to Polygon. This will give some horrible mess which is fixed by dragging PointID to Path. This should give this (i've also added a count of the number of rows to colour the hexagons and adjusted colours and transparency):
simple hexbin map
This shows that the basic technique works. But it also shows a key problem with it: the hexagonal polygons are distorted. If you can live with this (it is less of a problem close to the equator or over small areas) thens stick with it.
But if you care about the distortions (the visual as well as the fact that the hexagons don't cover equal areas) then you have to do something more complex. What I describe below is based on work by Sarah Battersby of Tableau.
The complex method for producing hexagons of even appearance
The adjustments below involve some extra steps over and above the simple solution:
  • Generate a new scaling factor based on distance (as opposed to LatLon degrees)
  • Convert the lat-lon values into X-Y coordinates in the Web Mercator system based on distance not lat-lon degrees
  • Generate the hexbins using distance not angle in the new coordinate system
  • Convert the distance-based hexbins back to the lat-lon coordinates to define the hexbin centres
  • Add the hexagon vertices using the distance coordinates and then convert back to lat-lon 
You also need to add a parameter for the radius of the earth which is a conversion factor in the coordinate transformations. 
Here is how that works. First the conversion to X-Y coordinates:
mercator x formula mercator y formula
Now we can use hexbin grouping on the new coordinates:
hexbin mercator formula
The formulae for converting these coordinates back to lat-lon are below (useful if you want to just plot the centres):
adjusted hexbin lat adjusted hexbin lon
Note that it is the latitude formula that is the complex one.
But to make the hexagonal polygons work properly when plotted, you have to add the extra points for the 6 vertices before transforming them back to lat-lon coordinates. The conversion is the same as the formulae above and the final formulae look like this:
plot lat adjusted plot lon adjusted
The extra part just adds a fixed distance to the coordinate for the hexbin centre based on a hexagon sized by the scaling factor (and there are six of these based on the six values of Angle).
When plotted in the same way as the previous polygon plot it should look like this (when tidied up):
adjusted polygon map
Now both the visual appearance and the actual size of the hexagons are better. 
For a Tableau Public workbook containing the above calculations see here.
PS Tableau could and should build most of this functionality into the product as it would dramatically simplify a useful technique.

PPS Although the visual appearance of the hexagons is now much better, the actual land area still depends on latitude meaning that the true area of each hexagon varies. According to Sarah Battersby's calculations areas the N/S extremes of the UK will be about 10% different from the areas in the centre. If you want the details of that calculation, check her Tableau page referenced below.

Many of the ideas are based of the work of Sarah Battersby of Tableau Research. Some of her relevant talks are also available on the Tableau conference site here.
And the original ideas were inspired by (but use slightly different techniques from) some posts by Alan Eldridge (the Last Data Bender) some of which are described on his blog here.

Tuesday, 25 April 2017

Wrangling English prescribing data

This is a technical post intended to remind me of what has to be done to make sense of English prescribing data, an amazingly useful open dataset that summarises everything dispensed in the community in England (which accounts for about £8bn or about 8% of NHS spending every year). For all its usefulness, it is a pain to do analysis with it for a variety of reasons. I hope that by describing how to make sense of it my notes will also be helpful to others. I may update this as I work further with the data.


The trouble with english prescribing data is that, although it has been openly available since 2010 (every month we can see exactly how many prescriptions of each type are dispensed in the community) it isn't easy to make sense of it.

The raw data released by NHS Digital consists mainly of a file detailing the BNF code (a 15 character code describing the exact item being prescribed, BNF is the British National Formulary), the GP or clinic code for the issuer of the prescription, and the amount of things prescribed (by book price, actual cost after discounts and charges, number of pills/volume/units). What it doesn't have is the sort of metadata that helps you group these things together into useful categories (GPs mapping to CCGs or the grouping of drugs into categories).

The BNF code does contain useful information to solve some of that problem and there are lookups for the mapping of GPs onto bigger units. But these have to be downloaded from other sites and the extra data doesn't always map neatly onto the raw data. This blog describes how I sort, wrangle and combine the sources to make something more useable partially to help me remember and also so others can see roughly what is required if they want to play with the data.

BNF codes

BNF codes actually contain a hierarchical coding that groups drugs and devices into categories. The codes group drugs into chapters, sections, paragraphs and subparagraphs. Chapters are mostly about what sort of problem the drug deals with. For example, there are chapters for drugs treating the digestive system, drugs treating infections and devices for incontinence. The other levels group these together in a fairly reasonable hierarchical classification. It works like this:
  • Characters 1-2 code the chapter
  • Characters 1-4 code the section
  • Characters 1-6 code the paragraph
  • Characters 1-7 code the subparagraph
  • Characters 1-9 code the chemical or device
  • Characters 1-11 code  the product (often essentially the same as the chemical)
  • The remaining characters code the specific item (different versions of the item, different formulations of the same drug, different strengths of pill, etc.)

There are some complications on top of this, though. Dressings and appliances only have 11 characters in their codes (though the raw data makes all codes 15 characters long by right-filling the codes with blank spaces, some metadata sources use a mix of 11 and 15-character codes just to be really annoying).

So, while we can derive a hierarchy from the raw data, we need to find a source for the names of chapters, sections, paragraphs and so on (the raw data comes with a file describing the names of each specific item's formulation). We could just buy the latest BNF (available in book form with all sorts of useful advice about drug use, safety and dosage and updated every six months). But the book is over 1,000 pages long and there are nearly 40,000 items to code so it isn't a very practical way to get the metadata unless you are both a masochist and a good typist.

Annoyingly the BNF doesn't, as far as I can tell, release an electronic form of the data (even their iOS app requires an Athens login presumably in a futile attempt to prevent non-qualified people misusing the data even though those who want to can just buy the physical book). But the BSA (the NHS Business Services Authority, who compiles the data from the raw prescriptions used to pay pharmacists), does. It is accessible as described in this useful opendata.stackexchange post. But the BSA have a deserved reputation for being annoying awkward bastards when it comes to open data. The opposed making the raw data available as opendata when Tim Kelsey originally proposed the idea. Their own tools for accessing the data are clunky, slow and primitive. And they don't fall over themselves to make anyone else's life easier. For example, their metadata is coded using both 15 character and 11-character BNF codes (unlike the raw data which pad all codes to 15-characters long). Worse, the lookups are inconsistent from year to year (the 2016 release has 35,673 BNF codes but the 2017 release has 74,653 even though the entire public dataset doesn't have that many unique codes in it). Worse still, not every code in the public data has a matching code in the metadata file.

It is unclear why there are so many more codes in the latest BSA metadata. Here is how this looks compared to the number of codes actually used in the raw volume data:

And this understates the scale of the problem as some of the codes in use don't match anything in the metadata. This table shows that problem:

This table shows that 1,277 codes in the raw data are not present in the 2017 metadata at all.

This is pretty fucking annoying for those of us who want to do analysis.

All is not lost. The correct position  in the hierarchy can be retrieved for the things actually prescribed as their codes contain some of the information about their position in the hierarchy (remember the BNF code contains the chapter, paragraph etc. codes).

So we can highlight the 1,277 missing BNF codes by doing some SQL magic in BigQuery:

Screen Shot 2017-03-28 at 18.22.45.png
And then we can reapply the known chapter, section etc. values, where they exist. Unfortunately, they don't always exist because the BNF sometimes revise the hierarchy, adding new ways to classify drugs. The biggest change recently was the introduction of a new paragraph and subparagraph classification for drugs used in substance abuse. This replaced the codes of drugs with new codes but means that the old codes won't fit correctly into the new classification. This can be fixed manually, though it is a pain in the arse to do so.

In addition, the raw data files containing metadata for chemical/product names uses a different structure for the chemical codes for the chapters consisting mostly of devices and dressings (all chapters>19). Instead of a 9-digit code the product/chemical is described by a 4-character code (ie the same as the paragraph code).

But, despite these difficulties, I persisted anyway. I extracted the codes actually used in the raw data and filled in (to the best of my ability) the missing hierarchy parts so that I could produce (nearly) complete classifications of everything actually prescribed.

Prescriber Locations

Then we have the problem of knowing who prescribed and where they are geographically. The raw data contains the prescriber code (some clinics but mostly the standard code for GP Practices). It also contains fields described as SHA and PCT.

Unfortunately the NHS has been reorganised since the data was first published and both Strategic Health Authorities (SHAs) and Primary Care Trusts (PCTs) were abolished and replaced by NHSE Area Teams (NHSats or some acronym like that) and Clinical Commissioning Groups (CCGs). The data in the SHA and CCG columns switches to new codes somewhere in the middle of the timeseries. This is sensible, but has the major disadvantage that the raw codes don't allow a consistent set of regional comparisons over time even though the actual people doing the prescribing are the mostly same (neglecting the small number of closed and new GP Practices). Another problem is that the fields contain codes that are not regional or local bodies. Some look like the codes of hospital trusts; sometime SHA codes appear in the PCT field and sometimes codes are used that don't appear to match anything I recognise.

There are two ways round this problem. One is to use the geography of postcode locations to map the locations of practices onto the area where the postcode is located. The ONS produces lookup tables showing the higher-level geographies each of the ~2.5m postcodes in the UK resides in. This won't always match the actual NHS administrative hierarchy for GPs, but is is close and we can extract the relevant NHS codes from a single lookup table, the ONSPD table which is large but manageable and downloadable from the ONS Geography portal.

The alternative is to match GPs to CCGs using the lookup tables NHSE are supposed to provide. This should be exact, but the tables are not always up to date and some codes were altered after the original release (presumably just to make life harder for analysts like me). And CCG membership and boundaries have changed several times due to GP defections, and CCG mergers/splits. Since doing consistent geographic analysis is clearly a waste of time for everyone, why should I expect this to be made easy for me?

Even if we use the NHS GP to CCG lookups, we still need the ONSPD file to identify the precise location of practices. This file, at least in recent versions, give the lat/lon of every postcode so enabling location maps with little fuss if your choice of analysis software can map using lat/lon (I use Tableau which can draw a map as soon as you give it a lat/lon).

Back to the raw data which has some further traps for the unwary. Alongside each month's prescribing data,  NHS Digital release a file containing the lookup from prescriber code to prescriber name and full address. WOOT! You might think, problem solved. Unfortunately not completely as there are some consistency problems. For each practice code there are a practice name and 4 address fields in the data. The last two address fields should contain town and area, but often don't. Sometimes the last field is town; sometimes it is blank; sometimes the town fields contains a third address. Luckily we rarely need those full addresses so we don't need to fix these inconsistencies. But we do need the postcodes and they are, at least, all in the same column of the data.

But not in a consistent format for fucks sake. There are several possible standard ways of formatting postcodes (the ONSPD file has three alternatives in case your data uses a different one). This happens because postcodes have different numbers of characters (for example, B1 7PS; BT6 1PA; SG18 2AA; SW1W 9SR). You can force fit all postcodes to 7 characters (some will now have no spaces), 8 characters (some will have multiple spaces) or a variable length which always has just one space in the middle. The raw data for prescriptions is supposed to be in the variable length format. And it often is, but not always. In some months some of the postcodes have an extra trailing space which means that any lookup to location using the ONSPD file fails. It is easy to fix (databases and spreadsheets have TRIM functions to remove redundant space) but why didn't NHS Digital apply this before releasing the damn data. It isn't as if they are even consistent every month.

This isn't the only problem with the postcodes. Some of them are wrong. It is almost impossible to tell this unless you draw maps and observe locations over time (I do which is why I spotted the problem). Some practices change postcode over time in ways that don't make sense. Sure, some practices move location to new buildings and get a new postcode. But in the data some make leaps of hundreds of miles and then move back. On investigation, some codes have been transposed from practices near them in the list in some months. This isn't common, but suggests that some analyst has been making transposition errors when putting the data together and hasn't found a reliable way to validate the whole monthly dataset.

Putting it all together

Now we have a (mostly) complete way to classify drugs we can look at the whole hierarchy. A summary is given below:

hierarchy counts to dec 2016.png

Note that the number of distinct names doesn't entirely correspond to the number of distinct codes. This is because of several factors: sometimes the name is changed over time for exactly the same product; sometimes the hierarchy is reorganised giving existing drugs a new BNF code (for the same exact product) and a variety of other reasons.

Normalising the results

Sometimes we need to know more than just the total amount prescribed. When we want to compare GP Practices, for example, we need to account for the fact that GP lists have a wide range of sizes and population demographics. A fair comparison of two GPs requires adjustments for both the size and age mix of their populations. The old pop a lot more pills than the young so any comparison needs to adjust for the age mix or it won't be fair.

The simplest normalisation is pills per head which is easily derived from gP Practice List size. A better comparison needs to use the whole demographic profile for the population on the list. Two common schemes are to use units known as ASTRO PUs and STAR PUs. ASTRO PUs take into account the age/sex mix by defining a standardised rate of prescribing by number or spend for different age/sex bands in the population (which is originally based on the English average rate though the actual analysis is only done every few years). STAR PUs are defined for specific categories of drugs where the national rate may be inappropriate.

An ASTRO PU (item or cost) is the cost or number of items that a practice would be expected to incur if it prescribed in exactly the same way as the average practice in England. Per quarter (so you need to adjust the timescale if you are working with units other than quarters). I create a table of monthly ASTRO PUs from the quarterly population data so I divide the values by 3 to get an expected monthly number of units.

But deriving the relevant factors for a Practice is not simple as list sizes change over time. And, as usual, you have to find the data, which isn't trivial. The NHS BSA have some but it only starts in 2014 (our prescribing data starts in mid 2010). NHS Digital have quarterly data from 2013 and, if you search carefully enough, some data from 2011. But no obvious single series going back to before 2013.

And that isn't the only problem with the data. Recent quarters add the useful extra information containing the ONS standard codes for NHS geographies as well as the NHSE codes. But this means the files are now in a different format and must be edited before being combined into a single source. Plus the file names are inconsistent and don't (usually) contain date information so you have to edit that in manually.

To add some anal-probing to the severe beating the poor bloody analyst has already taken, they extend the age bands in 2014 and later data to include 85-89 and 90-95 (previously ended with 85+).

In fact here is a suggestion for how NHS Digital could make everyone else's life easier: copy the Land Registry. They release monthly files of house sale transactions in a consistent format that is easy to join together if you want to create a bigger dataset covering a long time interval. And they also publish a single consolidated (and very large) file that contains all the data they have ever released so you can get a consistent single file with all the data you want with just one download. Or you can do the big download to get started and then update every month from the monthly release without worrying about consistency or manual edits.


The bottom line is that making sense of England's prescribing data is possible, but, at least for parts of the relevant data, unnecessarily hard. The difficulty of deriving joined-up datasets that can be visualised is far, far harder than it should be. It is as if there were a conspiracy to stop open data actually being useful.

NHS Digital could help. They could release the data and all the associated data in a single place where users could use it immediately. They could create repositories that release the data in a single lump rather than in monthly extracts. This wouldn't be easy as CSV files (the result would be far too big compared to the Land Registry where the accumulated file containing all house prices since the mid 1990s is only ~3GB as a single CSV. But there are people would probably do it for free if asked nicely: Google already host bigger public datasets on their BigQuery data warehouse and Exasol have assembled the core parts of the prescribing dataset on a public server for demonstration purposes (see where you can register for access to this and other interesting datasets and many examples of data visualisation based on them). This is the sort of thing that NHS Digital ought to do to encourage the use of their data. Diesseminating it isn't enough.

I should also mention that Ben Goldacre and Anna Powell-Smith have produced an online viewer ( for many aspects of the data but it doesn't (yet) have true freedom to explore any aspect of the data any way you want.

And, not forgetting, my Tableau Public summary of the national data (which leaves out GP-level data to fit into a public worksheet).