Pages

Wednesday, 17 May 2017

Equal area HexBins in Tableau without mapping distortion

In a previous post I described a way of creating hexagonal areas on maps automatically making it easier to show area maps using Tableau's HexBin functions. The original method creates automatic hexagons in LatLon coordinates but they become increasing elongated when plotted on the standard Mercator projections used in Tableau maps. It turns out there is a way to adjust for this that creates better-looking regular hexagons on maps (and also also make their actual geographic size more consistent). This post describes both ways of creating hexagonal regions from HexBins in Tableau.
Note, I have also posted the essence of this in stackoverlfow.com.
Introduction
The hexbin function in Tableau works on any X-Y data to bin the raw coordinates into hexagonal areas.
On maps, the data usually comes as Lat-Lon coordinates expressed in degrees. Tableau can plot this data directly but usually does this using the somewhat distorting Web Mercator projection (areas far away from the equator are greatly enlarged). This means that hexagonal bins defined on lat-lon will not be equally sized (one degree in an east-west direction represents a much smaller distance on the Earth's surface when the latitude is high and far from the equator but a degree on a north-south line is always about 111km long).
Simple versions ignore this distortion and generate the hexbins from lat-lon coordinates. I'll describe methods based on this simple technique first. I'm basing this on a dataset containing the locations of every postcode in the UK (~2.5m rows with a wide range of density in different geographies).
The simple method (produces distorted hexagons)
The basic method involves several steps:
  • Generate a scaling factor using a parameter that allows adjustment of the size of the hexagons
  • Generate a value to define the extra points for the vertices of the hexagon (I achieve this by adding a value to each row on the dataset that consists of a random number from 1 to 6. This works when you have enough data points to guarantee a complete hexagon can be creates for each are you want to plot.)
  • Generate the lat-lon coordinates for the hexbin centres
  • Add the 6 points that define the vertices for the hexagons
  • Plot the hexagons as polygons on a map (they can then be coloured by any aggregate based on the underlying data)
Here is a step-by-step guide.
I define a parameter hexbin size that represents the radius of the hexbin in degrees. Then the hex bins are defined in the following way based on the Latitude and Longitude values in each row of the dataset:
hexbinx formula
The hexbiny formula is similar.
The dataset itself contains a field called PointID containing values from 1 to 6. (there are multiple ways to achieve this but doing it in the dataset is easier than doing it in tableau as many databases contain hashing functions that guarantee a random distribution of integers which can be turned into numbers from 1 to 6 by a simple mod function). I define a new field called angle based on the value in PointID (to help define the hexagon's vertices):
angle formula
Now I can add the points that need to be plotted to define the hexagonal polygons:
plot lat formula
The longitude is similar:
plot lon formula
At this point the dataset should contain enough rows within each hexbin to guarantee that each hexbin contains rows with every value from 1-6 so all the points of the hexagon are defined.
To plot the hexagons, make the Geographic Role for the plot lon and plot lat fields is set appropriately and double-click each in turn. Then drag the hexbinx and hexbiny to detail and change the chart type from automatic to Polygon. This will give some horrible mess which is fixed by dragging PointID to Path. This should give this (i've also added a count of the number of rows to colour the hexagons and adjusted colours and transparency):
simple hexbin map
This shows that the basic technique works. But it also shows a key problem with it: the hexagonal polygons are distorted. If you can live with this (it is less of a problem close to the equator or over small areas) thens stick with it.
But if you care about the distortions (the visual as well as the fact that the hexagons don't cover equal areas) then you have to do something more complex. What I describe below is based on work by Sarah Battersby of Tableau.
The complex method for producing hexagons of even appearance
The adjustments below involve some extra steps over and above the simple solution:
  • Generate a new scaling factor based on distance (as opposed to LatLon degrees)
  • Convert the lat-lon values into X-Y coordinates in the Web Mercator system based on distance not lat-lon degrees
  • Generate the hexbins using distance not angle in the new coordinate system
  • Convert the distance-based hexbins back to the lat-lon coordinates to define the hexbin centres
  • Add the hexagon vertices using the distance coordinates and then convert back to lat-lon 
You also need to add a parameter for the radius of the earth which is a conversion factor in the coordinate transformations. 
Here is how that works. First the conversion to X-Y coordinates:
mercator x formula mercator y formula
Now we can use hexbin grouping on the new coordinates:
hexbin mercator formula
The formulae for converting these coordinates back to lat-lon are below (useful if you want to just plot the centres):
adjusted hexbin lat adjusted hexbin lon
Note that it is the latitude formula that is the complex one.
But to make the hexagonal polygons work properly when plotted, you have to add the extra points for the 6 vertices before transforming them back to lat-lon coordinates. The conversion is the same as the formulae above and the final formulae look like this:
plot lat adjusted plot lon adjusted
The extra part just adds a fixed distance to the coordinate for the hexbin centre based on a hexagon sized by the scaling factor (and there are six of these based on the six values of Angle).
When plotted in the same way as the previous polygon plot it should look like this (when tidied up):
adjusted polygon map
Now both the visual appearance and the actual size of the hexagons are better. 
For a Tableau Public workbook containing the above calculations see here.
PS Tableau could and should build most of this functionality into the product as it would dramatically simplify a useful technique.

PPS Although the visual appearance of the hexagons is now much better, the actual land area still depends on latitude meaning that the true area of each hexagon varies. According to Sarah Battersby's calculations areas the N/S extremes of the UK will be about 10% different from the areas in the centre. If you want the details of that calculation, check her Tableau page referenced below.

Acknowledgements
Many of the ideas are based of the work of Sarah Battersby of Tableau Research. Some of her relevant talks are also available on the Tableau conference site here.
And the original ideas were inspired by (but use slightly different techniques from) some posts by Alan Eldridge (the Last Data Bender) some of which are described on his blog here.