More Than A Storm In A Teacup: Making Over A Hurricane Viz
Much love as I have for Makeover Monday, I’m really getting into the alternative opportunities for critiquing and re-presenting visualisations. Last week, Cole Knafflic posted this challenge on her twitter feed, which became the trigger for this blogpost:
— Cole Knaflic (@storywithdata) September 13, 2017
In a fuller blog post she explains that she saw the chart and was given “pause for thought”, but didn’t list her concerns. Picking out the things one doesn’t like about a visualisation is something that shouldn’t be done lightly, but which some in the community have a habit of doing excellently (Sarah Bartlett’s blog is a good example, where she identifies what she thinks could be improved).
Since Cole’s challenge is to make this chart over, I felt I should address what I felt to be my criticisms and then try to work these into my iteration. Since the subject matter addresses hurricanes I’m already out of my depth in one regard, but can use that to my advantage when challenging the original. For instance, I know that hurricanes are measured against a scale and the original viz uses that scale to distinguish between hurricane types. But, if I’m to use this then I should at least make an attempt at understanding it. From my
point of viz standpoint, making vizzes over isn’t so much about making them look prettier or smarter, but making them sufficiently more comprehensible that someone reading it has a greater chance or taking something away from it, or will take more away from it than they otherwise would.
So, what can I learn about the Saffir–Simpson scale (for this is how they are classified)? The scale is a simple one, it kicks in when tropical storms reach 74mph and has five categories (1-5). These categories are not equally sized, however. Categories 1-4 cover ranges of 22, 15, 19 and 27 miles per hour, and category 5 is unlimited, so they shouldn’t be treated as if they were equal. The categories relate more closely to the associated levels of destruction.
The original cites “major hurricanes” being those categorised 3-5, which does indeed appear to be terminology adopted by agencies tracking storms in the Atlantic Ocean, but probably not fully understood by the majority of readers. Distinguishing by a more accessible characteristic at this breakpoint would seem more relevant.
One initial thought I had related to the accuracy of the data that had been collected. It seems likely that there have been technological advances in the 150 years covered by this data that cast some doubt over the accuracy of the initial decades (one way or the other!).
My other prime concern relates to the volume of data – even though the hurricanes are bundled into decade-long periods, the majority of these decades saw fewer than 20 events. That increases the amount of noise allowed into the data for any one of them. Drawing out trends across such a small sample involves uncertainty, and that uncertainty is not conveyed in the original viz by the Economist. It would be much more reliable to make inferences based upon the fullest extent of the data, wherever possible.
Finding The Right Angle
Combining these last two points, let me demonstrate further. I’ve mocked up a replica of the Economist’s chart, and the trend lines generated do indeed appear to show that the number of major hurricanes has risen whilst the overall number has fallen.
For simplicity I’ve used lines rather than bars, but the trend lines are the same as the original. However, if we only look at the last 100 years (when we assume the measurement method is at least as accurate and consistent, but likely more so overall), the picture is somewhat different. There’s still a fall in the overall number of hurricanes recorded across the last century, but the number of major hurricanes now appears to be falling rather than rising:
What this says to me is that a linear model doesn’t fit, and we need to find a better way of presenting this data. So, here’s my list of reservations regarding the original…
- a scale composed of unequally sized segments
- small volumes in each decade
- reference to Category 3 without further elaboration
- inclusion of the 2010s without complete data
- diverging colour scheme implies the extremes of a scale, rather it being incremental
- wrapped labels, appearing on alternate bars inhibit comprehension
…and what I’d propose to do to address them:
- do away with the categorisation breakdown and distinguish the major hurricanes by the minimum wind speed for Category 3
- find a way to aggregate the key part of the analysis over the full extent of the data
- use terminology that is more meaningful to the typical (target) reader
- ignore the entirety of the 2010s – this should only be considered once we’re into 2020
- keep to a single colour for the hurricane totals, and if showing categories only use sequential colouring
- either label each decade, or only as frequently as easy comprehension allows
My Makeover Of The Hurricane Chart
Taking the above into consideration, I set about changing the nature of the key metric. I wanted to ask “what proportion of hurricanes are major?”, but my view of that is that it is best considered with as many data points as possible. So, instead of a trend line I chose to place a running average in order that each decade takes into account all of the decades before it, and we then tend towards the closest possible approximation of the natural proportion.
Taking this view, I created single bars showing the proportion for each decade independently, and then a running average line on top. This shows that in eight of the last ten decades the proportion was between 33%-40%, and that the average over the full 160-year range has been 35%, a figure that has stabilised in the last fifty years. For my money this analysis says more about the data recorded in the 19th century than it does about any possible long-term trends over time.
If there’s one thing that really eludes me when it comes to creating visualisations for these types of challenge, its coming up with appropriate titles that find a way of explaining what is happening in the chart whilst also giving the reader all the tools they need to understand it. Here I’ve provided an annotation in red to accompany the average line, but I’m not particularly happy with it. Any suggestions as to how this (or any other feature) might be improved would be gratefully received!
As an aside, were I to be taking this more seriously, I’d probably want a more qualified statistician to analyse the data shown to determine whether it fits a distribution pattern that need not have trends picked out over (relatively) short periods, but since that’s not the aim of the exercise I will park my killjoy tendencies right here.
Featured image taken from Wikimedia Commons, originally by NASA