More Than A Storm In A Teacup: Making Over A Hurricane Viz
Much love as I have for Makeover Monday, I’m really getting into the alternative opportunities for critiquing and re-presenting visualisations. Last week, Cole Knafflic posted this challenge on her twitter feed, which became the trigger for this blogpost:
#DATAVIZ CHALLENGE: how would you show this data? What would your headline be? https://t.co/o1W17CnG4I pic.twitter.com/f4vlNGBrm4
— Cole Knaflic (@storywithdata) September 13, 2017
In a fuller blog post she explains that she saw the chart and was given “pause for thought”, but didn’t list her concerns. Picking out the things one doesn’t like about a visualisation is something that shouldn’t be done lightly, but which some in the community have a habit of doing excellently (Sarah Bartlett’s blog is a good example, where she identifies what she thinks could be improved).
Since Cole’s challenge is to make this chart over, I felt I should address what I felt to be my criticisms and then try to work these into my iteration. Since the subject matter addresses hurricanes I’m already out of my depth in one regard, but can use that to my advantage when challenging the original. For instance, I know that hurricanes are measured against a scale and the original viz uses that scale to distinguish between hurricane types. But, if I’m to use this then I should at least make an attempt at understanding it. From my
point of viz standpoint, making vizzes over isn’t so much about making them look prettier or smarter, but making them sufficiently more comprehensible that someone reading it has a greater chance or taking something away from it, or will take more away from it than they otherwise would.
So, what can I learn about the Saffir–Simpson scale (for this is how they are classified)? The scale is a simple one, it kicks in when tropical storms reach 74mph and has five categories (1-5). These categories are not equally sized, however. Categories 1-4 cover ranges of 22, 15, 19 and 27 miles per hour, and category 5 is unlimited, so they shouldn’t be treated as if they were equal. The categories relate more closely to the associated levels of destruction.
The original cites “major hurricanes” being those categorised 3-5, which does indeed appear to be terminology adopted by agencies tracking storms in the Atlantic Ocean, but probably not fully understood by the majority of readers. Distinguishing by a more accessible characteristic at this breakpoint would seem more relevant.
One initial thought I had related to the accuracy of the data that had been collected. It seems likely that there have been technological advances in the 150 years covered by this data that cast some doubt over the accuracy of the initial decades (one way or the other!).
My other prime concern relates to the volume of data – even though the hurricanes are bundled into decade-long periods, the majority of these decades saw fewer than 20 events. That increases the amount of noise allowed into the data for any one of them. Drawing out trends across such a small sample involves uncertainty, and that uncertainty is not conveyed in the original viz by the Economist. It would be much more reliable to make inferences based upon the fullest extent of the data, wherever possible.
Finding The Right Angle
Combining these last two points, let me demonstrate further. I’ve mocked up a replica of the Economist’s chart, and the trend lines generated do indeed appear to show that the number of major hurricanes has risen whilst the overall number has fallen.
For simplicity I’ve used lines rather than bars, but the trend lines are the same as the original. However, if we only look at the last 100 years (when we assume the measurement method is at least as accurate and consistent, but likely more so overall), the picture is somewhat different. There’s still a fall in the overall number of hurricanes recorded across the last century, but the number of major hurricanes now appears to be falling rather than rising:
What this says to me is that a linear model doesn’t fit, and we need to find a better way of presenting this data. So, here’s my list of reservations regarding the original…
- a scale composed of unequally sized segments
- small volumes in each decade
- reference to Category 3 without further elaboration
- inclusion of the 2010s without complete data
- diverging colour scheme implies the extremes of a scale, rather it being incremental
- wrapped labels, appearing on alternate bars inhibit comprehension
…and what I’d propose to do to address them:
- do away with the categorisation breakdown and distinguish the major hurricanes by the minimum wind speed for Category 3
- find a way to aggregate the key part of the analysis over the full extent of the data
- use terminology that is more meaningful to the typical (target) reader
- ignore the entirety of the 2010s – this should only be considered once we’re into 2020
- keep to a single colour for the hurricane totals, and if showing categories only use sequential colouring
- either label each decade, or only as frequently as easy comprehension allows
My Makeover Of The Hurricane Chart
Taking the above into consideration, I set about changing the nature of the key metric. I wanted to ask “what proportion of hurricanes are major?”, but my view of that is that it is best considered with as many data points as possible. So, instead of a trend line I chose to place a running average in order that each decade takes into account all of the decades before it, and we then tend towards the closest possible approximation of the natural proportion.
Taking this view, I created single bars showing the proportion for each decade independently, and then a running average line on top. This shows that in eight of the last ten decades the proportion was between 33%-40%, and that the average over the full 160-year range has been 35%, a figure that has stabilised in the last fifty years. For my money this analysis says more about the data recorded in the 19th century than it does about any possible long-term trends over time.
If there’s one thing that really eludes me when it comes to creating visualisations for these types of challenge, its coming up with appropriate titles that find a way of explaining what is happening in the chart whilst also giving the reader all the tools they need to understand it. Here I’ve provided an annotation in red to accompany the average line, but I’m not particularly happy with it. Any suggestions as to how this (or any other feature) might be improved would be gratefully received!
As an aside, were I to be taking this more seriously, I’d probably want a more qualified statistician to analyse the data shown to determine whether it fits a distribution pattern that need not have trends picked out over (relatively) short periods, but since that’s not the aim of the exercise I will park my killjoy tendencies right here.
Featured image taken from Wikimedia Commons, originally by NASA
Great work on your analysis, and on the work that you and Pablo have put into this blog. I also took the opportunity to work on Cole’s Hurricane Data Viz challenge and wanted to provide you with some feedback.
Overall I agree with your view of working to make a viz more comprehensible for the viewer. The best vizzes don’t require any external explanation and drive a point (or multiple points) to the audience. You’ve clearly put a lot of thought into how to approach data analysis and do a nice job of walking us through that in your post.
A couple of points on your general analysis:
1) Not all scales are created equal
On Wikipedia it mentions that the Saffir-Simpson scale is logarithmic, which explains why the segments are not equal.
“The scale is roughly logarithmic in wind speed, and the top wind speed for Category “c” (c=1 to 4, as there is no upper limit for category 5) can be expressed as 83×10^(c/15) miles per hour rounded to the nearest multiple of 5”
It’s also meant to be a simpler scale so the general population can understand it. While they may not know the exact wind speeds for each number, they know that 5 is the largest and 1 is the smallest. It’s also easy enough for them to understand that any hurricane at a 3 or larger is considered “Major”.
2) Add a note for why you removed the 2010s
I agree with your point about ignoring the 2010s (I actually kept 2010 and removed 2011 and later since the data starts at 1851) but you should explain why you’re doing this in your blog post. Something along the lines of: “Dates of 2010 and later have been removed from my analysis because it is an incomplete time segment. Since it is only a 6 year period, and not a full 10, it shouldn’t be included in the trend line at this time.
Now about your viz:
I appreciate that you’re trying to make sure the viewer understands the details but you’re overdoing it a bit.
1) The title needs a little work. Again, while most viewers won’t know the exact wind speeds of these hurricane categories, they will understand that any hurricane with a category of 3 or higher is considered Major. Your title can be something simpler like “Since 1851, one-third of all hurricanes are classified as Major”
2) I like where you went with the moving average to remove the noise due to hurricane fluctuation between years, but your use of percentages may need an explanation for the average viewer. You should put in an axis title for your y-axis to explain that it is showing the percentage of overall storms that are classified as 3 or above (aka Major Hurricanes).
So clearly your moving average trend line is in agreement with the trend line in the original Economist article in that major hurricanes have been on the rise. Do you think the moving average is clearer to the average viewer versus a linear trend line?
In my version of this Data Viz challenge, I cleaned up the original to make it clearer, while also adding a bit on hurricane pressure and wind speed (since that data was also included in the original set). I absolutely welcome any and all feedback: https://public.tableau.com/profile/paul.wachtler#!/vizhome/EyeoftheBeholder/EyeoftheBeholder
Hi Paul, and thanks for the lengthy response!
I acknowledge that I’m not really all that comfortable with putting titles to my work. In my day job this is rarely a requirement, and so it isn’t an art I’ve practised! I have a separate blog post in the works about this.
As for the Categories/Speeds, I deliberately referenced the speeds since I wasn’t comfortable that the typical reader would know quite how fast/devastating each of the categories would be. So instead I opted to reference the wind speeds which are at least on a scale that I figured more people could make more sense of.
I wonder whether emphasising “hurricanes making landfall” in the title might have done enough to clarify what the y-axis is.
I tend to disagree on the trend – the moving average has levelled out, which to me suggests that there is not a clear trend either up or down. I don’t think that a trend is clear and deliberately haven’t suggested as such, so whilst the average viewer might not find it to be clear I would still argue that it is more accurate than a trendline which is misleading.