Select Page

Visualising Simpson’s Paradox

One of the many great things about the the Tableau user base and its community is its breadth, and how it draws in those with perspectives from right across the spectrum. There are those who love the visuals, others who are drawn in by the opportunities it provides for data discovery. I came to Tableau as a statistician at heart – I care that numbers describe what’s happening, and care that it’s done accurately, with a focus on conveying the truth as effectively as possible.

Any half-baked statistician will be only two happy to chew your ear off about the various pitfalls in handling and presenting data responsibly. Rather than get my own view across here, I’d direct everyone to Tim Harford’s brilliant column in the FT recently, in which he condensed his best guidance into a single index card’s worth of bullet points.

This post is the first in a series of as yet undefined length, looking at some statistical nuances through the prism of data visualisation, and with the Tableau user in mind. And it emerged from a real-life situation I had at work recently – my director was scanning some figures and couldn’t make sense of something. To set the scene, we track a number of metrics on a weekly basis across an industry, with many individual companies all participating. One week, every company’s score on one of these metrics dropped week-on-week, yet the data implied that the total for the industry had increased. How can that possibly be?

Step forward, Simpson’s Paradox, an quirk of statistics which makes this possible. I tried to explain this verbally but didn’t really succeed. And when you’re faced with A, B and C each individually dropping, how can the combination of A, B and C increase? What is necessary here is to explore the context, and what better way to do that than to take an example and to visualise it.

The key thing to get across here is that this is only possible if the distribution of the relative sizes of A, B and C shift significantly between the two measurement points. To make sense of this, I put together an example in Tableau, showing how this looks in a visualisation, and also with a data table to support it, poster-style.

The Tableau Public version of this viz can be found here.

I was delighted when, shortly after I’d originally posted this to Tableau Public that Ken Flerlage got in touch:

Only days later, Ken had been introduced as one of the new Zen Masters. And if this is something I can bring to a Zen’s attention, then why not give it a wider airing? If you found this useful, please let me know as I will be only to happy to come up with future articles of this nature.

About The Author

Mark Edwards

A statistician at heart, Mark’s approach is always numbers-led. Already visualising data in other side-projects, Mark was introduced to the world of Tableau in 2016, when he and Pablo started working together in financial services. A keen participant in social Tableau challenges, Mark is building his skills and appreciation of powerful visuals, discovering interesting and untapped data sets, a path that has already led to a new career and a range of further opportunities.

1 Comment

  1. Pablo Gomez

    Great post Mark … I do want many more like this one !

    Reply

Trackbacks/Pingbacks

  1. #74: Trabajo conjunto de analistas y UXers • Recopilación de enlaces en Oriol Farré - […] Muy buena forma de visualizar la Paradoja de Simpson: algo importante de entender, sobretodo si haces muchos tests A/B.…

Leave a Reply