STEM

Plotting movements, connections, routes with weights (interactive)

In the domain of data visualization, few charts are as captivating and informative as Sankey diagrams. These diagrams beautifully showcase the flow of something quantifiable from one point to another. Understanding and using Sankey diagrams can elevate our storytelling and provide profound insights into data.

In this blog post, we’ll dive into the fascinating world of Sankey diagrams, exploring their history, significance, and instructions on creating your very own. You’ll learn how to transform raw data into a visually stunning narrative, making complex relationships easy to understand at a glance. Let’s get started.

I present two scenarios to showcase the effectiveness of Sankey diagrams: NFL Quarterback Analysis and Flight Routes.

Example 1 – NFL Quarterback’s Ball Distribution:

By analyzing the ball distribution of an NFL quarterback and his backup, we can gain insights into their play styles and decision-making. This diagram illustrates how often each player passes or hands off the ball to specific receivers and running backs, with the thickness of the lines representing the frequency of these actions. This visualization provides a clear and detailed view of player dynamics and team strategies. In this example, we’ll use a mix or real and fictitous data to exemplify the diagrams. Let’s say, the starting quarterback (QB) is Geno Smith and we have data on how he distributes the ball in passing or hand-offs which recorded who he passed to, how many times, and if it was a pass or hand-off (HO). Then also consider a backup QB, Sam Howell, and let’s say we also have the data on his ball distribution. The dataset and arrangement may look like this:


How do we compare the QBs patterns, the targets (receivers and running backs and tightends, etc.) and easily see all that data in a single chart?

Sounds impossible at first do it in a single chart without a lot of noise. It is a one-to-many relationship at first but now it turned into a many-to-many relationship as we have two QBs (Smith, and Howell) with some targets overlapping and some may not! Sankey diagrams to the rescue.

The diagram is also interactive. Go ahead and hover over each route/path and you’ll quickly get the big picture. For example, QB Smith had pass 56 times to DK Melcalf whereas Howell has passed 8 times to him. Smith has handed off most to Walker III (127 times) compared to other running backs just 20 times, and so on. If you’re viewing it on a smartphone, orient your display to landscape, and tap on any connecting line to see the tooltip.

Sankey diagram: QB target distribution example

Next, we take it a step further. Let’s think about more than one source and multiple destinations, yes, like departures and arrivals data on to and from cities, but with the additional feature of frequency of the connections. An example of this would be flights from a city to another city, and from that city to another city, so on, and so on. Is it even possible to show all this in a single chart all that information without cluttering up the visual or confusing the observer? Yes, it is. So, the second example is for some flight routes (data are essentially made up to exemplify the diagrams; you can use whatever data you please).

Example 2 – Flight Routes:

In this example, we visualize the number of daily flights between major cities, highlighting the flow of air traffic around selected cities of the world. The Sankey diagram shows connections from one city to another, and then to yet another, with the number of flights represented by the thickness of the connecting lines. This approach helps to reveal patterns in global travel and the busiest flight corridors, offering valuable insights for airlines, travelers, and policymakers.

The underlying dataset may be arranged as follows:


In the flights diagram below, you can see the number of flights connecting New York to Madrid to London, Paris, Lisbon, and then Istanbul, Sydney, Dubai; and from there to Beijing, New Delhi, Tokyo etc. just by hovering over the connecting lines. Again, the widths give you an idea of relative frequencies between any two points instantly by simply looking at the image. If you’re viewing it on a smartphone, orient your display to landscape, and tap on any connecting line to see the tooltip.

Sankey diagram: Flights example


How do we set up the data?

Since we’re posting the charts on a web page with interactivity, we have to use javascript or some variant. And the dataset must be set up in accordance to the language of your choice. I’m using javascript. As for rendering, I’m using Google charts engine. I wish there was an easier way to embed Excel charts without coding on a web page, but currently that is not possible without making the page look clunky (e.g. embedded Excel sheet: Yuck! Are we still in the 90s?). PowerBI offers some abilities but requires a whole different setup and API keys (and you’ll need a work or educational account even to create it, and sharing your creations online is another layer of pain!). In the end, as much as I admire Microsoft, the solution in this case, at this time, had to be Google Charts API (free) with javascript (free).

Here’s the gist: First, we need to use their chart library using loader.js whose source url is: https://www.gstatic.com/charts/loader.js
Then we need to write some javascript on our web page using their package called ‘sankey’. The google.visualization.Sankey() method is used to initialize the chart. We need to specify a callback function for the chart’s onload event via:
google.charts.setOnLoadCallback(drawChart) where drawChart is our custom function name [you can call it whatever you like].

google.charts.load('current', {'packages':['sankey']});
google.charts.setOnLoadCallback(drawChart);

Next, we set up the custom function specified for onload, in this example, it’s called drawChart() where we need to create a new object using Google’s datatable object whose class name is: google.visualization and the method to call there is DataTable().

function drawChart() {
var data = new google.visualization.DataTable();
This is where we specify our data: e.g. how many columns, their names, how many rows, category names and their values. These are done by call DataTable() methods addColumn() and addRows().

        data.addColumn('string', 'From');
        data.addColumn('string', 'To');
        data.addColumn('number', 'Distribution');
        data.addRows([
          [ 'Smith', 'DK [Pass]', 56 ],
          [ 'Smith', 'Lockett [Pass]', 45 ],
          [ 'Smith', 'Jaxon [Pass]', 35 ],
          [ 'Smith', 'Fant [Pass]', 28 ],
          [ 'Smith', 'Walker III[HO]', 177 ],
          [ 'Smith', 'Shenault [Pass]', 20 ],
          [ 'Smith', 'Other Receivers [Pass]', 16 ],
          [ 'Smith', 'Other RBs [HO]', 20 ],
          [ 'Howell', 'DK [Pass]', 8 ],
          [ 'Howell', 'Lockett [Pass]', 6 ],
		  [ 'Howell', 'Fant [Pass]', 3 ],
		  [ 'Howell', 'Walker III[HO]', 18 ],
		  [ 'Howell', 'Jaxon [Pass]', 5 ],
        ]);

Next, we set some chart options such as it dimensions, and look (font, colors, etc.)

       var options = {
          width: 700,
		  height: 600,
			sankey: {
					node: {
					  colors: colors,
					  label: { fontName: 'Arial',
                         fontSize: 16,
                         color: '#871b47',
                         bold: false,
                         italic: false }

					},
					link: {					  
					  colors: colors
					},

				  }
				  		  
        }; 


Then we use google.visualization.Sankey class to create a chart object. Once we have the object, call its draw() method.
var chart = new google.visualization.Sankey(document.getElementById('sankey_basic'));
chart.draw(data, options);

And that’s all within the <script>…</script> tags which should be inside <head>…</head> tags.
Finally, in … of our html page, we display the chart object within a div tag (where you can also specify its CSS styling) inside the <body>…</body> tags.

  <body>
    <div id="sankey_basic" style="width: 900px; height: 800px;"></div>
  </body>

Head over to https://developers.google.com/chart/ to learn more about Google Charts including many other types of charts and examples.

Summary:

Both examples leverage the strengths of Sankey diagrams in making complex data visually intuitive and easily digestible. As you’ll quickly realize, this diagram has myriad of uses such as for mailing routes, distribution and processing centers and capacities, roads, highways, and underground transportation connections, political clouts and donations, aid distributions, economic supplies, and on and on.

The first QB example uses two categories or nodes. In our example, it’s Smith and Howell that connect to multiple categories (receivers and running backs on the right side of the plot). In the second flight example, we have a multilevel Sankey where there are multiple intermediate connecting points or targets. In our example, we start with two categories: New York, and Mexico City at level 1, then have Madrid in level 2, then London, Paris, Lisbon in level 3, and so on until the last level containing Beijing, New Delhi, and Tokyo.

You will also notice that the first example has different coloration than the second. That’s because in the first example, we are not using any color gradients, whereas in the second we specify the colors and gradients (as well as node spacing…the distance between the nodes).

Factoid:

The Sankey diagram is named after Captain Matthew Henry Phineas Riall Sankey (1853-1925), an Irish engineer and captain in the Royal Engineers. In 1898, Sankey introduced the first energy flow diagram, which later became known as the Sankey diagram, in an article about the energy efficiency of a steam engine. That diagram has since become a valuable tool in engineering and data visualization.

The original diagram of steam engine efficiency by Captain Sankey (click on the image to see it in larger size):

NOTE: The datasets are fictitious and are merely for exemplification. Please use your own data as appropriate for your purpose.

I hope this was interesting and helpful. Happy charting and analyzing!

Back To Top