Here’s a scenario: We have data on gas tank sizes (gallons) for different types of vehicles for Japan and USA. Our objective is to create an informative plot to depict the gas tank sizes of both countries by vehicle type categories.
One great solution is the box-and-whisker plot. This type of visualization effectively compares the distribution of gas tank sizes across different vehicle types between Japan and the USA, allowing us to see differences in central tendencies (median and mean) as well as variability (range and outliers).
Let’s look at a sample dataset and its arrangement below.
Note, we don’t have to sort the vehicle types in any way, as long as the data are consecutive and together (grouped) by each country, it’s good enough. Because we want to show the tank sizes for each country separately, we want their own box-and-whiskers on the same chart, but as two independent series. We select the vehicle types and gas tank size values for Japan first for example, and plot it. Then we add another series selecting the same type of data for USA but using only the data for USA from the table. The chart is then customized with colors and fill, legend, and axis titles to make it easily readable. The plot should look like this:
This plot, created in Excel, is conveying some important information in a concise way. The horizontal lines across the boxes represent the Median gas tank size for each vehicle type and country.
The “X” inside the box represents the Mean gas tank size.
Circles inside each box represent outliers in the data. These are values that fall outside the typical range of the data distribution and are shown separately to highlight their deviation from the rest of the data.
Pretty neat. In the future, I’ll present a more extensive use-case for analyzing more complex data. Until then, happy charting. I hope this was informative and helpful.