In this multi-series of blog, I’ll touch on different ways to categorize data in buckets, or bins, and summarize in meaningful ways. Some will use Pivot Tables, some will not. But we’ll cover 3 common scenarios.
Let’s start with the first…
The Scenario
We have a 120K+ rows of data on wine bottle prices and their quality scores (by professional wine tasters). Prices range from $4/bottle to $3300/bottle. (For analytics on this data and visuals from different perspectives, see my earlier blog: search on “Wine”)
Objective: Categorize the data into a range by price buckets of $4 to $1000 bottles in $20 increments across buckets, and find average quality score for each bucket or category. (Using this method, we can categorize in whatever buckets we please, this is just one example that’s extendable to any requirement.)
Here’s how the data set looks:
What we want is something like this:
Where we not only can view the entire data set summary with wine prices in specific buckets of prices and their scores but also interactively filter that further as shown below:
Solution & the Method
Here are the steps in Excel:
1. Select all the columns and rows (Points and Price with headers) and insert a Pivot Table.
2. In Pivot Canvas, we add Price to Row and Points to Values.
3. We don’t care about the sum of the points, so change “Sum of Points” to Average in Values box (using Value Field Settings).
4. Click on any cell in Price column of the Pivot table and from Data menu, choose Group -> Group…
and in the Grouping dialog that opens, set starting, ending, and increment size values as follows for our example:
Finally, rename the Pivot table’s columns as desired and remove the Grand Total (as it’s irrelevant here) from bottom of the table. And adjust the score column’s decimal precision as desired.
Here’s a little clip of it in action:
I will continue this theme to tackle different types of data in the next blog: Understanding buckets, bins, categorization (2/3)