Wednesday, September 11, 2024

# Percentile…what it means, and how to use it

In statistics, the term percentile is a measure used to understand the distribution of data in a dataset. It’s a concept that allows us to rank and compare values, providing a deeper understanding of where a particular data point stands in relation to the rest of the data.

The percentile of a value is the percentage of values in a dataset that are less than or equal to that value. For instance, if you score in the 80th percentile on a test, it means you scored better than 80% of the people who took the test. This concept is used in various fields such as education, finance, health, and more.

In this post, I will demonstrate its calculation, interpretation, and applications in real-world scenarios.

## Real-world examples

Question 1: I have the following distribution:

85 90 78 92 88 76 95 89 84 91 87 93 79 86 94 82 80 96 83 50

What is the at the 90th percentile in this set?

To find the 90th percentile of the data set, we first need to sort the data in ascending order.
Then, we find the rank of the 90th percentile using the formula R = P/100 * (N + 1)
where P is the desired percentile (in this case, 90), and N is the number of data points. R will give us the rank or position of the value at the sought after percentile.

The sorted data in ascending order looks like this:

Sorted in ASCENDING order:
50
76
78
79
80
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

Since R returns the position or rank in that list, it helps to have a row number next to it. We can do it in Excel quite easily without having to manually count each row which can be laborious and error-prone. The formula would be: =ROW(actual starting row cell)-ROW(\$actual starting row’s column\$actual starting row#)+1

Enter the actual starting row cell number in the formula above, e.g. A2 depending on your sheet, and anchor the starting cell reference with ‘\$’ symbols in the 2nd argument, but do not anchor the first argument’s cell reference because once you enter that formula on first row of data, you want to drag it down to the last row of data to automatically have all row numbers calculated for you in Excel.

The final data range may look like this:

Ignore the yellow highlighted cells for the moment (I’ll explain it soon).

Next, we want to apply the formula R = P/100 * (N + 1) to this. So, P is 90 (90th percentile), N=20. If the dataset is large, use COUNT() function in Excel on the range to get N, otherwise you can just manually count the datapoints. Then we get R = 18.9. This is the rank where the 90th percentile is, not the value but the position of the value. Therefore, it’s the row number. But since R is not a whole number, we’ll interpolate between the values at the 18th and 19th positions in the sorted data set to find the 90th percentile.
The 18th value is 94 and the 19th value is 95 (first data point 50 is R=1, next is R=2, and so on; as shown in ‘Row number’ column). These cells are highlighted in yellow above to point them out.

So, the 90th percentile of our data set is somewhere close to 94 (since the whole part of R value is 18, but definitely less than the value in row 19, which holds 95).

But, we can do better and get the exact value by using PERCENTILE.INC() function in Excel. The syntax is: =PERCENTILE.INC(range, percentile in percentage).

So, the first argument, the range would be the cell references or table range of the values or scores above, and the second argument will be supplied as 0.90 for 90th percentile.

This gives us the 90th percentile of the scores (data): 94.1 — which means that about 90% of the values in our data set are less than or equal to 94.1

Question 2: What percentile does the score/datapoint 92 fall in? Or put another way, if your score is 92 in the above distribution, what percentile are you in? Use the same distribution as above.

Using the PERCENTILE.INC() function, we can get the answer directly. The answer would look like this (your data range cell references will vary):

This means that 78.9% of the scores in the sample dataset are less than or equal to 92. Or, you scored better than 79% of the class.

Whether you’re a student, a data analyst, or just someone interested in understanding data, I hope this post shed some light on the importance and utility of percentiles in data analysis.

``````
▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!
▟``````