Sunday, September 24, 2023

# Predicting missing/unknown information

Analysts often need to fill in the blanks in order to make longer-term decisions, or just to model different scenarios by making credulous predictions. In this post, I share just one of such scenarios and demonstrate how to make predictions using both statistical formulas manually and Excel’s feature. Lastly, we’ll see the difference between the two results.

Let’s start with the scenario that we have data on specific price points and how many units are sold at those price points. The data available is below: Obviously, we don’t have data at all contiguous price points. What we want to find out is how many units would be sold of this item when it’s priced at \$140. What about at \$150? \$160? And any other price point based on known information? The missing information is tabulated below with “?” in the cells: We know that we can apply linear forecasting in Excel and get an equation and perhaps we can use that equation to extract the missing values. We can draw a scatter plot, ask Excel to show the formula it used, and also R² as shown below: Ok, we see the formula now: y = -4.2037x + 1390.7

It’s now simple algebra to derive y based on x and vice versa. So, at a given price (x), find its expected sales quantity (y).

Next, we’ll use statistical formula ourselves manually. We’ll use the concept of linear interpolation to fill in the missing sales.

I like to think of the formula syntax as this: prev known qty – (target price – prev known price) * (prev known qty – next known qty) / (next known price – prev known price)

Or: prev known qty – (relative change in price) * (relative change in qty) / (relative change in price)

Apply the logic for every missing Sales (qty) values by changing prev and next values down the Price and Sales cols, and we get the following results (in light blue cells, green text): So, at \$140/item we expect to sell 780 of them. And so on.

Let’s compare the results: Manual Statistical Calculation v. Excel Calculation

The Excel prediction is the first blue column (from left). The light green is the statistical formula results. The difference between them is shown in the last blue column: It looks like Excel is not that far off with its Linear Forecasting although it seems to be slightly more optimistic about quantities across the price points. Overall, I think either method should get close enough for prediction purposes but it’s important to understand both. If we want to make more pessimistic (cautious) predictions, I’d use the statistical method in this case.