Optimization Problems with Diminishing Returns

Optimization problems are a cornerstone of decision-making processes across various fields, from economics to engineering. However, they often encounter the phenomenon of diminishing returns, where the incremental benefit of a decision decreases as the level of investment or input increases. This concept can complicate optimization efforts, as it requires finding the right balance between resource allocation (higher costs) and gains (larger benefits). For instance, investing heavily in a single solution might yield substantial initial benefits, but the effectiveness wanes with further investment. Addressing these challenges requires strategies and mathematical models that account for the non-linear nature of returns.

In this post, I present different scenarios and their possible solutions without having to use heavy-weight math models. You can find solutions to them using Excel or Python to find the relationship formulas, and then you can even manually find the answer by solving the equations. Let’s look at some examples.

Scenario 1: Optimal watering for a plant

Suppose we find that if water a plant certain amount of units (x) and the plant’s health increases by some amount (y) but up to a certain point. After which, increasing the watering (x) causes the health (y) to decline. Conversely, if watering is too low, health will also decrease. We need to mathematically express this and find the optimal value for watering amount such that the plant’s health is best.

The data we collected over time is arranged as below. The actual dataset would be much larger for more accurate calculations, of course. We can readily see that the relationship between x and y is not linear.

If we can find the relationship between x and y, then we are able to find the answer we’re looking for: the value of x, the amount of water for optimal health (y). We can do this is Python as follows:

import numpy as np import matplotlib.pyplot as plt from numpy.polynomial.polynomial import Polynomial

x = np.array([0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]) y = np.array([2.3, 4.5, 6.0, 7.5, 7.5, 6.0, 4.0, 1.5])

p = Polynomial.fit(x, y, 2) coefs = p.convert().coef

a, b, c = coefs print(f"y = {a}x^2 + {b}x + {c}")

plt.scatter(x, y, color='red', label='Data points') plt.plot(*p.linspace(), color='blue', label='Fitted curve') plt.xlabel('Water (x)') plt.ylabel('Plant Health (y)') plt.legend() plt.show()

Not surprisingly, the plot generated looks like a parabola.

We can do this in Excel using a Scatter plot as shown below. Then draw a trendline and display the equation.

This regression equation is quadratic in the form: y = -ax^2 + bx + c

The general quadratic equation is: and when y increase up to a certain point, then decreases with increasing values of x, then the coefficient should be negative and the equation becomes: (which is the case here).

From this, we can use the Vertex formula for solve for x:

Here the coefficient are: a = -1.8; b =7.9; c = – 1.4 (approx.). So, we get x = 2.19

So, the optimal amount of water for best health of the plant is 2.19 units of water.

Scenario 2: Study hours and success in exams

Suppose we’re analyzing the relationship between the number of hours a student studies (x) and their exam score (y). Initially, as the student increases their study hours, the exam scores improve. However, after a certain point, additional hours of study may lead to fatigue resulting in a decrease in performance and therefore lower scores. We want to find the optimal number of hours to study to ensure a top score.

We recorded the number of hours studied for a subject and the corresponding resulting score for that subject’s exam. The data is below:

While, in this small dataset, it’s easy to derive the optimal values, remember that in real-world there may be thousands, if not more, samples of data. The methods are formulas presented here will work on such large datasets without a problem.

As before, if we plot this data, and create a Polynomial trendline with the equation, we get everything we need.

With the regression equation y = -0.9697x^2 + 12.406x+ 47.4 and using the Vertex formula again, we find that x = 6.39 [where coefficients are: a = -0.97; b = 12.4; c = 47.4]

So, the optimal number of study hours that maximizes the exam score is 6.39 hours.

Finally, let’s entertain another scenario with a little more subjectivity and strategy involved.

Scenario 3: Inspections time and defects discovery

Suppose we have data on hours spent on inspecting some parts being manufactured in a factory, and the number of new defects found at each inspection recorded by the nth hour. We want to find the optimal number of hours to spend in inspections and yet find enough defects to ensure high quality.

The sample of the data collected on inspection and defects are shown below.

In the first hour of inspection, we found 5 defects. After 4 hours of inspection, there were no new defects found. But after continuing to inspect for 9 hours, we found one more new defect. Inspections cost money each hour. So, finding the optimal number of hours of inspection while ensuring we found most defects is of critical importance. To analyze the optimal time to spend on inspections while still finding a good number of defects in Excel, we can use a combination of data visualization and statistical analysis.

We can add a difference column, or a Marginal Defects…which will show the increment or decrement for each inspection hour over previous. It’s formula is simply = current row’s New Defects value – previous row’s New Defects value. This will also aid us in the visual aspect, which we’ll see soon.

We can plot the data in a Scatter plot as previously done and add a trendline with equation. The default Linear trendline may look like this:

But it doesn’t seem to capture the nuance of the data quite correctly as the line tends to slope evenly downward whereas we see the New Defects and Marginal Defects clearly spiked around 9th hour and then dipped at 10th hour. However, if we create a Polynomial trendline, we get a better fit as shown below:

Along with that, the relationship equation should also be more accurate. The polynomial equation y=0.1364x^2−1.9182x+6.4y offers a more flexible model that can better account for the non-linear nature of the data.

To find the answer, we should consider a few things:
Analyze the trendline:
Looking at the trendline (which shows the new defects trends), we see that it dips around 4th hour and continues to dip until it curves back up slightly after 8 hours
but the rise in defect is minimal for that many hours spent!
So we can probably inspect for about 4 hours and be confident that most defects were found by then.

Analyze marginal defects:
Look for the inspection hour where the marginal defects drop significantly or become zero.
We see a small spike at 3rd hour, and a drop in 4th hour, followed by a spike in 5th hour and then it flattens out at 0 until the 9th hour.
A spike means some defects were found in that hour, a flatline means none were found, a drop means either less defects or no defects were found over previous hour.

Overall, we see that defects drop sharply after the first few hours. We might find that spending more than 4 hours doesn’t yield significantly more defects (or benefits to quality at that cost).

Solving the equation:
If we solve for x using the Vertex formula: -b/2a (from the equation: y = 0.1364x^2 – 1.9182x + 6.4)
We get: x = -(-1.9) / (2*(0.13)) [ where a=0.13; b=-1.91; c=6.4] or, x= 7.3

So, what’s the final answer?
That depends on our tolerance for number of defects and the cost of inspections, and finding the right balance that makes sense for the organization. It can be about 4 hours to 7.3 hours depending on our choice based on our tolerance or situation.

In summary, by understanding and planning for diminishing returns, we can make more informed decisions that optimize outcomes without overshooting the point of maximum efficiency. I hope this was informational and interesting for you.