In an earlier blog How Well Are Your Incentives Working? I shared methods on how to calculate the various chances of success based on some campaign data (e.g. new sign-ups for membership) from various trials using Binomial distribution.

In this post, I show how to determine if events occurred due to random chance or affected by another factor? In other words, if a promotion/campaign was successful or not.

Let’s suppose we have data on a equal number of periods (assume: seasonal factors not impacting, equal variances, and normal distribution) of subscriptions for a company’s service. Magazines, online, memberships, whatever. The data we have are two sets: one for before the promotion in the same time period, and another set during the promotion. We want to compare them, and test our hypothesis that the change in number of subscriptions occurred by chance (random effect) or if it was due to the campaign. If the change (hopefully increase) is due to the campaign, the campaign, we’ll call it a success, otherwise the campaign was not successful.

How can we tell? Just by looking at increased numbers over previous period may be the easiest way but it’s not necessarily deterministic. We need a more sophisticated way. A way that involves statistical methods. Specifically, we’ll need standard deviations and means of the both datasets, using which we’ll have to run a T-test. This will give us a probability value that it was by chance (meaning, promotion was ineffective) or by a real non-random factor (meaning, promotion was successful).

In order to run the test, we need to understand whether or not we’re looking at two-tailed datasets or one-tailed (e.g. are there just increases or just decreases, or both). Imagine a normal distribution curve. Also we need to understand the spread of the data…how far or close are they to the means in both datasets? Additionally, we need determine if they’re of equal or unequal variances. In order to answer that last question, we also need to employ another test: F-test. Let’s start with the datasets…

Based on these datasets, we now have the standard deviations and means for both. A F-test yields a value of 0.85 (`F.test()`

just takes the arrays of each dataset and Excel will do heavy lifting for you), which means it’s of equal variance as we set out hypothesis to say if it’s >0.05 then it’s equal variance.

To answer the the previous question of tails, we see see it’s two-tailed because the values change in both directions for the datasets (i.e. left of the mean, and right of the mean…or left of the curve and right of the curve).

Next, the we’ll use T.test() function. It requires the arrays of the datasets, and takes the arguments for tail and type. We determined it’s a two-tailed set (which means 2 in Excel) and of equal variance (which means 2 in Excel) so the 3rd and 4th argument will be both 2. The function then looks like this: `=T.TEST(B3:B6,C3:C6,2,2) `

where the 1st and 2nd arguments are the dataset arrays. What this will give us is the probability that it occurred by non-chance.

In this case since the p value `0.582860716`

is far larger than 0.05 we can say the campaign did not make any significant difference. Any increase in subscriptions during the campaign was not necessarily due to the campaign and possibly by random chance or other external factors.

*NOTE: the p-value threshold (0.05 in this case) is not a hard rule, and depending on the context and the stakes of the decision, a higher p-value might still suggest a meaningful difference. *

Other related key concepts around this is normalization/standardization, and correlation which you can read about in a 3-part blog here.. I hope it was interesting and/or useful. For those unfamiliar with some of these concepts, take heart. You don’t need to be a statistics major, just research/study on key statistical concepts that are of practical use in life and work.

▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: **Hello Arduino!**

▟