Tuesday, August 13, 2024

Finding a missing number in a sequence

Imagine having a list of thousands of numbers in a sequence that is supposed to be sequential (incrementing by one) and somewhere there’s a number missing. That number could be an ID or some required piece of information. Also, what if the sequence is not incrementing by one but by some other number (e.g. by 3, or by -5.1, etc.)…and there’s some number missing in that sequence of thousands of digits…how do you find it?

Fortunately, both of the above problems can be easily solved, meaning identifying the missing number, by using two formulas: Sum of Natural Numbers or Sum of Arithmetic Sequence.

Here are the formulas:

Sum of Natural Numbers: N*(N+1)/2

Sum of Arithmetic Sequence: (N/2) * (2a + (N-1)d)

where N is the expected number of digits in a full sequence, and a is the first digit in sequence (when sorted ascending), and d is the common increment or difference between consecutive terms.

So when do we use which formula? Use Sum of Natural Numbers when numbers between consecutive numbers increment by 1. Use Sum of Arithmetic Sequence when numbers between consecutive numbers increment by 1 or by another magnitude either in positive or negative direction (decrement).

These formulas are amazingly effective and can be easily entered into an Excel sheet to find missing row for example from hundreds of thousands of rows in a flash! Or, if you prefer you can program it in your favorite language to take sequences and spit out the missing number.

To demonstrate, I’ll use a small set of sequences and will show both formulas at work, implemented in Python below.

Let’s say you have a sequence of these numbers that’s supposed to be incrementing by 1. Number 4 is missing from the 1, 3, 7, 5, 6, 2 sequence and we need to identify it using algorithm (not eye-balling!) and that should scale to a sequence of any huge size.

The following will identify it using the Sum of Natural Numbers formula I presented above.

Now, let’s try another sequence…this time it’s decimal numbers and it increments by some decimal amount as well. The sequence is: 2.25 5.25 6.75 8.25 9.75

For such a small set we immediately see by eyeballing that 3.75 is missing and the sequence progression is 1.5. However, we need an algorithm that can handle ANY sequence of ANY length. My code below does exactly that:

The code takes user-input of an incomplete sequence (i.e. with a missing number), and the code also needs to know the expected increment between consecutive numbers in a complete sequence, and it outputs the missing number!

In this case, we had to do a little bit of data structure manipulation. First we convert a string and convert to a list of strings (like words) and remove the spaces in-between. Then we convert that list of words into numbers, in this case, I’m converting to float so that it’ll work with decimal or integer sequences. We also convert the delta (d) to number, and calculate a (first digit in sequence) by using minimum min() function because the least number in the list is the starting number for instance. Then compute the sum of the full sequence by the formula and then sum of the given sequence. Finally, I deduct the sum of the given sequence from the expected sum of full sequence to find the missing number.

Running the code blocks above, the output is:

`USING SUM OF NATURAL NUMBERS: N*(N+1)/2Missing digit is: 4 in sequence:[1, 3, 7, 5, 6, 2]USING Sum of Arithmetic Sequence: (N/2) * (2a + (N-1)d)Enter a sequence of numbers separated by a space in-between with a number missing: 2.25 5.25 6.75 8.25 9.75What's the expected difference between consecutive numbers in that sequence?: 1.5Missing digit is: 3.75 in sequence:[2.25, 5.25, 6.75, 8.25, 9.75]`

In both cases, the codes identified the exact number missing from the sequences. You can try other sequences and the code will still work perfectly.

Did you notice something in the code examples above? The formulas themselves do NOT output the missing number…rather they output the EXPECTED SUM of a complete sequence, so we take that sum, and deduct the sum of the given sequence and THAT is the missing number.

Let’s wrap this up with an example of solving this in Excel. Below we have a sequence A29:A33. The expected delta is given as 1 in C28. In this short list we can see that -5.5 is missing from the sequence, our job is to make Excel find it by using our learned knowledge above.

The formulas for each cell is shown in blue in the image below so it’s really easy to follow. As you can see the missing number is correctly identified as -5.5

Hope you found it fun and informative.

``````
▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!
▟``````