Wednesday, April 17, 2024
STEM

When you’re going “blank”! (Excel, Python)

You have data (of course, you do) and you have data that have something missing (of course, you do). The question is multi-fold around that…do we ignore the data, do we remove them from your analysis, or do we interpolate to fill in the missing data? And if so, what type of interpolation is best? Okay, lots of real-world questions. I have already addressed some of the interpolation and mean() methods in my blog here (you can search for those words from the home page). But in this blog, I will answer specific questions brought to my attention via Quora.

The question originally posted was: “In excel, I use a function (MATCH) to return a specific row. I need to see if a cell in that same row (different column) is empty. How can I indicate that specific cell using the result from my MATCH function?”

It may seem benign enough at first sight but there’s so much more. The solution isn’t about what formula to use at all! It’s really about how do we identify the missing values easily. I won’t go into the details on various methods on how to fill in the missing values in this post (as stated, I’ve covered that in other posts) but I’ll show here different ways to IDENTIFY the missing values. That’s step 1. How and what we do once we found the missing values is another epic. So, let’s get started with a really tiny dataset for brevity. Assume this is either a CSV or XLS/X file.

Visually, in this tiny example, we see some cells are blank. In real-world cases, imagine hundreds or thousands more rows, so you can’t really depend on visual inspection in this fast-paced reality. You need to quickly find out where the cells are blank and highlight or flag them somehow. Here, I’ll present different ways to do just that, so you can take the corrective actions depending on your specific need.

Let’s start with Python. We can quickly find out the shape of the data and call out the missing values even without launching Excel.

Python Method:

Using Pandas library, we can load the file and the specific sheet from the workbook into a dataframe, say, called df. Then df.shape() will give us the number of rows and columns in that dataset. Then if we sum up the total for each column where it’s blank, we can get the name of all columns and their respective numbers where blanks were found. We can do this using isna() method which finds the “blanks” and then use sum() on top of that: df.isna().sum()

So, here’s the output from Python on the above dataset:

We can clearly see that column ContactName and Phone have 1 blank value each, and also that the dataset is 9 rows (without header) and 5 columns! This is really good information! Because now, we can home into the specific columns in Excel and easily call them out.

Okay, now let’s say we don’t know Python! We only want to do this in Excel. Sure, we can achieve this objective in various ways. Remember, our objective is primarily to find the missing column values (i.e. blank cells). Ideally, we also want to search for a company and flag any of its column that’s blank.

Excel Formula Method:

I avoid VBA solutions like a root canal whenever possible (for various reasons that are sprinkled around in my blog), so let’s start with a formula solution. (I’ll delve into even less technical ways below).

The highlighted (yellow) cells are missing. What’s being done to the right of the dataset is using vlookup() to bring in matched rows. For example, if we search for “Mayumi’s” company and need to find if it has any missing information in its record, we can do =VLOOKUP(H24,A24:E33,{1,2,3,4,5},FALSE) where H24 is the lookup value, and A24:E33 is the data range (lookup array).  Notice, I’m not just bringing in one column, but all 5 columns…entire row for it! If there’s some column in that row that’s blank, I get 0 (zero) as highlighted in red above.

If that’s too subtle, we can wrap this into a IF() statement and do whatever custom callout we want…shown in Example 2 in the above image. It clearly shows “WARNING!” or “OK” when there’s no missing value. In this case, the formula for my sheet is: =IF(VLOOKUP(H24,A24:E33,{1,2,3,4,5},FALSE)=0,”WARNING!”, “OK”)

Excel UI/Go To Method:

In this solution, we use F5 (Go To) method to highlight only blank cells: Select the entire data range. Press F5 and click on Special…Choose blanks. Then choose a Fill Color (e.g. Yellow/Red) to highlight those selected. Now they easily stand out.

and the result would be something like this (blank cells highlighted):

But there’s more! We can apply Conditional Formatting to identify them as well…

Excel Conditional Formatting Method:

In this solution, we use Conditional Formatting method to highlight only blank cells. From Home->Conditional Formatting -> New Rule…

Select “Format only cells that contain” then set Format only cells with: = Blanks. Select Format… to choose how to highlight (e.g. yellow).

Now they easily stand out…

If that wasn’t enough, how about adding one more method? We can use Filter feature in Excel to also find those naughty blanks.

Filter Method:

In this solution, we use Filters to find only blank cells. Select the entire data range. From Data->click Filter icon— this will now allow drop down to select for each column what to show or filter out easily.

GOTCHA: Selecting a blank item in a column filter can very easily obscure possible blanks in other columns, so you’ll have to check each column’s filter separately to be sure you didn’t miss any!

Another way to easily find the blank is to use the Filter drop-downs and Sort them in DESC and the blanks will appear at the bottom for the column (still will need to go by column by column).

 

Want more details? Please see below.


▛ This post is not meant to be a formal tutorial, instead it is  to offer key concepts, and approaches to problem-solving. Basic->Intermediate technical/coding knowledge is assumed. If you like more info or source file to learn/use, or need more basic assistance, you may contact me at tanman1129 at hotmail dot com. To support this voluntary effort, you can donate via Paypal from the button in my Home page, or become a Patron via my Patreon site. A supporter/patron gets direct answers on any of my blogs including code/data/source files whenever possible. At the very least, it’ll encourage me to share more posts/tips/knowledge in the future. 

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top