STEM

Python in Excel — Introduction (how and why)

Python in Excel is a feature that allows users to write and execute Python code directly within Excel spreadsheets. This integration enables advanced data analysis, automation, and visualization without needing external add-ins or additional purchases.

Advanced, complex visualizations that are not possible directly in Excel is now possible with Python code that uses powerful libraries like pandas, matplotlib, seaborn, scikit-learn, numpy, and others. Those libraries also enables us to handle really large datasets within Excel that it is inherently not made to tackle by leveraging Python’s advanced features.

Going beyond Excel’s built-in models, being able to use Python’s AI and ML libraries allow us to build predictive models within Excel that were not possible before. Similarly, it opens up possibilities for utilzing more sophisticated statistical functions than Excel’s built-in formulas.

Some challenges exist, however. Getting used to the new paradigm of entering Python code in the Excel formula bar and understanding its output may seem quirky, not just to traditional Excel users but also to Python IDE users. Additionally, most Python scripts will run slower within Excel compared to standalone Python environments. And obviously, those completely unfamiliar with Python will not be able to leverage these features unless they learn the Python language.

In this post, I’ll share some practical uses of Python in Excel, share the outputs, purposes, and the Python code to make it all happen.


Car Dealership Example

Suppose we have a dataset from a small auto dealership that contains monthly sales of vehicles, types of vehicles sold, and basic buyer information (age, gender). The dataset is below.


We want to do some analyses and create some visualizations in Excel, but using Python.

The first thing we want to do is load the dataset into a pandas dataframe. That dataframe (let’s call it df) will be used in Python code. The code is writtten in the formula bar, but unlike regular Excel formulas, when we enter Python code in the formula bar, it allows multiple lines (with each ENTER), and to run the Python code, we press Ctrl-ENTER. The ouput will be placed in the same cell where the formula or code in this case, was entered for. The output will show up as a Python object by default and the cell will have a small prefix [ py ] in the cell but may not have show the actual ouput (depending on the output type: query results, charts, dataframes, etc.). To see the full output as in a normal Excel worksheet, we need to specify so by selecting from the little dropdown on the left of the formula bar and select “Excel Value”. To collapse it and get back to Python object, select the same dropdown and select “Python Object”. The UI looks like this along the formula bar:

But how does Excel know if the formula bar contains a standard Excel formula or Python code? That’s done by switching the mode of the formula bar, which is simple. For Excel formulas, we start with the = (equal sign) as usual. But to switch to Python code mode, instead we start with PY followed by the Tab key press. As soon as we do that, you’ll see the green square with  PY  show up on the left of the formula bar.

I have summarized all the codes to enter in the formula bar for all the examples used in this post in the following tables. (Use your browser or device’s Zoom feature for a closer look as needed)

Table continues below…


So, to create a dataframe for Python from a dataset in the range B7 to F37 (our dealership data), we select a cell (ensure there’s enough blank cells around it for it to display fully or you’ll get “#SPILL!” error), and go to the formula bar, enter “=PY” then Tab. The formula bar changes to Python code box: there we simply enter one line: df=xl("B7:F37", headers=True)

The xl() function is specific to Excel in Python feature. The second parameter tells it show the header row as well for readaiblity. If you entered the formula in a cell I8 for example, you’ll see “[PY] DataFrame” appear in that cell. This indicates that you have successfully created a Python dataframe from the Excel data range.

To get basic statistical information about our data, now all we need to do is to use pandas describe() function on the dataframe. We called our dataframe ‘df’ so, in a new cell (again, make sure you have some blank cells to the right and below as it’ll need the space to display the results on the sheet), we enter another Python line of code: df.describe()

Remember, to get into the Python code mode, in the formula bar, we have to type =PY then press Tab. We get various statistical metrics of the dataset in return such as: count, mean, min, quartile 1 to 3, max, and standard deviation for numeric or date columns such as SaleDate, Price, Age (compare them with the headers in the datarange and you’ll notice Gender metrics are not shown here. That’s because it’s categorical and that’s by design).

Similarly, we can get count of vehicles by type, sales by dates, most common types of vehicles sold, monthly average sales, and much more. See the tables above for their codes and try them out according to your data range. Notice, how we don’t refer to the cell addresses anymore as we would normally in Excel, instead we’re using the dataframe and Python variable names. The dataframe once created and saved in our variable (df), will persist and we can continue to work with it all we want including for creating visualizations!

Visualizations

Following are the outputs from various charts created with Python code and embedded straight into the Excel worksheet. For the exact Python code for each viz, refer to the tables above.

Average Sales By Month Column Chart (it’s the same as Bar chart, only moves up vertically instead of to the right, so the Python value for kind attribute is “bar”):


Customer Segmentation:

We can find clusters of similar vehicle sales based on attributes like price, gender, and age using K-Means Clustering. The clustering approach helps in understanding the customer base and their purchasing patterns. A cluster in K-means groups data points that are similar in characteristics such as price and age cluster for younger customers. And another cluster for higher priced autos and its associated age group as seen by the pattern. So, each cluster is telling us about the demographic of customers in a specific price range.


Customer Demographics:

Show vehicle types sold and the count of genders sold to, as well as the mean age of those genders.


Scatterplot:

How does the age of customers relate to the price of vehicles they bought? To answer that, we can create a scatterplot using seaborn library.

Queries:

With pandas query() function, we can query the dataframe for retrieving all sorts of information. For example: df.query('Price > 30000 and Gender.str.contains("Male")') will return all vehicle information that were over $30,000 bought by the male customers.

Gannt Chart Example

For ages, people without Microsoft Project, have been using stacked bar charts to create Gannt charts (which requires extra steps to simulate it, as Excel does not have a built-in Gannt chart type), but this feature has been in matplotlib library of Python for a long time. Now, using Python in Excel, we can integrate Gannt charts into our Excel workbooks.

For example, we have these columns: Task ID, Task Name, Start Date, End Date for a project. The Task Name column contains categorical values: Project Planning, Design Phase, Development, Testing, Deployment, Review, Documentation for example and their associated task Ids, start and end dates. We can create a Gannt chart as follows using some Python code (see Tables above for the code).


As you’ve gathered by now, the ability to have Python code refer to Excel cells (that includes drop-down lists) as if it’s a declared variable, opens up new creative potentials for Excel users; the possibilities are further expanded with the ability to bring in Python’s advanced statistical calculations and visualizations into Excel workbooks.

Nevertheless, there are some limitations. Currently, Python in Excel does not natively support TensorFlow or PyTorch, as it primarily focuses on data analysis libraries. While Excel is great for structured data analysis, deep learning models typically require more computational power and are better suited for standalone Python environments. The Python in Excel requires no local Python development environment setup from the users’ point of view, this is because, the calculations are done in Microsoft’s cloud environment (where our Python code are being interpreted and actual results returned from). This also means a few seconds delay in refreshing the results. Personally, in most cases, I use my local, dedicated Python development environment for most Python needs as it’s far more flexible and is faster. However, for certain scenarios that require integrating Python results within Excel workbooks, this is a welcome option.

To help me continue to produce and share more content that’s informative, and ad-free, please consider making a one-time or a recurring small donation by clicking here (securely transacted via PayPal™).

Back To Top