Reshaping data for Excel via Python – Musings by FlyingSalmon

In this post, I demonstrate an effective way to reshape a CSV or XLSX file that originally has multiple rows repeating for each column. For example, many raw data exports will include a country’s metrics (GDP for example) for the year in each row and it repeats for different years as rows.

Cntry	Year	GDP
USA	2000	20T
USA	2022	25T
Canada	2000	1.8T
Canada	2022	2.14T
…etc.

Or, a data set on sales from a store by date may be collected as this:

Date	Item	Qty
1/1/22	A	10
1/2/22	A	15
1/1/22	B	10
1/3/22	B	5
…etc.

But for analysis what we need is the following format, where each country (or item in second example) appears just once and its row has all the data separated into columns in the same row:

Cntry	2000	2022
USA	20T	25T
CAN	1.8T	2.14T
…etc.

Or, for the store dataset example:

Item	1/1/22	1/2/22	1/3/22
A	10	15	–
B	10	–	5
…etc.

In order to get this format, we need to convert the data into a pivot table. We can do this in Excel easily using PivotTable feature or, we can also do this using Python without needing to open Excel as this example shows.
The code below reads a XLSX file with the raw unshaped data, and then reshapes all that using pandas’ pivot_table() function into a dataframe and writes that dataframe into a new XLSX file. And you’re done!

<span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">pandas</span> <span style="color: #008800; font-weight: bold">as</span> <span style="color: #0e84b5; font-weight: bold">pd</span>

input_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&#39;somepath/dastafile.xlsx&#39;</span>
df <span style="color: #333333">=</span> pd<span style="color: #333333">.</span>read_excel(input_file)

<span style="color: #008800; font-weight: bold">print</span>((<span style="background-color: #fff0f0">&quot;</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">Printing head of datafile</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">&quot;</span>), df<span style="color: #333333">.</span>head(<span style="color: #0000DD; font-weight: bold">5</span>))

pivot_df <span style="color: #333333">=</span> df<span style="color: #333333">.</span>pivot_table(index<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;Entity&#39;</span>, columns<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;Year&#39;</span>, values<span style="color: #333333">=</span><span style="background-color: #fff0f0">&#39;GiniCoeff&#39;</span>) 

<span style="color: #008800; font-weight: bold">print</span>(f<span style="background-color: #fff0f0">&quot;</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">pivot_df: {pivot_df.head(5)}&quot;</span>) 

output_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">&#39;somepath/Reshaped_datafile.xlsx&#39;</span>
pivot_df<span style="color: #333333">.</span>to_excel(output_file)
<span style="color: #008800; font-weight: bold">print</span>(f<span style="background-color: #fff0f0">&quot;Reshaped data is saved as: {output_file}&quot;</span>)

import pandas as pd

input_file = 'somepath/dastafile.xlsx'

df = pd.read_excel(input_file)

print(("\nPrinting head of datafile\n"), df.head(5))

pivot_df = df.pivot_table(index='Entity', columns='Year', values='GiniCoeff')

print(f"\npivot_df: {pivot_df.head(5)}")

output_file = 'somepath/Reshaped_datafile.xlsx'

pivot_df.to_excel(output_file)

print(f"Reshaped data is saved as: {output_file}")

All you have to do is change the path and file name for the input and outputfile files. And of course, you’ll need to have the pandas library installed.

The choice is yours whether you want to use Excel to do the reshaping, or this Python method. Hint: This takes no clicks and is very quick! Hope you found this helpful.

Related Posts:

Gini Coefficient and how to calculate it


▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!  
▟

Leave a Reply Cancel reply