In this post, I demonstrate an effective way to reshape a CSV or XLSX file that originally has multiple rows repeating for each column. For example, many raw data exports will include a country’s metrics (GDP for example) for the year in each row and it repeats for different years as rows.
Cntry
Year
GDP
USA
2000
20T
USA
2022
25T
Canada
2000
1.8T
Canada
2022
2.14T
…etc.
Or, a data set on sales from a store by date may be collected as this:
Date
Item
Qty
1/1/22
A
10
1/2/22
A
15
1/1/22
B
10
1/3/22
B
5
…etc.
1
But for analysis what we need is the following format, where each country (or item in second example) appears just once and its row has all the data separated into columns in the same row:
Cntry
2000
2022
USA
20T
25T
CAN
1.8T
2.14T
…etc.
Or, for the store dataset example:
Item
1/1/22
1/2/22
1/3/22
A
10
15
–
B
10
–
5
…etc.
1
In order to get this format, we need to convert the data into a pivot table. We can do this in Excel easily using PivotTable feature or, we can also do this using Python without needing to open Excel as this example shows. The code below reads a XLSX file with the raw unshaped data, and then reshapes all that using pandas’ pivot_table() function into a dataframe and writes that dataframe into a new XLSX file. And you’re done!
<span style="color: #008800; font-weight: bold">print</span>(f<span style="background-color: #fff0f0">"Reshaped data issaved as:{output_file}"</span>)
All you have to do is change the path and file name for the input and outputfile files. And of course, you’ll need to have the pandas library installed.
The choice is yours whether you want to use Excel to do the reshaping, or this Python method. Hint: This takes no clicks and is very quick! Hope you found this helpful.