In this blog, I’ll share tips on:
- How to read data into a dataframe, create a subset of dataset in memory
- Read/write curated subset from/to clipboard as Text/CSV or XLSX
- Create new data files CSV/Excel
The Scenario
I have a large dataset. Checking its shape via Python, I see it has 18241 records with 9 columns. I want to only read a subset of the data (say, first 10 rows), copy it to clipboard in a format that can be pasted into any editor or spreadsheet as a table. Then I want to read the content from the clipboard and save it as a new dataset in a CSV as well as an Excel format.
The Solution
We can use pandas library for this, and will be using its dataframe data structure and methods for clipboard and file read/write. So, I import it and aliased it pd:
import pandas as pd df = pd.DataFrame()
Assume that my data file is ufo2.xlsx residing in a subdirectory called Data. Then the following 2 lines will load the entire data into a dataframe df.
sourcedata='Data//ufo2.xlsx' df=pd.read_excel(sourcedata)
We can quickly check its size by: df.shape attribute which returns (18241, 9)
To modify the dataframe to only contain the first 10 rows, we can reuse df variable and state:
df=df.head(10) print("Selected rows in dataframe:\n", df)
The print verifies that we have the selected rows in the dataframe.
Then we can send that subset to the clipboard by:
df.to_clipboard(excel = True, index=False)
Let's read back from the clipboard, and print its contents to verify: df=pd.read_clipboard() print("From clipboard\n:", df)
Now, we can write the subset to any persistent file in either CSV or Excel formatswith data from clipboard
output="Data//clipboardex.xlsx" df.to_excel(output, index=False)
or to a CSV file as:
output="Data//clipboardex.csv" df.to_csv(output, index=False)
The following on-screen print outputs verify that it’s working as intended…
Once we copied the subset to clipboard, we can verify by opening Notepad and pasting the content into it, which should look like this…
After creating the CSV and XLSX files, we can open in Excel to verify as below…
There you have it! For more Python and data tips, look here.
This post is not meant to be a step-by-step, detailed tutorial, instead to serve key concepts, and approaches to problem-solving. Basic->Intermediate technical/coding knowledge is assumed.
If you like more info or source file to learn/use, or need more basic assistance, you may contact me at tanman1129 at hotmail dot com. To support this voluntary effort, you can donate via Paypal from the button in my Home page, or become a Patron via my Patreon site. A supporter/patron gets direct answers on any of my blogs including code/data/source files whenever possible upon request.