In one of my earlier blogs, I shared the techniques and code to create Word Cloud using Python and Excel together in this article: Word Cloud III – Python & Excel Together
Today, building on that, I’ll analyze some of the most historic speeches and/or documents that every kid should read and even memorize part of. I love reading well-crafted, impactful speeches regardless of political affiliation or agreement, and I love finding patterns in things 🙂
I’ve picked the following incredibly powerful speeches/documents (the text is available in public domain) and use their transcripts so I can apply my code to get their metrics.
For example, I found that:
> MLK’s “I Have A Dream” speech (in 1963) was a wordy (but repetitive for hammering it home) speech of 711 words.
> “We Shall Fight On The Beaches” by Winston Churchill (in 1940) had 311.
> “Gettysburg Address” by Abraham Lincoln (1863) had just 279.
> Roosevelt’s inauguration speech (1933) had a whopping 1966 words.
> But Theodore Roosevelt’s “The Man With The Muck Rake” (in 1906) takes the cake at 3345!
> JFK’s inaugural address (1961) was 1397 words.
> Whereas Declaration of Independence (1776) text contained slightly fewer words at 1389.
The frequency of words are from my above-mentioned word cloud program I describe above.
The count of words are obtained from another of my little python app that can take any text file and shows the total count (including “noise” words). The method is to read the content, split them into single words using space a delimiter, load them into a list as each as a single element, and count the number of elements in the list. The result is the total count of words in the file. I didn’t use any special library (in fact, I didn’t import a single library outside of built-in Python’s). The core of that magic is below:
with open(fname, 'r') as f:
for line in f:
words = line.split(separator)
num_words += len(words)
By the way, if you want to create audio (MP3, WAV) files using these speeches, take a look at my techniques in the following articles:
Interactive Text-To-Speech App with Presidents of USA
and Speech Recognition and back (using AI/Python). That way, you can create your own audio synth of Abraham Lincoln’s Gettysburg address for example. (I even used to read aloud my bio and resume and found oddities in grammar and flow much more easily than proof-reading!)
Here’s the overall process:
Write the python code to get stats. Once, I have these stats generated, it’s relatively easy to create the word cloud. My Python codes exported these stats to external CSV files (one for each speech transcript). Then I imported them all into an Excel workbook. A sample dataset looks like this in Excel:
Then each dataset is used to create its unique Word Cloud image.
Here are the results for each:
But Theodore Roosevelt’s “The Man With The Muck Rake” (in 1906) takes the cake at 3345!
JFK’s inaugural address (1961) was 1397 words.
Whereas Declaration of Independence
So, now we can clearly see which words appeared most in which speech (bigger the font size, the more frequent…same ideas as with most bubble charts).
There you have it. If you’re interested in the full code end-to-end for your perusal, feel free contact me as per below. (These Word Cloud images were generated on fed data by an Excel add-in from Office Library. For creating your own from scratch in Excel, please see my methods described in this post.)
Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!▟