Interactive Text-To-Speech App with Presidents of USA (Python)

In this blog, I’ll discuss how to build an interactive lookup of data and have the results spoken back to you in human language. Specifically, I have a list of all POTUS to-date in a file (Excel, CSV, Text…they’ll work without any code design change). I want to be able to lookup the very first few presidents of the United States, or very last, or first 5 or last 5, and any combination thereof. Once I get the correct data, I want my program to read it aloud back to me with correct pronunciation of each president’s full name (and also show on screen for reading). There’s a few interesting concepts here, so grab your favorite beverage and read on…

First, let’s talk about reading the data file containing the POTUS. To keep things in this example simple to explain, I’ll only have one column with header “Presidents” and all the full names under it (first, middle, last parts space-separated as you’d normally) as rows. However, it doesn’t have to be an actual CSV or Excel or any database file (although if they are, it’s fine too, my app will handle it), but can be as simple as a text file. Then we’ll use the basic interaction features provided in any language…I’ll use Python…to specify which chronologically ranked presidents you’re looking for, or all of them.

Lastly, with some massaging, I’ll convert the text results into speech (TTS engine), and optionally even save the exact result’s audio file as a MP3 that I can embed, attach in email, caption, translate, etc.

Before we dive into the code design, it’s helpful to understand what libraries I found to be most useful for this project. To efficiently extract and show specific parts of data, I’ll use pandas library. For TTS, there are several available for Windows and Python as well. I’ll use one that won’t require a constant internet connection so it works offline without having to make any web service calls or require any internet connection. I’ll use pyttsx3 and I read up on it here before I experimented with it.

Along the way, the other important concepts utilized here are: dataframe <-> string conversions. Type conversions. Parsing and detection of numeric vs alphanumeric vs alpha, looping, conditions, etc. But I won’t bore you with all the details as you’ll encounter them as you need.

Background & Setup

Text To Speech (TTS) is a software process of converting text into audio that is human speech. This is done with mathematical computations that require a specific degree but thankfully, not all developers need to have that degree, they just need to learn how to use use speech synthesis in a programming language. pyttsx is a cross-platform speech library that works very well on: Mac OSX, Windows, and Linux. I’ll be using Windows.

How to check what voices are installed on a system

First, for application development, we need to install the speech library as mentioned above. Users of the application should not need anything special to run the program as they should be either installed separately on the user’s machine by the app or linked (statically or dynamically) with the app so users have no need to bother with additional installations. So, as someone developing this, you should understand your environment first.

Run regit and drill down to:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens

1	HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens

and there will be the information about voices and languages installed. In Windows 10, I see

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0

1	HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0

and

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0

1	HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0

They happen to be male and female voices respectively and in En-US (english-american accent). DAVID node is for male voice, ZIRA node there is for female. In fact, once we have the TTS object created, we can something like:

print(engine.getProperty('voice'))

1	print(engine.getProperty('voice'))

to get those paths! And to set it to a specific one, if I execute:

engine.setProperty('voice', 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0')

1	engine.setProperty('voice', 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0')

it’ll actually pick ZIRA (i.e. female) voice for the next speech. engine is the name of my object I created in this example (more on that later). This detail is not always needed but helps to understand how the API connects to your OS capabilities. The API makes it easier to set the male/female property with a call like:

engine.setProperty('voice', voices[n].id)

1	engine.setProperty('voice', voices[n].id)

where n is 0 (male) or 1 (female).

With that interesting tidbit behind, let’s see the steps to create a TTS object in our program and speak out a string. Later, we’ll combine this concept to feed it the string that’s based on advanced pandas library capabilities.

To create the TTS engine/object, we can do:

engine = pyttsx3.init()

1	engine = pyttsx3.init()

Then feed it a string to read aloud:

engine.say(s)

1	engine.say(s)

Finally, execute it with:

engine.runAndWait()

1	engine.runAndWait()

Optionally (chosen at compile-time or run-time), I use the

save_to_file()

1	save_to_file()

method to export the speech into a persistent file in MP3 format. Very cool!

In-between, we can configure to it pick the male or female voice, language, age, volume among other things. In my app, most of it will be kept default except, I’ll switch up the gender randomly at each run 🙂 and change the volume to 90%. The URL above has the link to details on supported configurations.

Let’s switch our attention back to the data source…for which we’re best of with pandas library. My pandas instance is called pd with: import pandas as pd so I’ll read the source file containing the names of ALL POTUS (to-date) with:

pd.read_csv()

1	pd.read_csv()

into a dataframe. Now, I can use

head()

head()

and

tail()

tail()

methods freely on this dataframe! To show first 5 records, I can do:

df.head(5)

1	df.head(5)

and to show last 10, I can do:

df.tail(10)

1	df.tail(10)

(where df is the name I gave to my dataframe object).

The Source Code

The amount of actual written doesn’t have to be very large as most of the heavy lifting of TTS and dataframes are already provided by the libraries linked into the app. At any rate, here’s a glimpse of the entire code. If you’re serious about learning more, please see the bottom of this post (or any of my posts) to get hold of me for additional support/info. But notice the power with so little typing (mostly thinking :))!

The main code flow is:

Read the data file containing POTUS names and load into a dataframe

Offer user a menu to choose which ones they want to see (specific number (n) from past, some from recent, all, etc.)

Call my custom function show(num: string) passing the user-entry as argument (and processing conditionally, and error checking)

The function show() will convert num to int (positive or negative: positive to look from first record onward, negative to look from last record backward), extract n number of records and convert each dataframe into a string using

dataframe_obj.to_string()

1	dataframe_obj.to_string()

method…this enables me to not show index or header rows for brevity in display.

That function will call my another custom function

speakIt(text: string, flag: int)

1	speakIt(text: string, flag: int)

where the first parameter is sent to the TTS object to produce synthesized voice based on text. Within that function, I set a random variable that randomly switches between male and female at every call by

n=choice([0, 1])

1	n=choice([0, 1])

and passing n to voices array as its index as in :

engine.setProperty('voice', voices[n].id)

1	engine.setProperty('voice', voices[n].id)

…this keeps it voices always fresh 🙂

If the 2nd parameter is non-zero, then it’ll save the entire speech into a MP3 file. However, I also want the audio part that is “intro”, such as speaking: “These are the first n American presidents” or “These are the last n American presidents” and then the actual names of those n number of presidents where n is user-defined in the input. To accomplish that, I build that introduction string at run-time (since we never know what n will be entered by the user) and then glue on the rest of the data results’ audio and finally save the intro+data audio as one file.

Enough theory…let’s see it in action!

EX 1: This is the screen output when user entered 3: (inputs denoted in blue )

As you can see the output shows the FIRST 3 presidents’ full names. And as I chose to, generated WITH intro the following audio file:

Audio output (Voice: Female. First 3 POTUS):

EX 2: This is the screen output when user entered -3: (inputs denoted in blue ) to show the most recent 3 presidents…

And here’s the audio…

Audio output (Voice: Female. Last 3 POTUS):

EX 3: This is the screen output when user entered -5: (inputs denoted in blue ) to show the most recent 5 presidents…

Audio output (Voice: Male. Last 5 POTUS):

Hope you enjoyed the tips here. If you need more help, read below. Some fun things you can try: Find the presidents by year…let the user enter a year and extract the president(s) in that timeline. Have fun 🙂

This post is not meant to be a step-by-step, detailed tutorial, instead to serve key concepts, and approaches to problem-solving. Basic->Intermediate technical/coding knowledge is assumed.
If you like more info or source file to learn/use, or need more basic assistance, you may contact me at tanman1129 at hotmail dot com. To support this voluntary effort, you can donate via Paypal from the button in my Home page, or become a Patron via my Patreon site. A supporter/patron gets direct answers on any of my blogs including code/data/source files whenever possible.

Background & Setup

The Source Code

Leave a Reply Cancel reply