STEM

Creating Word Clouds from Google Trends

Google Trends platform provides interesting view into what the world is searching online. In our age of digital information, it has become a critical tool for understanding the world’s curiosities and behaviors. It allows us to explore the popularity of search queries across many regions and languages over time. Marketers looking to gauge consumer interests, journalists seeking data-driven stories, or just the curious individuals get valuable insights into what captures people’s attention. In this post, I share some charts and visualizations using Google Trends. Specifically, I’m focusing on distilling the unstructured textual data into a concise, non-technical visualization such as a word cloud.

Word Clouds

By visually representing the frequency or importance of words within a body of text, word clouds provide a snapshot of the most significant terms, with the most prominent words appearing larger and bolder. From educators and researchers to marketers and content creators, individuals across various fields utilize word clouds to quickly identify key themes, trends, and sentiments within their data. Whether used to analyze survey responses, enhance presentations, or simply explore textual patterns, word clouds offer a visually engaging and easy-to-understand method for making sense of large volumes of text.

There are various tools to aid in generating word clouds, some are free, some are paid with advanced features. We can create word cloud using multiple, general purpose tools as well such as Excel, Python, and PowerPoint. If you’re interested in one of those methods, check out my past article here. Here, I’ll focus more on arranging the data from Google Trends and using a light-weight specialized word cloud tool to generate a few visualizations.

Google Trends Data

While a word cloud provides a high-level view to the popular search phrases of a period, in a region, it doesn’t give exact frequency of the searches. Google Trends data however does; even though not exact, it gives a good ball park quantity to make comparisons and understand trends. Having numeric data enables us to create additional charts such as bar or column charts that word clouds cannot. Used together, they can be very insightful. Let’s take a look at what we get from Google Trends.

The Trends data downloaded as CSV looks something like this:

Depending on the tool used, the words in a word cloud visualization could be clickable and the ‘Explore link’ column values would be used for that. The ‘Trend breakdown’ column enables further drill-down on a word or phrase. We won’t delve into those in this blog, rather, we’ll focus on the ‘Trends’ and ‘Search volume’ columns. The ‘Started’ and ‘Ended’ columns are determined by the time period filter we set in Trends.

It’s clear that the suffixes such as “M+” and “K+” need to be removed or replaced. If we open the CSV exported file in a spreadsheet app, the easiest way to turn those values into numeric types is to replace “M+” with six zeros, and “K+” with three zeros. Once we have that, the next step is to understand the data a little better. We can try to chart the data showing the volume per search phrase using a bar or column chart. But beware! In the above example of data (taken on January 30, 2025 for USA), the “plane crash” searches were off the chart and far greater than any other (this is due to the absolutely unfortunate incident that occurred on that day). Charting such an outlier would leave the other data points virtually out of view. Therefore, we need to do some sort of normalization.

Data Normalization

Normalization in this context means that we need to rescale the data such that the maximum and minimum are brought closer or within a manageable range in a new scale while applying the same scale to the rest of the datapoints. This is critical not just for plotting, but also for generating word cloud which requires giving weights to each word or phrase. Imagine giving “plane crash” a weight of 10,000,000 and “fda recalls” just 5,000 as the search volume states. Many words will practically disappear or not render at all within a visualization! There’s no one-size-fits-all method to normalizing, so we need to understand the spread of the data (maximum, minimum, median, standard deviation for example), and thankfully there are several techniques to normalize data. One of the ways is to use logarithmic scale (if the numbers are large enough), another is Min-Max normalization or Min-Max scaling. Additional techniques are Z-score normalization, Decimal scaling, and Max Absolute scaling. No mater which method is used, the goal is to ensure that the scaled data preserves the relationships and proportions of the original values while adjusting them to a common scale.

Visualization Challenges

Another challenge in creating a word cloud visualization with very disparate data is due to the fact that the visualization is not accurate, nor is it meant to be, but the font size and weight must be sensible in relation to the word’s frequency. That means, even after normalization, if we end up with values such as 10 and 5 for all the sample words, we’ll just have two sizes for all the words, when it reality there were not just that similar in volumes. To address that, at minimum, I recommend 3 different “bins” at minimum ( more is preferred) so at least the image conveys somewhat diverse frequency proportions for the collection of words reflecting the data a little more honestly. If you’re in such a quandary, I suggest trying different normalization methods, sometimes in a sequence more than once until you have enough “bins”.

I should mention yet another challenge: some word cloud generators will allow you to simply paste the text and they determine the frequency based on the content. That sounds great, until you realize that Google Trends data doesn’t give you the full text that many times (that’d be utterly impractical) rather just the volume in estimated quantities per phrase. In such a case, you don’t have the text to paste. You have two choices: Use a tool that allows you to provide a CSV (or data table) with each phrase and its corresponding frequency or volume. Again, be sure to have those figures normalized to ensure the word cloud can handle the maximum and lower values. The other choice is to create the content by repeating the phrases as per its volume using a formula (in Excel, it’s REPT()), or code (in Python for example). Obviously, you’re not going to do that 10 million times for “plane crash” or even 5 thousand times for “fda recalls” phrases! You’re going to repeat it using the normalized values instead as described above.

Examples

Okay, now that we know we won’t get stuck in the middle of generating charts or word clouds, let me share some actual outputs.

These are the top 20 search phrases taken on January 30th, 2025 including the past 24 hours in the USA:

The word cloud based on that data is:

These are the top 20 search phrases taken on January 30th, 2025 including the past 24 hours in Ireland:

The word cloud based on that data is:


The following day on January 31, the search trends look very different (again, data used was for the past 24 hours). I’ll just share the word clouds for those days for both USA and Ireland below.

I hope you found this post helpful and interesting. Explore this site for more tips and articles. Be sure to also check out my Patreon site where you can find free downloads and optional fee-based code and documentation. Thanks for visiting!

Related:

Back To Top