In some of my previous posts, you’ve already seen my Python examples on how to count words accurately in a document or in blocks of text (search for: Wordcloud). It’s also possible to count the words in Excel, but we have some gotchas there to be aware of. In this blog, I demonstrate some of those and how to properly count words in different scenarios.
Let’s say you have a block of text that has bullets such as below.
To insert different page numbers for each page in word follow the procedure below: • Insert page numbers to all pages in the required format eg 1,2,3 a,b,c etc. • Go the end of the first page • Go to the layout tab, go to Breaks and select Next page and section breaks. • Click in the next page and specify the page number and format from the insert tab. • Create another section at the end of the second page. • Repeat the procedure for each new section created. |
In Excel, we can get the number of words in text by taking the entire length of the text (assume it’s in a cell), find the length of texts containing only spaces, then deduct that from the total length, and finally add 1. e.g. =LEN(A16)-LEN(SUBSTITUTE(A16," ",""))+1
When we apply this to the above content, we get 96 (content is in cell A16 in this example). If we manually count it, or verify it in Microsoft Word, we see the count is actually 84! Obviously the bullet items are causing issues here. Let’s take the following example…
To insert different page numbers for each page in word follow the procedure below: ✓ Insert page numbers to all pages in the required format eg 1,2,3 a,b,c etc. ✓ Go the end of the first page ✓ Go to the layout tab, go to Breaks and select Next page and section breaks. ✓ Click in the next page and specify the page number and format from the insert tab. ✓ Create another section at the end of the second page. ✓ Repeat the procedure for each new section created. |
Using the formula above, we get 90, whereas it’s actually 84. The tickmarks are unicode symbols and should not be counted as normative words. So, how do we get around this? Is there a way for Excel to report the correct number of count? Yes, there is. The trick is to apply TRIM which will remove trailing and leading spaces and some non-ANSI characters. So, if we apply this formula: =LEN(TRIM(A20))-LEN(SUBSTITUTE(A20," ","")) +1
we see the numbers to be exactly 84, which is what Word is reporting as well.
So far so good. What if we had a series of elements (words or values) in a cell that are separated by some delimiting character (comma, space, semicolon, whatever)? Can we get the number of elements from that cell in Excel? (If you search for my posts on “tkinter” you will find some of earlier posts where I save a bunch of coordinates and hex color codes in a file, delimited by a character, and then I read the file, parse it, and draw them on screen…using Python). Yes, in Excel, we can also get the count of those elements and even parse them and put individual element in its own cell (this is done by Text to Columns feature in Excel). However, to get the count automatically, we need to again resort to some functions.
The idea is the same as above for counting words, except this time we need to split them using a different delimiter. So if the delimiting character is specified in cell B3, we can apply this formula to get the exact count of elements in the long text in cell A1 for example: =LEN(A1)-LEN(SUBSTITUTE(A1,$B$3,""))+1
Where A1 content may look like this:
#F0F8FF,#FAEBD7,#00FFFF,#7FFFD4,#F0FFFF,#F5F5DC,#FFE4E1,#000000,#FFEBCD,#0000FF,#8A2BE2,#A52A2A,#DEB887,#5F9EA0,#7FFF00,#D2691E,#FF7F50,#6495ED,#FFF8DC,#DC143C,#00FFFF,#00008B,#008B8B,#B8860B,#A9A9A9,#006400,#A9A9A9,#BDB76B,#8B008B,#556B2F,#FF8C00,#9932CC,#8B0000,#E9967A,#8FBC8F,#483D8B,#2F4F4F,#2F4F4F,#00CED1,#9400D3,#FF1493,#00BFFF,#696969,#696969,#1E90FF,#B22222,#FFFAF0,#228B22,#FF00FF,#DCDCDC,#F8F8FF,#FFD700,#DAA520,#808080,#008000,#ADFF2F,#808080,#F0FFF0,#FF69B4,#CD5C5C,#4B0082,#FFFFF0,#F0E68C,#E6E6FA,#FFF0F5,#7CFC00,#FFFACD,#ADD8E6,#F08080,#E0FFFF,#FAFAD2,#D3D3D3,#90EE90,#D3D3D3,#FFB6C1,#FFA07A,#20B2AA,#87CEFA,#778899,#778899,#B0C4DE,#FFFFE0,#00FF00,#32CD32,#FAF0E6,#FF00FF,#800000,#66CDAA,#0000CD,#BA55D3,#9370DB,#3CB371,#7B68EE,#00FA9A,#48D1CC,#C71585,#191970,#F5FFFA,#FFE4E1,#FFE4B5,#FFDEAD,#000080,#FDF5E6,#808000,#6B8E23,#FFA500,#FF4500,#DA70D6,#EEE8AA,#98FB98,#AFEEEE,#DB7093,#FFEFD5,#FFDAB9,#CD853F,#FFC0CB,#DDA0DD,#B0E0E6,#800080,#663399,#FF0000,#BC8F8F,#4169E1,#8B4513,#FA8072,#F4A460,#2E8B57,#FFF5EE,#A0522D,#C0C0C0,#87CEEB,#6A5ACD,#708090,#708090,#FFFAFA,#00FF7F,#4682B4,#D2B48C,#008080,#D8BFD8,#FF6347,#40E0D0,#EE82EE,#F5DEB3,#FFFFFF,#F5F5F5,#FFFF00,#9ACD32 |
So, essentially, we counted the number of colors specified in the long string in A1 cell. For the curious readers, below is the code for Python where I enter a file name (with path if it’s not in a current directory, or even on the network), specify any delimiter (default: space) and it’ll spit out the total number of words in that text file.
and a sample session and output from that code:
Happy counting!
▛Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!
▟