For most people working with data for a significant amount of time, formatting raw data from different sources are both a reality and a pain. Today, I show different ways to format an un-formatted string that contains phone numbers using Python, and Excel.
Our task is to format it (either for human-reporting, or consumption by a system that expects a formatted string) in various ways using both Excel formulas and Python code (as a bonus).
Let’s start with Python as it’s the easiest.
First, upon looking on the Net for ideas, I found that most of the solutions employ regEx or another external library called phonenumbers. While they work perfectly, I think in most cases, they’re overkill as formatting strings in Python is quite easy and powerful in the first place. However, I’ll touch on the phonenumbers library.
This library is available from https://pypi.org/project/phonenumbers/ and once installed in your system, you’ll need to import it to your code and use its parse() method. A snippet below…
Python Example 1:
In the first usage, we don’t supply any country code (there are many available, and documentation is available on the site listed above). However, it still was able to extract the country code and the local number into 2 parts as shown here:
Country Code: 44 National Number: 2083661177
In the second usage, we supply a different format and specify United States. It parses it as below correctly:
Country Code: 1 National Number: 4258366117
So, this is very useful for formatting phone numbers from around the globe.
However, in most of my cases, I deal with US data and I don’t want the values to be parsed out separately and have to reconstruct them into one variable…I want to just one variable to store the appropriately formatted string.
So, given a string "1234567890", I want outputs to be one of the following: (123)456-7890;123-456-7890
So, let’s tackle this in Python first, then in Excel formulas (NOTE: You can do most of these from the GUI in Excel also, but using the formulas makes it really scalable,efficient, less error-prone).
Python Example 2:
In the above example, we format the string to this: (123)456-7890
The key is concatenation (+) and Python’s syntax to move the pointer to a specific character location along the string easily. For example [0:3] means: start from the beginning of the string and up to (but not including) the 3rd character. [6:] means start at 6th character and until the end of string (Python calculates the length of the string automatically for this). Note, that these locations are 0-based as in C/C++, so to get to the second character, you’d specify 1.
In the next example, we format the same string to 123-456-7890
Python Example 3:
None of these used any external libraries.
Now on to Excel…
To follow along, I have the un-formatted string in A5, and A8. The results in dash and parentheses formats are shown in C5, and C8 where the formulas are entered. The formulas in those cells are shown in D5 and D8 for explanation.
EXCEL Examples:
The first example (row 5) is easier because we’re only inserting the same single character “-” in two places of A5 value. So, using nested REPLACE() does the job easily. We insert a “-” in the 4th position of the A5’s string. (NOTE: Excel is 1-based array, not 0-based like Python.) Then again with outer REPLACE() in the 8th character, but not the 8th character of the original of course! Because, the inside REPLACE() has modifed A5 value to “206-5556666”, we need to recount this new string’s position and insert the next “-” accordingly (which is exactly in 8th position, after “…555”).
The second example (row 8) is slightly more complicated because we have to insert three different characters: “(“, “)”, and “-“. So, we’ll need 3 REPLACE() functions to do the job.
The first REPLACE() changes the A8's value to "(2065556666".
The next REPLACE() takes that and changes it to "(206)5556666".
The third REPLACE() takes that and changes it to "(206)555-6666".
So, instead of using 3 different cells and formulas and recombining them into the final cell, we can enter this formula in C8: =REPLACE(REPLACE(REPLACE(A8,1,0,"("),5,0,")"),9,0,"-")
Hope you enjoyed these tips, now go play with formatting in Python and Excel 🙂