Ever wonder what similarity a pair of words have with each other? Quantifiably? As in percentage? As in 50% match or 100% match, etc.? (Whoa, wouldn’t that be great if that worked out in dating world, eh?). Anyway, it’s okay if you haven’t wondered about it, because I did and here I share a few results using Python.
So, I wrote this small script in Python that uses an amazing module (that’s why Data Scientists love Python!) called “SequenceMatcher” and with basically 1 line of code I get the answer! Almost cheating, right? Well, efficiency really. Here are the lines of code you can use:
Import the module from the correct library (which is difflib): from difflib import SequenceMatcher
Then I just defined a function so I can call it from code and change up the strings at will:
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
And boom! Your result will be in float/decimal. So, a returned number (you have to format it, look up your favorite syntax for that) of “0.500” means 50% match.
And to call the function, you can write something like: result = similar(“abc”, “adc”)
Some Outputs (for fun):
How similar are the following words?
Q: “America” and “Armenia”:
A: 71%
Q: “America” and “Germany”:
A: 43%
Q: “Tony” and “Elena”:
A: 22.2222%
Q: “human” and “mandril”:
A: 50%
Q: “Human” and “Mandril”:
A: 33%
Explore and have fun with this idea! How will you expand this idea to find more similarities?
Exercise:
Why the difference in “human”-“mandril” matching percentage vs “Human”-“Mandril” ? 😉