Thursday, September 28, 2023

How Similar are Two Words?

Ever wonder what similarity a pair of words have with each other? Quantifiably? As in percentage? As in 50% match or 100% match, etc.? (Whoa, wouldn’t that be great if that worked out in dating world, eh?). Anyway, it’s okay if you haven’t wondered about it, because I did and here I share a few results using Python.

So, I wrote this small script in Python that uses an amazing module (that’s why Data Scientists love Python!) called “SequenceMatcher” and with basically 1 line of code I get the answer! Almost cheating, right? Well, efficiency really. Here are the lines of code you can use:

Import the module from the correct library (which is difflib): from difflib import SequenceMatcher

Then I just defined a function so I can call it from code and change up the strings at will:

def similar(a, b):
return SequenceMatcher(None, a, b).ratio()

And boom! Your result will be in float/decimal. So, a returned number (you have to format it, look up your favorite syntax for that) of “0.500” means 50% match.

And to call the function, you can write something like: result = similar(“abc”, “adc”)

Some Outputs (for fun):

How similar are the following words?

Q: “America” and “Armenia”:

A: 71%

Q: “America” and “Germany”:

A: 43%

Q: “Tony” and “Elena”:

A: 22.2222%

Q: “human” and “mandril”:

A: 50%

Q: “Human” and “Mandril”:

A: 33%

Explore and have fun with this idea! How will you expand this idea to find more similarities?

Exercise:

Why the difference in “human”-“mandril” matching percentage vs “Human”-“Mandril” ? 😉