Jun 28 2014

Comparing String Values for Similarities

Category: C# | SQL Server � Administrator @ 12:41

Many moons ago I embarked on a proof-of-concept project to see if I could use SQL Server to perform a matching process.

It was successful but I still had a lingering suspicion that the underlying algorithm (Levenshtein) used to determine the sameness between human full names was less than optimal.  So what ensued was a period where I would look around the web, once in a while, to see if I missed something. 

I can't tell you how many algorithms I researched, converted to C# and tested.  Don't ask me why, but anyone who starts with Levenshtein will most likely never find this algorithm since so many others like Levenshtein vie for attention.

I finally found what I was looking for purely by accident:

Strike-A-Match: http://www.catalysoft.com/articles/StrikeAMatch.html

and kudos to the paste bin for the C# version: http://pastebin.com/EfcmR3Xx#

So now if you ever need to compare human full names whether they be in any order like:

  • last name, first name compared to first name, last name

Strike-A-Match will do the trick. Take a look at this comparison of a human name: "Jimi Hendrix" to "Hendrix Jimi"

Using Strike-A-Match will compute that these two are exactly equivalent.

Enough said.

I embedded this into A SQLCLR function and it works like a charm.

Levenshtein and all your brethren really don't get the job done when all you really want to do is compare for similarity.

The web has the brightest ideas but try to look in all the dark corners.

Tags: , ,