Sql Server Full Text: Human names which sound alike
I have a database with lots of customers in it. A user of the system wants to be able to look up a customer’s account by name, amongst other things.
What I have done is create a new table called CustomerFullText, which just has a CustomerId and an nvarchar(max) field “CustomerFullText”. In “CustomerFullText” I keep concatenated together all the text I have for the customer, e.g. First Name, Last Name, Address, etc, and I have a full-text index on that field, so that the user can just type into a single search box and gets matching results.
I found this gave better results that trying to search data stored in lots of different columns, although I suppose I’d be interested in hearing if this in itself is a terrible idea.
Many people have names which sound the same but which have different spellings: Katherine and Catherine and Catharine and perhaps someone who’s record in the database is Katherine but who introduces themselves as Kate. Also, McDonald vs MacDonald, Liz vs Elisabeth, and so on.
Therefore, what I’m doing is, whilst storing the original name correctly, making a series of replacements before I build the full text. So ALL of Katherine and Catheine and so on are replaced with “KATE” in the full text field. I do the same transform on my search parameter before I query the database, so someone who types “Catherine” into the search box will actually run a query for “KATE” against the full text index in the database, which will match Catherine AND Katherine and so on.
My question is: does this duplicate any part of existing SQL Server Full Text functionality? I’ve had a look, but I don’t think that this is the same as a custom stemmer or word breaker or similar.
One Solution collect form web for “Sql Server Full Text: Human names which sound alike”
Rather than trying to phonetically normalize your data yourself, I would use the Double Metaphone algorithm, essentially a much better implementation of the basic SOUNDEX idea.
You can find an example implementation here: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=13574, and more are listed in the Wikipedia link above.
It will generate two normalized code versions of your word. You can then persist those in two additional columns and compare them against your search text, which you would convert to Double Metaphone on the fly.