Friday, August 28, 2009

Stemming in Search

Stemming is a process of reducing a word by removing some pattern. For example 'Search', 'Searches' and 'Searching' has the same origin and if user search with any of the words then content with 'search' keyword should be included in the result. So when user search with keyword 'Searching' how will you ensure that any content with keyword 'search' is included in the result? You can do so by applying stemming to the user search text. So if user searches with 'Searching' then the stemming process will remove the 'ing' from 'searching' and you will get the 'search'. Then you can use this keyword 'search' to use for searching content in your system.

The process of applying Stemming is shown in the following diagram:

 

image

So we can use a stemming library to parse the user search text and then we can get the root of each word user provided. Then we can search with stemming keywords. This will increase the chance of getting result from user perspective. You can get the open source stemming library from the following link:

http://tartarus.org/~martin/PorterStemmer/

No comments:

Post a Comment