JUN 16th 2007

There's a lot of talk about new search engines and the promising technologies behind them. One technology that has more or less recently been applied to Web search is natural language processing. NLP allows search engines such as Hakia and Powerset to return results based on the query's meaning rather than relying on keyword distribution as a means of identifying relevant Web documents.

Stochastic search methods retrieve information containing one or more words that are specified by the user. Keywords are usually used from the text body of a document or from metadata such as title, author, etc. Stochastic searches frequently utilize Boolean search strategies to maximize the efficiency of the search and return the best results, or exclude results that the user knows to be unhelpful. Searches on the 3 major search engines are accomplished using some type of statistical method for calculating the relevancy of results.

How does keyword search fall short? It falls short because the relevancy of documents is calculated based in part on the occurrences and distribution of keywords. Stochastic search methods return relevant results much of the time, however there is an incredible amount of improvement to be made. Those improvements will involve using natural language processing to extract meaning from search queries.

About the author

James Simmons

It's my goal to help bring about the Semantic Web. I also like to explore related topics like natural language processing, information retrieval, and web evolution. I'm the primary author of Semantic Focus and I'm currently working on several Semantic Web projects.

Trackback URL for this entry:


Spam protection by Akismet

Comments for this entry:

No one has left a comment for this entry. Be the first!

Post a comment

  1. Spam protection by Akismet