Stochastic (statistical) search is on the way out
Published 17 years ago by James Simmons
There's a lot of talk about new search engines and the promising technologies behind them. One technology that has more or less recently been applied to Web search is natural language processing. NLP allows search engines such as Hakia and Powerset to return results based on the query's meaning rather than relying on keyword distribution as a means of identifying relevant Web documents.
Stochastic search methods retrieve information containing one or more words that are specified by the user. Keywords are usually used from the text body of a document or from metadata such as title, author, etc. Stochastic searches frequently utilize Boolean search strategies to maximize the efficiency of the search and return the best results, or exclude results that the user knows to be unhelpful. Searches on the 3 major search engines are accomplished using some type of statistical method for calculating the relevancy of results.
How does keyword search fall short? It falls short because the relevancy of documents is calculated based in part on the occurrences and distribution of keywords. Stochastic search methods return relevant results much of the time, however there is an incredible amount of improvement to be made. Those improvements will involve using natural language processing to extract meaning from search queries.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/724535/
Spam protection by Akismet
Post a comment
Recently Commented Blog Entries