Webscaled: Data marketplace - Buy and sell data

Tag: Open Calais

Open CalaisThe Calais Initiative is almost one month old, and they've already received a large and welcoming response from the development community (1,113 early adopters)! When they weren't busy doing interviews or answering hundreds of emails and forum posts, they were coming up with ways to help spread the technology. They will soon be releasing a Wordpress plugin, followed by plugins for Drupal, Plone and other content management systems. They also express that Calais is not only good for named entity extraction, but can extract other facts from documents. An example they give is "what technologies are associated with what company in a document?" Good luck, Calais team!

11 months ago I posted a short entry that posed the question of whether the world needed a metadata extraction service. I stated that the service could quickly become the largest repository of metadata (in the form of named entites and facts) on the Web if it stored the resulting metadata from each request. Open Calais seems to me to be the "metadata extraction service" I had in mind; it's is a Web service that allows you to automatically annotate content and extract information like facts and named entities (people, places, and organizations, and much more) from unstructured text. If that weren't enough of a good thing, Open Calais returns the metadata in RDF.

Although the question of whether we need it still hasn't been answered, I believe this service could be a catalyst for change towards Semantic Web standards if it is integrated into (or used to create plugins for) the multitudes of open source blogs and other CMS software. Open Calais opens the door to the possibility of lowering the barrier enough for everyday users to publish semantic content.

FEB 14th 2008

Open CalaisOpen Calais - a new and smart API from Reuters - finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the end-user by providing an automatic meta-tagging tool. The principle behind Open Calais is easy: Put in some unstructured text and get in return nicely structured RDF-data. Backed by powerful Text Mining and machine learning techniques the API automatically detects entities like persons, events, countries and other facts.

Open Calais takes account of the fact that the added value of content is hidden in its structure. Uncovering that structure and representing it in a interoperable format makes existing resources more programmable and reusable.

But what is in for Reuters? Nothing less than the biggest structured content repository on the web. Should not we talk about this little fact as well?

Page 1 of 1

  • Previous
  • 1
  • Next