NOV 1st 2007

Image credit: Node GardensThe more we use the Internet, the more we realize the necessity of finding new solutions to better organize the growing mass of information. Today we actually have a certain number of tools to add meaning to the information that we drop all over the Web. Adding a comprehensible meaning to computers, allowing them to help us better organize things. That's the big idea behind the Semantic Web, an idea which appears more and more obvious to us everyday. In this field, we already have many advanced technologies, starting with those offered by the W3C itself: XML, RDF, OWL, etc.

But I have the feeling that something important is still missing to allow the Semantic Web to really take off. To understand what it is we have to take into account the fact that the Web is in permanent change and data is continuously added, modified or deleted. The fact of having semantic information doesn't change anything, it just challenges the search engines more everyday to crawl the whole Web constantly in order to detect the slightest change.

It would be so much more efficient (and simple) if Web sites could alert the search engines themselves: "Hi Google, I've changed, please visit me!". Furthermore, it would strongly reduce the indexing time, until we come up to "real time search engines." You might say that Google is in charge, that they earn enough to endlessly increase their infrastructure. You'd be wrong to think that because it is not only Google's business. From the expansion of the Semantic Web should emerge various specialized Web sites in the need of aggregating an important mass of information focused on one field but able to appear anywhere in the Web. So what, everybody's going to crawl separately?

But on top of the issue regarding the search engines, there is so much more. Let's imagine that I want to make an online address book. The easiest way consists of storing the data myself; I will save my addresses in any database and that's it. There is another approach, a little bit more complicated but so much more interesting: you can store some kind of links making it possible to get the informations back at the source at any moment. In my address book, for instance, I will have a record directly connected to my favorite restaurant. This restaurant has a Web site which includes exactly what I'm looking for: the restaurant exposes their details in a way that my address book recognizes, as the hCard format for example. Therefore I will subscribe my address book to this Web site.

Then, thanks to the subscription, the restaurant's details would be available "on demand" and show in my address book. Of course, we can optimize the process by keeping a copy of the information in a local database. But let's make it clear, it is only a temporary copy, a kind of "cache" if you want. The real data stays at the source, on the restaurant's Web site. If a change occurs (e.g. the restaurant moves) changes will be automatically reported in my address book.

This "data subscription method" seems to be an interesting way to reach a kind of decentralized database able to work on a worldwide scale. But there is a much more essential aspect: the idea of "backlinks." Actually, a subscription comes down to weaving a bidirectional link from one piece of information to another. This very small concept actually has enormous consequences. The computer would now be able to determine how data is connected to each other and suddenly become a lot more intelligent.

Let's take a look at another example of understand exactly what's going on. What if our restaurant is willing to collect comments from the customers, to show them in its Web site in a sort of visitor's book for instance. The restaurant would simply add a form on its Web site so the customers could save their comments.

But the restaurant's owner is a lot more ambitious, he also wishes to show the comments that have been stored somewhere else, on other Web sites. Either gastronomic critics or miscellaneous opinions given on the Web, our restaurant's owner would like to display all of them on his own Web site. Unfortunately, there is no existing easy solution to accomplish this according to the current Web. There is no easy way to search all the information related to our restaurant.

This is when the concept of "backlinks" could be very useful. Actually, chances are that the miscellaneous comments spread on the Web already include a link to the restaurant's Web site. But unfortunately in our old Web, those links are one-way. The restaurant's Web site doesn't even know about them, unless it makes a "link:" request with a search engine or if it considers the "HTTP Referers," but it stays unsatisfactory (in the world of blogs, there is something called the Pingback protocol). No matter what, let's say that the links are bidirectional: when a comment would be posted somewhere, the restaurant's Web site would be alerted in order to store the corresponding "backlink." Finally you would only have to go up to the different links and find all the comments linked to our famous restaurant.

The relational databases don't operate differently, the "backlink" concept is prevailing and we couldn't imagine it in a different way. But this isn't how the Web works, is it good or bad? I cannot say... However, if we want to someday achieve the World Wide Database dream, I think we should seriously consider the use of mechanisms for bringing subscriptions and "backlinks," therefore allowing the semantic information to really "exist."

Trackback URL for this entry:

Spam protection by Akismet

Comments for this entry:

No one has left a comment for this entry. Be the first!

Post a comment

  1. Spam protection by Akismet