Algorithms vs. Data: The Seesaw Effect
Published 1 year ago by James Simmons
Over the years I've noticed that the importance of algorithms and data tends to shift back and forth, depending on which at the time is hardest to duplicate (often from a business perspective). This effect seems to be caused by the availability or demand of one side increasing or decreasing, shifting the balance of importance to the other. At one point the world of software was dominated by the proprietary. The organization with the best software (backend, algorithms, etc) was the dominant entity and data (from say, a Web 2.0 perspective) was generally not the focus. This may have partly been the responsibility of a mindset formed during an era with very little storage space and before mass user activity on the Web.
Things have changed and the word proprietary has become a sort-of developer faux pas. Open source has caused a paradigm shift away from the old proprietary software models and has allowed organizations to focus their attention on the other side of the equation: data. As a result of this shift we saw the start of the Web 2.0 era (perhaps with a few years of padding before the phrase started floating around). Now many organizations focus on the data they acquire and how they can leverage it to their advantage. As a result we see many walled gardens in an attempt to preserve this advantage.
However we may be seeing another shift, this time back to software once again. The Semantic Web calls for making data open and ubiquitous. This is a strong paradigm shift away from the walled garden mindset (and most people understand this, especially the business set). After writing about the cross-pollination of DBpedia and Freebase it occurred to me that the project with the most advanced proprietary information extraction algorithms would in a sense be the "dominant" project because it would be able to leverage its software in a space where data is becoming a commodity.
Freebase has a secret sauce and that is probably their biggest advantage over competing projects. In the Semantic Web/Linked Data Web/Web 3.0 (whatever we feel like calling it at the time), data may decrease in value as it spreads and becomes more commoditized; at least in the original sense of value it once had: as a tool that only the walled gardens could leverage.
We are seeing the walls come down, possibly to be replaced once again by proprietary algorithms.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/500118/
Spam protection by Akismet
Post a comment



Posted by Nick M on October 31, 2008 at 12:32pm
whats the secret sauce?
Posted by Yihong Ding on November 1, 2008 at 1:38pm
James,
It's great to watch you coming back to blogging. I agree to you that there is a switch between data and algorithm (actually I would rather say it to be the switch between the data-focused and the service-focused strategies).
In essence, data and algorithm/service are two typical aspects of human mind. Data represents the static side, and algorithm/service represents the dynamic side. To improve the utility of the Web, we need to improve both sides in contrast to only one of them. At the same time, however, the history never goes in straight line. Hence it is the seesaw effect you described in the post.
:-)
Yihong
Posted by Dominiek ter Heide on November 18, 2008 at 11:30pm
Good stuff, I just rediscovered this blog and I'm hooked.
I guess the main role of today's algorithms is to 'do things' with the data. Often people scream that Google is hoarding all the data and that the data is so valuable. However, after the commoditization of data we now have value in the 'activity of data': e.g. the mapping of human attention on the data. So in a way, we have shifted from value in static data to dynamic activity streams.
I think the same thing can be said for software. We have been buying boxes of Microsoft CD-ROMS for 10 years and that era is over (and yes Microsoft will probably miss the boat). The big thing that Open Source Software did was commoditize software. Big crowds of developers started building things that had equal quality as proprietary code and for the lowest price possible, free!
So yes, there is definitely value in the 'secret sauce' of Powerset, Freebase and the likes, but it will not last! Just like traditional media companies are feeling the huge pain of cheap information, so will 'algorithm protectionists'. How could a Powerset or a Freebase compete with a big developer community that evolves algorithms a lot quicker and open to the world? I think that will happen :]
Posted by Ric Old on August 18, 2010 at 2:48pm
I think the secret sauce keeps changing.