Introduction to the Semantic Web Vision and Technologies - Part 2 - Foundations
Published 15 years ago by Cody Burleson
In Part 1 of this series, we introduced the Semantic Web vision set forth by Tim Berners-Lee. We also took a look at the famous layer cake diagram illustrating key technologies that make it possible. This week, we'll be munching around the bottom of the layer cake with a few important points about Unicode, URI, and XML. Below, you may notice that we are presenting a slightly different illustration of the layer cake than last week. The intent is not to confuse you, but rather to point out that there are a variety of interpretations floating around the Web.
Some versions of the cake combine layers into broad concepts for simplicity while others present more detail or include terminology that is less known to the average Joe (no offense, Joe - we know, you're fantastic). They all express the same general model of the core standards and technologies that can be used to develop Semantic Web systems.

If you think the ontology layer looks especially delicious, you are not alone. In my opinion, that is where things get really interesting, but the items below it are essential and no comprehensive course would be complete without at least touching upon them. Though they are somewhat ubiquitous in the existing Web, their relevance and significance is especially amplified in the context of the Semantic Web; so let us review.
Semantic Web Foundations
- URI/IRI
- URI is an acronym for Uniform Resource Identifier; a compact string of characters used to identify or name a resource. The URL to a web site (e.g. http://www.semanticfocus.com) is a popular example of a URI. IRI is an acronym for Internationalized Resource Identifier which is a form of URI that uses characters beyond ASCII, thus becoming more useful in an international context.
- Unicode
- Unicode is the universal standard encoding system and provides a unified system for representing textual data. 1 million characters can be encoded to specify any character in any language without a single escape sequence or control code. Before Unicode, there were several different encoding systems which made communication and integration across borders a big pain. Now it's so much easier. Shout out to my peeps in Bangalore, 'haaaay' (अरे, दोस्त)!
- XML
- XML is an acronym for Extensible Markup Language. With XML, we have a standard way to compose information so that it can be more easily shared. At the same time, it still affords the freedom to structure that information however the heck we want. It's kind of like HTML - only, you get to make up your own tags and attributes. How cool is that?
- Namespaces
- Namespaces (aka XML Namespaces) are integral to XML. Namespaces provide a means to qualify the tags and attributes in an XML document with URIs which then makes them truly unique on the Web and thus, universal (among other things).
- XML Schema
- XML Schema describes the structure of XML documents just like DTDs, only better. An XML Schema is known as an XML Schema Definition (XSD). Basically, if you're going to use XML to invent your own document structures, XSD provides the way to define your rules (like guidelines) so that people and machines can understand them, adhere to them, and integrate with them.
- XML Query
- XML Query (aka XQuery) is a standardized language for combining documents, databases, Web pages and almost anything else. It is very widely implemented, powerful, and easy to learn. XQuery is replacing proprietary middleware languages and Web Application development languages. XQuery is replacing complex Java or C++ programs with a few lines of code.
Personally, I think it is sufficient to refer to these foundational items with just a few broad concepts: Unicode, URI, and XML. Unicode gives us a universal system for encoding information in all of the world's writing systems. URI gives us a standard way to identify and locate resources. XML gives us a way to model information uniquely, yet still share it and integrate it in consistent ways. All together, they help us integrate content and services throughout the Web.
Much like AJAX was not itself a new technology, but rather just a new paradigm, these foundational technologies are not new to the Web or particularly Semantic. When you get to the higher levels of the layer cake, however, you begin to see just how underutilized they are in the context of today's Web of documents. The way they are interrelated, their elegance, and their overall integration power then assure us that a Web of data was always meant to be.
In the next post in this series, we will start to see this in practice as we use Unicode, URI, and XML together in the Resource Description Framework (RDF). RDF provides a standard for describing resources on the Web which gets us into metadata (data about data) and that is where things start getting particularly Semantic and exponentially more exciting. Until then, enjoy your work and the Web.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/321342/
Spam protection by Akismet
Post a comment
Recently Commented Blog Entries