Uniform Resource Identifiers

We have seen in previous lectures and exercises that identifiers are very important on the Semantic Web. Identifiers give the things we are describing an unambigous "name" that is not hindered by the limitations of natural language; they are global and international, and make sharing easier.

Even before the idea of the Semantic Web there has been an Internet standard for a "Uniform Resource Identifier" or "URI." (Usually pronounced "U-R-I" but sometimes treated as a word pronounced "uri.") The rules for the structure of a valid URI are:

the first segment consists of a pre-defined URI schema name. Some examples of these are: "http" "mailto" "ftp"
that segment is followed by a ":" character
the remainder is specific to the actual URI, and can be very simple or relatively complex.

Some examples of URIs are:

       ldap://[2001:db8::7]/c=GB?objectClass?one
       mailto:John.Doe@example.com 
       news:comp.infosystems.www.servers.unix 
       tel:+1-816-555-1212
       urn:oasis:names:specification:docbook:dtd:xml:4.1.2

Fortunately, the Semantic Web has decided on one particular URI to use as its standard, and it is one we know quite well: the "http" URI

       http://www.ietf.org/rfc/rfc2396.txt

You may be wondering why this is a URI and not a URL. In fact it may be both. Every URL is also a URI. Not all URIs point to a location (the "L" in URL) on the web, so not every http URI is a pointer to a location. But the reason that the http URI was chosen for linked data is precisely because it can also be a web location. Here is how Tim Berners-Lee stated this succinctly in his "Four rules of linked data"

Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information using the standards (RDF, SPARQL)
Include links to other URIs so that they can discovery more things

One of the great advantages of the http URI is that the software of the web already knows what to do with it. When you click on an http URI (or query it with a program, such as when using an API) the Web software attempts to resolve the URI to a location and then performs the expected activity, whether that is displaying data in a browser or responding to a program with requested data. This means that there is a build-in mechanism for you to both name your "thing" and provide information about it with one identifier.

The information that you provide can be for machines (information that helps programs that are using your data) or for humans (explanations of what your terms mean) or, ideally, for both.

For example, if you (as a human) look up

   http://id.loc.gov/authorities/sh85007557

you see a display of the authoritative term, and an icon letting you know that this term is in English, along with a number of other details.

Art, Aborginal Australian

When a program queries the site with that identifier, it can request particular types of information and the specific coding that the program needs. This is just a small example of the full amount of data, but is the particular code associated with the human-friendly display above.

RDF code

This lecture continues...

Linked Data and the Semantic Web

Uniform Resource Identifiers