The Way to Linked Data
by Karen Coyle
Part II: Tools and Techniques
Talk given as ASIST webinar, March 15, 2011
In the first presentation we covered the concept of "triples". Triples are the basic building blocks of semantic web metadata. An important thing to note is that the left-hand element (the SUBJECT) and the middle or linking element (the PREDICATE) must be in the form of Uniform Resource Identifiers, or URIs. These cannot be represented as anything but identifiers. The right-hand element (the OBJECT) is different, however. It can take as values any data type.
The value of an object can be plain text, structured or typed text (such as dates or integers), a controlled list of terms from which to choose, or even a URI representing another thing or entity. (Note that the preference in SW implementations is to give URIs to each of the members of a controlled list. We'll see later why this is important.)
Any "thing" that has a URI can be used as a subject or object, and it is this characteristic that allows a web of data to form. Data that is not represented as a URI, however, forms a dead end in terms of linking. It is still valuable data and can be indexed, searched and displayed, but it has limitations in terms of linking.
Because of this you want to design your metadata to use URIs wherever possible. Some data, however, simply must be represented as text.
Semantic Web Fundamentals
The semantic web development at the World Wide Web Consortium (W3C) has already yielded a number of useful standards. I will present a few of the more common ones here.
RDF - The Resource Description Framework. This is the basic standard for the semantic web, describing the data model that all of the other semantic web standards are built upon. RDF defines the concept of the triple and basic rules that allow this data to function in web space.
RDFs - The RDF Schema. The Framework is a set of rules but does not have an actual encoding. RDFs provides the coding so that RDF can be "made real" through applications.
RDFa - The article announcing Tim Berners-Lee's concept of the Semantic Web appeared in an issue of the Scientific American in 2001. This article described the SW as being coded data in web pages. RDFa allows you to include SW data in an XHTML page, consistent with that original vision. (Much linked data today is not found in web pages but has been exported from traditional data stores like DBMS's, and lives on the web without being related to specific web documents.)
OWL - Web Ontology Language. OWL obviously should have been named "WOL" but apparently no one liked that acronym. OWL specifically allows you to define your metadata in a way compatible with the SW. An "ontology" in SW-speak is the description of the knowledge space that your metadata will address. Using OWL you define your entities and all of your elements and relationships. You can include rules governing your data, such as the ones described above on the values that can be used, and some relationships between elements that will facilitate understanding your data in a heterogeneous, mixed-data environment like the Web.
SKOS - Simple Knowledge Organization System. SKOS is a special language for encoding terms lists and thesauri. It has built-in relationships like "broader than" and "narrower than." It also provides for preferred and alternate display forms.
SPARQL. SPARQL is the equivalent of SQL designed for the semantic web. It allows the construction of queries for linked data.
Library Metadata on the Semantic Web
In order to make use of the Semantic Web, it is first necessary to define your metadata in a SW-compatible format, using RDFs and/or OWL. Some library data has been defined in this way, primarily the data elements defined in the RDA standard, as well as the IFLA standards: FRBR, FRAD, and ISBD. Other members of the "FR" family (or "framily") are in development.
These element sets have been defined using the Open Metadata Registry. There is a page listing all of the RDA elements and vocabularies. The IFLA standards can be located by clicking on "Elements" in the right-hand list. They will appear on the first page. (Look around, you might find other interesting metadata.)
Library Vocabularies in Semantic Web Format
Library metadata makes much use of controlled vocabularies, and these are an obvious and relatively simple set of data that can be added to the Semantic Web. Below are some sites with library vocabularies.
- Library of Congress Vocabularies
- RDA Vocabularies and Elements
- FRBR, FRAD, ISBD, and others
- Dewey Summaries
In particular, note how the Dewey summaries can express the different languages that Dewey has been translated to. There is also work being done to translate RDA properties and vocabulary lists. Here is a sample entry with both English and German terms:
The key thing about the RDA Carrier example is that the identifier for RDA Carrier remains the same for both language versions, but displays of terms and definitions (and any other information) can be specific to one language or the other. This brings us much closer to sharing our bibliographic data between communities because we can use exactly the same data without compromising the langauge needs of our users.
Bibliographic Metadata on the Semantic Web
There are many efforts to make use of bibliographic metadata on the semantic web. A number of these come out of academic communities that are creating repositories of academic works. Of interest is CiTO which is a list of relationships between a citing text and a cited resource. The links below lead to some bibliographic metadata schemas.
- Bibliographic Ontology (BIBO)
- Citation Typing Ontology (CiTO)
- FRBR-Aligned Bibliographic Ontology (FABiO)
There are some metadata sets that can be used widely for common data like geographic names, events, and persons. There isn't nearly enoug of this type of metadata and that means that for many common data types the same information isbeing coded in many different metadata schemas. Hopefully the range of commonly useful types will increase because it will make it easier to create metadata for specific applications or functions.
Bibliographic Linked Data - Examples
These are all examples of linked bibliographic data that is visible on the web both in a human-readable and machine-consumable format.
Open Research Online
The Metadata Registry Sandbox
You can experiment with creating vocabularies and metadata elements in the Open Metadata Registry Sandbox. You will need to set up a logon id and password. After that you will see the "(add)" link beside "Vocabularies" and "Elements" on the upper right. Feel free to look at what others have done and to create your own metadata. Once you have filled in the information for an element or term and saved it, you will then be able to see the result in RDF by clicking on the link on the bottom right.