By Karen Coyle
Published in the Journal of Academic Librarianship, v. 33, n. 4, July, 2007, pp. 512-514
"Who are you?" said the Caterpillar.
This was not an encouraging opening for a conversation. Alice replied, rather shyly, "I - I hardly know, sir, just at present -- at least I know who I was when I got up this morning, but I think I must have been changed several times since then."1
In this era of remote services to library users, it is essential that libraries be able to identify their users, wherever they are and however they connect to the library's services. It is also essential that users be able to carry their identity with them as they move among information resources. The need to protect user privacy in accordance with state laws and library policy is a significant factor in developing appropriate identification technologies. Various efforts in the commercial web standards environment and in the academic standards around Internet2 are working to solve this thorny problem.
The Internet was designed as a computer-to-computer protocol. While we often talk of "being on the Internet," the fact is that there are no persons on the Internet, at least as far as the Internet is concerned, there are only computers. That any of those computers can have anywhere from none to thousands of humans using it is irrelevant to the Internet itself. So what does it mean to be "on" the Internet? And who are you?<
Even if you log onto an account on your computer, that account information is not associated with any of the tasks that you perform on the Internet. When you request a Web page by typing "http://www.example.com" into your browser, the communication between the computer you are sitting at and the computer that will fulfill the request utilizes an exchange of the Internet addresses that only identifies the two computers. Even when you send email, the mail goes anonymously from one computer to another. This is one of the main Internet "flaws" that has allowed spam to happen: your email account is invisible to the Internet, and only becomes visible within the mail program of the receiving computer. The Internet itself sees email as just another packet of data to move along to its destination, and makes no connection between the origin of the packet and the email address of the sender. This allows services like web mail to work, where you can sign on to your Internet account from any computer on the Internet, and also makes it very difficult to prevent the falsification of email "from" addresses that spammers take advantage of.
This anonymity of individual users on the Internet has led to some rather awkward work-arounds. The cookie file that many Internet sites store on your computer is needed to create an identity for you with that site.2 By placing a cookie on your computer with a unique identifier in it, the Internet site can read that file each time you visit the site and know, at least, that visitor 9836509234798 is the same 9836509234798 that visited two days ago and searched for cashmere sweaters in size large. You may not recognize yourself in the identifier, but for the company at the other end of the Internet connection, the customer identity is unique and precise.
Well, almost. Because the problem with that identity is that once again it is associated with a computer, not a person. If the whole family uses a single computer and a single computer account, then your family has a kind of composite identity. Your customer profile may depict someone who wears women's sweaters size large, smokes cigars, and plays with dolls. This identity can serve some purposes, but it wouldn't get you on an airplane in this day and age.
In most of our libraries, users have been authenticated at the time that they apply for their library card, or they have an institutional identity (student, staff) that is also used by the library. Libraries sign contracts for online access to materials that limit this access to the library's immediate users. It generally isn't terribly difficult to identify users who come into the library, whether physically or virtually. But what happens when the library and the user meet in cyberspace? How can the library know to provide its services to a user who hasn't entered through the library's virtual doors?
For users in the library or with access through the institution's local network, the Internet addresses of the computers located on the network can be sufficient to identify the use as coming from the institution, and that is sufficient for some license agreements. For institutions that are not centralized, such as multi-campus universities or regional consortia, maintaining a list of the number of different ranges of Internet addresses and conveying this to the vendor in a timely manner can be quite burdensome. Some institutions use a proxy server to funnel all outgoing traffic through a single point, so the vendor only needs to know the address of one or a few servers.
A proxy server takes care of users on the local area network, but it doesn't easily provide access for remote users of the library's services. Remote users almost always have to log on with a user name and password in order to be authenticated for the library's services. This log-on usually gets the user onto a server on the local institutional network, from which he appears to the vendor system to be the same as the user who is using a computer that is physically on the local network. But this requires the library to manage user passwords, with all of the headaches that entails. It also often requires the users to have multiple sign-ons, one of each of the institution's services: one for e-mail, another for the library services, yet another for course-specific sites. Keeping track of all of these passwords, not to mention the many non-institutional ones that we all have to manage, is an added complexity for our users.
In addition, the institution's authentication of legitimate users is a complex affair. The university population is quite fluid -- there are always some students and staff changing their institutional status on any day of the year. Within a consortium users may move from one institution to another in the course of their research. Institutions also have a variety of guest users, from visiting scholars to members of the general public. Managing this within multiple departments such as the library, admissions, and the campus technology service, is very inefficient. The holy grail of user authentication is a single sign-on (SSO) solution that provides a centralized authentication of users for all services.
We tend to think of our identity as some combination of our name, address, and various numbers like library card numbers and credit card numbers. Identity is who you are as an individual. This kind of identity is necessary for some transactions like purchases or managing your bank account. But what is the identity that is needed to make use of library services? It essentially consists of two parts. The first is the authentication, that is who you are in relation to the library. Are you, the individual, someone who belongs to the library's stated community? This is your entry into the library services. It is also required for you, an individual, to borrow physical items from the library.
The second part of this, however, is your identity to the vendor with whom the library has licensed services. In a sense, the library itself acts as a kind of proxy identity for its users. When a library user logs onto a vendor database, all that vendor needs to know is that the incoming user is a legitimate, authorized member of the library with which it has a contract. It is your group identity that gets you in the door. After you are in, you may engage in personalized services, such as setting up a personal profile or using emailing results or documents to your personal email account, thus revealing your "true" identity, but you may also be able to remain anonymous if you eschew those personalizations.
So when we talk about identity in relation to library services, we are talking about a two-step process. The first step is between the individual and the library (or the larger institution, the university). The second step is between the authorized member of the group and non-local institutions. So we need an identity solution that can provide both of those steps, and, ideally, will give us the coveted single sign-on solution. The requirements for a viable identity solution are:
And in the academic world there is another requirement:
The Security Assertion Markup Language (SAML) is a standards project of the OASIS Security Services Technical Committee.3 OASIS (officially the Organization for the Advancement of Structured Information Standards) creates e-business standards that promote interoperability in the electronic marketplace. SAML is designed to be secure, to accurately identify users, and to carry the data elements necessary to provide a range of customer services. SAML is not an application; it is a standard for exchanging authentication and authorization data between security domains, that is, between an identity provider (a producer of assertions) and a service provider (a consumer of assertions).
SAML can be used to provide a variety of identity services, including single sign-on. When a user signs on to the identity provider service, that service creates an authentication assertion for that session. From that point on in the session, other services are able to query the identity provider service for the particular assertion that is needed to log the user on without having to re-authenticate him. Clearly all of this must take place with a secure environment where the communication between the identity provider and the service providers is trustworthy.
Systems based on SAML can be used to serve authentications based on attributes as well as individual identity. This means that a specific identity can be served to an e-mail service, but an attribute that translates to "authenticated member of XY University library users" can be used to provide a sign-on to a remote database licensed by the library for its community.
Although designed for e-commerce, SAML has many of the elements that libraries and universities need to fulfill their identification functions. SAML breaks the identification process down into two roles: the creation of an identity, with information about that identity (called an assertion), and the use of the identity. In the academic environment, the university would most likely be the identity provider because it is the source of information about the person's affiliation with the institution and any relevant status information (staff, student, faculty, as an example). The consumers of that identity could be many, from the university systems themselves (e-mail, courseware, class assignment systems, the library) to affiliated systems outside of the library, like the external database vendors whose services the library has licensed for institutional use. Once the user has logged on to a system that has implemented the SAML standard, each separate system that the user approaches is able to query the identity provider to obtain the information it needs to authorize that user for its service.
Shibboleth4 is open source authentication and identification software being developed as part of the Internet2 Middleware Initiative.5 It uses the SAML standard to address the needs of academic institutions to authenticate their users and to communicate an identity to a variety of organizations in a way that protects the privacy of users. In particular, Shibboleth is designed to integrate with a web browser, since that is the primary portal through which users access services on the web.
With Shibboleth, authentication takes place at the home institution, which is the place where personally identifying information is stored, but the permissions associated with the log-on can be used anywhere in the network. This means that user information is held in only one place, and is not being promulgated around the network. Services receive only the authentication, not the user's information, and in most cases the remote service knows only that the user is a member of a particular institution or user group.
Shibboleth is more than a replacement for a proxy server. It has the capability to authenticate users at a more granular level than an IP address. Thus, any user can be a member of one or more groups that define levels or types of privilege. The "professor" privilege can allow posting to the curriculum portion of a courseware systems, while the "Fall class 102B" privilege can authenticate members of the class for access rights to the course work online. Those same individuals will also be members of the "XY Institution" class for purposes of accessing online vendor databases, and when accessing those databases only their institutional affiliation will be communicated.
The services that share information through Shibboleth are members of a "federation," a group of services that agree to communicate the authentication information and trust each other's communications. This neatly matches the kinds of service and license agreements that libraries have with their data vendors, and makes Shibboleth a highly attractive solution in that environment.
Although not yet widespread, Shibboleth has been implemented on some university campuses,6 and continues to evolve as an application. If you want to try it yourself, the open source software is available for download from the Shibboleth site, http://shibboleth.internet2.edu.
1 Carroll, Lewis. Alice's Adventures in Wonderland. Sterling Publishing Company, 2004. p. 59
2 http://www.howstuffworks.com/cookie.htm
3 http:// www.oasis-open.org/committees/security/
4 http://shibboleth.internet2.edu/
5 http://middleware.internet2.edu/
6 http://shibboleth.internet2.edu/community.html
The copyright in this article is NOT held by the author. For copyright-related permissions, contact Elsevier Inc.