Data, Raw and Cooked

By Karen Coyle

PREPRINT: Published in the Journal of Academic Librarianship, v. 33, n. 5, September, 2007, pp. 602-603

One advantage of computerization is that it allows us, at least potentially, to gather statistical data on a wide variety of operations that take place in or through the library. Whether we make use of them or not, our systems do keep counts of actions such as the number of searches against the library catalog or the number of automated holds requested by users. Having this data does not automatically make it useful for library management, however. Fortunately, some vendor initiatives aim to derive the meaning from the numbers.

Traditional Statistics

Libraries have a fairly long tradition of putting numbers to their collections and services. The American National Standard for Information Services and Use: Metrics & statistics for libraries and information Providers-Data Dictionary, ANSI/NISO Z39.71, was first published as a standard in 1968. This standard provides definitions for the data categories of library statistics so that reporting from different libraries will be comparable. These statistics have changed over time as libraries themselves have changed.

In the academic library world, the main statistical effort is that of the Association of Research Libraries (ARL)2 which has been gathering statistics from ARL member libraries since 1961. Some earlier statistics that date back to 1908 make this the longest-running set of statistical data for libraries. The ARL statistics emphasize collection size, library expenditures, staffing, and services. A subset of the data is used to calculate the "ARL index," that is, a numerical score that actually defines the criteria for ARL membership. When sorted in order by this score you obtain a ranking of North American research libraries. A library's ARL rank is one important measure of its stature in the research library arena.

Measuring Quality

As we move further from traditional materials and services into providing materials in digital formats and through licensed services rather than physical holdings, the traditional ARL statistics become less and less of a useful measure of a library's value to researchers. We also have an emphasis shift, away from a pure measure of the library as a set of objects toward a measure of the user base and of user activities. ARL recognizes this and has launched some additional data-gathering efforts to respond to the new environment. Called its "New Measures Initiative,"3 these efforts cannot rely entirely on automated statistics, but use surveys to gather additional data from users.

Measuring the Impact of Networked Electronic Services (MINES™)4 collects data on the use of electronic resources and the demographics of users. It gathers user status (faculty, graduate student, etc.) and the research purpose that brought the user to the library, either physically or virtually. Data gathered in MINES allows libraries to measure the impact of grant-funded research on library use, a key measure for allocating the library's portion of grant-derived budgets.

Another ARL survey, LibQual+™5 is designed to solicit the library user's view of service quality. Unlike the traditional ARL statistics, which serve mainly as a comparative ranking of libraries, the LibQual+ data provides feedback to the library as a way to understand how it is perceived by its users. Libraries can use this feedback to improve their services.

Networked Resources

The term "library holdings" originally meant those items sitting on the library's shelves. Today's library provides access to digital materials, only some of which are actually owned by the library. These digital, networked resources, however, are some of the most highly used materials in the academic library. Unlike physical materials, which the library purchases once, most digital resources are licensed for a period of time. Because of this, the library has the opportunity to make new decisions about which materials require its continuing investment and which materials are not providing the "bang for the buck."

Project COUNTER (Counting Online Usage of Networked Electronic Resources) was launched in 2002 to standardize the usage statistics of online materials much as NISO Z39.7 and ARL have standardized statistics about physical libraries. This standardization means that libraries can combine and compare statistics from their online vendors and use these to make better licensing decisions. COUNTER is an international initiative that is supported by publisher and library organizations. COUNTER-compliant statistics are being provided by online data vendors as well as intermediaries like OpenURL link resolvers.

Gathering the statistics, however, is only one step in getting the data to libraries on a regular schedule and with the greatest efficiency. This problem is addressed by a recent NISO standard named SUSHI (Standard Usage Statistics Harvesting Initiative).6 SUSHI provides a protocol that allows libraries to create an automated service to retrieve their statistics on a regular schedule and with no human intervention. SUSHI was designed expressly to deliver standard COUNTER statistics but has provisions to add other forms of statistics that conform to its defined structure. With SUSHI, libraries have the possibility of getting an ongoing view of materials use.

Analyzing Collections

The ARL statistics merely count items in a library's collection. The information value of this count is fairly low. Other vendor services provide detailed reports based on collection subject area coverage and overlap of collections between selected libraries. This data can inform both future purchases as well as de-accessioning decisions. It can be a great aid to academic libraries whose institutions are moving into new areas of research where the library may not have collected extensively in the past.

The WorldCat database at OCLC is an obvious source of data mining for collection comparison. The WorldCat Collection Analysis7 services allows the comparison between individual libraries, groups of libraries, or among members of a group. The analysis uses factors such as topic (based on classification numbers), date of publication, and language. As part of its inter-library lending services OCLC can also provide collection-based analyses of borrowing and lending patterns. The service is integrated into OCLC's member services.

The Spectra product of Library Dynamics8 can do a single-library collection analysis based on an export of the library's records, or can do comparisons between libraries or consortia that use its service or against its own databases of library holdings, including those of Library of Congress and National Library of Medicine or its North American Title Count. It, too, uses factors like topic, language and date to provide a variety of views of the data. Analysis can include circulation rates for topic areas as well as individual titles, and can highlight those books that have been reviewed in standard library resources. Library Dynamics will store data for comparison over time, giving a library not just a single snapshot but an idea of ongoing change.

The Bowker company provides analysis and reporting tools based around its book9 and serial10 products. Comparison here is against standard lists like the ISI Impact Factor for its Ulrich's Serials Analysis System™ and Resources for College Libraries (the successor to Books for College Libraries) in Bowker's Book Analysis System™.

Each of these services has unique qualities but what they all provide is an intelligent use of collection data that is inherent in the machine-readable library catalog. Collection analysis can guide future purchases and can inform detailed planning for consortial decision-making. It can lead to better decision-making for off-site storage and can identify last copies in a library group. The availability of ongoing collection analysis is an obviously desirable tool for today's library management.

Many Miles to Go

Of the areas where libraries are using system-generated data the area of collection development is one where it is clearest what the data mean and what steps a library could take in their regard. Other data provides numbers but less obvious direction. Most online catalogs will generate reports of search and display activity but it is not clear what one learns from the data. For example, what if the data indicates that the vast majority of searches take place on the general keyword index and that only a small percentage use a corporate heading browse? Unless you know which users chose those indexes and why, you can't decide that one is more important than the other. Usability testing is a more reliable way to understand whether the interface serves the users since qualitative factors can be taken into account. Catalog use data combined with usability testing may give better direction than either measure alone.

We seem to be only beginning to use the data that our systems can and do generate with each user action. Better tools to manipulate that data are needed, especially in the area of user activity. If our vendors can provide the tools we need to gather and present the data in easy-to-read formats, then it is time for us to brush up on our statistical analysis skills so that we can use the data for better decision-making in our libraries.

1 The main NISO e-metrics page is at http://www.niso.org/emetrics/index.cfm. This points to the standard as well as to the web sites of the main library statistics efforts.

2 http://www.arl.org/stats/annualsurveys/arlstats/

3 http://www.arl.org/stats/newmeas/newmeas.html

4 http://www.arl.org/stats/initiatives/mines/index.shtml

5 http://www.libqual.org/

6 http://www.niso.org/committees/SUSHI/SUSHI_comm.html

7 http://www.oclc.org/collectionanalysis/

8 http://www.librarydynamics.com/

9 http://www.bowker.com/products/analysis.htm

10 http://www.bowker.com/catalog/000025.htm


The copyright in this article is NOT held by the author. For copyright-related permissions, contact Elsevier Inc.