FRBR Group 1 in practice

FRBR divides the bibliographic description into four entities: Work, Expression, Manifestation and Item (WEMI). Item is the most concrete of these entities, Work the least.

There are four main questions to ask about FRBR Group 1:

Does it serve user needs?
Is it efficient for catalogers?
Is it efficient for systems?
How will it work in an OPAC?

Although FRBR stands for "Functional Requirements for Bibliographic Records" the document spends little time on functional requirements (the four user tasks) and primarily defines an overall model for bibliographic data. However, it is not intended to represent bibliographic data in general, but bibliographic data in library catalogs. This is a particular use case, considerably different from other bibliographic use cases such as the citation of documents from within other documents. The questions here relate directly to the library catalog use case, and do not consider FRBR for other implementations (whether one should or not is another question).

The Bibliographic View Before and After FRBR

The library view of bibliographic data has always had the problem of fitting the individual library's holdings into the larger bibliographic universe. That universe is assumed to contain all items ever available for bibliographic description, of which the library's holdings are a subset. Even more problematic regarding this view is that the librarian working within an individual library does not have complete knowledge of that larger bibliographic universe. The approach to library cataloging has been to create rules that would allow items in individual libraries to be cataloged in such a way that if there are other instances of this item in other libraries, the two would be described in the same way. It also presumes to place the library item in relation to other versions of the same work even when those versions are not available or known.

Work/Expression

FRBR defines Work as: "a distinct intellectual or artistic creation." Barbara Tillett defines it as "the conceptual content that underlies all of the linguistic versions, the story being told in the book, the ideas in a person's head for the book." [btil]

This is still rather vague, and I generally use the following to illustrate what is meant by Work in FRBR:

A literary critic writes an essay on the book "War and Peace." This essay is about the work created by Tolstoy, which the critic may have read in the original or in translation, and in one or more particular printings.
You and I have a chat over dinner about our appreciation for the works of Thomas Mann, in particular his book "Magic Mountain." I read the book in English, and you read it in the original German. We can still discuss the work, even though the words we read were in different languages. Your appreciation of the subtleties of the author's use of language may be greater than mine, but we read the same story and presumably had access to the same subtext of meaning.
A professor re-orders a book for the students in his Chemistry 101 class. The bookstore places the order and receives this year's updated edition, which is what is available from the publisher. This is fine with the professor.

This is, therefore, the level of FRBR's concept of Work, although the FRBR definition focuses on the creation, and my examples focus on the experience of the audience. The Work in FRBR has a creator, a genre, and topics. It also has a Work title, although this appears to be a hold-over from current cataloging practice. Since the Work is essentially "un-expressed" giving it a title is a convention designed to allow us to give the Work some handle that allows us to communicate about it in normal circumstances. The Work does not exist until it is Expressed and Manifested. Presumably there is always a first Expression and first Manifestation that brings the Work to light in the real world.

In the library cataloging rules that govern the catalogs in existence in 2012, there is no separation between WEMI, and information about the FRBR Group 1 entities is mixed together in a single catalog record. However, the work, as defined by FRBR, is specifically mentioned only in certain cases, even though creator and subject terms implicitly refer to the Work using the FRBR terminology. To explain those cases one must first talk about the general concept of "collocation" that has governed library catalogs for nearly 150 years.

Collocation means literally "co-location," locating things together. In the library case, the location is the position of the items in the alphabetically ordered list of the catalog. Collocation is accomplished using "headings" which are controlled text strings for the parts of the bibliographic data that will be represented in the catalog, such as the names of authors, titles, and subjects. Collocation of authors is on the author's standardized name ("Tolkien, J. R. R. (John Ronald Reuel)"). Collocation of subjects generally uses LCSH.

Collocation for works would fail in some cases in spite of the normalization of author names because titles of manifestations for the same work can vary. In modern works this is most often true for translations:

The magic mountain
La nontagne magique
Der Zauberberg

Older and ancient works, such as the works of Shakespeare or early sagas that were written before the language or dialect was normalized, may also have titles that have varied over time, like:

Hamlet
Hamlet, Prince of Denmark
The tragedy of Hamlet, Prince of Denmark

To bring these together in the catalog, an additional title is added between the author and the title of the printed book. This is called a "uniform title" and it serves as a single title for the work.Where known, the uniform title represents the title of the original publication of the work. In other cases, the title is a selected title, such as "Hamlet," that contains the commonly known name of a work that was published under many different names, especially in its early period. The uniform title can also contain the language of the translation and/or the date of publication, to distinguish between different versions.

Mann, Thomas
  [Zauberberg. English]
  The magic mountain
  
Mann, Thomas
  [Zauberberg. French]
  La montagne magique
  
Mann, Thomas
  Der Zauberberg


Shakespeare, William
  Hamlet

Shakespeare, William
  [Hamlet]
  Hamlet, Prince of Denmark
  
Shakespeare, William
  [Hamlet]
  The tragedy of Hamlet, Prince of Denmark

Shakespeare, William
  [Hamlet. Italian]
  Amleto

There are a number of things to note about this practice. One is that the practice sometimes collocates what FRBR would define as Expressions of a Work (those in a different language) and sometimes the Work (Shakespeare example). The uniform title represents the Work with a "Work title" combined with something that distinguishes between Expressions. In the above case that distinction is made with the languaage, but for some older works that appear in different versions in the same language, such as Shakespeare, the expression may be represented by either a date or both a language and a date.

Another interesting note is that any Manifestations whose title is the same as the uniform title are not given a uniform title. So the Work title is only applied to some Manifestations and represents some Expressions but not ALL Expressions of the Work. (This complicates the rules for sorting in catalogs since it requires a cascading sort of uneven membership, and therefore brings up with question of the order of sorting "Hamlet" vs. "[Hamlet]". The ALA Filing Rules, which were in force in the card catalog, is a 109 page book, and would answer this question, but not necessarily in a way that could be rendered as an algorithm.)

(side note: in 1949, when the filing rules were developed, numbers in titles were filed thus: "Arrange numerals in the titles of book as if spelled out in the language of the rest of the title." Thus:

1812; ein historischer roman ... files as "achtzenhundert zwolf...
1812 ouverture ... files as "dix huit cent douze..."

/end side note)

While the practice of creating uniform titles for works is presented in the cataloging rules, in fact it is not always used in libraries that are unlikely to have the same work in multiple languages or versions, such as many public libraries. The practice also fell out of use as libraries began using online catalogs since those catalogs make use of direct keyword searching rather than alphabetic collocation as their main finding tool. An even more dysfunctional aspect of the online catalog, in this regard, is that in general each retrieved unit record is displayed only once in the display resulting from a search. This means that the library must decide whether record display will use the uniform title (thus collocating all versions of the above book under "Z," which will be a disservice to many users who are looking for the book under "M") or will not use the uniform title, in which case no collocation of the work takes place.

In some cases of differing expressions the titles of the expressions do not interfere with collocation, as in reprintings or updated editions:

Eysenck, Michael W, and Mark T. Keane. Cognitive Psychology: A Student's Handbook. Hove [u.a.: Psychology Press, 2011. 6th edition
Eysenck, Michael W, and Mark T. Keane. Cognitive Psychology: A Student's Handbook. Hove [u.a.: Psychology Press, 2007. 5th edition
Eysenck, Michael, and Mark T. Keane. Cognitive Psychology: A Student's Handbook. Hove: Psychology Press, 2003. 4th edition

These works are not given a uniform title, even though by FRBR definitions they would be considered expressions of the same work.

The majority of works, however, do not appear in multiple versions. Most works are published in only one edition or version, and thus have only one FRBR expression. There is therefore no need to collocate these expressions of the work and no uniform title is created. For these items, the record for the manifestation is sufficient, and no separate indication is made of the Work or Expression.

Through its automated clustering of current bibliographic records into Works, OCLC shows:

WorldCat Statistics
As of June 30, 2011

Languages	485
E-books	14,089,964
Works	167,711,500
Manifestations (records)	235,822,950
Total holdings	1,735,365,613

from: OCLC Annual Report, 2010/2011, p. 14

This reduces the file by 17%.

This differs from the initial analysis of Manifestations and Works that was published in 2001 [lavoie], which at that time gave these numbers:

Manifestations (records)	46,767,913
Works	32,000,000
Average number of Works per Manifestation	1.5
Single manifestation	78%
Number of works with 7 manifestations or less	99%
Number of works with more than 20 manifestations	1%

At that time the file was reduced 32%. The enormous growth in OCLC during the period from 2001 to 2011 was due primarily to the batch loading of files from non-member libraries, in particular those of large non-US national and higher education libraries. This new figure appears to reflect the addition of many new Works resulting in a less uniform (and less US-centric) file. This gives some evidence that the statistics in OCLC may not provide a measure of "workness" for the "average" US library.

This does confirm, however, that a large number of published items appear in only one Expression and therefore the Manifestation-Expression-Work ratio is 1:1:1. Using the 2001 OCLC figures, 53% of the manifestations are "singlets" of this type. The figure for 2011 should be higher.

Expression/Manifestation/Item

Expression is defined in FRBR as "the intellectual or artistic realization of a work." It is at this point that we can see that Work is a very abstract concept, because it is not itself a realized thing. Expression takes a form such as text, sound, or a visualization (photograph, moving picture). The Expression is not, however, a physical realization -- it has no size, for example. It takes on some physical reality only when it is manifested, that is when it is published or produced. At that point you can describe it as a physical item with numbers of pages, a date of publication or release, and a publisher's identifier. The Manifestation, however, describes something that might be considered a product or a print run, and it is only when you arrive at the Item that you have actual physicality. The difference between Manifestation and Item is, however, somewhat academic because it is rare that the library user is interested in the item itself; instead, as with all mass-produced items the item in hand is no different to the user than any other exemplar of the product. (n.b. This is not true for unique items like original works of art, manuscripts, or certain rare books, which are all of interest as Items rather than Manifestations. This paper will not address those unique works.)

Bibliographic records in library catalogs today focus on the description of the Manifestation, although bibliographic records contain elements that are now in FRBR Expression and Work. The ISBD standard essentially covers the description of a Manifestation, and does not include headings, such as those for author or subject. In this sense there is nothing in ISBD that is aimed at facilitating discovery of content.

Under FRBR, all Manifestations will be required to have at least one Expression and all Expressions will have one (and only one) Work. Here the goals of FRBR are somewhat contradictory. The purpose of the Work-Expression-Manifestation is to aid the user in finding and identifying the item sought. Yet in the majority of cases, there will be no new information provided by the WEM triumvirate as compared to the bibliographic record that is created today. The argument for FRBR asserts that the publications with numerous Expressions and Manifestations are also those often sought by users -- popular items that have achieved a kind of "classic" status. There is also the argument that an increase in physical formats, in particular with electronic materials, will result in an increased number of manifestations for the same expression. I haven't found figures to support these arguments.

1. Does FRBR Group 1 Serve Users?

Little study has been done of the user's view of the bibliographic universe. The studies by Pisanski and Zumer [pisanskizumer] were specifically directed at the question of whether users share the FRBR view. They made some key discoveries:

That users have many different views of the bibliographic universe, but where users had a common view it generally was a FRBR-like view of the "progression" from a general concept (Work) to individual publications (Manifestation) and lastly to specific items (such as signed copies).
That what mattered most to users was the language of the text, the form (book vs. DVD) and the contents (illustrations vs. no illustrations).
That users have a strong sense of "original work" which was the item that they placed at the "top" of the hierarchy, and for them took the place of the FRBR abstraction, Work. [1]
That users seek items at the FRBR level of Expression for the most part.

The conclusion that users seek at the Expression level is quite logical when you think about it. The Work is an abstraction without expression, and in practice in libraries it would represent, for example, all language versions of an oft-translated Work. Most users have a preference in reading language and therefore the set including all languages would not be useful. In the case of different revised editions, such as with textbooks or reference books, (which was not tested for in the Pisanski/Zumer study) it would be unusual for a user to be seeking all editions, and even more unusual for a user to be seeking something other than the most recent edition available.

I therefore conclude that FRBR Work, while it may be helpful in organizing displays or in creating a set from which users will select, is not in itself sought by users, although in the case of a work in the user's language the difference between work and expression is not significant. In the study, users generally regarded any manifestation of an expression to be acceptable. Note, however, that the study was done by giving users cards containing information about manifestations, not with a "FRBR-ized" display. Both WorldCat and Open Library have a somewhat FRBR-ized display, but they do it differently. See section 4, below.

2. Is FRBR Efficient for Catalogers?

Using the statistics that we have from OCLC, we know that less than half of the manifestations that catalogers have encountered appear in multiple expressions, and there are a small number of Works (1% of the total) that appear in a large number of Manifestations/Expressions. We also know that over half of the manifestations represent a single Expression and therefore a single Work. Regardless of the number of Expressions, every new Manifestation must be described. The model for this description is ISBD. Therefore there is no change in terms of cataloging efficiency in the description of Manifestations; the only new efficiencies would be found in the aspects of the bibliographic record that describe the Expression and the Work. There are three possible situations that a cataloger doing original cataloging may encounter:

A manifestation that represents a new Expression and a new Work. This would presumably be the case for newly published first editions. The cataloger creates the necessary data elements for theManifestation, the Expression and the Work.
A manifestation that represents a new Expression of an existing Work.The cataloger creates the necessary data elements for the Manifestation, the new Expression, and links to the existing Work.
A manifestation that represents an existing Expression (and thus its related Work).The cataloger creates the Manifestation, and links the manifestation to the existing Expression.

All of these represent the activities of original cataloging. It does not appear that a copy cataloger's activity is greatly changed from what it is today, which is to locate an existing bibliographic record for the Manifestation in hand and to utilize that data for the local catalog. [Note on "copy cataloging": once an item has been cataloged in a bibliographic database, like OCLC or LC's catalog, other libraries that purchase the same item re-use the bibliographic data from that database. This is called "copy cataloging," because the cataloger copies the bibliographic record rather than creating it. Creating a new bibliographic record is called "doing original cataloging."]

I have ignored here, quite consciously, the issue of manifestations containing more than one expression. This is a highly complex situation and would result in a long and non-conclusive side-bar to this current discussion.

Whether or not FRBR is efficient for catalogers, and whether the existence of FRBR Work data saves time depends entirely on how often catalogers encounter situations #2 and #3 above. We only have statistics for Manifestations and Works, not Expressions, from the OCLC study, and that study presents a snapshot of the WorldCat database, which may not represent the bibliographic situations encountered on a daily basis by today's cataloger. Obviously, most older published Works have already been described, so what interests us is how often those are re-issued as new manifestations, and whether today's publishing patterns are producing more or fewer situations of re-use of Expressions and Works. In addition, catalogers in different types of libraries (for example, medium-sized public libraries vs. large research libraries) will encounter a different pattern of publication types. None of the above can be applied to catalogers of unique archival materials. Note also that there are some studies on using FRBR for cataloging music materials, which have a high degree of repetition of Works.

Therefore, in relation to catalogers, we need much more data before we can answer the question of whether FRBR Work (and Expression) save cataloger time.

3. Is Work/Expression Efficient for Systems?

The treatment of the FRBR entity/relation model as a database design by the JSC for RDA appears to have some mild assumptions about systems efficiency, as well as improvement of services to users:

"The data structures used to store the data and to reflect relationships, however, will have a bearing both on the efficiency of data creation and maintenance, and on the ease and effectiveness with which users are able to access the data and navigate the database." [jsc5]

There are two ways in which efficiency of a database is commonly measured, and these ways may be partly in conflict. First, the storage requirements for Work data (and to a lesser extent Expression, because it contains few fields) would be reduced because there would be less duplication of these elements in the database. None of the statistics that we have on hand, however, measure data storage, so it remains an open question whether the savings is significant in some way.

Second, effieiency of a database often depends on how many joins and reads are required Much depends on underlying design and the particular capabilities of the database management system, so there is little that one can say about efficiency in general, although one could test different solutions. The key question here is whether the majority situation of 1:1:1 entities creates a disadvantageous number of operations that is not offset by the gains of the many:many:1 of the minority.

Then there is the question of maintenance, and this can vary greatly depending on the overall model for systems. Clearly it should be better to make changes to a single Work entity rather than proliferating that change to many copies of that information in multiple bibliographic records. This is true not only within a single system but also across the bibliographic universe. In today's model, changes made at the level of the shared bibliographic database (generally WorldCat, but there are others), must proliferate out to the hundreds or thousands of local databases that contain records for the same item. As local systems are often under-powered for massive data update, this method is costly, and some local systems are not able to accommodate such changes.While updating a single Work or Expression record is more efficient than updating a number of "unit records," the extent of gain depends greatly on the particular holdings of the library affected by the change. The alternative is a cloud-computing model where local systems hold only local information and thus bibliographic updates are "inherited" simply by use of the central file for access. This is the model used by WorldCat Local libraries, who theoretically no longer need to maintain a local copy that is up-to-date with bibliographic changes.

Thus, how efficient the Work/Expression is for maintenance depends on:

the extent of many:many:1 in the local catalog (which is often a function of the library size)
the extent to which bibliographic data is shared rather than copied to local systems

4. How will it work in an OPAC?

Today's OPAC has manifestations and their related item information (the latter generally serving the circulation function). The results of any search is a set of manifestations, presented in some order (ranked, by date, or alphabetical). All searchable fields are in the manifestation record. With FRBR Group 1 there are options relating both to retrieval and to display.

First, we must assume that users will not be aware of the Group 1 structure, but will search as they do today, which means:

by keyword, which pools words from all of the searchable fields in the record
by author, title, or subject

In most OPACs, the title index includes all titles included, therefore the user searches the work title and the manifestation title with one search. The same is true of the various creators, who are searched together in the same index, yet in FRBR primary creators are linked to the FRBR work and some secondary creators (translators, illustrators) are linked to the Expression. I am assuming that these searches will contnue to work in this way, with the user not being required to know what Group 1 entity the search should go against.

Where changes are anticipated, however, are in the user displays. There is a general assumption that users will not be given a single, manifestation-level display as they are today, but will be given a view that takes advantage of the Work/Expression as a way to gather all versions of the Work together in a new kind of collocation using the primary author and the Work title. However, there may need to be decisions based on the language of the catalog or of the user. For example, the Work title "Война и миръ"(War and Peace) may not be useful in a catalog aimed at English-language speakers, yet that is the correct Work title. It is also unclear how Expressions should be used in display; both WorldCat and OpenLibrary ignore the Expression level and display Works and Manifestations.

The "Scherzo" project at Indiana University developed a FRBR-ized catalog of music materials did comparative user testing between the FRBR-ized and the traditional catalog. Although some of the reuslts were mixed, they concluded that users prefered the FRBR-ized catalog. [scherzo]

Footnotes

[1] "We also saw that in both exercises the original expression was often given a special position, not bundled with the other expressions but rather much closer to the work in question. In fact, in some cases it seemed to be considered as a surrogate of work. This indicates that the original expression requires a special consideration within any conceptual model of the bibliographic universe and should be addressed in any future developments of FRBR." [pisanskizumer 1]

[btil] Tillett, Barbara. What is FRBR? Washington, DC, Library of Congress, 2003.

[jsc5] Delsey, T. (2009). RDA Database Implementation Scenarios. 5JSC/Editor/2/Rev, 1 July 2009

[lavoie] Bennett, Rick; Brian F. Lavoie; Edward T. O'Neill. The concept of a work in WorldCat: an application of FRBR. Library Collections, Acquisitions, and
Technical Services 27,1 (Spring 2003).

[pisanskizumer] [1] http://www.ff.uni-lj.si/oddelki/biblio/oddelek/osebje/dokumenti/pisanskizumer1a.pdf, published in Journal of Documentation, 2010, vol. 66, no. 5, str. 643-667 [2] http://www.ff.uni-lj.si/oddelki/biblio/oddelek/osebje/dokumenti/pisanskizumer2a.pdf, published in Journal of Documentation, 2010, vol. 66, no. 5, str. 668-680

[scherzo] Juliet L. Hardesty, Steven Harris, Anna Coogan, Mark Notess. Scherzo Usability Test Report: Testing a FRBR Search Interface for Music. Indiana University. January 3, 2012.