Stakeholders and Standards in the E-book Ecology

Or, it's the Economics, Stupid!


Published in Library HiTech, volume 19, number 4, 2001. pp 314-324

When we think of the production of paper books, we generally think of publishers as the actors in that world. In fact, the production of books involves many players; a partial list would include authors, agents, editors, publishing houses, illustrators, paper-makers and binders, printers, distributors, and retailers. Although books are many hundreds of years old, the current book business has a distinct 20th century flavor with its outsourcing of functions like actual book printing to companies that can take advantage of economies of scale. There is a hidden business around the book business that is mainly invisible to us.

When people refer to "publishers" or "publishing" there's some confusion as to whether they are referring to the entire process with all of its players or to the process that takes place within the publishing houses, that is, the area of selection of content and nurturing of writers and their creative products, and trying to reconcile this with the bottom line of making enough sales to keep the business running. Two recent books [1] on the publishing business eloquently discuss the tension between the creative, intellectual process of publishing and the business aspects of selling books in a very competitive marketplace.

The emerging e-book business is a new entry in the chain of events that leads from the publishing house to the customer.[2] Electronic book production replaces the traditional services of printing and binding, shipping, warehousing, and retail. That these companies produce the book products after publishing houses have done their job of selection and editing is not new. What is new about e-books is that the e-book business produces a machine-readable rather than a human-readable product. While the end result of each of the e-book products is to display text or images on a screen, the underlying technology can be different for each product. In other words, there is no single technology that defines the e-book in the way that paper and binding defines the traditional book. This plasticity of computer technology, and its ability to achieve the same end through an endless variety of means, leaves us in the unenviable position of having rival products that are attempting to re-define the book for the digital age. After hundreds of years of stability the definition of "book" is being called into question.

To an outside observer, the nascent e-book business today is quite chaotic. Companies have appeared with seemingly viable products that have never caught on, such as the delightful tk3 multimedia e-book reader and development software. Others have announced their services months or even years before actually opening their virtual doors for business, such as Questia and eBrary, both online e-book startups that were late to leave the starting gate. Many of the initial start-up e-book companies, such as Glassbook and Peanut Press, have been purchased by already-established technology companies (Adobe and Palm, respectively). In some instances, those companies decided not to support the original product, as occurred with IT Knowledge, an online e-book company that had a very promising model of delivering computer technology books to the desktop until the parent company decided to terminate the service[3]. It does not seem that the transition from paper books to electronic books is progressing in a smooth manner and this is clearly going to be detrimental to the sales of these products, at least until some product achieves sufficient dominance to define the e-book product with their technology. Meanwhile, authors, publishers and readers are unsure of the their own future as participants in the intellectual life of our civilization. These are trying and tense times.

There is, however, another option. Rather than wait for the market shake-out that will produce a single e-book format, the industry could define standards that allow a number of players to enter the business with interoperable but competitive products. Such standards would reduce the risk for all of the parties involved, especially the publishers who are otherwise faced with incompatible requests from competing technologies. Since no one wants to invest in the BetaMax of e-books, publishers, with their slim profit margins, have been reluctant to partner with particular players in the e-book market. With standards the industry could define a base technology that all players can adopt, and which hopefully would result in a marketplace where a number of companies can compete.[4]

In fact, standards in the e-book arena are being developed in five primary areas: e-book formats, digital audio formats, digital rights management languages, digital rights management systems, and distribution and promotion. Each of these has interesting technological aspects, but they are also interesting in terms of the stakeholders who promote them. Publishers have actually been somewhat slow in their involvement in the standards process, but this is not as surprising as it seems on the surface: the role of publishing houses themselves in relation to the e-book product is not terribly different to that of printed books. The changes that will take place are in the production of the final product and its distribution and sale, areas that publishing houses have outsourced in the paper world.

In general, the parties involved in developing the new business of e-books do not come from the traditional book business at all, not even from its periphery. What first interested me about the emerging e-book standards was that many of them come out of computer technology environments that have no inherent connection with the world of publishing. Researchers in encryption, trusted systems, and certification have moved into the development of e-book standards and are potentially defining the book of the future. In these circles a book is simply another technology, and the delivery of e-books is a technological problem. In the hours that I have spent attending meetings of e-book standards bodies I have never encountered discussions of authors or readers (the latter meaning the person who reads the book not the device that displays it), although the "consumer" is within the target range of some standards efforts.

There is also no talk of the intellectual content of e-books or of the knowledge that they might impart, not even over the traditional after hours drink in the hotel bar. Much like the discussion of e-commerce on the Internet, "content" is the name of the product yet it has no specific meaning.

I admit that it is possible that we do not need a humanistic approach to the middleware of e-book delivery, but like all technology developments, each decision being made opens up some possibilities and closes others. The consequences of these decisions are complex and perhaps even unknowable. They will definitely remain unknown, however, if we don't scrutinize the technology for potential impact on what we value as the human, social, intellectual side of the business of books.

In the sections below I enumerate and describe some of the primary standards efforts that are being undertaken in the area of electronic books. My emphasis, however, is not on the technologies that are being developed but on the business and market context within which these standards are evolving. In nearly all cases, the stated intention of standards and how they are eventually deployed vary in interesting ways. This is not in itself any indication of nefarious actions among e-book producers; instead, it shows that there is such unpredictability in this market that even the players themselves may not have a clear idea of what will result in a successful product. In any case, we have to face the fact that the needs of libraries are not driving the current market developments, even though some of the standards developers had libraries in mind as users of their future products.

Publication Structure

The Open eBook, by the Open eBook Forum

The Open eBookTM (OEB) effort got its start in 1998 at the world's first e-book conference in Gaithersburg, Maryland. In his keynote speech, Dick Brass of Microsoft called for the development of a common e-book format as a way to avoid the "standards war" phase that new products go through. As stated later in a Microsoft press release: "Without a common standard, publishers would have to format eBook titles separately for each electronic device and the number of titles available for any devices would be small... This would be a recipe for disaster." Early participants in the group included Nuvomedia, Overdrive and Softbook Press. The group constituted itself in 1999 as a non-profit industry standards group. The Open eBook publication structure version 1.0 was released in September of 1999, and version 2.0, a major upgrade to the standard, is in progress. You can think of the 1.0 version as being a minimum set of data elements for document structure, while 2.0 begins to add some of the bells and whistles that will make e-books an enhacement over the paper book. For exampl,e new features allow greater linking between parts of documents, opening up the possibility of a non-linear approach to the content and greater integration of multi-media components.

Like most industry standards groups, the Open eBook forum is membership organization. This means that only members can attend meetings and participate in the development of the standards although there are currently no restrictions on use of the standards, which are published and distributed on the OEBF Web site. The annual report for 2000 shows the membership of 130 organizations to be heavily weighted to technology companies (39%) and consulting and services (41%). Publishers make up only 9% of the membership, while universities and libraries are 8%. The Library of Congress has been a members since 1999, and the American Library Association joined in the fall of 2000. Of note is that 10% of the membership is organizations for the blind and visually handicapped, who are particularly interested in the development of mainstream standards that also serve their constituency well. Of all of the participants in the OEBF, this community is closest in its interests to the library community. Both have goals of widespread affordable access to the public through standards-based technology.

The motivation behind the OEB publication structure was to facilitate the rapid and wide-spread production of e-books. "[I]n order for electronic-book technology to achieve widespread success in the marketplace, reading systems must have convenient access to a large number and variety of titles."(OEB Publication Structure v. 1.0) Their FAQ states that the structure "... ensures that content can be viewed on any reading system which is OEB-compliant...."

The publication structure standard is in XML but version 1.0 is deliberately very HTML-like. This allows e-book producers to make use of current HTML authoring tools, not to mention current HTML skills. Although later versions will add features that are not available in HTML, the group developing the standards is committed to maintaining backward compatibility so that standard continues to support the creation of basic, simple e-books that can be coded by anyone who knows how to create web pages.

From the above description you might expect to be able to purchase books today in the OEB format, but that is not the case. Although the Open eBook is used by the publishing and e-book companies, it is not the format that is sold to the reading public, nor are "raw" records in this format readable in the various devices and software. One possible reason for this is that the publication structure is only part of the e-book equation; specifically, it doesn't provide any protection for the files. Because of this the OEB format is primarily used as a single production standard for publishers who then transmit this format to e-book companies to be translated into the various proprietary formats that are actually sold to the public. In spite of this "standard," e-books are sold in over a dozen different proprietary formats, each corresponding to a proprietary hardware or software device. A partial list of current e-bok formats includes: Adobe PDF, Adobe PostScript; AportisDoc, ASCII text, DAISY Digital Talking Book, eBookMall Instant eBook, ExeBook, Franklin Reader, Gemstar eBook, goReader, HieBook Reader, HTML, Microsoft Reader, Microsoft RTF, Microsoft Word, Nightkitchen TK3, Nitelinks NSR, Open eBook, Peanut Press, Rocket eBook, Softbook, TumbleBooks, VersaBook.

From what I can understand of the e-book market, another possible reason behind this proliferation of formats is because of the business model of e-book sales. Software readers are provided for free, so clearly the technology companies such as Microsoft and Adobe must make their money elsewhere. And although the price for hardware readers may seem steep, rumor has it that sales of those devices themselves are not profitable. It is the presentation of e-books in a proprietary format that will lead to the capture of market share for the technology companies that deliver the digital product. If the book formats were not proprietary, it would be more difficult for a particular company to distinguish its product in the market. (Think Sega vs. Nintendo.) Since the stakes are high for companies entering into a new and unproven technology, the goal of these companies is to capture a significant share of the market, not to share it with dozens of other companies.

The Open eBook publication structure is an interesting example of how the right technology standards do not necessarily translate into a viable product on the open market. While libraries -- and readers -- have a need for a single format that can be read in a variety of devices, the market requires just the opposite: product differentiation.[5] We can only hope that the addition of a standard rights management system will make the concept of "any book on any reader" a reality.

Digital audio books

The digital audio-based information system (DAISY) project has a long history, at least long for today's technology. It began in 1988 in Sweden as an attempt to define a better form of audio books for blind students. There was a need to develop a technology of spoken books that contained structure and navigation, allowing students quickly to locate chapters, pages, footnotes and other parts of a book that are readily available to sighted students. This concept became a real possibility when CD-ROM technology went mainstream in the early 1990s.

The DAISY Consortium was founded in Stockholm in 1996 by the agencies serving blind readers from Japan, Spain, Holland, the UK, Switzerland and Sweden. A prototype of the DAISY format was developed, but in May, 1997 the consortium decided to change the file format to allow interaction between the spoken text and HTML. This has produced a technology that allows synchronization of the audio file and the printed text. It is this interaction that has moved the DAISY standard into the mainstream and has opened a host of possibilities not only for the standard but for readers, both sight-impaired and fully sighted. It allows blind readers to navigate the text to specific chapters or paragraphs, to locate footnotes or references, and even to stop the audio and request the spelling of terms, something that is very useful for students whose primary exposure to textbooks is auditory. It has the same potential to allow sighted readers to are reading the text of the work to request the pronunciation of written words. It also makes it possible for all readers, sighted and sight impaired, to move back and forth between the audio and the text in whatever way serves them best.

The DAISY standard, now being considered for NISO standard status (NISO Group AQ), has consciously incorporated standards like HTML, the OEB publication structure and CD-ROM as a way of moving products for the blind into the general market. As stated by George Kershner, chair of the DAISY Consortium and representing Recording for the Blind and Dyslexic, and Janina Sajka, of the American Federation for the Blind, in their document Surpassing Gutenberg -- Access to Published Information for Blind Readers, the greatest advantage for the blind reader is this "historic opportunity to serve blind people's access needs through the general marketplace rather than through specialty publishing." What will make it possible to provide a wide variety of reading -- at a reasonable price -- to the visually handicapped is to employ the same reading system that is used by sighted readers. In fact, you can imagine a whole range of needs that can be served by the very same technology: standard e-books already can serve as large print books for those readers whose sight is poor (or whose eyes are tired), so why not add sound and capture the market for the audio books favored by commuters? And if it so happens that the same technology can incorporate the navigation needed by readers of audio textbooks, which is also the technology that will make it possible to skim through an audio version of a weekly news magazine, then you have a product that serves the full range of reading needs of both the general and the special needs public.

This is the dream of those who have worked to make DAISY a reality. Although coming from a special interest point of view, they hit at the core of the question: what is a book? If we can accept that a book is expression, not just text, then it seems natural that a book has both a textual and an audio expression in the same package.

This does not mean that you can expect that in the future every e-book will also be an audio book. What the DAISY standard does not resolve is, of course, that an audio book requires another step beyond the production of the text, and that it must be read out loud by a professional to produce a quality spoken version of the text. This adds a cost beyond the production of the textual e-book, which means that the audio versions will surely be more expensive than the text-only e-books. But the DAISY consortium will still have achieved its goal if those audio e-books can be read on a mass-produced, mainstream reading device. By tying their dream to the emerging standards they are, in essence, betting that the market will evolve from a standards-based solution.

Rights Management Systems

Electronic Book Exchange (EBX), by the EBX Working Group

Len Kawell, President of Glassbook, announced the formation of the EBX Working Group at the first NIST conference on e-books in 1998. Early members of the group, which had formed informally in the months preceding the conference, were Adob, Philips, Houghton Mifflin, Amazon.com, Lightning Press, and the Coalition for Networked Information. The goal for the group was to develop standards that would allow consumers to read any e-book on any device. This required standardization in two areas: content format and copyright protection and distribution. The content format goals were laudable: create an e-book equal in readability to paper books; preserve the design and aestherics; allow a quick translate from publishers' pre-press to the e-book format; get e-books out to retailers at the same time as the hardcopy book. The goals for copyright protection and distribution were equally as ambitious: to prevent piracy; permit lending and giving yet make sure that one sale resulted in only one copy in circulation; perform royalty tracking; allow fair use copying and library conservation activities.

The EBX working group's first effort was to develop a standard for the management and distribution of protected e-books along the distribution chain. The working group recognized that the selling of e-books would require an entire electronic distribution system and that each node in that system had to be compatible and secure. They also assumed that they and others in the system would benefit from a standard format for this system so that all publishers, all distributors and all bookstores could interact much as they do today for paper books. Think of it this way: if books can sit on shelves equally well at a distributor's warehouse, a Barnes & Noble, and at your library, then e-books need to be able to sit equally well on e-shelves at the electronic equivalents of these locations.

While we tend to think of rights management in terms of encryption and rights languages, EBX takes on the definition of the trust management structure that would be necessary to implement a rights management system. Although other standards define rights management languages, only EBX tackles the difficult problem of how the rights can be actually be managed. This system would allow publishers to hand over the production of consumer copies to distributors, would allow booksellers to sell copies to individuals, and would allow individuals to re-sell or give away copies of their books. It also is "library-aware" and includes provisions for lending and borrowing of books.

The great advantage of this model is that it allows multiple players to participate in the market and compete at various points along the distribution chain. The other option is the development of one or more proprietary networks that handle the entire system of distribution and delivery. This latter is the model of some existing e-book companies, such as netLibrary or Questia. Essentially, a subscription to such a service limits you to books licensed by that service even though there are other e-books available on the market. The potential of EBX is the creation of a universal system that can deliver all e-books through all distributors and retailers. This doesn't eliminate the need for aggregator services and libraries would probably continue to use them in such an environment. It should, however, expand the number and variety of titles they could provide through their service.

In the EBX system the management of the e-book product is done through a special digital object called a "voucher" that accompanies the e-book file. The voucher carries within it the full set of permissions related to the e-book. These permissions will vary for different parties along the fulfillment route: a distributor may have permission to make hundreds of copies of the book that are then distributed to retailers; retailers will have the permission to sell those copies; consumers may or may not have permission to lend or re-sell the books but they will not have permission to make copies; libraries will have permission to lend and perhaps make a limited number of archival copies. The types of permissions that are encoded in the voucher for each step in fulfillment and ownership is a business decision; the voucher is the vehicle for that decision and the systems used by all steps in the chain (including the reading devices of the end-user) are the enforcers.

This all sounds rather magical, and in fact the technology could use the application of some magic in its realization: EBX describes a complex trust management system that will be difficult to actually build and operate, especially because so many different components must act in concert to make this system work as desired. As the e-book passes through the many steps in the distribution chain, each component of the system must act on the encoded rights in such a way that each copy that is produced is entered into the system with a unique identity and an accurate generation of the copyright protection that is appropriate for the next recipient of the book along the chain. EBX also requires the implementation of a certificate system that accurately and securely identifies everyone who acts upon the voucher, which means that there needs to be a certificate authority for the e-book business, another technically difficult and expensive function.

For all that such a rigorous system will be difficult to create, not to mention costly, it has inner logic that has the potential to overcome some of the basic problems with the distribution of digital files. EBX gives each digital file an identity and a set of rules that it will follow. If the file is coded "do not copy," it will not allow itself to be copied. If the file is coded "ok to give" then the ownership of the file can be transferred to another user. Since "transfer" of a digital file means making a copy, the EBX system allows only one copy to be usable at a time, whichever is the current legitimate one. This system would allow digital files to act in some ways similar to analog products.

EBX, or a system like EBX, is one of the key components of the interoperability that we desire between proprietary devices. In theory, different reading devices could implement EBX and libraries could lend e-books to any EBX-compliant device. Combined with a standard e-book file format, such as the Open eBook publication structure, we would have a universal e-book.

With draft 0.9, the EBX Working Group had developed a strong system design and a basic rights management language. But there was a particular hurdle that the group had to get over in order to move to the 1.0 stage: acceptance by the corporate members of EBX and declaration of any patent interests. It was at this point that members of EBX, many of whom were also members of OEBF, became concerned about the overlap of mission between the two organizations. Although OEBF primarily had a publication structure, it was beginning to move into rights management, which is where EBX had begun its work. Both were looking at rights management languages and metadata. Both needed a clear system of identifiers.

In fall of 2000 the two groups decided to join forces. EBX was folded into OEBF where it would reconstitute itself as the working group on Rights and Rules and continue its work. It remains to be seen whether the new group will have an easier time making the leap to release 1.0, but EBX has provided the group with a rigorous design with which to carry on the work. Like the OEB publication structure, however, the EBX system must provide market incentives for it to be adopted. The big question is: can the logic of interoperability survive in a market that rewards proprietary solutions?

Digital Rights Languages

XrML by ContentGuard

XrML, which stands for Extensible rights Management Language, is owned by ContentGuard, a joint venture of the Xerox Corporation and Microsoft. It was developed through the research of Mark Stefik at Xerox PARC in Palo Alto. Stefik has long been a proponent of intellectual property protection through technology, with great enthusiasm for the advantages that technology can provide to content owners:

The XrML specification covers nearly every imaginable condition that could be placed on access to and use of intellectual property, from time constraints and payment options to rules on installation and deletion of the product. Because it is presented as an XML file, XrML allows the combination of these conditions in nearly infinite patterns. So a document protected by XrML could licensed for access on alternate Thursdays between March 1, 2001 and September 30, 2002, for a maximum of five hours at one price and a different price for hours after that, and so on. Certain restrictions are referred to as "incentives," such as those that alter price based on use. For those of you who fear "pay per view" for books, I should note that among the restrictions (which the XrML document refers to as "rights") is one labeled "PERUSE" which is actually "per use."

XrML is indeed the embodiment of the philosophy of "more control" for intellectual property, and not only in its own content but also in the business practices that surround it, for it seems that the marriage of Xerox and Microsoft has in essence wed Xerox technology and Microsoft market aggressiveness. There is also a strong element of that latter company's arrogance: ContentGuard offers XrML to the intellectual property community as a standard rights management language that anyone can license at no cost. Others have offered their products as industry standards, such as Adobe and the PDF format, so this isn't entirely unusual. What is unusual about XrML is the license that accompanies it. That, and the fact that the file, when the standard was first issued, was itself protected using the Adobe WebBuy content protection technology.[7]

In order to access the XrML document, a potential viewer must agree to a "click-wrap" contract, the kind that most of us pay little or no attention to as we click "I agree." This license, however, could be very dangerous for anyone whose agreement can be seen as binding on his or her employer. The license contains some language that could put you in a relationship with ContentGuard that you may not necessarily desire. For instance, if you make any modifications to XrML (and because XrML is in XML format it is very easily modified if you should find features that you wish to add), you must provide these to ContentGuard and give them license to use it. Then again, if your modifications are not approved for integration into XrML, you cannot say that your product is XrML compliant. The actual wording from the specification is:

  1. You promptly provide ContentGuard with copies of all XrML Modifications made by or on behalf of You;
  2. for any XrML Modification made by You which is incorporated by ContentGuard in the XrML Specifications, You will grant (and You agree this License Agreement constitutes such grant), to ContentGuard a perpetual, world-wide, royalty-free, irrevocable, non-exclusive license, including the right to sublicense, in the XrML Modification (and agree that this License Agreement shall constitute such assignment); and
  3. If ContentGuard informs You in writing that any XrML Modification made by You is not approved for integration into the XrML Specifications, You may use such XrML Modification but You must not designate, advertise or promote in any way that any such XrML Modification is compatible with the XrML Specifications and You agree that You will cease using the notation described in Section B5(ii) of this License Agreement.

At this time there is no e-book system that makes full use of the capabilities of the XrML standard. Even Microsoft's e-books, which presumably use the XrML structure, use only a tiny fraction of what the standard offers. This, however, is deceptive; the XrML internal rules say that any rights not explicitly granted are thus denied. In other words, an XrML-protected document must have at least the basic permission to view the e-book, but all other capabilities, such as fair use copying or first sale rights to lend the book, are not allowed unless coded specifically in the language of XrML. Other rights languages being developed, such as the permissions developed in the EBX standard, also take this approach. Admittedly it is one of the pitfalls of the computerization of the copyright function: you can't refer computer programs to Title 17, US Code, for their default action. Yet it is notable that where the copyright law gives us fair use and first sale rights as defaults, digital rights management languages take just the opposite approach: anything not permitted is forbidden. This necessarily shifts the balance between the copyright holder and the public in favor of the copyright holder. This is a prime example of what Lessig calls the effect of "West Coast code"; that is, where computer code regulates behavior that previously was regulated by legal code (which he refers to as "East Coast code"). Taken in their worst light, e-books become a means by which copyright holders can essentially take the law into their own hands.

ODRL by IPR Systems Pty Ltd.

In spite of the almost overwhelming completeness of the XrML specification, there are other rights languages efforts being undertaken. One is the Open Digital Rights Language (ODRL) which was developed by IPR Systems in Australia. Although ODRL is not nearly as fully developed as the XrML rights language it is being presented as an open alternative. Its relatively unfinished aspect is an invitation to others in the e-book community with an interest in sharing in the development of the final language. In January of 2001, a meeting was held at INRIA in Nice, France, with members of the World Wide Web Consortium (W3C) and other with an interest in digital rights management in attendance. The meeting was prompted by developers of ODRL to discuss whether the W3C should consider adding a rights management language to its suite of World Wide Web standards.

Like other standards described here, ODRL is a only a part of what would be needed to actually manage e-books. It proclaims to be focused on the semantics of rights languages. Or, as it is expressed in the ODRL document itself:

ODRL borrows from the Interoperability of Data in Ecommerce Systems (INDECS) for its basic vision of the e-publishing flow. INDECS introduced its concept of the sale of digital intellectual property as a flow chart with rights, assets and parties. It is a purely economic and highly rational approach to the problem, as is that of ODRL which illustrates its concepts with a series of precise flow charts and almost contentless language. There is little in the document to remind us that

refers to passing along a favorite book to a good friend. As with many other attempts to create functional systems out of human activities, this one reduces a complex social and economic interaction to a tinker-toy-like construction. What seems to be missing from this and other standards for rights management is that in the end the system needs to interact with people, not just other systems. It will be a great challenge for the developers of the user interface to such a system to make the transition from this underlying precision to a user friendly product.

At the time of this writing there are no specific conclusions on whether W3C will pursue the question of a rights management language, and very little traffic is taking place on the e-mail list set up to discuss the issue. The general consensus, however, is that a standard rights management will be beneficial for commerce in digital intellectual property, and that the creation of a rights language standard for the World Wide Web would facilitate e-commerce in general. Significant work has already taken place in the development of standards by MPEG and the work of the EBX group, and perhaps the role of the W3C will be to provide an overall architecture that allows these systems to work together on the Web.

Promotion and Distribution

ONIX, by EDItEUR

The ONIX standard for book promotion comes directly out of the book industry itself. The "EDI" in EDItEUR refers to the Electronic Data Interchange standard used to exchange business information, and the "EUR" refers to Europe. EDItEUR describes itself on its home page as "Co-ordinating the development, promotion and implementation of Electronic Commerce in the book and serials sectors."

ONIX is supported by the Book Industry Communication (BIC), a UK industry group, the American Association of Publishers (AAP), and the Book Industry Study Group (BISG) of New York. EDItEUR itself is within the Book Industry Communication organization. The first meeting of these groups was called by the AAP in July 1999. As a result, AAP funded the development of version 1.0 which was released in January 2000.

ONIX can be seen as a companion format to EDI, and was developed in response to the increased computerization of the book trade. An ONIX record contains the kind of information that you see in an Amazon.com or Barnes & Noble online display. It allows the book companies themselves to provide the bibliographic and promotional content, such as cover art, blurbs, excerpts and reviews23. The first versions of ONIX were developed for the exchange of digital information about hard copy books. Because the e-books that publishers are producing today are generally electronic versions of current print products, the overlap between the data elements needed for the promotion and distribution of e-books is that already used for print books is very high. However, the addition of digital objects to the ONIX standard opens up the possibility for the treatment of other products such as music.

ONIX contains a rich set of bibliographic metadata, which has been mapped to MARC21 by the Library of Congress. This doesn't mean that there is an intention to convert ONIX metadata directly to MARC (MAchine Readable Cataloging) for use by the library community. ONIX data may or may not directly interact with library data but the standardization of the publisher's data makes at least plausible the enrichment of library catalogs with "Amazon-like" features such as cover displays and reviews.

The ONIX proposed standards for e-books brings up a question that should be familiar to those of us in the library world: the question of "work" vs. "manifestation." In the print world, a book was published in a small number of different formats: hard back, trade paperback, and popular paperback. Each of these was considered a different product and each was given different product numbers (ISBNs plus barcodes). With e-books, any number of different e-book formats can be generated from the same basic file. For publishers, it isn't useful to consider each of these e-book formats a separate product; what the publisher actually produces is the file that feeds into a host of transformation programs that result in files for Microsoft Reader, Franklin Reader, Adobe E-book Reader, Peanut Reader, etc.

The ONIX standard attempts to deal with this by defining two different entities: the epublication abstraction, which is the electronic content independent of the rendering format, and the epublication manifestation, which is the actual deliverable in the format required by an electronic book reading system. This may seem a bit like splitting hairs but it has a great deal of effect on the business systems that track products and product sales, as well as on the identifier systems such as ISBN. An active controversy in the publishing world today is the requirement by the ISBN agency that each new format be given a unique ISBN. Large publishing houses claim to be quite close to exceeding the digits assigned to them by the ISBN authority due to this requirement. Unfortunately, the solution isn't obvious: systems all along the supply chain depend on the 10-digit ISBN as the key identifier for products so a change in the industry standard identifier to accommodate the many e-book manifestations would ripple throughout the industry and probably cost much more than any profit that is being made today off of the sales of e-books.

Because ONIX comes directly out of the publishing industry itself, it has a reasonably good chance of being widely adopted. Some publishers are already coding ONIX records for their new publications. It will probably be a while, however, before your local bookstore is making use of the rich content of the ONIX record. In the end it does promise to facilitate some valuable services to readers of both paper and electronic books.

Conclusion

The above is merely the tip of the standards iceberg. Hidden below each of these are efforts in areas like bibliographic and discovery metadata, identifier schemes, and attempts to standardize vocabulary so that cross-industry discussion can take place. One of the first things that seems to occur as each of these standards groups convenes is the need to define the boundaries of the problem within a diagram of the general publishing universe. Myriad flow charts and multi-dimensional representations of publishing have been developed and some patterns are emerging.

OEB and ODRL present the world of publishing as a triangle with technical, legal and social as its three points. Each of these is treated as a dimension or viewpoint relating to the standards. The best way to illustrate this is with the Napster example: the technical is the Napster technology itself; the legal is the question of whether the copying on Napster is infringing or is fair use copying; the social is that there are tens of millions of Napster users who are making copies and seem unconcerned about copyright law.

While it is admirable that the standards makers recognize that there are complex aspects to the activities they are trying to standardize, it is not clear that this knowledge will help them in practice. As I have shown in this short round-up of standards activities, both markets and technologies have their own inner logic, and as we have seen with the boom and bust of e-commerce, people do not always act in accordance with the diagrams of systems designers. No amount of standards development will make up for putting out a product and seeing if people really like it. I have heard e-book technologists themselves say that e-books are a product looking for a market; finding that market will take more than clear vocabulary and logical design. As is so often the case, it is the human element that is the deciding factor, and it is also the one least susceptible to standardization.



1) Epstein, J. The Book Business, Norton, New York, 2001; Schiffrin, A. The Business of Books, Verso, New York, 2000. [Back]
2) I am using the term e-book to refer to a literary work in digital form, not the software and/or hardware that renders it for reading. This is similar to the Association of American Publishers' definition of an e-book: "An ebook is a literary work in the form of a digital object consisting of one or more standard unique identifiers, metadata, and a monographic body of content, intended to be published and accessed electronically." in: Association of American Publishers. AAP Numbering Standards for Ebooks, New York, NY. p. 31. [Back]
3) http://www.itknowledge.com (no longer accessible) [Back]
4) That one of the players is Microsoft, a company known for its inability to tolerate competition, makes one wonder how long this competitive phase will last before we see a single "winner" monopolize the market. Although the development of standards may allow many companies to enter the business, it does not guarantee that the market will remain competitive. [Back]
5) . Shapiro, Carl and Varian, Hal R. Information Rules. Harvard Business School Press, Boston, 1999. p. 26. [Back]
6) Mark Stefik. Shifting The Possible: How Trusted Systems And Digital Property Rights Challenge Us To Rethink Digital Publishing. In: Berkeley Technology Law Journal, 1997. http://www.law.berkeley.edu/journals/btlj/articles/12_1/Stefik/html/text.html [Back]
7) For a description of my experience with the XrML standard and its content protection, read (or listen to) my testimony before the Copyright Office. [Back]
©Karen Coyle, 2001
Creative Commons License
This work is licensed under a Creative Commons License.