First Monday


Oh What a Tangled Web We Weave: Opportunities and Challenges for Standards Development in the Digital Library Arena

Standards development is particularly difficult within the digital library arena, primarily because the most active players have not yet formed a true community in the sense of having evolved a common vocabulary, commonality of interest, or structures for collaboration and communication. Competition, the need to allow for innovation and experimentation, and the need for speed in a rapidly changing technological environment all help to foster a culture that does not necessarily value standards. Appropriate responses can help encourage the development of standards even in this environment. Among these is the recognition that collections of practices, best practices, local and global guidelines, normative standards and implementers' agreements can all be useful under different circumstances, and to aim for the level of standarization most appropriate to the situation.

Contents

Introduction
The Un-Community
Some Arguments Against Standards
Some Useful Reactions
The International Arena
Small Fish in Big Seas
Conclusion

Introduction

This paper will address in very general terms some of the challenges and opportunities for standards development in the digital library arena. Several of the papers in this Conference have referred to the need for standards; for example, Liz Bishoff wrote about the practicalities of getting a heterogeneous group of libraries and museums to follow common guidelines. In this paper I'd like to focus on some of the reasons why the development of standards related to digital libraries is particularly difficult, and some of the approaches we can take in response to these challenges.

I should point out that, unofficially at least, I'm wearing two hats. At the Florida Center for Library Automation (FCLA) I'm responsible for central systems support for digital library initiatives in the State University System of Florida. As such, I'm a consumer of standards. I am also on the Standards Development Committee of the National Information Standards Organization (NISO), which is an ANSI-accredited standards developer in the areas of libraries, publishing and information services. The Standards Development Committee has a number of jobs, but our three main responsibilities are evaluating proposals for new NISO standards, monitoring the work of standards committees to help them keep on track and on target, and keeping an eye out for areas where standards are needed. So in my NISO role I'm a producer of standards. Although I'm not speaking officially for either organization in this paper, I've tried to keep in mind the perspectives of both producers and consumers, and you should keep in mind that my background is with libraries and with NISO.

The Un-Community

I'd like to assert that the fundamental challenge for standards development is that the digital library community really isn't a community at all. When we speak of the "library community", this is not just a handy phrase; there is an actual communal infrastructure and commonality of interest. I certainly wouldn't go so far as to say that all librarians know all other librarians, but I would venture that most of us know the librarians active in our own areas of interest, regardless of where they are located geographically, and if we don't know them personally, we know where to find their names. We have frequent and regular meetings of regional, national, and international professional associations and membership organizations, and we have committees and mailing lists and interest groups at the national and international levels. In fact, we have so many opportunities and mechanisms for cooperation and communication that it's often hard to get our work done at our home institutions.

This is not at all the case in the digital library arena, where we have a far more heterogeneous set of players. Digital librarians come from libraries and museums and galleries and archives, each of which may have their own communities but these are only beginning to get acquainted with each other. One of the comments I've heard most frequently at this meeting is how interesting it is to be here with so many people we don't know. This is testimony to how much we have to share with and learn from each other, but also to how isolated we have been up to this point.

Even the interests represented here are only a small subset of the participants in the digital library arena. Publishers are major players, and they are a heterogeneous lot themselves, ranging from scholarly societies and academic presses to the large commercial STM publishers. And then there are hardware and software companies from Adobe to Xerox; computer scientists from any number of venues including corporations, academic departments, and government organizations; and research faculty in just about all disciplines. (One of the things that struck me at the Digital Libraries Conference sponsored by the National Science Foundation last fall was how many of the participants were scholars and researchers in fields nominally unrelated to computer science - faculty only involved in digital libraries because they need them as a tool toward pursuing their own ends.) There is also an interesting group of players I can only call "media types" - those whose primary interest is in digital media such as audio and video and the problems of how to store, index, search, organize, and otherwise use these formats, rather than in the subject content that they carry. All of these groups and several more are actively involved in digital library research, development, and applications. This diversity is terrific, it is wonderfully exciting. The breadth of knowledge we have to draw on, the possibilities for synergy, the potential in this hodge-podge of talents for creative and unexpected solutions are all tremendous. But we are not yet a community, and that has a downside, particularly in respect to standards.

We do not have a history of working together, or established structures and mechanisms for doing this. We don't read the same publications, or travel in the same circuits. I don't know who is working in my area of interest within this wider digital library un-community, and I likely don't know how to find out.

We do not share a common vocabulary. I have been to meetings involving two or more of these groups where the first half of the day was spent discovering that the same words had different meanings to different participants, and the remaining time was spent simply agreeing on terms and assumptions. Even a concept as basic as the "digital library" itself will invoke different assumptions. The computer scientist is likely to think of software and applications, the faculty member will think in terms of collections, and librarians and curators will think of organizations and services.

We look to different standards organizations. You have probably all heard the old joke that the nice thing about standards is that there are so many to choose from. This can be said of standards organizations as well. There are about 270 ANSI-accredited standards agencies. A 1996 directory lists 620 non-governmental and 80 governmental standards bodies in the United States alone, and this doesn't include ad hoc or single-focus groups like the Dublin Core Metadata Initiative (DCMI) [ 1]. Of course, by far the majority of these are not relevant to digital libraries, but a large number are. One of the first questions you have to ask when thinking about a standards need is who might already be working in this area, what organization might feel this is its bailiwick. In many cases these may turn out to be organizations that we are not accustomed to working with.

Finally, not all of these groups share the same goals and values. So much of the library infrastructure has been built on MARC and shared cataloging that librarians as a group have a deeply ingrained respect for standards. This is not necessarily true for all of the players in the digital library arena, and it is not necessarily benightedness that accounts for this. There are a number of legitimate counter-values that can work against any impulse to standardize.

Some Arguments Against Standards

Possibly the worst reason is the "not invented here" syndrome that I'm sure you've all encountered in one form or another. As an ex-programmer I can safely say that it is much more fun to design and develop an application from scratch than it is to follow someone else's specification. But there is some legitimacy to this as well. Standards are almost by definition the product of compromise, and they tend to reflect this. It is extremely unlikely that a standards-based solution, whether it be a protocol or data dictionary or procedure or format, can be as efficient and effective for any particular application as a solution tailored to that application. You nearly always give something up when you adopt a standard.

Another reason is what I'm calling by the awful term "pre-interoperability". The desire for standards seems to be strongest when there is a need for interoperability between multiple applications. When we want to share data, or search each others' servers, then we need to follow common practices. But many of the problems in digital library development precede the need for interoperability. For example, structural metadata (that is, metadata that describes how objects like page images are put together to form more complex units like chapters and volumes) is generally used only by the specific system that will deliver those objects. We do not yet share our digital objects much or ship them around for general use, so interoperability is not an issue. Consequently there is not much perceived need for a standard for structural metadata. Yet every archival, museum, or library project that digitizes and displays complex objects needs some form of structural metadata, and every project is inventing its own data elements and format. It's hard to believe this is the best use of our collective intellectual energy.

Competition is another strong force working against collaborative standards development. Competition for market share is a factor for commercial enterprises such as hardware and software manufacturers, but there is also competition between cultural institutions for visitors and funding, and competition between educational institutions for faculty and funding. The publishing industry is very much aware of the interplay of competition and standards. Attempts to involve publishers in discussing best practices for library licensing have foundered on the publishers' own fears of anti-trust actions. At a recent NISO workshop on best practices for electronic journal publishing one publisher made a compelling plea for cooperation, arguing that there were far too many external threats to commercial publishing in today's environment for publishers to keep stabbing each other in the back. I thought both the acknowlegement of "back-stabbing" and the plea to rise above it were somewhat indicative of the problems publishers face.

One of the most difficult situations to deal with is when the primary beneficiaries of a standard are a different group than the primary implementers. This is not uncommon in NISO, which serves both publishers and libraries. It may happen for example that a library group will have a strong interest in defining a specification which will save significant time in cataloging or controlling a certain class of materials. Unfortunately, the materials producers who would have to follow the specification would have to retool their own operations at some inconvenience and expense for no direct benefit. The library constituency may not be a large enough part of the market for the standard to be successful. This is likely to be an increasingly important factor in the digital library area, where our applications are not necessarily going to be of significant import to designers of computing and communications infrastructures and tools.

Another argument against standardization is the need to allow for innovation and experimentation. This is particularly valid in the development of digital libraries, where the technological environment is rapidly changing and the most fruitful approaches to many of our problems are far from clear. Computer scientists, researchers, and developers are especially sensitive to the fact that too rigid standardization, standardization occurring too soon, or standards in inappropriate areas can stifle creativity and impede progress. Think of what the world would look like today if we had standardized the Gopher protocol in the early 1990s.

A related argument concerns the need for speed. Hardware and software development processes now run on Internet Time, while the standards development processes of most organizations do not. The time it takes to develop a NISO standard from the acceptance of a work item to approval by the Voting Members has traditionally averaged five to seven years. NISO recently made a number of changes in its Operating Procedures to speed things up, but these are too recent to have had an impact. A 1995 study of the major Internet standards organizations estimated the average development time in the International Telecommunications Union Study Group 7 (ITU/SG7) and in the Internet Engineering Task Force (IETF), two major developers of international communications standards, was 3-5 years and 2-4 years respectively [ 2]. My own informal calculation is that it takes about two years to progress from start of work to official Recommendation of the World Wide Web Consortium (W3C). This seems to me to be about the right ballpark; I certainly don't believe we should strive for anything shorter than this. The standards process requires enough time to allow full participation of materially affected parties, to achieve consensus, and to allow the best to emerge. We will always have to balance the need for speed with the need to take the time to do it right.

Some Useful Reactions

Both the standards producers and the consumer community seem to be changing their own approaches in response to these issues. For one thing, I see a move towards tackling smaller, more focused topics. The day of monumental, comprehensive standards like Z39.50 or MARC may be over. We seem to be shifting towards simpler, single-purpose formats and protocols. Recent initiatives like Open Archives [ 3] have declined to specify services and applications level functionality and instead are concentrating on defining the minimum necessary for interoperability at a low level.

It also seems clear that encouraging use before a specification or practice is turned into a standard is an approach well suited to the digital library environment. This not only results in better standards, but has the advantage of getting them into use in the community more rapidly. The IETF has always required at least two interoperable implementations to advance from Proposed Standard to Draft Standard status, and a significant number of successful implementations to advance a Draft Standard to an Internet Standard. Recently NISO instituted the Draft Standard for Trial Use to encourage implementation experience with certain classes of draft standards before balloting. NISO also approved a "fast track" process where a standard or agreement coming out of another community can be formalized expediently though the NISO consensus process.

There is also, I think, a practical move to adopt the most "lightweight" standardization that will serve the purpose. It is very important to recognize that the road from chaos to control has a number of rest stops along the way. The first step may simply be to identify practices in a certain area: just what is it that people are doing? For example, the RLG-DLF Task Force on Policy and Practice for Long-term Retention of Digital Materials took it as their first task to gather institutional digital archiving policies and documentation of current digital archiving practices [ 4].

Once practices are gathered, the next step may be to identify best practices, which are not proscriptive and can vary from situation to situation. Best practices in turn can be codified into local or global guidelines or standards as appropriate. (In NISO there is a formal distinction between guidelines and standards. Both are consensus documents balloted by the Voting Membership but guidelines are non-normative, meaning that implementers can deviate from them when they have good reason to, while standards are normative and must be followed to be in conformance.) Finally, after a standard is established, there may well be a need for further refinement in the form of implementers' agreements; for example a group of institutions may get together to decide exactly how they are going to implement the EAD (Encoded Archival Description) to facilitate certain types of cross-system searching.

So: collected practices, best practices, guidelines, standards, and implementers' agreements can all be appropriate under different circumstances, and although we tend to speak of all of these generically as "standards", we need to analyze each situation to see which is most appropriate, and I would argue, aim for the lightest that will do [ 5]. This can help us reap some of the advantages of consistency while remaining nimble and quick to respond to changes in the digital landscape.

The International Arena

Another challenge is the fact that the network is inherently international, meaning network applications including digital libraries are inherently international. It is very difficult to think of local or even national standards in this context. This international environment serves to increase the diversity of the digital library players, with all of the advantages and disadvantages that entails. But it also presents very specific problems to standards developers.

For one thing, international collaboration is just plain hard. International travel may be routine but it is still expensive and time-consuming. Some people will tell you that e-mail and modern telecommunications and videoconferencing have made geographic barriers meaningless but don't believe it. Standards committees find they cannot work by e-mail alone, and both voice and video conferencing still have to contend with time zones. I'm on one committee that has a weekly conference call at 7:00 in the morning for me in Florida, which is 11:00 in the evening for the member in Brisbane and 4:00 in the morning for the member in California, who usually doesn't participate. I know of another committee with international membership which has divided into temporally clustered subgroups. The subgroups each hold their own conference calls using the same agenda, and then the chairs share the results with each other.

It is tempting to want to develop a standard in the U.S. and then take it to the International Organization for Standardization (ISO) or some other international forum. However, this linear process is terribly slow and has any number of problems of its own. The DCMI and some other single topic groups which have developed de facto standards independently are now attempting to make them de jure in multiple forums simultaneously. In the case of the Dublin Core, this means pursuing a CEN Workshop Agreement, NISO Fast Track processing, and some non-U.S. national standards processes at the same time. Regardless of the formal approach taken, it is good to include international input from the start. Even NISO, an American national standards agency, now allows and encourages membership from outside of the U.S.

Finally, it is important to keep in mind that any number of organizations in Europe, Australia, and elsewhere are working on agreements of their own, from best practices to formal standards, in all areas pertaining to digital libraries. We need to pay attention to and participate in these efforts as if they were our own, as in a sense they are.

Small Fish in Big Seas

The last challenge I'll mention is the simple fact that many of the technologies we depend on are quite literally out of our control. If we are working with digital libraries, we're working with the Internet, the Web, multimedia, document imaging, and any number of technologies that serve far larger, richer, and more powerful constituencies than ourselves. We can agree upon the features we wish we had in our Web browser, but we shouldn't expect to see them in the next version of Internet Explorer. Nor should we expect to make much of an impact on the development of technologies for authentication, e-commerce, rights management, or other areas where large commercial interests are involved.

Even where we have some access to development channels, some of these technologies may be too technical or too commercial to encourage our participation. Different sectors of the digital library non-community may be more comfortable with different technologies, leading to a Balkanization of effort.

Nonetheless, we are all producers and consumers of standards, and we need to be as intelligent as consumers as we are as producers. We may not be able to control all of the technology we depend on, but we can develop appropriate responses to it. If we don't trust PDF as an archival format, then we don't have to plan on using it that way.

Conclusion

One of the things that has impressed me at this Conference is how many of our projects are using some form of the Dublin Core Metadata Element Set for resource description. The Dublin Core was not until very recently an "official" standard, in the sense of having been developed or approved by an ANSI-accredited or formally recognized standards body [ 6]. So far as I know, use of Dublin Core isn't required by professional societies, funding agencies, or state or federal governments. The Dublin Core itself is not very well documented, does not yet have very wonderful tools to support it, and in fact, can hardly be used without some local qualifications and extensions. But I'd venture to guess that the majority of us in this room are using Dublin Core in some form in some project, completely voluntarily, using our own professional judgement. It is hard to think of more compelling proof of the fact that we need standards, we will develop the standards we need, and we will use the standards we develop.

Standards exact a significant cost in the time and effort it takes to develop, document, publicize, and implement them; they may discourage innovation; they may slow us down. Standards also encourage stability, promote interoperability, and allow us to invent the wheel once collectively instead of many hundreds of times individually. We should recognize that there are many forms that standardization can take and look for the most appropriate and proportionate approach for any given situation. In this way we can attempt to minimize the costs and maximize the benefits of standardization. Importantly, the adoption of standards can provide a foundation of commonality in language, practice, and goals, while the process of developing standards can provide channels of communication and collaboration for the emerging digital library community.

About the Author

Priscilla Caplan is Assistant Director for Digital Library Services at the Florida Center for Library Automation in Gainesville, Florida.
E-mail: pcaplan@nersp.nerdc.ufl.edu

Notes

1. Robert Toth, 1996. Standards Activities of Organizations in the United States. Washington, D.C.U.S. Department of Commerce, National Institute of Standards and Technology.

2. Jacob Palme, 1995. "Notes on Standards Making,", at http://www.dsv.su.se/jpalme/standards-research.html

3. The Sante Fe Convention of the Open Archive Initiative "presents a simple technical and organizational framework to support basic interoperability among e-print archives." See the Open Archives Initiative Web site at http://www.openarchives.org/

4. See "RLG-DLF Task Force on Policy & Practice for Long-term Retention of Digital Materials" at http://www.rlg.org/preserv/digrlgdlf99.html

5. Anne Kenney has argued something similar in her wonderful keynote address to the Joint RLG and NPO Preservation Conference on Guidelines for Digital Imaging, at http://www.thames.rlg.org/preserv/joint/kenney.html

6. Dublin Core has recently been approved as a CEN/ISS Workshop Agreement; see http://www.cenorm.be/news/press_notices/metadata.htm


Editorial history

Paper received 1 May 2000; accepted 10 May 2000.


Contents Index

Copyright ©2000, First Monday

Oh What a Tangled Web We Weave: Opportunities and Challenges for Standards Development in the Digital Library Arena by Priscilla Caplan
First Monday, volume 5, number 6 (June 2000),
URL: http://firstmonday.org/issues/issue5_6/caplan/index.html