As technologies to digitize primary source content mature and become better understood, more widely accessible, and more efficient, the volume of available digital content increases and issues of integration and aggregation become more important. Today's digitization project managers must give high priority to factors such as reusability, persistence, interoperability, verification, and documentation when planning their projects. Digitization project funding agencies, like the Institute of Museum and Library Services (IMLS) and the National Science Foundation (NSF), must give substantial weight to these same factors when assessing programs and evaluating project proposals. A Digital Library Forum convened by the IMLS and working in collaboration with participants from the NSF's National Science, Mathematics, Engineering, and Technology Education Digital Library program has released a Framework of Guidance for Building Good Digital Collections to serve as a resource for practitioners and funding agencies. This Framework pays particular attention to digitization collection practices that facilitate integration and aggregation of digital information resources developed by museums, libraries, and similar institutions. To protect against obsolescence and to better accommodate the wide range of digitization projects funded by the IMLS, NSF, and other granting organizations, the Framework is not wedded to any particular set of standards or best practices. Rather it articulates principles fundamental to planning, implementation, and evaluation of digitization projects and links to specific resources and exemplary models that support and illustrate good application of these principles. This paper describes the context and development of the Framework, briefly presents the major principles articulated in the Framework, and concludes with remarks regarding the immediate impacts of the work accomplished by the IMLS Digital Library Forum and a call for the continued development and maintenance of the Framework.
Framework Scope & Organization
Principles for Good Digital Collections
Principles for Good Digital Objects
Principles for Good Metadata
Principles for Good Digitization Projects
In the spring of 2001 the Institute of Museum and Library Services (IMLS) convened an eight-member Digital Library Forum to discuss topics and issues relating to the implementation and management of networked digital libraries. Of particular interest were digital library technologies and protocols that would facilitate and encourage interoperability between and among digitization projects funded by IMLS and other federal agencies such as the National Science Foundation (NSF). This interest was stimulated in part by various executive branch and legislative branch actions, including a December 1999 Presidential Memorandum directing the NSF, Smithsonian Institution, National Park Service, and IMLS to work together "to create a Digital Library of Education to house this country's cultural and educational resources" (Clinton, 1999). Members of the IMLS Digital Library Forum were drawn from a broad range of libraries, museums, and allied organizations. All were experienced with digital library technologies and implementations. Most were current or prior IMLS grantees. The IMLS Digital Library Forum was chaired by Priscilla Caplan of the Florida Center for Library Automation.
To work with the IMLS Digital Library Forum, especially on issues relating to integration and interoperability with NSF digital library initiatives, the NSF convened a parallel group, drawing members from those involved in one of the largest of the NSF's current digital library programs, the National Science, Mathematics, Engineering and Technology Education Digital Library (NSDL). These two groups met together for the first time 1 June 2001 at the IMLS offices in Washington, D.C. Discussions during this initial day-long meeting were far-ranging and touched on issues relating to the creation of digital collections, the implementation of digital library services, and interoperability between digitization projects. Prior works, including Pathways to Progress (Manduca, McMartin, and Mogk, 2001), a white paper prepared for the NSDL program, and the final report of the Web Based Education Commission (2000), provided context and a starting point for our discussions.
An early and not too surprising consensus emerging from our in-person discussions and follow-up e-mail exchanges was that the issue of digital library interoperability is complex and multi-faceted. Though there was clearly strong interest in interoperability and the establishment of robust and effective links between digital collections, we recognized that engendering such functionality across the full spectrum of funded and planned digitization projects would require more than a simplistic, one-dimensional, prescriptive guideline wedded to a single set of existing best practices. We saw a need to articulate basic principles and identify exemplary resources and models in a way that would accommodate the unique needs and goals of the wide range of digitization projects funded by NSF, IMLS, and others. We concluded that a high-level framework of guidance for digitization project planning and evaluation was required. At a second meeting held during the 2001 Joint Conference on Digital Libraries in Roanoke, Virginia, the general outline and scope of this framework document was developed. A sub-group with members from both the IMLS Forum and the NSDL team then worked during the summer to fill out this outline and draft the complete document. Individuals who participated in drafting the full framework document are listed in the acknowledgements at the end of this article.
The "Framework of Guidance for Building Good Digital Collections" was published on the IMLS Web site in the fall of 2001 (Caplan et al., 2001a), along with a summary report describing Forum findings and recommendations in regard to the NSDL and its implications for IMLS digitization projects and programs (Caplan et al., 2001b). The Framework has been endorsed by the Digital Library Federation and The Chiefs of State Library Agencies, and has been favorably reviewed in the online publication Current Cites (Tennant, 2002). Comments about and responses to the Framework were solicited from the digital library community through 1 May 2002. This paper, presented during the Web-Wise 2002 conference in March, describes the thinking that went into the creation and organization of the Framework, summarizes briefly the essential principles articulated in the Framework, and concludes with a few observations about what remains to be done and about the immediate impacts of the Framework and other work accomplished to date by the IMLS Digital Library Forum.
Central to the design and purpose of the Framework is the observation that the nature of what makes a good and successful library or museum digitization project has changed over the course of the last decade. As available technologies have improved, digitization project objectives, as well as expectations for those projects, have evolved. Ten years ago, proof of concept alone was a sufficient objective. If a project could demonstrate a new and innovative way to digitize full content, primary source material, it was considered a success, almost regardless of what was then done with that digitized content. While we focused on getting the underlying digitization technologies right, considerations of scale, sustainability, and usability, though of interest, were of secondary significance.
Within a few years, however, as the fundamental technologies of digitization became more mature and better understood, issues of sustainability and utility to a specific and well defined target community of end users increased in importance. Digitization projects were no longer considered successful unless they succeeded in effectively delivering digitized content to a targeted community of users. This shift in focus has facilitated the initial, partial integration of digitized information resources with more traditional collections and has led to the development of many excellent stand-alone digital information resources.
We are now entering a third stage of development with regard to the construction of digital collections. Digital collections are no longer seen simply as self-contained, single-purpose entities. Practitioners now recognize the potential of digital collections to function as components and building blocks that can be reused by many different groups and upon which many kinds of advanced digital library services may be built. Seen in this light, digital collections as components of an institution's holdings have become more fully analogous to typical traditional collections. For instance, traditional academic library print collections have long been viewed as shared resources, available to and used by many individuals beyond those originally considered when the collection was initiated. In implementing a collection policy for a traditional print collection, a good academic library collection manager considers not only proximate core end users but also the wider and more diverse audience of both current and potential future end users, both near and far. Managers and implementers of digital collections are now expected to take much the same approach to digital collection development. This means that planners of digitization projects today must consider issues of reusability, persistence, interoperability, verification, and documentation. The resulting implications for project planning and evaluation are emphasized in the Framework.
Framework Scope & Organization
The design and scope of the Framework also was informed by a judgment that there exists an important distinction between digital collections and digital libraries. Digitization projects, for which the Framework is intended as a resource, deal with the creation of digital content, i.e., with digital collections, whether "born" digital or transformed from print or other analog format. A digital collection, as defined and addressed in the Framework, consists of an organized assembly of digital information objects, metadata describing those objects, and metadata describing the overall collection. Though exact working definitions of what a digital library is still vary, a consensus among most practicing librarians is that digital libraries encompass more than just raw digital content (Borgman, 1999). In this view digital libraries are seen as systems that incorporate not only digital content, but also many value-added services, ranging from search and discovery utilities, to browse and interpretative interfaces, to specialized preservation and dissemination protocols. These value-added services facilitate management of the digital content and make it more usable by and accessible to end users. In this view digital collections are the building blocks of digital libraries much as collections of printed books, journals, and similar materials are the building blocks on which traditional library services are built. The charter of the IMLS Digital Library Forum was to focus especially on projects designed to generate digital content and on ways to improve planning, implementation, and evaluation of such projects. This in turn implies a focus on digital collections rather than digital libraries per se. The Framework reflects this limited scope and intentionally does not address digital library services. Rather the Framework addresses considerations of what is needed to make digital collections useful as building blocks of digital libraries. It individually addresses the following three primary constructs of digitization projects:
- Digital Collections
- Digital Objects
For each of these constructs the Framework articulates relevant fundamental principles that should be considered in planning or evaluating a digitization project. To elaborate on each principle, pointers to online resources and examples are provided. Not all principles will apply equally to all kinds of digitization projects, but the principles are stated in general terms and are defined independent of any particular standard or best practice, though many such are cited as illustrative of a how a particular principle may be applied in practice. The Framework also articulates principles and links to resources that deal with digitization projects generally.
This approach contrasts to more prescriptive digitization project guidelines, such as those often promulgated by a specific funding agency or for inclusion in a particular initiative. For instance, the Joint Information Systems Committee has issued prescriptive guidelines (Beagrie et al., 2001) for projects that wish to integrate as part of the UK's Distributed National Electronic Resource (DNER). Though thorough and well written, these guidelines are prescriptive and closely tied to a particular selection of standards and best practices. They make sense for a relatively homogeneous, top-down digital library development program.
Our review, however, indicated that guidance for interoperability with large-scale, emerging national and international digital library programs like the NSDL will require a framework with greater scope and flexibility. NSDL is being designed to accommodate and embrace digital collections built to a wide range of standards and best practices, including, it's reasonable to anticipate, some yet to be written. For compatibility with NSDL it's generally more important simply that recognized, high-quality standards and best practices are used in constructing a collection than it is that one particular standard or best practice is chosen in preference to another. In that sense the NSDL approach is bottom up rather than top down. For this reason our Framework was written less as a prescription and more as a summary of first principles for digital collection creation. As expressed in the Digital Library Federation's endorsement of the Framework:Where other such guidelines promote specific standards and good practices - mechanisms whose utility is likely to diminish with time and changing technology - the IMLS Framework provides a set of high-level principles by which specific standards and good practices may be judged (Digital Library Federation, 2001).
What follows in the next four sections is a recap of the primary, high-level principles articulated in the Framework.
Principles for Good Digital Collections
Just as a library collection is more than a random assemblage of books and journals and a museum collection is more than a random assemblage of artifacts or specimens, a digital collection of information resources is more than a random assemblage of digital objects. Collections imply selection and organization. Collections typically also require descriptive, structural, and/or administrative context, typically in the form of metadata, usually at both the collection level and the item (object) level. The Framework principles for good digital collections derive from this understanding of the nature of collections. They specify what is most often necessary to create a good digital collection, but are not prescriptive about how such specifications must or should be satisfied.
Good Digital Collections
- A good digital collection is created according to an explicit collection development policy that has been agreed upon and documented before digitization begins.
- Collections should be described so that a user can discover important characteristics of the collection, including scope, format, restrictions on access, ownership, and any information significant for determining the collection's authenticity, integrity and interpretation.
- A collection should be sustainable over time. In particular, digital collections built with special funding should have a plan for their continued usability beyond the funded period.
- A good collection is broadly available and avoids unnecessary impediments to use. Collections should be accessible to persons with disabilities, and usable effectively in conjunction with adaptive technologies.
- A good collection respects intellectual property rights. Collection managers should maintain a consistent record of rightsholders and permissions granted for all applicable materials.
- A good collection provides some measurement of use. Counts should be aggregated by period and maintained over time so that comparison can be made.
- A good collection fits into the larger context of significant related national and international digital library initiatives. For example, collections of content useful for education in science, math and/or engineering should be usable in the NSDL.
Principles for Good Digital Objects
In the context of the Framework, digital objects are defined as the items that make up digital collections. Multi-part digital objects take on characteristics of collections, and so should follow principles for both, as applicable. The following principles are intended to apply both to digital information objects that are "born" digital (i.e., initially published in digital form) and digital objects that are surrogates for or representations of physical objects or texts. The principles also are generally applicable both to objects intended for routine dissemination (e.g., use or access copies) and to objects maintained for archival purposes (e.g., master or preservation copies).
Good Digital Objects
- A good digital object will be produced in a way that ensures it supports collection priorities.
- A good object is persistent. That is, it will be the intention of some known individual or institution that the good object will persist; that it will remain accessible over time despite changing technologies.
- A good object is digitized in a format that supports intended current and likely future use or that support the development of access copies that support those uses. Consequently, a good object is exchangeable across platforms, broadly accessible, and will either be digitized according to a recognized standard or best practice or deviate from standards and practices only for well documented reasons.
- A good object will be named with a persistent, unique identifier that conforms to a well-documented scheme. It will not be named with reference to its absolute filename or address (e.g. as with URLs and other Internet addresses) as filenames and addresses have a tendency to change. Rather, the filename's location will be resolvable with reference to its identifier.
- A good object can be authenticated in at least two senses. First, a user should be able to determine the object's origins, structure, and developmental history (version, etc.). Second, a user should be able to determine that the object is what it purports to be.
- A good object will have and be associated with metadata. All good objects will have descriptive and administrative metadata. Some will have metadata that supplies information about their external relationships to other objects (e.g. the structural metadata that determines how page images from a digitally reformatted book relate to one another in some sequence).
Principles for Good Metadata
Metadata, most generically defined simply as data about data, is an essential ingredient needed to support almost all current approaches to digital collection interoperability and aggregation. Metadata may be subclassed as descriptive, administrative, or structural. For some digitization projects full attention to all three subclasses of metadata will be required to insure a successful project. In other situations one subclass (e.g., descriptive metadata) may be demonstrably more important than the other two. The metadata principles articulated in the Framework apply to all types of metadata and emphasize in particular the need for the digitization project manager to balance metadata benefits against the cost of generating the metadata. They also emphasize the importance of standards-based taxonomies and metadata schemas and the need for early planning of metadata strategies. Metadata is not an afterthought for a digitization project, but rather something that should be considered from the outset of project planning.
- Good metadata should be appropriate to the materials in the collection, users of the collection, and intended, current and likely use of the digital object.
- Good metadata supports interoperability.
- Good metadata uses standard controlled vocabularies to reflect the what, where, when and who of the content.
- Good metadata records are objects themselves and therefore should have the qualities of good objects, including archivability, persistence, unique identification, etc. Good metadata should be authoritative and verifiable.
- Good metadata supports the long-term management of objects in collections.
Principles for Good Digitization Projects
Though often defined and developed in the context of a broader digital library program, efforts to initiate and construct collections of digital information resources are most frequently funded and managed as discrete digitization projects. Because most digitization projects are for finite terms, even though most digital collections created by digitization projects are intended to exist indefinitely, it is essential that digitization project design includes plans for collection maintenance and potentially ongoing digitization after the term of the startup grant and project has expired. This implies an institutional commitment at least on a par with the commitment made when new collections of traditional materials are created. Because the process of constructing collections of digital content is still novel for most institutions, digitization projects also require ongoing assessment and evaluation. The fundamental considerations to insure good and successful digitization projects are commonsensical, but are overlooked often enough to warrant inclusion and discussion in the Framework.
Good Digitization Projects
- A good project has a substantial design component.
- A good project has an evaluation plan.
- A good project produces a project report.
In its report looking at what is needed to support inclusion of IMLS-funded digital content in online national library initiatives like the NSDL program, the IMLS Digital Library Forum recommended that topically appropriate digitization projects funded by IMLS be encouraged to take six specific steps to help insure that content developed can be included in the NSDL (Caplan et al., 2001b). These suggestions for digitization project design features focus on development of collection and item-level metadata and support of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). They also call for project managers to address issues relating to interoperability, reusability, and persistence when planning digitization projects. These recommendations has raised consciousness regarding these issues among 2002 IMLS applicants and have resulted in a decision by IMLS to fund in 2002 a limited number of projects designed to add value (e.g., standard metadata and OAI-PMH metadata provider services) to digital collections already created under prior IMLS digitization grants.
The IMLS Digital Library Forum also articulated two general recommendations for the IMLS digitization program as a whole. The second of these recommended the creation of a registry of IMLS digitization projects (current and selected past) containing both collection level metadata about each digitized collection and a repository of item-level metadata describing the contents of each collection. This recommendation has resulted in the inclusion (as part of the IMLS National Leadership Grant program call for 2002) of a "Request for Proposals to Develop a Registry and Metadata Repository for Digital Collections."
The other general recommendation of the forum was for IMLS to adopt, promulgate, and maintain a Framework of Guidance for Building Good Collections. In accord with this recommendation the IMLS has already disseminated and requested comments regarding the Framework draft as described and discussed above. The IMLS Digital Library Forum is very much aware that the first draft of the Framework as published last November on the IMLS Web site is just that, a first draft. Though the principles and links contained in this document represent considerable breadth and depth, the Framework can't be considered a complete and finished document. To be truly useful, the Framework must be maintained over time as a dynamic document. New links and potentially even new principles will need to be added and old ones discarded as technologies, protocols, and best practices change and evolve.
The means to maintain and update the Framework over time remain to be decided. However, the response and comments so far received since its initial publication last fall suggest that an effort to maintain and continue development of this resource for both project planners and project funding organizations will be well worth the effort. It is hoped that IMLS, possibly in conjunction with other organizations, will put in place the necessary structures to support the long-term continuation of the Framework.
About the Author
Timothy W. Cole has a BS in Aeronautical and Astronautical Engineering and a MS in Library and Information Science, both earned from the University of Illinois at Urbana-Champaign. He has been a member of the faculty of the Library of the University of Illinois at Urbana-Champaign since 1989, serving first as Assistant Engineering Librarian and later as Systems Librarian for Digital Projects. He currently holds the post of Mathematics Librarian and Associate Professor of Library Administration and is Principal Investigator for the Illinois Open Archives Initiative Metadata Harvesting Project, funded by the Andrew W. Mellon Foundation. He was a member of the IMLS Digital Library Forum.
This paper was presented 22 March 2002 at Web-Wise 2002, Johns Hopkins University, Baltimore, Md.
Individuals who contributed to the draft of the Framework of Guidance for Building Good Digital Collections as published by IMLS on 6 November 2001:
- Liz Bishoff, Colorado Digitization Alliance
- Priscilla Caplan (chair), Florida Center for Library Automation
- Tim Cole, University of Illinois Urbana-Champaign
- Anne Craig, Illinois State Library
- Daniel Greenstein, Digital Library Federation
- Doug Holland, Missouri Botanical Garden
- Ellen Kabat-Lensch, Eastern Iowa Community College
- Tom Moritz, American Museum of Natural History
- John Saylor, Cornell University
Neil Beagrie et al., 2001. "Working with the Distributed National Electronic Resource (DNER): Standards and Guidelines to Build a National Resource" (February), at http://www.jisc.ac.uk/dner/development/guidance/DNERStandards.html, accessed 2 April 2002.
Christine L. Borgman, 1999. "What Are Digital Libraries? Competing Visions," Information Processing and Management, volume 35, pp. 227-243.
Priscilla Caplan et al., 2001a. "A Framework of Guidance for Building Good Digital Collections" (6 November), at http://www.imls.gov/pubs/forumframework.htm, accessed 2 April 2002.
Priscilla Caplan et al., 2001b. "Report of the IMLS Digital Library Forum on the National Science Digital Library Program" (October), http://www.imls.gov/pubs/natscidiglibrary.htm, accessed 2 April 2002.
William Jefferson Clinton, 1999. "Memorandum on the Use of Information Technology" (17 December), In: Public Papers of the Presidents of the United States: William J. Clinton, 1999, volume 2, Washington: Office of the Federal Register, National Archives Records Administration, p. 2316, and at http://www.gpo.gov/nara/pubpaps/srchpaps.html (from the U.S. Government Printing Office via GPO Access, DOCID:pap_text-672), accessed 2 April 2002.
Digital Library Federation, 2001. "DLF Endorsement for Framework of Guidance for Building Good Digital Collections," at http://www.diglib.org/standards/imlsframe.htm, accessed 2 April 2002.
Cathryn A. Manduca, Flora P. McMartin, and David W. Mogk (editors), 2001. "Pathways to Progress: Vision and Plans for Developing the NSDL" (20 March), at http://www.smete.org/nsdl/meetings/grantees0901/whitepaper.pdf, accessed 2 April 2002.
Roy Tennant, 2002. Current Cites, volume 13, number 2 (February), at http://sunsite.berkeley.edu/CurrentCites/2002/cc02.13.2.html, accessed 2 April 2002.
Web Based Education Commission, 2000. The Power of the Internet for Learning: Moving from Promise to Practice. Washington, D.C.: The Commission, and at http://interact.hpcnet.org/webcommission/index.htm, accessed 2 April 2002.
Paper received 7 April 2002; accepted 18 April 2002.
Copyright ©2002, First Monday
Creating a Framework of Guidance for Building Good Digital Collections by Timothy W. Cole
First Monday, volume 7, number 5 (May 2002),