The Colorado Digitization Project (CDP at http://coloradodigital.coalliance.org) is a collaborative initiative involving Colorado's archives, historical societies, libraries, and museums. The Project's goal is to increase access to the special collections and unique resources of the cultural heritage institutions through digitization. The CDP received a two-year IMLS grant, with the goal of further developing the collaborative, increasing the amount of digital resources linked through the CDP Web site, testing the guidelines and standards developed by the CDP and creating a model for statewide library/museum collaboration that can be adopted by other states.
To meet the objective of increasing access, development and adoption of standards or best practices for the description of digital objects was critical. The presentation identifies the key Project assumptions, the approach the CDP has taken to achieve the goal of increased access, as well as barriers. The second key area of standards for the Project is scanning. Establishing minimum scan standards, identifying quality control guidelines, and options for scanning all are needed to assure a degree of consistency and quality across these multi-institution projects.
CDP and Metadata
CDP and Scanning
Conclusion: What Have We Learned?
The development of the Web and the application of scanning technology to library and museum collections offer unprecedented opportunities for increasing access to library and museum special collections and unique resources. In the Fall of 1998 Colorado's archives, historical societies, libraries, and museums undertook an effort to develop a collaborative with the vision of creating a virtual collection of these unique resources and special collections, expanding access to information on Colorado's history, culture, and scientific heritage through digitization.
The library and museum leaders who envisioned this initiative developed it with several assumptions. First, that to complete the digital library for the people of Colorado, the special collections held by Colorado's libraries, had to be added to the existing finding tools, online catalogs, online databases, and Web resources. Digitization offered new opportunities to providing access to these collections. Due to the uniqueness of the materials in special collections and the fact that a range of cultural heritage institutions holds these resources, the collection couldn't just represent the holdings of libraries. The collection had to include the collections of archives, historical societies, and museums. As a result, the collaborative had to include these institutions from the beginning. Third, the collection would be a distributed virtual collection, not a single centralized databases of images, taking advantage of the Internet, as well as allowing maximum flexibility for local distribution options. Fourth the emerging standards and best practices for metadata and scanning would be utilized for the Project. Lastly, the K-12 community, in meeting the state education standards, would incorporate digital objects in their lessons.
An important aspect of the Project was the environment in which the collaborative began its work. As with many other states and regions, Colorado's cultural heritage institutions hold significant amounts of primary source materials. These collections are widely dispersed, with limited access, sometimes poorly organized and generally underutilized. There is increased public demand for access to and use of primary source materials due to many of the pioneering projects, including those of the Library of Congress, Cornell University, and the Denver Public Library. Librarians and archivists indicated a need to bring together traditional library materials and digital content, while reducing physical use of the original materials.
Like many other states, we had many well-intended, undertrained, individuals digitizing primary source materials and putting them on the Web. As a result, the quality of the images was inconsistent and the metadata frequently inadequate. While we were trying to create a new collaborative, Colorado had a strong collaborative library and archives environment. However, there was limited collaboration in the museum and historical society environment, with few generally adopted standards. Lastly, standards on scanning and metadata were only just emerging, so the Project had to look to best practices or community standards to guide the standards initiative. Lastly, the traditional commercial vendors supporting the cultural heritage communities had not developed software to support the digital content metadata. Successfully addressing these environmental issues was crucial for the Project.
During the first year, the Project focused on development of the collaborative and establishing standards and guidelines to inform the future digitization projects of the partner institutions. In a distributed networked environment standards are the key to success. In the International Federation of Library Associations document Digital Libraries: Definitions, Issues and Challenges it is noted thatThe technical architecture will be a collection of disparate systems and resources connected through the Internet and integrated within one interface - a Web enabled interface. Common standards are needed to allow digital libraries to interoperate and share resources. [ 1]
Without standards the goals of the Project could not be met. However, the standards environment could best be described as challenging and ever changing.
As we examined the 15 existing digitization projects ( http://coloradodigital.coalliance.org/browse.html) that were underway in the fall of 1998, we found that different cultural heritage institutions have different standards and different levels of adoption. Libraries have a long tradition of standards that are broadly adopted by all types of libraries for a wide range of materials. Archives have a tradition of standards in the areas of preservation and conservation and best practices for finding aids. Within the museum community, specific segments have standards, such as art museums, however across the museum community there are few commonly adopted standards. Additionally museums and historical societies are faced with a wide range of non-textual, three-dimensional objects that provide additional difficulties in description and scanning. Historical societies frequently have a library, archives, and museum. Within each area, they are likely to use the standards for that function.
As the Colorado Digitization Project includes all four types of cultural heritage institutions, we found a wide range of standards and practices. Many of the museums and historical societies offered an exhibit approach to display their digital objects with no search capability to allow the user to locate an individual image. Several of the libraries used AACR-2 based MARC records to describe their digital objects, linking from the Web site to the library online public access catalog to offer searching that retrieves the individual object. Others converted their finding aids to HTML coded documents, linking from the collection level record in their online catalog to the HTML page on the Web site. The libraries are doing both item level and collection level cataloging, frequently depending on how the materials are organized. Item level metadata records were preferred where more in-depth description of the individual items was available, while collection level cataloging linked to finding aids is offered where the finding aid was in a format that could be converted to HTML. One museum, the Colorado Springs Pioneer Museum, offered a database of metadata describing their 40,000 three-dimensional artifacts, unique among Colorado's museums.
As the CDP was based on the strategy of distributed metadata and distributed images Web search capability was a major concern. With the wide range of approaches taken by the Colorado institutions, Web searching was further compromised. For many of the individual collections, retrieval via a Web search was unsatisfactory, as the Web search engines cannot access local databases or online catalogs. In other cases the Web search took the user to the highest level of the Web page, when the digital content was five levels lower on the site. To meet the CDP vision of enhancing access to this virtual collection of digital images, another answer to searching beyond the Web had to be developed. The other major issue faced by the CDP was that there was limited software to support the new metadata standards. [ 2]
CDP and Metadata
The metadata working group, one of five working groups established to develop the Project, began addressing ways to improve access to digital objects within the current metadata environment. They had to address several issues:
- How do we improve on current Web searching options?
- How do we realize the goal of improved access in a distributed network environment approach?
- How do we deal with the diverse set of standards, diverse communities, diverse clientele, diverse missions, and diverse knowledge base?
- What approach can we take that will realize the goal of increased access, while allowing for local flexibility and autonomy?
After several months of exploration, the CDP metadata working group developed a set of assumptions upon which to make decisions. These included:
- The CDP could not mandate one metadata standard; rather we had to build on the standards already adopted by the particular community, offering a variety of standards.
- The CDP could not rely on the Web search engines to provide access at the desired level.
- The CDP wanted to offer searching across print and digital collections.
To address these assumptions, the CDP metadata working group recommended the development of a union catalog of metadata, bringing together the metadata from the various projects, creating a single physical union catalog and providing enriched access to the digital objects through this union catalog. The CDP union catalog would provide the expanded search capabilities that weren't available via Web searching as users would be searching this specialized union catalog rather than the more general Web. Metadata records from local online catalogs or databases would be loaded onto a system that would offer a physical union catalog. The system would have to support cross database searching, allowing the user to search online catalogs of library and archive collections, Web resources, and the CDP union catalog.
Following the policy of not requiring a single standard, the union catalog would have to be able to load records from various metadata standard-based records, including AACR-2/MARC, SGML/XML based format (e.g. Dublin Core (DC)), as well as records created on individual databases. To make this approach a reality, the metadata group believed that the CDP needed to define a set of mandatory metadata elements ( http://coloradodigital.coalliance.org/standards.html). Using the Dublin Core framework as the basis, the working group identified seven mandatory elements of the 15 DC elements. The mandatory elements include the creator, title, subject, description, identifier, date, and format. The remaining eight elements are optional, but desirable. In addition to being able to load records from a variety of sources and in a variety of formats, the union catalog software also had to allow for online input of a MARC or Dublin Core record, support Z39.50 searching, be a production product with product support, and, in the future, support the Encoded Archival Description, and output a MARC or Dublin Core record.
As of the summer of 1999, the OCLC Site Search software was the only system that met the above requirements. The CDP worked with the Colorado State Library, expanded their existing SiteSearch contract to include the Site Search database builder software and additional simultaneous users. As of April 2000, the Project has defined the indexed fields for AACR-2/MARC and Dublin Core, determined screen displays, and qualifiers. Testing of records from the museum, archive, and library communities will begin soon.
We know that we will encounter new problems as we begin to load these individual databases. For example, many museums and historical societies do not include titles for their three-dimensional artifacts, rather relying on extensive description of the physical object for retrieval. In contrast many in the library and archival community frequently make up titles for items without a title. While this issue may be resolved through keyword searching, a user doing a title keyword search won't retrieve these objects without titles. The CDP is looking at various options for dealing with this issue. Another key issue will be that of subject terminology. Not only are we facing the issue of various terminologies used by cultural heritage institutions, we are also facing different terminology in several of our Projects because subject-based experts are operating the databases. For example, the paleontologist at the Florissant Fossil Beds National Monument is the developer of that collections database. Terms used in that database are very specific to the field and probably unknown to most general users. The CDP metadata working group will have to address this issue at some time in the near future.
CDP and Scanning
In addition to metadata standards, the CDP had to address the scan standards ( http://coloradodigital.coalliance.org/standards.html). The major issue in developing the standards or best practices for scanning, was the lack of understanding of what decisions needed to be made. The first thing we had to do was get the participants to think beyond digitization as a means of increasing access. Several of the projects scanned images at a very low resolution, one appropriate to thumbnails, but inadequate for even some Web viewing. This resulted in images that had limited application. On the other hand, the scanning working group understood that we couldn't 'sell' the participants on a higher resolution level based only on the reason that it could serve as an archival use version, since many of them didn't see their digitization initiatives as preservation projects. Like others, the Colorado participants were skeptical as to the viability of digitization as a preservation medium.
Key to establishing standards was the adoption of a set of principles for scanning. The CDP incorporated the following principles into their guidelines.
- Scanning at the highest resolution appropriate to the informational content of the originals
- Scanning at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future - scan once
- Creating and storing a master image file that can be used to produce derivative image files and serve a variety of current and future user needs
- Using system components that are non-proprietary
- Using image file formats and compression techniques that conform to industry standards
- Creating backup copies of all files on a stable medium
- Creating meaningful metadata for image files or collections
- Storing media in an appropriate environment
- Monitoring and recopying data as necessary
- Outlining a migration strategy for transferring data across generations of technology
- Anticipating and planning for future technological developments
These principles are derived from a set of recommendations developed by Howard Besser, Best practices for image capture. California Digital Library at www.cdlib.org/standards/moaa=bp71w95.doc
Once these principles were in place, we reviewed the best practices of institutions such as the Library of Congress, National Archives and Records Administration, Ohiolink, and others. As a result of this review, the CDP established a set of best practices and minimum recommended standards. Some of the areas addressed as best practices and minimum standards include:
- projects must create master, access and thumbnail version of the image
- the CDP established different standards for different format of materials, ie. text, transparent images, opaque images
- established minimum standards, with the caveat that the scanning be done at the level appropriate to the individual item
- established quality standards
To assure that the institutions had access to equipment that supported these standards, the CDP established five regional scan centers. These centers provide the Colorado institutions with relatively easy access to scanning equipment, assistance by trained staff in scanning, and access to the union catalog and local databases via the Web. Each institution has to do their own scanning. Training sessions on scanning and metadata are being conducted throughout the spring and summer, 2000 at these regional scan centers. It is hoped that the combination of consulting on scanning, training, and quality equipment will result in a consistent quality image, as well as developing expertise at the local institution level.
Conclusion: What Have We Learned?
What has been the result of our work to date? In general the institutions are pleased to have these standards and best practice guidelines, finding them useful as they develop their own plans for digitization. Key to the adoption of these standards and best practices, was that we had demonstrated that the concerns of all parties were met through working groups activities. We wouldn't have been successful had we developed standards with one group, for example the libraries, and then dictated that they be used by all other institutions. Museums had to feel that their issues were address by the standards. The same goes for the archives and historical societies. It was critical to have representatives from all the cultural heritage institution types at the table from day one when determining the standards or best practices.
The standards had to accommodate all format of materials individual photographs, collections of letters, three-dimensional artifacts, textile items, etc. This issue was particularly important if we were going to be including archives, museums, and historical societies, whose collections contain many of these resources.
We found that within the state we had the expertise to undertake this type of project. The knowledge of catalogers and registrars transferred to the description and subject analysis of digital objects. We also recognized the need to think about the description of these digital objects in some different ways. We needed to consider the functional and administrative metadata, which is usually not relevant for print materials, to assure ongoing management of these digital assets. Some of the new or emerging metadata standards capture this data, while incorporating it into MARC or Dublin Core record is still under development.
More importantly, we realized that we could learn much from the representatives from the different institution types. The library cataloger's understanding of the descriptive and analytical needs of three-dimensional objects was expanded. The approaches museums and historical societies take to describing their collections was new to many and had to be accommodated in the metadata. Similarly scanning needs of the different institutions had to be considered. For example the requirements for a digital image of a painting held by an art museum, which would be used for scholarly research or commercial publishing, required a different set of standards and skills than scanning photographs from a library collection for general user access.
After all this work, the adoption of the standards by institutions undertaking digitization projects will be the true test of whether the standards are appropriate, meet the needs of the institutions, and allow us to realize our goal of increased access. Through the IMLS grant, the CDP is awarding 19 grants to 27 institutions. These grantees must adopt the CDP standards for metadata and scanning. They must contribute records to the union catalog of metadata. These institutions will be the pilot projects for our standards, based on their experience and the continued development of the standards by the standards setting communities; the CDP will be modifying the standards and guidelines.
About the Author
Liz Bishoff is currently the Project Director of the Colorado Digitization Project. The Project, a collaborative among Colorado's libraries, museums, archives, and historical societies, is developing a virtual collection of Colorado's unique resources and special collections. Liz is the owner of The Bishoff Group, a management consulting organization specializing in library and library related organizations.
Prior to coming to Colorado, Liz was Vice President, Member Services at OCLC. Liz worked with many external organizations, including national libraries, professional library-related organizations, and government relations. She was responsible for OCLC's participation in national programs including CONSER, the Cooperative Cataloging Council, Commission on Preservation and Access, and the Digital Library Federation.
Previously with OCLC, Liz was Director of the Online Union Catalog Product Management Division, which included strategic product planning and product management for OCLC PRISM Cataloging, Interlibrary Loan, and Union List systems.
Liz is the immediate past President of the American Library Association, Association for Library Collections and Technical Services and is currently Treasurer-Elect and a member of the Board of the American Library Association.
Liz has extensive library experience in a range of school and public libraries in Illinois and California. She has taught in the graduate library programs at Rosary College and Emporia State University.
Liz holds an MLS from Rosary College, and has post-graduate work in public administration at Roosevelt University.
1. International Federation of Library Associations and Institutions, 1998. Digital Libraries: Definitions, Issues and Challenges. The Hague, Netherlands: IFLA.
2. The metadata standards considered by the CDP included Dublin Core, Encoded Archival Description, VRA, AACR-2/MARC, etc.
Paper received 1 May 2000; accepted 10 May 2000.
Copyright ©2000, First Monday
Interoperability and Standards in a Museum/Library Collaborative: The Colorado Digitization Project by Liz Bishoff
First Monday, volume 5, number 6 (June 2000),