First Monday


Digitizing and Preserving Plant Images: Linking Plant Images and Databases for Public Access

Our project continues to develop a database of plant image images dynamically linked to associated scientific data about those images. We strive to research and improve image capture, retrieval, and delivery technologies. Our long-term objectives are to create a digital library for botany including images, digitized botanical references and "born" digital data. All content and research resulting from this project is freely available through the Internet.

Contents

Introduction
Discussion
Conclusion

Introduction

The Missouri Botanical Garden (MBG), located in St. Louis, was established in 1859, and has one of the most active botanical research programs in the world. The Garden maintains the collections, staff, and infrastructure needed to fulfill its mission to discover and share knowledge about plants on a worldwide scale. Garden resources include a growing herbarium of more than five million dried plant specimens; the TROPICOS database that currently has more than 850,000 botanical names records, and more than 1,400,000 specimen records; a library of 126,000+ volumes; tens of thousands of live plant slide images; and, a growing Web-based digital library program.

Discussion

A significant part of the Garden's Web-based digital library program is our IMLS-funded project to digitize and preserve plant images, link them to associated information, and make them available on the Web. Having this wealth of information available through the Web often obviates the need to borrow specimens or travel to the institution to examine them. The result is information delivered to an international audience more quickly and with less expense and potential damage to unique materials. Examples of these images of live plants and herbarium specimens, along with associated information, can be seen on-line through the W3 TROPICOS database at http://mobot.mobot.org/Pick/Search/image/imagefr.html. To date, there 198 plant families represented on the image database, out of 275 families recognized by MBG. We have approximately 10,000 images now online. Eventually, the Garden plans to make hundreds of thousands of botanical images and associated information available on the Web.

In addition to making the images and associated information available, the Garden is learning better and faster ways to build our digital plant library. Our project is designed to assist others as well as ourselves in learning to economically capture, store, preserve, and make available large image files.

The two procedures that are the basis of the imaging portion of the project are to digitize slides of live plants and to digitize herbarium specimens. Often, the live plant image and the herbarium specimen are created from the same plant, with the slide of the living plant taken in the field before it is processed and added the Garden's 5-million-specimen herbarium. Having the live plant image, the herbarium specimen image, and the associated data available in one place is extraordinarily valuable to researchers.

The large numbers of plant images and herbarium specimens make it necessary for us to develop protocols that are simple, adaptable, and economical. To this end, the project is using readily available, inexpensive technologies wherever possible. It is producing protocols that are simple enough to become part of the normal workflow in most institutions without causing a major disruption. These protocols are adaptable to a variety of institutional environments, including different computer platforms and levels of user sophistication.

Below are protocols we are currently working with:

File formats:
Archive images are saved as TIFF (Tag Image File Format). TIFF is the standard for high quality uncompressed raster images captured by scanners and used in desktop publishing.

Full-sized and thumbnail Web Images are saved as JPEG (Joint Photographic Expert Group). This is the dominant standard in imaging applications, such as the Internet, for continuous tone (photographic quality) images.

Image quality:
Archive Images: Resolution - 300 LPI. The File Format we use to scan specimens is High Dynamic Range (HDR), which contains the whole dynamic range of the scanner in 48-bit color. By saving the HDR files as uncompressed TIFF's without editing tonal range, we are preserving all the information the Kaiser Scando Camera can generate. Tone - no tonal correction performed. Pixel array of camera - 3648 x 4625 Maximum resolution; Actual typical pixel dimensions of scanned specimen - 3400 x 4570 pixels; Average file size 70 MB.

The rationale in choosing these specifications for the Archive images reflects a desire to balance the highest quality image and file size.

Full-sized and thumbnail Web images: Resolution - 72 PPI; Depth - 8 bit color; Tone - Full range. Four different histograms are used to create an image that approximates the original specimen, or slightly enhances the texture and form details. Pixel array of camera - 3648 x 4625 Maximum resolution; Actual typical pixel dimensions: Web ready full size image - 850 x 1225 pixels; Web ready thumbnails - 200 x 288 pixels. Average file size: Web ready full size image - 110 K; Web ready thumbnails - 15 K.

Rationale: the specifications chosen for the Web images reflected a desire to balance the highest quality image and file size.

Delivery, compression, and storage issues:
During the current project, we learned that advances in digital photographic equipment allows us to economically image the specimen image directly, and have been doing so since early in the project. One difficulty that has been encountered as a result of this change is the problem of storing and preserving large digital images. We have begun to study alternatives for economically storing and preserving these images.

Since the beginning of the IMLS1998 project, our philosophy has been to capture the most information rich digital images possible, despite the limitations of the current WWW bandwidth. Our current file size averages 70 MB per image. The images we currently deliver on the Internet, however, must be compressed to an average of 110 K. The hope is that technology will evolve which will allow for more efficient delivery of very high resolution images to the end user. This situation, however, presents two major technological challenges, delivery and storage, which demand further research.

We plan to research and implement high-resolution image delivery technologies such as fractals or wavelet. One promising technology that we are reviewing is the wavelet compression and viewing protocol, Multi-resolution Seamless Image Database (referred to as 'MrSID'). This product is currently being used by the Library of Congress, University of Illinois at Urbana-Champaign, British Library, and others deliver large digital aerial photographs. The Garden is investigating the adaptability of this technology to its large plant images.

In this project, we will also research data storage issues and the question of digital archiving or preservation. We anticipate generating up to a Terabyte of image data per year necessitating the need to investigate options for "archiving" this amount of data that are economically feasible for small to mid-sized institutions such as ours. Our IMLS1998 proposal suggested the use of CD-ROM. The storage capability of this media does not seem to be sufficient to meet the needs of our current imaging configuration. Possible storage technologies include, DVD-RAM, tape and magneto-optical, but all of these methods require further research and testing to prove efficiency and feasibility.

Work Flow:

Select Image:
Garden botanists determine the scientific value of slides and specimens being digitized, while the Garden archivist determines the archival value of the slides.

Digitize select slides and specimens:
Selection of images is based on demand, value, and availability of associated collection information.

Link digital images with associated information in the TROPICOS database:
TROPICOS contains nearly 850,000 botanical names records and over 1,400,000 specimen records. The digital images - whether from herbarium specimens or archival slides - will be directly linked to records in W3TROPICOS and will be accessible through either the name or the specimen portion of the system. In this way, a user can access the images by searching for a specific plant name or by searching for the collector name and number associated with the specimen. If several images exist for an individual name or specimen, the user will see an index and brief description of each.

Preservation of analog images:
Preservation of the original 35 mm color transparencies is an important component of long term preservation strategies. We are making duplicate copies of all original slides, which will be used as circulation copies. The original slides will be keep in cold "freezer" storage to minimize color degradation. We are basing our preservation strategy on research conducted by Wilhelm Imaging Research, Inc. Information on "Zero-Degree F, Cold Storage" for color photographs can be found at http://www.wilhelm-research.com/.

Develop protocols for use by other institutions:
Protocols will be developed and made available on our IMLS Web page to assist other institutions with setting up their own database using Microsoft Access software. Included in the protocols will be suggestions for purchasing equipment, capturing and preserving images, and other information discovered during our research that will help in developing an image database with associated information.

Provide ongoing evaluation:
Project staff meets regularly to evaluate the process, discuss unforeseen issues, and formulate solutions. An evaluation of the process will be included in the final 1998IMLS and IMLS2000 reports.

Write narrative report:
The narrative report that describes the process from start to finish, and provides valuable input to other institutions considering the implementation of such a system, will be written as part of the IMLS1998 Project. This narrative report will be updated and expanded as the project continues, even beyond the end of the IMLS2000 Project.

Disseminate results:
The results of the project, including recommendations for creating an in-house database, equipment specifications, and documentation, will be available on the Garden's IMLS Web page at www.hoya.mobot.org/IMLS/. Information concerning the project will be submitted for publication in printed and Web-based professional journals. The Council on Library and Information Resources ( CLIR) is eager to work with us to develop standards and protocols for image repositories so that interoperability for scholars and students will be assured; they also encourage us to write about our findings for their publication, DigiNews. Research Libraries Group (RLG) also encourages us to work with them.

Register the project:
The scale of digitizing entire collections remains an issue deserving national coordination to avoid costly duplication of effort. Therefore, we will register our project through such organizations as the CLIR and RLG. We also will catalog our image collections at the collection level into OCLC.

Work simultaneously on other digital library projects that, in aggregate, will form an ever-growing digital plant library:
The Garden is in the process of writing a proposal to digitize rare books as part of a collaborative that includes the Oak Spring Library (Mrs. Andrew Mellon's personal library), National Gallery of Art, New York Botanical Garden, and possibly others.

The Natural History Libraries Cooperative, a group of natural history libraries that want to cooperate on the digitization of natural history materials, is in the formative stages. The charter members include libraries of the Missouri Botanical Garden, New York Botanical Garden, American Museum of Natural History, and Academy of Sciences in Philadelphia.

Conclusion

We are at the beginning of a dynamic new era in libraries and museums. These functions within the community of systematic botany are moving rapidly towards online access of important resources and collections. Procedures, strategies, and standards must be developed both outside and within the discipline. With the generous support of IMLS and other funding agencies, the Missouri Botanical Garden hopes to continue to be a leading partner in these endeavors.

About the Authors

Connie Wolf is Librarian at the Missouri Botanical Garden. She is the principle investigator for the Plant Image Digitization project.
E-mail: connie.wolf@mobot.org

Douglas Holland is the Archivist at the Missouri Botanical Garden. He oversees the preservation aspects of the Plant Image Digitization project.
E-mail: doug.holland@mobot.org

Notes

Missouri Botanical Garden: http://www.mobot.org

IMLS project page: http://hoya.mobot.org/IMLS
TROPICOS Image Index: http://mobot.mobot.org/Pick/Search/image/imagefr.html
TROPICOS Database: http://mobot.mobot.org/Pick/Search/pick.html


Editorial history

Paper received 1 May 2000; accepted 10 May 2000.


Contents Index

Copyright ©2000, First Monday

Digitizing and Preserving Plant Images: Linking Plant Images and Databases for Public Access by Connie Wolf and Douglas Holland
First Monday, volume 5, number 6 (June 2000),
URL: http://firstmonday.org/issues/issue5_6/wolf/index.html