First Monday

Digital Star Dust: The Hoagy Carmichael Collection at Indiana University

Funded in part by the Institute of Museum and Library Services (IMLS), the Indiana University Digital Library Program and Archives of Traditional Music are completing a two-year project to preserve and digitize the university's extensive Hoagy Carmichael collections. When the Project ends in September 2000, the Project team will have preserved thousands of items, including sound recordings, photographs, sheet music, lyric sheets, and more, pertaining to the life and work of this master of the American popular song. Much of this content is already accessible to the public through a multimedia Web site. More digital content and improved search capabilities will be added in the coming months. While the Project builds upon previous experience and expertise, the complexity of the Project has presented numerous challenges. This paper describes some of these challenges and their resolution, along with a brief discussion of remaining issues.


Progress to Date
Challenges and Accomplishments


In October 1998, the Indiana University (IU) Digital Library Program (DLP) and the IU Archives of Traditional Music (ATM) formed a partnership and began an ambitious project to digitize the university's extensive and popular archival collections pertaining to the life and work of master songwriter Hoagland Howard Carmichael, better known as Hoagy. Partially funded by a National Leadership Grant from the Institute of Museum and Library Services (IMLS) and a grant from the Library Services and Technology Act (LSTA), this Project will create a model for integrating multimedia materials and distributing them via the World Wide Web. Scheduled to end in September 2000, this two-year project has made significant progress to date, although much work remains and many issues are still to be resolved. Like most digital library projects, "Digitizing and Preserving the Hoagy Carmichael Collections at Indiana University" requires a collaborative approach to the creation of a digital collection; neither of the partners could accomplish the Project alone. Experts in a variety of fields, including librarians, archivists, musicologists, information technologists, digital media specialists, and others are working together to create a lasting digital cultural resource.

This paper describes progress to date on the Hoagy Carmichael Project and what the Project team has learned from our work. While there is only one Hoagy Carmichael Collection, we believe that the issues related to this Project are the same that any team would confront when implementing a collaborative multimedia digital library project within the context of a research university. We will focus upon what we have accomplished, with a brief discussion of options and challenges. As we have not completed the Project, we will also outline remaining issues and our plans for resolving them. We stress that there are no right and wrong answers in creating a digital library, only options that present information more effectively to our users. We function within a library context, building upon our experience in providing usable collections to a varied population. However, we also recognize that the Carmichael Project breaks new ground, allowing us to present a portion of a rich virtual collection to users of all ages around the world.

The Hoagy Carmichael Project will accomplish two goals: first, it will preserve and digitize unique resources that appeal both to the general public and to scholars of twentieth-century American music; and second, the Project will provide a model for presenting these various media through a Web site that is meaningful to these distinct audiences. This "dual-target" model will contribute to the knowledge base of other museums and libraries that wish to increase access to their own diverse collections (which may include field recordings, manuscripts, images, and three-dimensional artifacts), and in so doing, help these institutions use technology to meet their complementary goals of outreach and research. The diversity of items within Indiana University's Hoagy Carmichael collections - from original musical scores and signed photographs to unreleased recordings of the composer's music and films in which he appeared - provides the inspiration as well as the intellectual framework for this undertaking.

During the Project, a team from the Archives of Traditional Music and the Indiana University Libraries will digitize every item related to Hoagy Carmichael from three university repositories: the ATM, the Lilly Library, and the University Archives. The collections include approximately 250 hours of sound recordings, 4,550 pages of printed and textual materials, and 1,070 photographs. These materials will be available through a Web site that will offer selections of the Collection to Internet users and the entire digitized collection to users at workstations in the Archives of Traditional Music connected to the University. Digitization will also prolong the life of these materials by providing surrogates that can be examined or listened to in lieu of the originals, thereby reducing deterioration. To provide access to both the physical collections and these digital surrogates, all items in the collections will be inventoried in finding aids or cataloged, with browsing and searching of these finding aids and catalog records available through the Carmichael Collection Web site.

Figure 1: Hoagy Carmichael Collection home page

Indiana University selected the Hoagy Carmichael collections for a digital project for a number of reasons. The first is the continuing undeniable appeal of Carmichael's music. Scholars believe that Carmichael offered unique contributions within the context of twentieth-century popular American music history. His songs were experimental and innovative, reflecting his interest in jazz, folk, and other vernacular American musics. And yet, despite the broad range of his influences and the sophistication of his songs, Carmichael's music remained accessible. A "musical democrat," Carmichael created songs that "communicated with Park Avenue society and Wall Street brokers as well as with small-town merchants and tenant farmers" [ 4]. Second, the timing of the Project was also important. The year 1999 marked the Centennial of Carmichael's birth, which focused increased attention on his life and work. Third, Carmichael had ties to both Indiana University and to Bloomington, where the University is located. Carmichael grew up in Bloomington and graduated from the IU School of Law in 1926. He composed his enduring pop standard "Star Dust" in Bloomington, and the story of its creation has become an integral part of local history. Fourth, IU's Archives of Traditional Music offers the largest grouping of materials pertaining to Carmichael's life available anywhere. Carmichael valued his association with the community and University, and as a result, he and his heirs have contributed to Indiana University many materials pertaining to Carmichael's career. Complementing this collection are Carmichael's manuscripts and sheet music at the Lilly Library, IU's principal special collections repository, as well as miscellaneous materials at the IU Archives.

The significance of Hoagy Carmichael to the history of American popular music, his connections with Indiana University, and the richness of the university's Carmichael collections were our first considerations. However, we also considered the University's ability to complete a multimedia digital library project. We had to realistically evaluate our digital library organizational structure, infrastructure, and technical expertise. We concluded that IU has the necessary capabilities to successfully complete the Project. IU has substantial experience in audio digitization, both in the Archives of Traditional Music and in the Music Library, where to date over 5,000 hours of audio have been digitized and stored for online access. The Carmichael Project builds upon the accomplishments of IU's VARIATIONS Project, a national model in distributing CD-quality sound via computer networks [ 2] [ 10]. The on-campus listening copies of the Carmichael materials are streamed from VARIATIONS for use on campus, and the low bit-rate selections are streamed to users over the Internet. Indiana University's Digital Library Program has experience in mounting large digital collections on the Web and providing sustained support for network access to these collections. The Library Electronic Text Resource Service (LETRS) publishes the Victorian Women Writers Project, a collection of SGML-encoded texts created and mounted on the Web at Indiana University [ 18]. This collection now includes approximately 175 volumes. Another related digital collection is The Frank M. Hohenberger Photograph Collection [ 7]. The finding aid for the complete collection of 9,600 photographs is offered on the Web, with a sample of 500 digital images available. This is an ongoing project, with the eventual goal of digitizing the entire collection.

Progress to Date

November 22, 1999, was the Centennial of Hoagy Carmichael's birth in Bloomington, Indiana. Although the Project to catalog, digitize, and preserve the artifacts continues through September 30, 2000, the Carmichael Project team wanted to provide a sample of our work to date by early November - to join in the celebration of Hoagy's birth. On November 5, 1999, we launched the Hoagy Carmichael Web site [ 8], with complete finding aids for items from the Archives of Traditional Music and as much digital content as we had completed by that time. As of April 2000, this content includes 1,016 music scores, 757 lyric sheets, 1,039 photographs, 222 pieces of correspondence, four scrapbooks, 623 sound files, and 122 images of personal effects. Users can search a complete inventory of the ATM's Carmichael Collection and access selected digital objects and supplemental research information, such as genealogy. For the general user the Web site includes a QuickTime VR virtual tour of the Hoagy Carmichael Room in the Archives of Traditional Music, highlights from the collection, and background information on Carmichael. Users can browse the various collections or they can search for specific information.

Challenges and Accomplishments

Administrative Challenges

The Carmichael Project presents interesting administrative challenges, as it involves many people, working in different administrative units of the University. Months before submitting a grant proposal to fund the Project, a small planning team with staff from the Digital Library Program and the Archives of Traditional Music began meeting. Once funded, this planning team evolved into a project team, with responsibility for completing the Project. The Directors of the DLP and ATM serve as Co-Project Directors, and staff from DLP and ATM serve as Co-Project Managers, directing day-to-day operations of the Project. Other Project team members include both the director and technical specialist from the University's electronic text center, and the DLP's Digital Media Specialist, who oversees scanning and Web design for the Project. The Project has required the coordination of many people working on various aspects of the Project. Staff at the Archives of Traditional Music are cataloging the sound recordings and digitizing them; they also prepared the inventories that were used to create EAD (Encoded Archival Description) finding aids. Staff in the Digital Media and Image Center are digitizing the photographs, music, letters, manuscripts, and scrapbooks. Staff in the Library Electronic Text Resource Service (LETRS) encoded the EAD finding aids, transcribed the letters, and encoded them in SGML. Programmers from the Digital Library Program set up servers and disk space to store digital objects and are developing the software to allow users to search, browse, and access items in the collection. In order to coordinate the Project, the Project team has met frequently, sometimes weekly, to ensure that we can anticipate upcoming work. Over the course of the Project more than 25 people have worked on various phases or aspects. Key staff took new jobs in the middle of the Project and we had to deal with conflicting priorities among other staff. However, none of these developments seriously impeded our progress.

The complex administrative arrangement has worked due to several factors. First, all participants realize that we need one another to insure the success of the Project. Second, most of the key staff had pre-existing relationships. We had worked together on other projects and in other capacities. We trust one another's expertise and commitment to the Project. Third, we maintain open lines of communication. We agreed from the outset to be honest in our feedback on work to date or any other aspect of the Project. We attempt to solve problems before they become overwhelming. We encourage all participants to call someone if they need help and not to wait too long to seek assistance from another member of the Project team.

Technical Challenges

As noted earlier, the IU DLP has had significant previous experience in dealing with large text and audio collections, and to a somewhat lesser extent, image collections. However, IU has not previously undertaken a project involving all of these formats in a single collection. As might be expected, most of the technical challenges encountered in the Carmichael Project have been related to the diversity of materials in the collections being digitized and presented.

Image Scanning

Even in the area of image scanning, the diversity of the collection has posed interesting challenges: requirements for the scanning of photographs, manuscripts, printed music, scrapbooks, typescripts, etc., obviously differ from each other. Even within a single category such as photographs, the items vary greatly (e.g. a number of different sizes of negatives and positives, different paper and film types, etc.). While our primary goal in this Project is improved access to the collection, we wanted to use scan parameters that would produce images sufficiently detailed and accurate to avoid the need for rescanning in the future if at all possible. For handwritten and printed text and music, the primary goal has been legibility of the end product, but for photographs, other issues arise such as accuracy and consistency of tonal reproduction, detail and edge reproduction, and color reproduction [ 3]. In the case of photographs, it was difficult to synthesize the access and preservation goals into a single scanning procedure, so for photographs only, we have produced two scans: an access version, in which scanning parameters were adjusted somewhat subjectively to produce a "good looking" image, and an archive version, in which scanning parameters were set to known fixed values in order to produce consistency in tonal reproduction from one photograph to the next. To develop our scanning procedures, we relied on a number of recommendations from other institutions, including the National Archives and Records Administration [16].

Scanning of most items was performed on PCs running Windows NT 4.0 and 98 using Microtek 9600XL and Linotype-Hell Saphir Ultra2 flatbed scanners, with LaserSoft's SilverFast Ai scanning software and Adobe PhotoShop. Some oversized or fragile items were digitized using a Jenoptik ProgRes 3012 camera.

Color accuracy was important, particularly for photographs but also for other color items such as sheet music covers. We created color profiles for our scanners and displays using LaserSoft's SilverFast scanning software and Linotype-Hell's ViewOpen ICC monitor calibration/profiling package, and did all color scanning under Windows 98 in order to take advantage of its built-in color management capabilities. All images were saved as 8-bit grayscale or 24-bit sRGB color uncompressed TIFF files with embedded ICC profiles. These TIFF files were then transformed into lower resolution display and/or thumbnail versions in JPEG format for Web presentation. Of course, we could not rely on users having properly calibrated displays and Web browsers that support color correction, so we decided to save all color files using the sRGB [ 5] colorspace, which attempts to provide as accurate color reproduction as possible across a wide array of PC video hardware and displays. If a visitor to the Web site is using a properly calibrated system and a browser that supports color correction (such as Internet Explorer 4.5 or later on the Macintosh, which supports Apple's ColorSync color management system), color reproduction will be even more accurate. More details on our scanning procedures may be found on our Web site [9].

Text Digitization and Encoding

In the case of the correspondence items in the Carmichael Collection, we felt it would be beneficial to have both images and encoded text versions available, so that users could search for words and phrases to find letters that refer to particular individuals, songs, etc. In most of our past projects dealing with large bodies of text, we have relied on high-performance OCR software to transform page images into editable and searchable text. In the case of the Hoagy Carmichael correspondence, much of it was handwritten and/or of such a quality that it could not be accurately recognized by the OCR software. So in this case, we used hourly student employees to transcribe the documents by hand. The body of text was small enough (222 items) that this was a manageable task. The documents were then encoded using the TEI (Text Encoding Initiative) SGML DTD (document type definition). With the text transcribed and encoded, we implemented search capabilities using OpenText's SGML-aware search engine in combination with Perl scripts and modules originally developed at the University of Michigan [ 17].

Audio Digitization, Compression, and Streaming

The sound recordings in the Hoagy Carmichael Collection are in a variety of physical formats, including 280 commercial 78 rpm discs, 107 78 rpm test pressings, aluminum, glass, and acetate discs, 130 commercial 33 1/3 rpm discs, 90 open reel tapes, 15 cassette tapes, and one wire recording. As a sound archive, the Archives of Traditional Music has had much experience in the area of audio format conversion. Using IMLS funding, a digital lab was equipped in the ATM, with an Apogee PSX-100 analog-digital converter, Digidesign AudioMedia III digital audio interface, and Apple Power Macintosh G3 at its center, plus appropriate input and output devices.

Due to the fragile nature of most sound formats, copying for preservation as well as access purposes is very important for this category of materials. Three copies are made of each sound recording: a digital access version in WAV format for eventual networked delivery, a digital CD-R (CD-Recordable) version for access purposes in the ATM listening library, and an analog 1/4" open reel tape, which is still the most time-tested medium for audio preservation.

Because of the variation in format and quality of the originals, we encountered some challenges in undertaking the process of audio digitization. One issue always at the forefront has been whether and to what extent we should "clean up" the originals for more audible listening. Being an archive, the first concern of the ATM is to preserve the original recording as much as possible in the archival tape copy. On the other hand, the access copies (CD and digital file) produced are intended for public use. The original recordings range from professionally recorded broadcast material to amateur home recordings of personal letters. Volume level often varies tremendously in the amateur recordings. Rather than continually adjust the recording level during the digitization process, we tried to find an appropriate level for each recording to balance the highs and lows. In some cases, however, the volume was raised at certain points to aid in audibility. Equalization in old 78 commercial and acetate recordings has also been a difficult issue, because altering the volumes of various frequencies in the original could risk the loss of valuable information in the "noisy" original. Finally, one of the most time-consuming aspects of digitizing the audio has been recording track or index times and descriptions to aid users' access to the recordings and allow users to locate particular songs or segments without having to constantly fast forward, rewind, etc. The level of documentation was very inconsistent from item to item, so a great deal of research was necessary in some cases to determine the true contents of a recording.

Two different digital audio formats are used for networked access. All recordings are being made available at selected locations on the Indiana University Bloomington campus via the university's VARIATIONS system, which provides streaming delivery of MPEG-1 layer 2 format audio at a data rate of 384 kilobits/second. The level of quality provided by MPEG requires too much bandwidth to be able to stream to most Internet users, and streaming rather than file download is desired because we are dealing with copyright protected compositions and recordings in most cases. For those recordings which we are able to make widely available to Internet users (based on copyright status), we needed to find another delivery method. We evaluated two options: Apple QuickTime Streaming using the QDesign Music 2 Professional compressor/decompressor (codec), and RealNetworks' RealSystem G2 using the RealAudio codec. QuickTime Streaming was more attractive to us, because we already had experience with streaming media server software (IBM VideoCharger) that could support this format, but the decision came down to the subjective issue of sound quality. As noted earlier, many of them contained some level of pops, clicks, and other noise. While we found similar results when using the QDesign and RealAudio codecs for "clean" source material, the RealAudio codec performed significantly better with our noisy items, reducing instead of exaggerating the background noise. Rather than expend staff time in cleaning up our source material, we elected to install a RealServer to provide Internet access to RealAudio versions of our recordings at rates appropriate to 14.4, 28.8, and 56 kilobit/second modem users.

Providing Access: Catalog Records and Finding Aids

In addition to the diversity of formats in the Collection, we have confronted challenges due to having a variety of different forms of cataloging and descriptive metadata for segments of the Collection, including both archival finding aids and item-level bibliographic records. Some of these challenges have arisen from the fact that we are trying to create a single virtual collection from collections in three different physical repositories at IU, each with their own local standards and practices.

The vast majority of the Collection comes from the Archives of Traditional Music. The ATM has cataloged its sound recordings in OCLC, a national cataloging utility used by libraries and archives throughout the U.S., using the USMARC standard bibliographic data format [ 13]. All other materials in the Collection had been previously inventoried to varying degrees in alphabetical lists by title (music, printed lyrics), chronological and descriptive lists (photos), or some combination of the two (correspondence, awards, personal effects, and others). The biggest challenge was determining descriptive and organizational criteria to create a unified "finding aid" in EAD (Encoded Archival Description) format for the Collection, since various content formats dictated different types and levels of description. For example, the description needs for one format, such as photographs, were not necessarily useful for another, such as correspondence. Since ATM staff had not had previous experience in creating archival finding aids or in marking finding up aids using the EAD format, models had to be developed as the Project progressed.

Cataloging of the sound recordings was also a challenging aspect of the Project, due to the range of sound recording formats and the fact that most sound recording in the Collection are either unique items or rare commercial recordings that have not previously been cataloged by any other institution. Ninety percent of the items required original cataloging. In many cases, the non-commercial field recordings lacked descriptive, accompanying documentation or labels on the items themselves. As a result, creation of original bibliographic records required hours of listening to be able to describe the intellectual content and provide access points to the items. National cataloging guidelines provide few examples or instructions for describing "field" materials (e.g. informal gatherings, interviews, and spoken letters). Thus new models needed to be established to provide uniformity for the Collection and adherence to relevant cataloging rules. Cataloging the commercial 78 rpm discs was also labor intensive, due to the discographical research involved in determining performing personnel and recording dates.

Extensive original authority work was a final cataloging challenge. In order to provide consistency, catalogers determine the authoritative form of a name to be used in all catalog records. Many names associated with the Carmichael sound recordings had no existing name authorities. The authority records created for performers' and songwriters' names will be of particular value for cataloging agencies that catalog jazz and popular music and for scholars involved in discographical, jazz, and popular music research. We originally underestimated the time needed to research name variations and create these authority records.

As for the other repositories contributing items to the "virtual" collection, IU's Lilly Library had print inventories for its manuscripts and some but not all of its Carmichael sheet music. The Lilly manuscript inventory has been transformed into an EAD finding aid, while the Lilly sheet music is being cataloged in MARC. The IU Archives did not have any real inventory for its Hoagy Carmichael items, so an EAD finding aid still needs to be created from scratch for these items.

Providing integrated browse and search access to multiple EAD finding aids as well as several sets of item-level MARC records remains an unresolved issue. At the moment, the EAD finding aid for the ATM collection has been indexed using OpenText 5 software, with Web browse and search access provided through locally modified versions of Perl scripts originally written at the University of Michigan. The MARC records for the ATM sound recordings have been extracted from IU's online catalog and transformed to SGML using the MARC-SGML conversion program developed by the Library of Congress Network Development and MARC Standards Office [ 12]. These bibliographic records are also indexed and searchable via OpenText on the Web site, but through use of a separate search form than the one used for the EAD finding aid. Work still remains to provide a search form that can simultaneously search the ATM, Lilly, and IU Archives EAD finding aids, ATM sound recording MARC records, and Lilly sheet music MARC records simultaneously, plus full text correspondence items mentioned earlier which are in TEI format. The problem is not so much in simultaneous searching, but in creating displays for the varieties of SGML formats that may be returned from each search.

Administrative Metadata

In addition to the descriptive metadata for the Collection and its contents encoded in EAD and MARC, we also have a great deal of data that is often referred to as administrative metadata. This metadata relates to the digital objects themselves and the process by which they were created rather than the intellectual content of the objects. For images, this information includes things such as resolution, file format, colorspace, gamma, scanning hardware and software used, corrections applied in the scanning software, etc. For sound files, it includes sample rate, sample size, number of channels, compression, etc. Some of this information is actually represented in the image and sound files themselves, while other pieces have been recorded in spreadsheets by digitizing technicians. The questions related to what data elements we will retain permanently as administrative metadata and where we will store this metadata are important ones. This information may be of use in the future for purposes of digital format migration or determining the accuracy of the digital reproductions (particularly if the original items are lost or destroyed). For images, we plan to examine some of the recent work in this area [ 1] [ 14] [ 15] and make the best possible determination on what to retain until true standards emerge. There is less existing work on administrative metadata for music sound recordings than for other formats, so we may simply have to use our judgement in this area to determine the best course of action.

Image Display, Navigation, and Repository Issues

Once a user locates an item, such as a music manuscript or piece of correspondence, by browsing or searching finding aids or MARC records, the system needs to be able to display that item to the user with means for navigation. If the item has more than one page, a page-turning mechanism is needed. If more than one version of a given page is available (such as for multiple resolutions), the system needs to provide a way for the user to switch between these versions. To provide these capabilities, as well as to store additional image-specific administrative metadata for objects in the Collection and potentially provide easier cross-collection image search mechanisms in the future, we elected to store access versions of images in our IBM Digital Library system. IBM Digital Library (recently renamed IBM Content Manager, but which we will refer to as IBM DL) provides services for storage of digital objects (images, audio, video, text, etc.) as well as associated metadata elements, and supports a variety of programming interfaces (including C, C++, and Java). What IBM DL does not provide is a good set of out-of-the-box user interface tools for loading, search, and delivery of digital objects. We decided to use the Carmichael Project as a basis for developing such tools that could be reused in future projects.

DL provides a relatively flexible data model based on parts, items, and folders. A part is an actual binary object such as an image file. An item is an object that contains one or more numbered parts. Folders are objects that act as containers for items and/or other folders. Each folder or item belongs to an index class, which defines a set of metadata fields associated with that folder or item. Our goal was to develop a data model for the Carmichael Collection that would suit both our presentation needs and our need to store administrative metadata for the images in the Collection. We initially developed a model in which objects in the Collection (i.e. musical scores, pieces of correspondence, photographs, etc.) are represented by DL folders. Within each folder is a series of DL items which represent the pages of that object (in the case of photographs, though, there is only one "page"). Each item then contains one or more DL parts that contain the different resolution images of that item. Appropriate metadata is attached at both the folder and item level. This model worked well for all items in the Collection that we were dealing with initially, but when we began thinking about scrapbooks we ended up having to introduce an additional level of nesting to represent the required scrapbook - scrapbook page - clipping within a page structure.

We developed a command-line image and metadata loading utility in Java to load images, descriptive metadata (from the EAD finding aids), and administrative metadata (from the image files themselves and spreadsheet logs created during the capture process) into IBM DL. To deliver Web access to these items stored in IBM DL, we developed a Java image display servlet as well as several Java beans that can be accessed from Java Server Pages to display images and metadata from DL. See figure 2 for an example.

Figure 2: Object display page

Usability, Web Site Design

We originally set out to build two Web sites. One site would be designed for students and casual visitors and would feature background information on Carmichael, an exhibition with selected items from the Collection grouped together to illustrate particular topics, and access to sound recordings of Carmichael's "greatest hits." The other site would be designed for researchers and scholars, providing full access to finding aids and catalogs of the Collection, with access to as much digital content as copyright would allow for Internet users (and all digitized materials for on-campus users). However, in the process of designing the Carmichael Web sites, this design seemed cumbersome. Instead, we developed a single menu-based home page with some selections more suited to casual users and other selections more suited to scholars, but with the ability for users in either group to "jump over" to any area of the site. We are currently engaged in task-based usability testing of the site with individuals representing secondary and undergraduate students, music scholars, and the general public. Based on what we learn from these tests, we will make revisions in the site to improve its usefulness for these potentially diverse user populations.


The vast majority of the materials in the IU Hoagy Carmichael Collections are covered by copyright. The copyrights for many of these materials, in fact the most important of these materials, are controlled by Hoagy Carmichael's heirs. The digital surrogates that we present on the Internet are distributed with the consent of Carmichael's heirs. Although preservation exemptions and fair use provisions of the copyright law might have allowed digitization of these collections without permission, we need permission of the copyright holders to distribute them in any way. Throughout the years since the collection came to Indiana University following Carmichael's death in 1981, Indiana University, in general, and the Archives of Traditional Music, specifically, have benefited from a positive working relationship with Hoagy Bix Carmichael, Hoagy's oldest son. Early in the Project planning stage, the Director of the Archives of Traditional Music contacted Mr. Carmichael about the Project and received his verbal agreement to make accessible on the Web everything for which the family holds the copyright. We proceeded with the understanding that this verbal agreement would eventually be formalized in writing.

The variety of material in the collections presents different copyright issues. For the unpublished material, including test pressings of sound recordings, home movies, photographs, manuscript music, letters written by Hoagy Carmichael, and early autobiographical manuscripts, the family clearly holds the copyright. For the published material, including sound recordings, films, sheet music, collections, manuscripts that were later published, and radio and television scripts, the situation is less clear. The family holds the copyright on some of this material but not on others. The first task has been to sort out the two groups. The most urgent aspect of this work has been to identify the songs for which the Carmichael family holds the copyright, in order to make decisions regarding which sound recordings we can offer on the Web.

Once the Project began in October 1998, a member of the Project team began working Mr. Carmichael and his copyright attorney to resolve the copyright situation. We have worked with the list of songs, publishers, and lyricists on the offical Hoagy Carmichael Web site [ 6], updated with a list obtained from Hoagy Bix Carmichael's attorney to determine the songs controlled by the family. We have added to this list from the University's inventories. Creating a definitive list clearly stating which songs we may offer on the Web has turned out to be more difficult than we anticipated. This difficulty is due to complications related to copyrights for the melody, the lyrics, and the sound recording. In order to determine which works could be presented on the Web when we launched the site in November 1999, we also worked with Indiana University legal counsel and the University's copyright attorney. We began with a group of songs that we believe to be the safest, the songs published by Hoagy Publishing, PSO Limited, or Pera Music, all of which are owned by Carmichael's heirs.

Our intent at this time is to present on the Web pre-1972 sound recordings for which the Carmichael family owns the copyright on the underlying work. We selected 1972 because that is the date when sound recordings were brought under federal copyright law. We will also create a list of important works for which the family does not own the copyright, usually on the lyrics, and we will seek permission from the copyright holders to present the sound recordings of these songs on the Web. We believe that we have a good chance for success in this endeavor, due to Hoagy Bix Carmichael's relationship with the heirs of his father's colleagues, who now hold these copyrights.


Work on the creation of the Hoagy Carmichael digital collection continues, but the Project has already met its goals. As Project staff near completion of the digitization of Indiana University's rich and diverse collection of Hoagy Carmichael material, we have achieved our first goal of preserving their content long after the physical objects may have deteriorated to the degree that they are no longer usable. The University has a long-range plan for preserving these digital files and making them accessible to users in perpetuity. Although not all of the digital content that we plan to offer on the Web is available yet, users from around the world have accessed this content for a variety of purposes. We continue usability testing to insure that the organization of information and its presentation on the Web are effective for all types of users. We are in the process of making changes to the site in response to preliminary user testing and will make final modifications before the Project concludes in September 2000. We hope that the Project is on the way to achieving its second goal of providing a model to other institutions who seek to digitize and present multimedia digital content on the Web to a broad range of users. By writing about our work, explaining our decisions and our solutions to problems, we believe that we can contribute to the body of knowledge about the administrative and technical challenges inherent in a project such as this one.

About the Authors

Kristine R. Brancolini is Associate Director/Acting Director of the Indiana University Digital Library Program. Previously, she was Chief Planner for the program. She is Co-Director of the Hoagy Carmichael Project and Co-Director of "The Russian Periodical Index Digital Project," a three-year project to digitize and publish on the Web a twenty-year run of Letopis' Zhurnal-nykh Statei.

Jon W. Dunn has been Manager of Digital Library Operations and Development for the Indiana University Digital Library Program since 1998. Previously, he was Technical Director for the university's VARIATIONS digital music library project. He is Co-Manager of the Hoagy Carmichael Project.

John A. Walsh is the Lead Electronic Text Specialist for University Information Technology Services at Indiana University and a member of the Indiana University Digital Library Program team. He has over five years experience working with WWW/SGML/XML technologies to deliver large text collections via the Web. He is a member of the Hoagy Carmichael Project Team.


We would like to thank Perry Willett of the IU Digital Library Program and Suzanne Mudge and Ilze Akerbergs of the IU Archives of Traditional Music for their assistance with this paper, and the entire Carmichael Project team at Indiana University for their hard work on the Project. Finally, we would like to extend our thanks to IMLS and the Indiana State Library for their financial support for the Hoagy Carmichael Project.


1. D. Bearman, 1999. NISO/CLIR/RLG Technical Metadata for Images Workshop, (April 18-19). Bethesda, Md.: National Information Standards Organization, at

2. J.W. Dunn and C.A. Mayer, 1999. "VARIATIONS: A Digital Music Library System at Indiana University," In: DL '99: Proceedings of the Fourth ACM Conference on Digital Libraries, Berkeley, Calif. (August).

3. F.S. Frey and J.M. Reilly, 1999. Digital Imaging for Photographic Collections: Foundations for Technical Standards. Rochester, N.Y.: Image Permanence Institute, Rochester Institute of Technology, at

4. J.E. Hasse, 1988. The Classic Hoagy Carmichael. Indianapolis: Indiana Historical Society.

5. Hewlett-Packard Company. sRGB Web site, at

6. The Official Hoagy Carmichael Web site, at

7. Indiana University. Frank M. Hohenberger Photograph Collection Web page, at

8. Indiana University. Hoagy Carmichael Collection Web site, at

9. Indiana University. Technical Information for the Hoagy Carmichael Collection, 1999, at

10. Indiana University. VARIATIONS Project Web site, at

11. Library of Congress. Encoded Archival Description (EAD) Official Web site, at

12. Library of Congress. MARC SGML Web site, at

13. Library of Congress. MARC Standards Web site, at

14. Making of America II Web site, at

15. The Making of America II Testbed Project White Paper, Version 2.0 (September 15, 1998), at

16. S. Puglia and B. Roginski, 1998. NARA Guidelines for Digitizing Archival Materials for Electronic Access. Washington, D.C.: National Archives and Records Administration

17. University of Michigan. Digital Library Production Service Web site, at

18. Victorian Women Writers' Project home page, Perry Willett, General editor, at

Editorial history

Paper received 1 May 2000; accepted 10 May 2000.

Contents Index

Copyright ©2000, First Monday

Digital Star Dust: The Hoagy Carmichael Collection at Indiana University by Kristine R. Brancolini, Jon W. Dunn, and John A. Walsh
First Monday, volume 5, number 6 (June 2000),