Evaluation of digital libraries using snowball sampling

This article describes how snowball sampling was applied in two different cases to evaluate digital collections. The first digital library was evaluated by conducting in–person interviews with survey participants. For the second digital library, an e–mail survey was mailed to site users. The results are compared and a cost–benefit analysis is provided. The author concludes that the convenience of an e–mail survey is not necessarily the most effective way to survey users.

Contents
Introduction
Methodology
Survey comparison
Conclusion

Introduction

Increasingly there is more discussion in library literature that digital libraries are in need of evaluation [ 1]. Web sites can be monitored by domain name, yielding numbers of hits to the site and users’ affiliations. However, many digital library creators want more information about their users than log files can provide. Studies that are user–centered rather than system–centered are also beginning to appear [2].

During 2002–2003 I was part of a team that evaluated the Montana Natural Resource Information System (NRIS) Web site. NRIS is a digital collection of three primary components – – the Natural Heritage Program (plants and animals), water information, and maps (including the Geographic Information System). The NRIS site receives over one million hits per month ( http://nris.state.mt.us). The first phase of evaluation was conducted using snowball sampling and included in–person interviews. In the second phase of the NRIS evaluation the same group of people were polled, but the follow–up survey was sent only by e–mail. Interviews were not conducted in the follow–up survey because of perceived considerations of time and money. The follow–up survey resulted in a drastic decrease in the number of respondents, even though they were the same people who willingly participated in the original survey.

Montana Natural Resource Information System (NRIS) Web site ( http://nris.state.mt.us).

After completing both NRIS surveys, the question remained whether snowball sampling could successfully be applied through a survey sent out only through e–mail. It was puzzling that the response rate had dropped dramatically between the NRIS interviews and the e–mail survey. Another survey could be undertaken using a different Web site to see if an e–mail survey could be as successful as in–person interviews. The two approaches are described below.

Methodology

First survey: User evaluation

The purpose of the NRIS survey was to discern patterns of use and to collect qualitative statements regarding the use and improvement of the NRIS site. In discussions with NRIS staff, five broad user groups were identified: federal agencies, state agencies, academic users, local government agencies, and private users. For each group, individual names were suggested. In addition to the initial list of names, further suggestions of possible survey respondents came from the technique of "snowballing" (users recommending others to survey).

Each person was then contacted to arrange an in–person interview. In many cases, almost an hour was spent interviewing the respondent. In a few cases, personal visits could not be arranged and in those instances the survey was administered over the telephone.

The survey of NRIS users was based on a combination of non–random stratified and snowball sampling. Snowball sampling has usually been employed to access hard–to–reach populations. This sampling method solicited a richness and depth from respondents. The purpose of the NRIS survey was to discern patterns of use and obtain qualitative statements from selected users, not to represent the global population of those who access the site. In all, 47 people were interviewed, representing 37 organizations. A complete copy of the survey and further details about it can be found online [ 3].

Approximately a year after the initial NRIS survey, a second follow–up survey was sent out to see how the same users were receiving the redesigned site. Because of perceived costs of re–running the survey using interviews, the survey was sent to all of the initial participants through e–mail. The result was a sharp decline in the response, even after three reminders. The return rate for the follow–up survey was only 32 percent.

Second survey: Indian Peoples of the Northern Great Plains (IPNGP)
In 2004, the IPNGP survey was undertaken to survey users but also to pay careful attention to all time factors in conducting a survey. The hypothesis was that sending out a survey through the simplicity of e–mail could work at least as well as the more time consuming work of interviewing users.

Indian Peoples of the Northern Great Plains (IPNGP) Web site ( http://www.lib.montana.edu/epubs/nadb).

The digital library selected was created by this author and is located at http://www.lib.montana.edu/epubs/nadb. IPNGP consists of images, primarily photographs, that are searchable in a variety of ways, including subject, tribe, photographer, and geographic area. User opinion about this site has been solicited at various times, but a formal survey had never been conducted.
In an attempt to simplify the survey and possibly increase the response rate, only five questions were asked:

Do you use the database "Indian Peoples of the Northern Great Plains"? [Yes; No]

Do you usually find what you are looking for when you visit the site? [Easily; Sometimes; Never]

If you did not find what you needed, where do you look? [Google; Asked someone for help; Library/Archives; Other]

What is the one thing you would most like to see improved/added to the site?

Could you send me the e–mail addresses of other individual users of this Web site who might wish to answer these questions and give me their input?

Since the first survey of NRIS users was conducted with 47 individuals, a target of 47 people was set for the IPNGP survey. The same process of snowball sampling was utilized through embedding the fifth question in the survey.

The e–mail survey was sent to six individuals employed by institutions that had been introduced to the Web site a few years ago, and thought to be users of the digital collection. The e–mail was sent with the individual personal addresses apparent to all, but the survey was not addressed to participants by personal name.

After a week, only one of the six people had responded to the survey. That person was new at their institution and was not a user of the site and answered No to the first question. A reminder was then sent to the other five persons queried, and two more names were added to the list. This time the e–mail survey went out separately to each individual, and each was addressed personally by first name in the inquiry, in an attempt to boost the response rate.

The process continued over a period of nearly six weeks, until 47 surveys were finally completed. Once surveys were returned, tabulating the information was quite easy. The survey continued for so long because of the time it took to run the survey itself. Reminder e–mail notices went out continuously. More important, it was difficult to apply snowball sampling since many participants seemed reluctant to give out another’s e–mail address. Factoring in all this time to conduct and monitor the survey, an average time of 30 minutes was spent for each completed survey.

Survey comparison

Both surveys resulted in constructive and valuable comments. However, the survey delivered through e–mail contained briefer comments. A productive dialogue about the Web site was difficult to achieve using just e–mail.

The most important difference expected was the amount of time it took to conduct interviews versus an e–mail survey. The time spent creating each survey and tabulating the data was not noted, since the purpose of this study was to monitor the time spent in the technique of getting the completed survey.

The in–person interviews averaged around 50 minutes. This included the time it took to schedule the meeting and then conduct the interview. Although some sessions took less time, many ran over an hour. The dynamic of having a person discuss their use and application of the Web site proved to be extremely valuable. The e–mail survey’s time took only 30 minutes.

The solicitation of names for potential participants, a key component of snowball sampling, proved to be more difficult with the e–mail survey. There was a marked difference in the cooperation of participants in suggesting other users. During the in–person interviews, participants often on their own suggested another user who might be contacted. Also, in the initial contact with individuals to set up meeting times for interviews, the person would often state up front that perhaps they were not the best person to contact and immediately suggested another name.

Using e–mail to gather names through snowball sampling proved to be the most time consuming activity for that survey. Even participants who filled out the survey were often reluctant to give out another person’s e–mail address. It would be for future research to study that tendency, but it was a problem in the mechanics of snowball sampling using e–mail. Perhaps users feel more comfortable giving out another’s name in a verbal, face–to–face meeting rather than sending it over the Web.

Conclusion

As more digital libraries are created, evaluation of their effectiveness for future development is becoming more important. Because of time constraints and the ease of technology, it is tempting to rely on a survey sent out through e–mail. The results of this research show that the amount of time taken to conduct a successful e–mail survey might not make e–mail surveys the best option for digital library evaluation. Although it is more time consuming, as much as 20 minutes more per user, in–person interviews remain a viable option for evaluation of Web sites.

More important, the expenditure of time used during an in–person interview session can be viewed as an entirely productive use of that time. In contrast, time spent conducting the e–mail survey was spent on survey reminders and solicitation of e–mail addresses that probably could have been resolved more quickly by direct verbal contact with the participant.

The benefit of educating users is a clear advantage of in–person interviews that cannot be overlooked. Some respondents to the e–mail survey simply answered No to the question of whether they are a site user. Although a follow–up e–mail was always sent to those respondents, the conversation was limited. The education and public relations aspect of an in–depth interview with a site user is an extremely valuable component of the survey.

In sum, based on these surveys, an in–person interview appears to be the better value when taking the time to conduct a digital library evaluation, even though an e–mail survey will take less time.

About the author

Elaine Peterson is Associate Professor/Catalog Librarian at Montana State University Libraries. She has worked with digital libraries since 1998 and is currently interested in evaluation methodologies.

Notes

1. Saracevic, 2000, p. 350.

2. Dickstein and Mills, 2000, p. 149.

3. Peterson and York, 2003.

References

Ruth Dickstein and Vicki Mills, 2000. "Usability testing at the University of Arizona Library: how to let the users in on the design," Information Technology and Libraries, volume 19, number 3 (September), pp. 144–150.

Elaine Peterson and Vicky York, 2003. "User evaluation of the Montana Natural Resource Information System (NRIS)," D–Lib Magazine, volume 9, numbers 7/8 (July/August), at http://www.dlib.org/dlib/july03/peterson/07peterson.html, accessed 28 July 2004.
Tefko Saracevic, 2000. "Digital library evaluation: Toward an evolution of concepts," Library Trends, volume 49, number 2 (Fall), pp. 350–369.

Editorial history
Paper received 6 August 2004; accepted 14 April 2005.
HTML markup: Kyleen Kenney, Susan Bochenski, and Edward J. Valauskas; Editor: Edward J. Valauskas.

Copyright ©2005, Elaine Peterson

Evaluation of digital libraries using snowball sampling by Elaine Peterson
First Monday, volume 10, number 5 (May 2005),
URL: http://firstmonday.org/issues/issue10_5/peterson/index.html