First Monday
Read related articles on Data Mining, Libraries and Search Engines

Mining the Web: Techniques for Bridging the Gap between Content Producers and Consumers by K. Kris HirstThe Internet is an enormous, chaotic mixture of information and disinformation, in which the consumer finds difficulty in traversing, and the content producer finds difficulty in being heard. The Archaeology Guide for The Mining Company ® discusses the background and gate-keeping techniques used to "mine the Web."

Contents

Introduction
Finding Web-Based Information
A Mining Solution
Digging for Information Nuggets
Conclusion

Introduction

The growth of the Internet from a simple research tool to an information source for the masses has been little less than astonishing. Estimates on the numbers of Web pages in existence are in the millions, and early warnings that the Internet might crash under its own weight have gone unrealized. Instead, as perhaps sociologists could have told us, the Internet has become the equivalent of the soap box in Hyde Park, where anyone with access to a computer and a modem may become a "content producer." It is a situation never dreamt of by Gutenberg.

Science, in particular the social sciences, has been slow to adopt this method of imparting information. Mass media reporting of the Internet as a dark and dangerous place, replete with computer hackers and child molesters, excesses of pornography, and rampant urban legends of security leaks and computer viruses have all played a role in retarding both the growth of the Web and its use by consumers. However, as these issues fade with time and familiarity, some researchers are finding that communicating information without the intervening noise of the press is proving too attractive to dismiss out of hand. Faculty are finding that the Internet allows them to make course-specific material available to everyone in their classes, without having to find a publisher, edit for the widest possible audience and then wait two years for the book to come out.

Finding Web-Based Information

On the other hand, the growth in content on the Web has made it more difficult for consumers to access it. Whether they are using the Web as an educational resource or as entertainment, consumers are faced with the difficult, time-consuming, and frustrating task of sorting through the thousands of sites available on a particular topic. Search engines, while useful starting points, do not evaluate a site for its content, they merely mechanically recognize keywords that a Web author has provided to them. Even the best of the search engines requires consumers to individually assess the value and validity of the information they discoverÖsomething that they do not always have the background to do. What is needed, then, is a bridge between the content producer and the consumer. In founding The Mining Company ®, Scott Kurnit is attempting to provide that bridge, a place for consumers to access the best of the Web data easily and with confidence, and a place for content producers to make their best efforts available.

A Mining Solution

The Mining Company, a division of General Internet Inc., a New York-based Internet service development company, was founded in December of 1996 by Kurnit and a team of experienced Internet professionals. Kurnit recognized that the evaluation process was arduous for the consumer, and so began building a network of specialists to evaluate and organize the resources in their own specialty. These specialists, called Guides in The Mining Company terminology, are chosen on the basis of their background and passion in a particular topic, their technical agility, and their ability to combine content production with gatekeeper functions.

The Mining Company went on-line in April of 1997, and by September of 1997, over five hundred Guides are active, in fields ranging from journalism to medicine to soap opera fanatics to gardeners to weavers. The specialists are loosely organized in groups called "Hubs," including Arts and Entertainment, Careers and Education, Hobbies and Games, Sports, and Travel. Approximately 60 individuals are in the Computing/Science Hub, the majority of whom are either scientists in their respective fields, or technical writers with experience in those fields. Each Guide maintains contact with researchers in the field through electronic discussion lists and Usenet newsgroups, in addition to traditional news sources such as scientific journals and conferences. In addition, each Guide regularly searches the Internet for new sites, using a variety of search engines, link compilations, and other Web sites available to him or her.

The Web sites maintained by the Guides are identical in format, both to promote ease of use by the consumers, and to allow the specialists to concentrate on gate-keeping endeavors. Each Guide provides a front page, updated weekly, dozens of resource pages with annotated links (one Guide has over 200 resource pages), and a monthly "best of the net" list. In addition, Guides provide a weekly feature article, such as a commentary, book review, analysis, or discussion of some topic in their field of endeavor. The Guide also answers questions from the users, and maintains electronic relationships with the content producers and Webmasters in the field.

Digging for Information Nuggets

In terms of what a Guide looks for in an useful link, it depends on the audience. Most of the Computing/Science Hub Guides seek a balance in providing information for both professional scientists and the interested public. Regardless of the audience, however, the most important quality of a great Web site is meaningful content that is well-written. Academic jargon should be kept to a minimum, except when a resource page is aimed specifically towards professionals. The site should be visibly maintained and frequently updated. The pages should be signed by the Web authors, not just the Webmasters, and they should come from a reputable source, such as a university or professional society. If the source is a personal Web page, it should be the product of someone with credentials in a given field. In terms of illustrative material, pictures and graphics are very helpful, but they can be overdone.

Each Guide identifies links by systematic search, and no doubt each has a different methodology. A recent example for the Archaeology page will illustrate some of the process. On Arch-L, an electronic discussion group that I subscribe to, one user suggested to another a Web site that described the radiocarbon dating technique clearly. I looked at it, and indeed the Web authors handled a difficult topic quite well. I put it on my front page immediately, but I didn't have a place to permanently archive it. After some thought, I decided to build a resource page on technology - that is, a page of links that would have similarly clear discussions of technological issues in archaeology. I began a list of what issues those might be - use wear, taphonomy, seriation, geographic information systems, faunal and floral studies. I then used the MetaCrawler search engine and input various keywords. With the keywords "use wear archaeology" I found Roger Grace, who has specialized in the microscopic study of abrasion that appears on stone tools when they are used to scrape hides or work bone. Grace also dabbles in information technology, and he has developed a Web page for use wear analysis, with a skillful use of photographs to illustrate the different patterns.

Using a search engine such as MetaCrawler or even a combination of available search engines will only go so far, and so eventually I began a search of academic departments. Archaeology is a sub-discipline of anthropology in the United States, and that of history in Great Britain, making the search for appropriate departments somewhat complicated. Fortunately, others have compiled lists of university departments dealing with archaeology; Allen Lutins has the best compilation of anthropology-related sites I've seen.

By methodically visiting each site, I discovered Indiana University's photographic imaging techniques page, which describes a technique for underwater photography. After collecting ten or fifteen links, I compiled them in a resource page and put them on-line. I continue to keep an eye out for new links, so that the page continues to expand. Users are encouraged to write and contribute new links, and as I maintain communications with the various university departments, web authors notify me when new sites go on-line.

Conclusion

Although it is said that only death and taxes are certain, it seems most likely that the Internet and the World Wide Web will only continue to increase in size and complexity. As more and more individuals, universities, and businesses gain access, the amount and quality of information will continue to expand. At the same time, the amount of disinformation and propaganda will also expand. By using the skills and knowledge of a cadre of specialist gatekeepers, The Mining Company has opened a bridge between the everyday user and the producers of quality information on the Web.

The Author

K. Kris Hirst is a project archaeologist at The University of Iowa, Iowa City, Iowa in the United States, the current editor of The Journal of the Iowa Archeological Society, and the Guide for Archaeology at The Mining Company. Mail can be addressed to her at hirst@inav.net

Contents Index

Copyright © 1997, ƒ ¡ ® s † - m ¤ ñ d @ ¥