First Monday

First Monday Interviews

Cybrarian Reva Basch explores information and its uses in cyberspace

Reva Basch is a writer, researcher, and consultant to the online industry. Based in northern California, she was Vice President and Director of Research at Information on Demand, a pioneering independent research company. Reva has designed front-end search software for major online services, written and consulted on technical, marketing, and educational issues for both online services and database producers, and published extensively in information industry journals. She is a frequent speaker on topics related to information retrieval and the Internet.

Reva won the 1990 UMI/Data Courier Award for her two-part article on the user "wish list" in the magazine "Online." She was the 1993 recipient of the Dun and Bradstreet Online Champion Award. Reva is a Past-President (1991-1992) of the Association of Independent Information Professionals, a member of the Northern California Chapter of the Southern California Online Users Group (SCOUG) and a founding member of Information Bay Area. She has taught a course called "Information Brokering: Is this career for you?" at the University of California, Berkeley Extension.

Basch is the author of "Secrets of the Super Net Searchers" (Wilton, Conn.: Pemberton Press, 1996) and "Secrets of the Super Searchers" (Wilton, Conn.: Eight Bit Books, 1993). She was also editor of "Electronic Information Delivery: Evaluating Quality and Value" (Aldershot, Hampshire, England; Brookfield, Vt.: Gower, 1995). Reva is news editor of "Online," "Database," and "Online User" magazines, contributing editor of the "Information Advisor" newsletter, and writes the monthly Cybernaut column for "Computer Life" magazine.

Reva received her Masters in Library Science from UC Berkeley in 1971, began her career as a corporate librarian, and has been an online searcher since the mid-1970s.

She is an active participant in The WELL, a thriving virtual community, hosts several conferences there, and is an interested and enthusiastic observer and explorer of cyberspace in its various manifestations.


FM: In John Whalen's interview of you for "Wired" [ 1] and in Carla Sinclair's book "Net Chick" [ 2], there seems to be this element of surprise or dismay that thousands of dollars are paid for information. Does the abundance of information on the Internet make it difficult to convince others that intelligent filtering and searching takes time and effort?

It depends on who those "others" are, and whether I have anything vested in convincing them that you sometimes do have to pay to get good information. Let's back up for a minute: What we're talking about here, fundamentally, is the distinction between finding information on the Net, which is nominally "free," and searching, or hiring an expert to search, the professional online services like Lexis-Nexis, Dialog, Dow Jones News/Retrieval and so on. These services typically charge by the hour, or by the amount of information you retrieve, or some combination of the two. A simple search may cost as much as $US 20, a more complex one several hundred dollars.

Is it worth it? Not always, and that's why I don't leap into "convincing" mode every time somebody tells me that they can find everything they want on the Net. Perhaps they can. There's certainly a lot of good, useful information there, especially if you're willing to invest the time it often takes to find exactly what you're looking for. In fact, there's some information you can only find on the Net, or that you can find more easily and cheaply there than you can in the professional online services. Examples that come to mind are product specs, corporate press releases, some government information, opinions and anecdotal data of all sorts.

But there's a huge volume of information, much of it historical, technical, highly specialized or all three, that you won't find on the Net, that exists only in the structured environment of a commercial database service. You might be able to find equivalent information, or some portions of it, at a Web site or on a gopher server or in an FTP archive somewhere. But you will pay a price, in time and effort. The Net is decentralized; the information itself is unstructured, unstandardized, and largely uncataloged; and the search protocols available to you, whether they be archie or veronica or the current bleeding-edge Web agent, are far less sophisticated than what you'll find on a professional, research-oriented database service like Dialog and its ilk.

It comes down to a time versus money tradeoff. Even if what you're looking for is available on the Internet, it will almost certainly take you more time to find it, and what you do find may be incomplete or out of date. What you're paying for when you search one of the commercial services is efficiency and, to a somewhat lesser degree, comprehensiveness and reliability. You get a Boolean search engine that typically allows you to specify not just ANDs and ORs, but exactly how many intervening words to allow between two search terms. You get the ability to truncate word stems, to account for plurals or alternate word endings, to search on as many synonyms as you can think of. You also get the benefit of field searching, which lets you search for a specific author, publication name or type of publication, such as conference proceedings or market research reports. You can also restrict search results by language or date, or a range of dates. You can look for a specific company or product name, without having to worry that you're going to retrieve a millions references to "apples" when you're really interested in Apple computers. If you're doing a topic search, you can often take advantage of a database's "controlled vocabulary," which is essentially a thesaurus of standard indexing terms, with cross references to related concepts. That ensures that you don't have to guess, as you do on the Net, what words might be used to describe your subject. The vocabulary is documented. The Net, and the Web in particular, is notoriously weak on documentation.

Professional search services offer a couple of other advantages as well. One you might call conglomeration, or concentration, of resources. By that I mean that a single database may comprise hundreds, even thousands, of newspapers, newswires, magazines or scholarly journals. You can cover them all, simultaneously, with a single search strategy. Compare that with having to visit individual Web sites and, depending on the protocols in place at each one, spend untold hours searching or browsing them title by title or, at best, a few publications at a time. This concentration of resources under the umbrella of a single searchable database is, in itself, an enormous advantage.

The other factors, currency and reliability, I've touched on briefly. Because these databases are commercial products, marketed at a premium price and with their producers' reputation behind them, they're subject to quality controls at every level. They're certainly not infallible; I've written several articles, and there's a body of literature by my colleagues, on database quality problems. But these files are updated, generally, on a regular schedule. They are quality-checked during the production cycle. The information itself is often derived from print journals, which are peer-reviewed or at least subject to some basic fact-checking and editorial control. It's a very different information environment from the Net, where anybody with access to a server and a rudimentary knowledge of HTML is automatically a publisher.

That element of "surprise or dismay" you mentioned is not an attitude I encounter when I deal with business people, scientists, or other serious and sophisticated users of information. These people understand the value of information and the reality that, as with other valuable commodities, there is a price tag attached.


FM: Many bemoan the lack of content on the Internet, and its increasing commercialization. Do you think there is a lack of real information on the Internet? Is the Internet becoming "over-developed"?

I have to smile when I recall the concern, several short years ago, about whether commercial entities had the right to participate on the Net at all, even as lurkers, and about what, exactly, constituted permissible, "non-commercial" speech. Obviously, what we thought was the big issue turned out not to be an issue at all. Once Webmania hit, there was no longer anything to debate.

With regard to content, commercialization and development, you're conflating two or three different concerns. First of all, I don't think the state of the Net's commercialization has anything to do with whether there are things of value there. There's certainly no inherent reason why commercialization should affect content in a negative way, although the lower signal-to-noise ratio might make it more of a challenge to find the good stuff. If anything, commercialization is attracting more useful content to the Net. There's a huge amount of substantial material online, with more migrating there every day. It's become clear to scholarly publishers, database producers and the professional online services that the Web is the way information is distributed these days, or one of the major ways. Everyone, not just the mass market media companies like Time-Warner that we think of as the "commercial" interests, is developing Web interfaces to their products and services. The Web may not yet be the default method of distributing information, but it is becoming one of the essential channels. Companies, whether they be academic or consumer-oriented, pro bono or for-profit, simply can't afford to overlook it.

Thus, the question of whether the Net is becoming "over-developed" doesn't make sense to me. It's becoming "developed," and of course there's an ongoing concern that the infrastructure can continue to support it, but the phrase "over-developed" carries a negative value judgment that I don't think is appropriate or relevant under the circumstances. The Net is there, and it's being utilized. It's a resource for all sorts of communications, both commercial and non-commercial. We're not looking at an either/or situation here. The Net is, for all practical purposes, unlimited; there's no reason why bad information should necessarily drive out good.

What I do see happening is a growth in "branded" information, perhaps in the form of agent software or certified sites, that will make it easier to filter the high-quality data from the dross. Services like Yahoo! or the Argus Clearinghouse are tremendously valuable in this regard. Collections of expert links, like John Makulowich's Awesome Lists , will become more and more common. As more people come online and become acclimated to the Net as a social, professional and educational environment, they'll start using tools like these to carve out their own contexts, build their own interfaces, re-define this virtual space in a way that makes sense to them. The next generation of Web-users is going to be far less overwhelmed by the cacophony than we are. They'll have internalized the fact that they don't have to deal with the whole thing. We're not at that point yet. This is all still too new to us; we're still on the outside looking in.


FM: Do you verify your electronically retrieved facts? How?

It depends. Some sources come "pre-verified." If I'm searching one of the professional online services, or a magazine or newspaper site on the Web, I'm generally dealing with information that's gone through an editorial review process of one sort or another. Obviously, that in itself is no guarantee of complete accuracy or objectivity. But it does imply a certain accountability. The source of the information is clear, and my clients can judge for themselves whether to accept the data, or accept it with some adjustment for the bias of the source, or reject it entirely. The same is true for material I find on Web sites; some sites - universities, research institutions, government agencies - carry an inherent credibility, or at least a clear identification. As long as the client knows the source of the information, and can go back and verify it, I figure that I'm covered.

Where it gets sticky, though, is when you move away from the institutional and print-equivalent realm, and toward personal home pages, 'zines, Usenet newsgroups, listservs and so on. I do try to "source" information that I find at a site, to find out who produced it, when they put it up, who this person or entity is and what they represent. Often, there's enough information at the site itself to give you some context. If not, there's usually a mail-to for the Webmaster or the creator of the page. If it's important, I'll pursue that, and ask some questions in e-mail.

When it comes to the conversational side of the Net, as opposed to the "documentation" side - listservs, newsgroups, conferencing systems and so on - it's very important to identify the source of a statement and to verify their credentials. You can do this in a number of ways - by using DejaNews to check out their postings in other newsgroups, running an author search to see if they've actually published in the areas where they claim to be experts, checking university phone directories to verify their affiliation, and so on. You can always e-mail the individual directly, you know: "I found what you said in such-and-such very interesting. I'm curious about your background. Have you written any papers you could send me?" Listservs, at least in my experience, present less of a problem than the wide-open world of netnews; the population is more stable and there's generally a collegial feel to them, a shared commitment to the subject under discussion. It's a more controlled environment than the Net at large. I find listservs to be a tremendously valuable source of reliable, targeted and timely information.

It all comes down to accountability. I can't verify the facts per se, but I do try to ascertain who reported them. If I've found something on the Net that seems useful, but I can't identify the source, I'll tell my client exactly that. "Here it is; here's where I found it. I have no idea where it came from or who's behind it. Caveat emptor." After that, it's up to them to decide how or whether to use it, or whether to investigate further on their own.


FM: What do you see as the biggest defect with Internet-based research?

Separating the wheat from the chaff - and, even more fundamental than that, recognizing that a lot of it is chaff. The basic problem with the Net as a research environment is the sheer volume and variety of information that's out there. It's grown, very quickly, from a medium for scholarly exchange to a mass medium for all kinds of communication, not all of it text-based. Right now, we're in a very early phase in our relationship with the Net. We're still quite naive about it. Those of us who are early adopters can remember when it was possible to check out every new Web site as it came up. For a very brief period, probably no more than a couple of months, you really could "read" the entire Web. That's an absurd concept now, of course. But most of us still believe on some level that the Net can be tamed, that the current generation of search engines is capable of searching the entire Internet and doing a credible job of it. The more the Web grows, the more that belief becomes analogous to thinking you can encompass all of human knowledge. We're eventually going to realize the absurdity of that expectation, and then the era of specialization will begin. I think it already has, actually.

FM: How would you design a software agent to hunt for information?

No bow ties, please. Seriously, if you're asking how I'd like an agent to operate, here's what I consider important: A built-in hierarchical thesaurus, so I don't have to think of every possible synonym and related concept. Enough intelligence to account for plurals, alternate spellings and punctuation and forms of the word. The ability to specify the type of material I want searched - journal articles, current newspapers, Usenet discussions, personal Web pages, or the universe of Net resources. I'd like an agent that searched the "gated" sites, the ones like the New York Times or Wall Street Journal Interactive, that require registration and sometimes a fee, along with the open ones. And of course, as long as we're blue-skying here, I'd like not only relevance ranking based on my agent's intuitive understanding of what I need, but some sort of reliability/authority filter as well, the ability to find the best sources, not just the ones that mention my search terms the most.

I'm intrigued by the Firefly concept, the idea of building a collective, associative database from a community of shared or related interests, of educating one's personal agent by letting it mingle with its "peers" who are out on similar missions. I'm not sure how workable it will be in the long run, but it is an attempt to address some of the less quantifiable aspects of how we think, the oblique, synergistic, associative way that each of us builds our sphere of expertise.


FM: What advice would you give a youngster searching the Internet for the first time?

Have fun. Pay attention, click on those hot links, see where they take you. Try to get a sense of "the lay of the land," the way pieces of information relate to each other. One of the reasons kids do take so readily to the Net - and by the Net, I assume we mean the Web; they're so contiguous by now - is that it mirrors the way they first explore the world. Unlike books, it's a non-linear way of learning. I would just turn the child loose and let his or her curiosity do the rest. Technique can come later.


FM: The Internet works well if you have the income to buy a personal computer and a modem, and have access to a phone line. It also works well if you are literate and moderately educated. The Internet is certainly not designed for the handicapped. Those at the periphery - the poor, the ill-educated, the physically handicapped - seem to be ignored by the Internet. What would you do to bring the Internet to these audiences?

Each of those populations - the poor, the un- or under-educated, and the disabled - presents a different set of barriers to Internet access. I can't possibly get into all the complexities associated with each. I will say that I know blind people, and people with debilitating muscular conditions, who are avid, and heavy, Net users. They use voice I/O devices very effectively. I also know deaf people for whom the Net is a very effective medium for communications, certainly better than the telephone or face-to-face conversation, as well as wheelchair-bound individuals whose mobility problems, at least in terms of education, information-gathering and socializing, have effectively been transcended by their use of the Internet. So, in some ways, the Net is an enabling device, not a barrier at all.

Computer ownership presents an economic barrier to the poor. But I suspect that this, too, is starting to change. Equipment prices are lowering, as always happens with technology. There are agencies and organizations that refurbish used machines and donate them to individuals and organizations that serve the poor. Libraries and other agencies offer Net access. There are cyber-cafes where 25 cents buys you 15 minutes of time online. Public Internet kiosks are starting to appear. Web TV may make the Net both economically and conceptually more viable for millions of additional households; who knows? There are certainly more TVs in American homes, across the economic spectrum, than there are computers. I'm not saying the problem is solved, but in an evolutionary sense, we're making some progress.

As for the ill-educated, the problem isn't lack of Internet access, it's lack of education - plus whatever personal difficulties and/or social inequities lie behind that lack. Both the poor and the ill-educated face a constellation of problems, all of them a lot more immediate than Internet access. The Internet is an enormously educational tool, but it's absurd to worry about access to the Net for someone who can't read or write. Remember Maslow's hierarchy of needs? You've got to solve the fundamental problems before you spend a lot of time thinking about how to get the poor and the uneducated on the Net.

The Internet is not for everyone. I don't mean that in an elitist sense. I realize that there's a big difference between choosing to live "off the grid," informationally, and not being in a position to make that choice in the first place. Computers and use of the Net are already a part of our educational system. The current generation is growing up with them, and will be unable to imagine a society without them. They know what this technology is about, and why it's so important. If they have an incentive to get online, people will find ways of doing so, regardless of their educational level or socio-economic status. There are plenty of examples of this, ranging from Eastern Europe to Africa, Latin America and Southeast Asia. I have no doubt of this: The tools will find their way into the hands of those who need them.


© 1996 Reva Basch. One-time publication rights granted to First Monday. Further quotation or redistribution for non-commercial use is okay, as long as my authorship and the context are explicitly acknowledged. I do appreciate the courtesy of prior notification. Contact reva@well.com prior to any contemplated commercial re-publication or re-use.

Notes
1. John Whalen, 1996. "Super Searcher: Cybrarian Reva Basch is the ultimate intelligent agent," http://www.hotwired.com/wired/3.05/features/searcher.html
2. Carla Sinclair, 1996. Net Chick: A Smart-Girl Guide to the Wired World. New York: Holt, pp. 192-197.