The processed book

The "processed book" is about content, not technology, and contrasts with the "primal book"; the latter is the book we all know and revere: written by a single author and viewed as the embodiment of the thought of a single individual. The processed book, on the other hand, is what happens to the book when it is put into a computerized, networked environment. To process a book is more than simply building links to it; it also includes a modification of the act of creation, which tends to encourage the absorption of the book into a network of applications, including but not restricted to commentary. Such a book typically has at least five aspects: as self-referencing text; as portal; as platform; as machine component; and, as network node. An interesting aspect of such processing is that the author's relationship to his or her work may be undermined or compromised; indeed, it is possible that author attribution in the networked world may go the way of copyright. The processed book, in other words, is the response to romantic notions of authorship and books. It is not a matter of choice (as one can still write an imitation, for example, of a Victorian novel today) but an inevitable outcome of inherent characteristics of digital media.

Update to "The processed book" — 23 October 2005

In March 2003 First Monday published "The processed book," a speculative essay about the future of electronic texts. The thesis of that essay was that the concept of the book would change in a world of networked information and computer processes and that books would come to be seen as nodes on a network consisting of other books, commentary, and various kinds of meta–information. The essay came to the attention of the William and Flora Hewlett Foundation ( http://www.hewlett.org), which funded a software demonstration of some of the ideas in the essay. That demonstration project is now "live" on the Internet and can be found at http://prosaix.com/pbos. An essay on the background of the project, including a survey of the technical challenges and accomplishments, also appears on the site under the title "The Processed Book Project." In keeping with the spirit of the project, that essay has been placed in the "library" section of the project, where it can itself be read as a Processed Book: and thus it can be read, annotated, linked to and linked from, analyzed linguistically, measured quantitatively, and anything else that the Processed Book tool set permits. We named the tool set "Processed Book Operating System" or PBOS to convey the sense that it comprised a set of capabilities that could be continuously added to and "called" by other developers and annotators. PBOS is available as open source code from SourceForge: http://sourceforge.net/projects/pbos.

During the course of this project, we — Lynn Brock, who designed the software, Wayne Davison, who programmed it, and myself, who mostly worried — learned a great deal about how books might evolve in the future. The most obvious lesson is the great distance that must be traversed from talking about an idea in the abstract and actually instantiating that idea as a software service. Beyond that, however, there were some interesting issues that will likely require further elaboration in the future:

Where does a Processed Book end and a wiki begin? Conceptually, a wiki is a subset of the Processed Book idea, but we chose to implement PBOS without wiki capability. PBOS puts the original book or text at the center of a universe of annotation, but the text itself remains inviolate. Wikis, on the other hand, allow for the communal rewriting of a text. Not all Processed Books need be wikis, any more than all hardcopy books must have color illustrations. It seems probable that a taxonomy of subgenres of Processed Books will emerge over time.

How best to accommodate the range of thinking about interactive features? The sheer mass of the literature on different ways of enhancing a text (via hyperlinks, multimedia, smart processes, etc.) makes it difficult to come up with a workable overview. So, with PBOS, we have not tried to anticipate all the kinds of interaction that various people have come up with, but to create an environment where all kinds of interaction could find a home. The measure of success of this strategy will be in the number and kind of features that other developers add to PBOS.

What is the most effective means to communicate the distinction between features of a Processed Book and the software platform (PBOS) upon which those features can be built? This is a source of frustration for us, as we have not always been able to convey to early reviewers of PBOS that the most important thing is not what PBOS does but what it enables other people to do.

While the Processed Book revolves around a single text, what kind of issues would arise if instead of a single text, we worked with a collection? It seems probable that a collection would yield certain emergent properties that were not anticipated in PBOS. This is worth investigating, but it would be a separate project.

What will it take to effect a transition from "primal" books to Processed Books? We believe that all books will someday be Processed Books, but how to get there from where we are today is a difficult challenge. Our view is that this transition will require attention and investment from all sectors, and we thus designed PBOS to accommodate both for–profit and not–for–profit activities.

With PBOS now up and running for all to see and use (and download and enhance), what we hope to see is for others who are interested in interactive texts to poke around and to enlarge our understanding of the properties of digital media. Any user can now go to the Processed Book Project site and engage it in any number of ways: by simply reading a text that has been deposited there; by adding new titles to the library; by annotating the texts (both in the sense of adding comments but also in the special sense of adding software features); and, by downloading the open source code from SourceForge and starting another Processed Book site.

We wish to acknowledge the help and support of many people and organizations in developing the Processed Book Project. Most importantly, we wish to thank the William and Flora Hewlett Foundation for its financial support. Michael Carter, the project’s "godfather," provided the initial spur to create PBOS. And we wish to thank Ed Valauskas of First Monday, who published the original essay and whose own work in online media is an inspiration to the community at large. Other individuals are cited in the more extensive acknowledgments that appear on the Processed Book Project site (that is, http://prosaix.com/pbos).
Joseph J. Esposito is President of Portable CEO, an independent consultancy focusing on digital media. He can be reached at espositoj [at] gmail [dot] com

The electronic book or ebook has arrived, but it has not come very far. Optimistic expectations of the rate of ebook acceptance have been dashed, and numerous people are debating why something as obviously useful as the digital display of text has not already begun to replace paper. It may be that the current debate about electronic publishing is missing the point, however; it may be too focused on devices (however amazingly cool these devices may be) and is not reaching to the heart of the matter, which is why we care about books in the first place. We care about books because of what's inside them, because of what they mean. The intriguing aspect of electronic publishing is not simply whether we will all someday dump print in favor of screens or what file format will become the standard, but how electronic publishing will affect what goes inside of books. It is my view that our current notion of books is naïve, raw, and that what electronic publishing will give us is something that is highly thought out, cooked and processed. To the world of processed food and processed hair, we now add the processed book.

Some definitions are in order. Usually books are identified with their physical package. That package is generally between four and six inches in one dimension and seven and nine inches in the other; it is printed on paper; and it is the product of an author (usually one). The content of such a package, however, is also called a book, and that is the kind of book I wish to discuss here. As we begin to publish some books in electronic form, the print package gets tossed out and only the content remains. Is an ebook (or e-book or eBook) the content, the device that displays it, or both? Some interested parties now use the term etext to distinguish the content from its package. This would be more helpful if enough people subscribed to the convention. What I will call books are texts or etexts or content. This kind of book is the same whether it is displayed in a handsomely bound hardcover book, within a Web browser, as an Adobe PDF file, or in Microsoft Reader (among a multitude of other formats). By this definition, the book of the future will be a ... book!

The book of the future will be a ... book!

Here we should note that once we separate a book's content from its hardcopy package, the notion of what is "book-length" disappears. A very short book is likely to be about 120 pages, which comes to about 40,000 words. Most books are roughly three times that length, and those long, gooey novels you curl up with on the beach can be twice that again — well over 200,000 words. I am drafting this document just after completing a commercial novel of about a half-million words, and I enjoyed every one of them. The connection between the physical book and our sense of a book-length idea is important because the literal physical package has come to define what we mean by a well-thought-out argument or story — because such an idea would fill the pages of a book. In other words, the accident of the convenient size of a single volume has served to create an arbitrary image of an intellectual category; the medium in this case has served to define the message. But in electronic form, anything goes. A book (probably better to refer to it as a text, though that term lacks the historical resonance of book) could be millions of words long or it could be a simple e-mail of a few lines: No particular length serves to define what is meant by a complete idea and the physical display of such a book or text (whether on a computer screen, a personal digital assistant, or whatever) is thoroughly agnostic when it comes to meaning. It would be interesting to speculate what it will mean culturally to lose the sense of a well-developed idea when such an idea is no longer hardwired to paper and ink. Throughout this essay I use the term book to refer to texts of any length.

I

Before we have a processed book, however, we must have a traditional book, a primal book, an utterance that precedes or has escaped the bureaucratization and systematization of the modern world. The primal book is a curiously romantic myth that a number of otherwise skeptical and dispassionate people (mostly authors) cling to unreflectively. The primal book is usually written by a single author, someone who has Something to Say. The author's job is to get it out, to get it on paper. It is a serious task. It requires a serious person. To assert the seriousness of the effort, the author may rent a garret and embrace poverty; even more reckless souls may teach at a university. It is a spiritual mission. It is hoped that the author's creation will ultimately be wrapped in the appropriate robes of ritual: A stiff hardcover binding with a glossy dust jacket, acid-free paper, perhaps a colophon page, with extra points for deckle edge. The most important aspect of the primal book, however, is its air of authenticity. The author, the creator, has made the book in his image. Such a book is a bit of the inner life of the author brought into the world for all to admire.

The notion of authenticity is insidious and, apparently, resistant to all attempts to stamp it out. Perhaps one could identify a young acolyte by the expression of an early interest in the poetry of Wordsworth and drag the reprobate to the woodshed for improvement; or the ungrateful cur could be presented with the complete works of Kerouac and admonished: Do you see now? Do you see where this could lead? It is to no avail. The author has Something to Say and the book is Where It Is Said. This myth can emerge unexpectedly. So, for example, I am drafting this essay in the wake of a well-publicized plagiarism scandal. The crime of plagiarism is an assault on the church of authenticity. It threatens to undo the primal book.

People familiar with the book industry, especially those few who pay attention to the numbers, are aware that the primal book is a myth. For starters, many books are not created by a single author or even a dynamic duo but by teams of writers, who may be writing to scripts created by someone else. This is virtually always the case with reference books and for many textbooks and is often true of paperback series publishing, where legions of writers (we probably should not call them authors) "fill in" the details of plot and character that have been outlined by a project head. Then there are all the books that only pretend to be books to get bookstore distribution. (The late Hayward Cirker, founder of Dover Publications, used to publish juvenile titles that he called "toys within covers.") And there are books that are compilations of the works of others, sometimes of people whose identity is unknown (e.g., The Darwin Awards). Celebrity books represent a particularly cynical attack on authenticity. Such a book has a prominent personality "tell" his or her story with the assistance of a ghost writer; and in a twist that almost seems like postmodern wish-fulfillment, often the "ghost" is no longer invisible but is cited on the dust jacket right below the name of the celebrity, whose true role is that of marketing lightning rod. But the myth of the primal book is too potent to be troubled by the facts. Books are what authors write. Books express authors' ideas. Books have a certain integrity born of the fact that they are the authentic manifestations of the serious men and women who create them.

I offer this caricature of the traditional book in order to more easily contrast it with the evolving forms of electronic publishing, where the author's authentic voice is buried within a network of references and interpretations. This is the world of the processed book, the book where the primal utterance of the author gives rise to hyperlinks and paralinks and neural networks and whatever other kinds of connections and cross-connections computer scientists come up with. Do a Google search on "computational linguistics" and you will never again think of books in the same way. A processed book is processed in two senses: The original utterance undergoes a series of transformations before the end-point is reached, and it is micro-processed, that is, it uses the astounding capabilities of computers to augment the original text.

Do a Google search on "computational linguistics" and you will never again think of books in the same way.

The current crop of electronic books is often criticized as being dressed up with bells and whistles (to which a programmer might say: Whistles! Why didn't I think of that!?). Well, you ain't seen nothing yet. The list of features differs from format to format (you can do more with an etext on a personal computer than you can with the relatively weaker processing power of a handheld reader), but generally includes such worthwhile things as bookmarking, highlighting, perhaps hyperlinks, and the integration of a mediocre dictionary. We are often told of the wonders of not having to carry six heavy books on a plane when you can put that much and more into a handheld reader's memory. This is perhaps not as wonderful a feature as some (nonreaders) might suppose, as virtually no one would dream of carrying six books on a plane (you can't even finish a novel in the time it takes to fly from San Francisco to New York and there are hardcopy bookstores everywhere). Despite these added features, though, all the current ebooks brag that they preserve the text and spirit of the original, the printed book; one device even looks like a classy leather-bound book! Now, when in the last century has anyone routinely read a leather-bound book? The trap that the ebook publishers and the device manufacturers have fallen into is the myth of the primal book. The ebook is supposed to have the same aura as a tony printed book. To my mind, this is an insult to the digital medium.

The processed book is not boundless, but it is vast. It is not limited to dedicated handheld devices but can be displayed on any computer screen. Indeed, some of the most interesting examples of the processed book are difficult to use except on a personal computer, as they require a large screen for display and insist on being read with a Web browser (see, for example, the outstanding search engine and links of the Reed Elsevier collection of scientific, technical, and medical journals). But as we project into the future, we assuredly will see a multitude of computing devices, some that sit on desks, some that slip into our pockets, some that are combined with wireless phones, and perhaps some that are surgically implanted at the base of our skulls. And the good news is: they will all display books. These books will be everything books have always been in the past and more. By "everything" from the past, I mean just that. If you want the smell of a leather binding, it can be programmed in. If you want the complete text of The Adventures of Huckleberry Finn or Little Dorrit, you can have it. You can have the text of Crime and Punishment in Russian and English; for that matter, you could see an exact copy of the manuscript displayed in a small window. Critical commentary? Pack it in. Full text of footnoted sources? Included. It does not matter where some of this data resides physically, whether it sits in the memory of the reading device or must be instantly retrieved from a remote outpost on the Internet. The processed book collapses time and space and makes all the civilization's documents available in the palm of your hand. The processed book is thus an assault on the natural rhythm of things; it occupies a deracinated world of ideas. We have exchanged the garret for the microprocessor.

The processed book is thus an assault on the natural rhythm of things; it occupies a deracinated world of ideas. We have exchanged the garret for the microprocessor.

When placed into the context of the processed book, the primal book doesn't disappear; rather, it is stripped of its air of being a vital expression of a human being and is reduced to its text. If this is beginning to sound like some abstruse critical theories of literature and texts, this is because aspects of those theories have proved to be predictive. This is painful to behold for someone who prayed earnestly that Isabel Archer would not return to Osmond, but words are symbols and are ideally suited for the manipulations of the symbolic logic of computers. The processed book takes Isabel Archer and shows her to be the collection of words that she is. She then can be processed.

II

Above and beyond the text of the primal book that serves as its staging point, the processed book has at least five aspects, which may overlap; and some of these aspects are more developed today than others:

A. The book as portal. This is the aspect of ebooks that most people are familiar with. A book becomes a specialized portal by encouraging readers to click through to other sources of information. The most primitive example of this is a book with a built-in dictionary. Every word in an ebook can be linked to its definition, pronunciation, etymology, etc., which augment the reading experience. Some ebooks link to proper names or Web sites where background information on the primary topic can be found. With hyperlinked footnotes an ebook can point a reader to its sources, including in some instances the full text of those sources. The ebook thus becomes a window on a bigger, interpretively supportive world of data.

Can't you do this with even our lowly primal book? You can do some of it. Printed books can have footnotes and bibliographies; they may have other metadata as well, such as an author's preface, an afterword by a scholar, or even a collection of critical essays (see, for example, the excellent Norton Critical Editions). Publishers have done great things with print, and they have every reason to be proud. But the primal book breaks down in the face of the microprocessor much as a horse would in a race with an SUV. The printed book has a footnote, but the processed book can have the full text of the citation. A bibliography in a processed book can be tantamount to a library dedicated to a particular subject. You can read a printed book with a good dictionary at your side, but with the processed book you can look up every entry in Webster's Third New International Dictionary with a mere click or two; and the lucky members of some academic institutions can now read a word's entire history in the OED.

But the electronic portal goes far beyond even this, connecting readers to specialized databases of information and online services. A bibliography in an ebook can link to the online catalog of a nearby university library, where you can determine if a particular book is in the collection. Or readers of Books in Print, a huge reference work marketed mostly to publishing-industry professionals, can look up any title they desire and then check the inventory status of that book with two of the nation's leading book wholesalers. This particular reference book, in other words, has become a "front end" to a business process by which booksellers can restock their stores. The processed book has the potential to make the contents of a book actionable, not merely readable.

The processed book has the potential to make the contents of a book actionable, not merely readable.

One aspect of the book as portal is that it undermines the reading experience even as it augments it. Reading is linear and requires concentration. A portal link takes the reader away from the author's linear design and focuses his or her attention on other text. While that text may enrich the meaning of the original book, it also distracts the reader, who then must reorient him or herself upon returning to the primary material. As authors become increasingly aware of the potentialities of the processed book, we should expect that they will begin to write with these jumps in attention in mind. Perhaps they will encourage leaping, perhaps not; or perhaps they will learn to accommodate this aspect of the medium, just as audiobook publishers have learned to give their listeners special cues to help them with the transition from print to sound ("This is Moby-Dick by Herman Melville, cassette four, side two").

What will be especially interesting to see in the years ahead is whether authors will begin to regard their own work as portals and begin to write with an eye toward extending the book beyond its own contours. They could never do this in the printed form; it would be peculiar to ebooks. Perhaps some authors will be more open to including obscure references, knowing that the reader can obtain a gloss with a simple click. One wonders what T. S. Eliot would say if he were alive today and could view Ray Parker's online annotated version of The Waste Land (at http://world.std.com/~raparker/exploring/thewasteland/explore.html). On the other hand, it may be that some authors will resent the ease with which obscure references can be glossed electronically. For example, part of the meaning of the notoriously elliptical poetry of Ezra Pound lies in its very obscurity, in the sheer difficulty of catching all the allusions to other works. (For those unfamiliar with Pound, think of the allusive music of Elvis Costello or Smashmouth.) As a processed book, Pound's Cantos would lose some of its aesthetics of difficulty as every allusion is presented to the reader with helpful commentary. If Pound had been aware of the possibility of the processed book, he might have written a different kind of poetry altogether. I am inclined to think, though, that the processed book will make all writing, from serious literature to notes to the baby-sitter, more Pound-like for everyone — except Pound's disciples, who, perversely, will seek to distinguish themselves by the clarity and completeness of their expression. Ebooks are likely to become increasingly compressed as the need to spell everything out in the primary text is lessened by the one-click availability of explanatory texts. Writing, in other words, becomes not simple expression but also computer-assisted calculation. It is not too much to say that Pound, with his enormous influence on Modernist aesthetics, helped create the intellectual pressure that made the development of electronic publishing necessary.
B. The book as self-referencing text. Books consist of words that are organized in a particular way. Change the organization and you change the book's meaning, but some changes reveal some of the original meaning that previously had been obscure. You can try this with a printed nonfiction book with an index. Read the book through without looking once at the index. Now turn to the index and read all of the headwords. New patterns will appear. By taking the distillate of the book, an index becomes an interesting heuristic device.

It goes without saying that computers can do this better. The processed book can index a book in any number of ways, and each method will highlight a different aspect of the primary text. This capability is beautifully spoofed in Italo Calvino's If on a Winter's Night a Traveler, where a literary critic proclaims that she no longer reads literature, preferring instead to study a computer print-out of word-frequency counts. The processed book can show us word frequencies; it can map such frequencies against a statistically determined dictionary of "normal" usage, note the standard deviation, and output the result visually; it can associate certain words with specific characters; it can identify webs of metaphors that even the most attentive of readers may have missed. By identifying these patterns (or, it would be more accurate to say, by revealing these patterns for us to see and interpret), the processed book is doing some of the reader's work. A reader of The Scarlet Letter notes that there is a red "A" sewn on Hester Prynne's bosom, and later notes a passage where Hester walks in a garden of red roses: the connection noted, interpretation is possible. A computer can pick up this connection and many, many more, potentially providing us with some of the richness that we usually associate with rereading. The processed book is spatial: it takes the linear progression of a book and makes events from different times spring to mind simultaneously. It takes the primary book and makes it comment upon itself.

The processed book is spatial: it takes the linear progression of a book and makes events from different times spring to mind simultaneously. It takes the primary book and makes it comment upon itself.

While this particular aspect of the processed book is generally unavailable in the current generation of commercial ebooks, it has been in use in research institutions for several decades. (Indeed, many of the features of the processed book mentioned in this essay will not become widespread for years.) I first saw word-frequency counts used in the study of literature over twenty years ago at an exhibit at the Modern Language Association annual convention. There was an alphabetized list with corresponding frequencies of every term in Joyce's Ulysses — one way to help to understand a notoriously difficult book. Dictionary-makers now routinely search through large content databases in this way, seeking to isolate new words and new meanings for old terms. (The latter — new meanings — requires a bit of artificial intelligence to be effective.) Dictionaries can also be made to "read" themselves, a good way to check for spelling errors and to make sure that every word that appears in a definition also is given its own headword, making the text of a good dictionary into a closed hermeneutic circle. For the most part, when fiction writers, as in the example from Calvino above, study this self-referencing characteristic, they treat it humorously. For example, a character, a writer, in a novel by David Lodge gets writer's block when he sees one of his own books processed in this way. He simply couldn't bear the self-consciousness that comes with knowledge. In this case, humor is the revenge of the primal upon the processed.

C. The book as platform. There is a simple and a complex form of the book as platform. The simple form is where commentary is heaped upon the poor unsuspecting text of the original work. This is not peculiar to electronic books, of course; hardcopy books often feature extensive critical commentary. To some extent this simple form of platform overlaps with the book as portal, in that the commentary is often found by clicking through the primary text. The commentary need not be restricted to formal criticism; it could include such things as a student's highlighted text or notes provided by an instructor. There are a number of companies exploring the simple platform now. We should expect this technical capability to become widespread, especially (at least initially) in higher education, where many students are required to have laptop computers and many instructors supplement classroom activity with online communication. The complex form of the book-as-platform, however, may be a bit obscure. Currently, it is much less developed than the processed book's portal and self-referencing qualities. It is the opposite of the book as portal. As a portal, a processed book points to other things; as a platform, a book invites other things to point to it.

Books want to be pointed to for the same reason that people want to be the center of attention. In a hardcopy book this desire may take the form of crafting curmudgeonly aphorisms, which lend themselves to quotation. In scientific work, the production of primary data can place a particular publication at the center of a huge web of citations. This is an extraordinary aspect of the processed book and bears some reflection. ISI publishes quantitative reports on how often particular scientific articles have been cited by other reports. A high score is presumed to indicate a good paper, which redounds favorably on the author. Well, do more citations mean that a paper is better? And what do we mean by "better" anyway? Or is it simply that we have thrown up our hands at the really hard question, the determination of value (an artifact, we should note, of the primal book), and have chosen instead to use a computer-assessible mechanism, a simple count, as a proxy for the hard question? This is not to say that this measure of the processed book is wrong; it simply isn't exactly right. We don't use it because it gives us the right answer, we use it because it gives us an easily derived answer — not unlike the old joke about a boy who loses a quarter on one end of a dark street, but chooses to look for it on the other end, because the light is better.

A platform is a specific and important thing in the computer industry; Bill Gates owes his wealth to having overseen the development of one of the most significant, the Windows operating system. A platform is what other things rest upon. Those other things (called applications) draw on or "call" the resources of the platform to perform certain tasks. So, for instance, software developers don't have to teach computers how to output color on the display; they simply invoke the platform's capability to display in color.

In the world of books, reference books most readily lend themselves to being reinvented as platforms. Dictionaries are now being created with software tools to allow them to be "called" from any word displayed on the screen: highlight the term and click and the definition appears. Encyclopedias are being used as platforms as well, though the implementation is generally limited to manually inserting hyperlinks in the primary text, links that then "call" the encyclopedia database. We shouldn't limit our thinking about platforms to natural-born backgrounders like reference books, however. For example, some books, in their primal form, come to be thought of as seminal. On the Internet something that is seminal can be instantiated as a parent text that links to all its offspring. When a book achieves seminal status, the publisher may then provide tools to make it easier for other works to link to it, converting it from a primal text to a platform. The ultimate book-as-platform is the Bible, which serves as a platform for a large swath of Western civilization. The Bible has yet to be published as a platform, however, though it has been published as a limited portal.

To publish the Bible as a platform means not only getting the content "right" (which in this context means having content that other people want to build upon) but also providing tools for other developers, whom we are likely to call authors or publishers, that make it easy to build on that platform. Metatags, information that helps to define the components of the documents that they are attached to, are such tools; they can identify such things as graphical categories ("this is a picture"; "this is a paragraph indentation"), rhetorical categories ("this is a paragraph"; "this is the beginning of a chapter"), and topical categories ("this passage is about cats"; "this passage is about dogs"), even when obvious keywords ("cats" and "dogs") are missing ("this is a passage about household pets"). Metatags can be weighted, which means that their importance can be ranked. This paragraph, for example, includes the keyword Bible, but the passage is not about the Bible, which should be given a low weighting. A metatag for literary theory would be given a higher ranking, even though neither term appears here; and a metatag for a consultant's marketing tool would get the highest ranking of all. At the risk of pushing the metaphor too far, the publisher of a book-as-platform needs to "expose the API," the application program interface, allowing other authors and publishers to write to the platform. The content of the platform is then conceived of as information objects, defined and discernible modules that can be invoked by other works.

The book-as-platform strains the traditional sense of what a book is, making it hard to reinvent or resuscitate a traditional book for platform work. For this reason, some of the work currently being done to create content platforms is original to the Internet, though the business prospects for the entities in this area are still uncertain. One venture is producing a set of reference data keyed to news items. For example, a reader who comes upon a reference to Ariel Sharon can click to a brief article about this figure. Similarly, a text reference to petroleum will link to an article about oil and the oil industry. The content created by this venture differs from traditional reference works in that it was designed with the Web in mind from the outset. The articles are short and can be easily displayed in a window on a computer screen, and the article topics are generated by scanning items that actually appear in the news (unlike a traditional encyclopedia, many of whose entries may be obscure to people who only read newspapers). A related venture has chosen not to create new content for a platform but has developed a database of Web site entries. So, when a reader comes upon a reference to Sharon, he or she can link to a small group of Web sites that contain information about Sharon, rather than to a specific article. As more and more reference information is published on the Web, the split between content that is made for the Web and content that is made with another medium primarily in mind will close.

There is, I believe, a very large business opportunity in creating books-as-platforms, especially by concentrating on reference material in specialized markets. General reference works — a new version, say, of Encyclopaedia Britannica, but with much more extensive coverage and an atomistic, short-entry editorial strategy — are tempting, but the cost of creating and maintaining such a work is staggering and the economic prospects discouraging. Better to work in vertical markets, whether for consumers or professionals. A definitive online encyclopedia of garden flowers, organized as information objects, would be a good project, but even better would be a highly technical encyclopedia of the genomes of garden flowers, including the genetic maps of each plant and flower, with downloadable files of data for simulation of genetic engineering. From a business point of view, as a rule, the narrower, the better; the more technical, the better; and if the data can be made instrumental — how things work — as opposed to interpretive — what things mean — better yet.

Publishers will devise various means to monetize their investments in books-as-platforms, but finding the right economic model (that is, the one that provides the highest return on capital) will be a process of trial and error. Publishers with seminal content may charge other publishers for the right to "call" the seminal property, but, on the other hand, if the costs are perceived to be too high, the seminal work may not be able to generate a substantial network around it, thereby undermining its seminal status. There is a trade-off here between short-term and long-term economic gain — which is another reason that publishers will continue to get bigger and bigger, as only large organizations can finance a long-term vision. An analogous situation exists today in the library world, especially the public library segment, where hardcopy trade or consumer publishers have always had ambivalent feelings. Most trade books are sold through retail outlets (bookstores, discount clubs, and online), but a book placed in a library's collection has the potential to cannibalize retail sales. On the other hand, there is strong evidence that library collections serve as a marketing mechanism to stimulate retail sales. Publishers therefore support public library sales provided that they don't become a substitute for retail distribution. This meets with a paradox: If we ever developed a fully-funded public library system in the United States, where everyone looked to their local libraries as the centerpiece of civic life, publishers would stop selling books to them.

I suspect that a split will develop between marketing books-as-platforms and books-as-applications (that is, mere books). Books-as-platforms, seeking ubiquity and determined to keep their transactional costs down, are likely to be marketed to pre-existing communities such as the faculty and students of a university, the employees of a corporation, or a special-interest group (the local chess club licensing an encyclopedia of chess openings). Books that draw on these content platforms will be marketed both on a community basis and to individuals. This two-track marketing structure will encourage communities to take a larger role in their members' informational needs, which in turn will encourage closer community involvement. This is not a "one world" scenario, but rather one of tribal associations born of common economic interests.

It is worth noting a curious aspect of the book-as-platform, namely, that books that are created with this quality in mind are something of a self-fulfilling prophecy. A book-as-platform announces its availability to be invoked by other books in part through the suite of tools it makes available to third parties. A good book with no tools will not get invoked often. A bad book with good tools will not, one hopes, be invoked at all. But a good-enough book with good tools is likely to get invoked more often than the tool-less good book. Certain network effects — things external to a particular product or service that tend to support and even reinforce the original product, as the huge quantity of third-party software supports the Microsoft Windows platform — may then kick in, which will tend to strengthen the platform aspirations of the good-enough book. The Google search engine, for example, ranks Web sites in part by counting the number of links other sites have to the primary site; and since search engines, of which Google is the current leader, are a major source of traffic to Web sites, a large number of inbound links can result in even more inbound links. This means that the creation of a successful book will increasingly involve an awareness of what tools are necessary to inspire invocation. It is not enough to say something; it must be said in a way that others will choose to say it as well.

But how about the outstanding book, the book whose felt force is so great that it demands that we pay attention, despite an absence of platform tools and inept publication? Provided that we understand that the club for outstanding books is a small one indeed, the exceptional book will foster its own followers, who will assemble the network of processing tools around it. Despite its efforts, the processed book cannot ultimately do away with the exceptional primal book, whose very intensity exposes the limitations of computing.

D. The book as machine component. We have been spoiled by books. We believe that they have been written for us to read, that their ultimate goal is to reach us, that as readers we occupy a central place in the drama of culture. If the processed book attempts to separate the author from the text of his or her own work, we should not be surprised that the reader will soon fall under attack. One aspect of the processed book is to create books that are intended to be read by machines and embedded within machine processes. It is only a matter of time before books will be created with a machine–audience in mind. Considering the slow growth of the publishing industry today, the future of publishing may be to serve this new constituency.

It is only a matter of time before books will be created with a machine–audience in mind.

Research into the use of aspects of human culture — books, for instance — as parts of computer algorithms has been going on for decades; some examples of this work are now finding their way into consumer devices and services. We commonly encounter text-to-speech synthesis (TTS) technology, for example, when we dial an information operator and are greeted with a robotic voice. TTS works by developing a collection of sounds that are mapped onto the letters and words of the text in question. While there are only forty such sounds (phonemes) in English, most TTS engines generate more sounds than that in order to reduce the choppiness of pronouncing one letter at a time; indeed the technical sophistication that "sits behind" what would seem to be a simple sound is dazzling and tends to overshadow the lexical content that it generates. Millions of personal computers now come with this technology built in. You can have your e-mail read to you in a robotic voice, if you want to, which may not make much sense for someone sitting at a computer, but is a great convenience for someone driving a car who can't or shouldn't take his or her eyes off the road; such mobile TTS is now available. One Silicon Valley company has developed a TTS tool for reading books to the blind, which is a wonderful addition to the world's media, as only a small portion of published books are ever recorded as audiobooks (and even then, for reasons of cost, mostly in abridged format). TTS will eventually find its way to all books, giving the reader of a processed book the choice of reading or listening. (This, by the way, will diminish or even destroy the $US2 billion audiobook business as we know it today, as the rights for a book's text intended to be read and the rights for audio will converge.)

The reverse of TTS is voice-recognition technology, though this technology is not as far along as TTS. A voice-recognition system incorporates a dictionary, which helps identify the words being spoken. Some of the current systems require that a particular speaker "train" the system for a period of time to make it work effectively, but even so, the principle of embedding a dictionary is unchanged. One way to improve the accuracy of voice recognition is to restrict the vocabulary of the system. This is what is going on when you are talking to an automated voice mail system, which may tell you something like: "At the tone please say your Social Security Number or you may key it in using your phone's dialpad." Creating such a restricted vocabulary is the equivalent of making an abridged paperback version of a dictionary, except that the requirements of voice-recognition technology are largely determined empirically, by studying the words users actually employ and expanding the vocabulary if users effectively demand it. This feedback mechanism, which is peculiar to the processed book, can take place quickly, even instantly. The processed book, in other words, "learns" and adapts itself to the actual circumstances of its use. Traditional books, on the other hand, like diamonds, are forever.

It may appear that reference books and dictionaries in particular have an advantage over other books in becoming machine components, but in fact all books aspire to the perfection that is a machine. It may require a multistep process, however. Let's take romance fiction, for example. How could we possibly make a machine want anything to do with a romance novel? The first step is to convert the novel into a collection of indexes through text analysis, much as described above in the section on the self-referencing book. Such indexes are intermediate documents that could be of value to marketers, who might extract word-frequency lists (or some other underlying textual pattern) to assist in crafting copy for advertising. But the processed book can do more. Why one romance novel when you can have one thousand? And let's link the indexes of each novel to a number of fields of metadata such as author, date of publication, rate of sale, and the geographical distribution of sales. Let's also capture data at the point of sale, directly from the cash register, and update our metadocument in real-time. Now we have a dynamic database that can tell us how the moods and tastes of a particular market segment are changing minute by minute. (We could get even better information if these books were being read online, where each page view could be assessed.) We may as well disintermediate the copywriter and have the processed data tweak the Web site of the client company; or we could have the dynamic data feed a digital printing press, where last-minute changes to a marketing brochure can be made.

Outside the world of books, something like this is already underway. The Benetton organization captures data on every garment it sells, data that is then sent to a database and evaluated for trends in fashion. These evaluations are then moved to the production line, where they can influence the dye lots. It is quite possible that the color of shirts in the Benetton inventory could change from one week to the next as a direct response to the information being fed to the factory floor from the cash register. The introduction of the processed book to such a system will represent a refinement, the addition of a culturally based weighting mechanism to optimize the effectiveness of the inventory management and merchandising system.

E. The book as network node. The primal book is a discrete item. The processed book is a node on a network. Now we know what has happened to the primal book: it resides as a node, linked to other nodes, many of which themselves are primal books. Compare this to DNA: Each individual has his or her own unique DNA, but this DNA has much in common with that of all other people, living and dead — and, for that matter, not yet born. All men are cousins. And this is true of books as well: By being placed within a network, where it is pointed to and pointed from, where it is analyzed and measured and processed and redistributed, a book reveals its connections to all other books. When Hemingway remarked that all American literature can be traced back to Huckleberry Finn, he was acknowledging kinship. When a text analysis program determines that writers from one region use more dependent clauses than writers from another region, it is defining kinship.

The relationships between the nodes of the network can be multiple. One node can be used as a machine component and aid in creating another node, which serves as a platform for a third, which supports the first. The network map — how one node connects to another — is a portrait of the processed book, showing its ancestry, its descendants, and the relationships between the entire family. This map is itself a document — may we call it a book? — or metadocument, which derives from the very field it comments on and in turn influences that field, much as consciousness influences the behavior of a human being.

This is all pretty abstract for someone whose ambition is to write a simple, self-contained text such as a memoir or a category novel (mystery, romance, science fiction, etc.). The problem is that the idea of a self-contained text is a product of the fixed medium of print on paper. The challenge that the processed book puts to writers is that of working with a double consciousness, as primal authors looking over their own shoulders as they see the book being processed even as it is written. The primal book lives under surveillance. It is hard to imagine many authors whose work will not be influenced by the fact of being observed by a camera. And it is important to note that this is not a matter of choice. While some romantic writers will try to bat away the intrusions of processed media, those who embrace the network will be the most successful, success being determined by the survival and "pointability" of the text or node.

It is worth noting that the nodal aspect of the processed book has very important business implications, which are likely to reshape the publishing industry in the years to come. The threat to copyright, for example, may be pushed back. Publishers have been watching the tribulations of their brethren in the music industry and fear copyright piracy like nothing else. Piracy is not peculiar to digital books, of course, as any publisher who has travelled in Asia can tell you, but it is much more extensive when copies of books can be sent around the world on the Internet. While Napster, the first mass-market file-sharing service, has been clipped by the music industry, file-sharing still takes place on Napster's underground successors (LimeWire, Aimster, etc.) and within the universe of UseNet. Many books are now being pirated on these services, which has led not a few publishers to steer clear of distributing digital copies of their products, for fear that they will end up in the file-sharing underground.

A processed book, however, can be published as a node in a network, with connections to other books, commentary, online library card catalogues, teachers’ recommendations, and so forth. If the network is usefully developed (and this is an important "if," for links and other connections for their own sake can be a distraction), the value of the book-as-node is greatly enhanced by being part of it. Pirated copies of the primary book, the node, would not have all the network connections, making the pirated copies less valuable. This would serve to bring readers back to the nodal book, not for its primal value (because the content is elsewhere available for free in pirated copies) but for its processed, networked value. This would embolden more publishers to make their books available electronically, provided that they had the means to plug the book into a network quickly.

A processed book can be published as a node in a network, with connections to other books, commentary, online library card catalogues, teachers’ recommendations ...

Something like this is already going on in the world of academic journals. As noted earlier, Reed Elsevier, the leading publisher of journals, has built a powerful search engine for its collection of academic research. This is a shrewd move, and it may be that Reed's customers, primarily research libraries, don't yet see what is going on. There is a growing movement in the research community for changes in the way research is published, with not a few people arguing that all academic research should be made available online for free to anyone with an Internet connection. Reed has been the whipping boy for this movement, perhaps with some reason, as it has pushed through aggressive price increases on many of its journals. Many articles are now appearing in various pre-publication forms on the Internet, which could undermine the value of Reed's journals, and there is a move afoot (see, for example, the writings on this subject by Stevan Harnad) to have researchers self-archive their work prior to submitting it to a journal. This would give a publisher pause. If an article is self-archived on the Internet, then anyone with a Web browser can read that article for free. Why then would librarians continue to purchase journals, especially if the prices continue to rise?

Reed's answer is to create a processed book, though, of course, it is never phrased this way, nor is the underlying strategy ever expounded on. Reed's search engine adds value to the journals indexed and searched; the extensive links add more value. To the database of journals are added many public domain documents, all of which can be searched at the same time. The database gets bigger, thus the need to have a good search engine becomes greater. Now the value of any one document is significantly augmented by virtue of its being part of the network that Reed has created. If copies of these articles are placed in a self-archive, what value they have is theirs alone, assuming that anyone can find them; but placed within the network, the value of the nodes rises. The inclusion of public domain documents is particularly crafty. Reed has migrated the value from the public domain documents themselves to the search engine, the dynamic metadocument, which helps a reader find the underlying documents. In some respects Reed is coopting the public domain. So who needs copyright? The economic challenge for content creators and publishers is to create content that requires its incorporation into a network and to make sure that the network's domain-specific search capability is always a step ahead of general-interest search engines such as Google.

Can this work for books as well as journals? It can and it will. By the time we get to the twenty-third title in the Harry Potter series, ebooks may be ubiquitous. The new title will be published electronically and will have built into it such things as links to key passages and characters from the previous books, a Harry Potter dictionary, connections to Web sites for Harry Potter clubs, and much, much more. There will be a temptation to pirate the text, but the pirate won't get the built-in links to the trailer to the next movie — not cool, as any kid will tell you. Piracy will be kept in check by reinventing the highly primal Harry Potter titles as processed books. The economics of publishing will demand it.

III

Who is going to implement these aspects of the processed book? Authors? Their agents? Or is there hope for publishers, those benighted organizations that steal the souls of authors?

The answer is, All of the above and more. The processed book is not one thing, nor will it be implemented as a single coherent system whenever a new book is written or published. The processing of the book will be organic. Authors will pitch in, as will their readers, critics, publishers, librarians, and anyone else who touches a book. There is obviously a place here for software vendors as well, and the telecommunications companies that ship this content around will benefit regardless of what a particular node of the network is about. Not all processed books are created equal. Books that lend themselves to linkages to other texts will be processed more. There will be a new quantitative measure: a book is equal to the sum of things that can be linked to it and through it.

Authors will process their own books in the act of writing; indeed they have been doing so for some years now, as word-processing is an aspect of the processed book. They will increasingly write with the processed potential in mind, changing the nature of their texts, as reflected in their style, choice of topic, etc. As they help to bring their books to the attention of others, they will process their books further, perhaps by creating Weblogs (see, for example, www.andrewsullivan.com), which can help promote a book and encourage others to add new layers of processing to it. They will choose their agents and publishers in part for how much processing they can bring to the task, just as they now inquire into who has the best relationships with television talk show producers.
Will they self-publish and disintermediate publishers entirely? Some will, but disintermediation is overblown. Publishers are in a better position than individual authors to develop the network of essential processed relationships. It is hard to build and maintain a single Web site, but it is trivial to build and maintain a thousand. The scale of a publishing house will benefit all the authors, books, and nodes that connect to it. Publishers will acquire software tools for all their titles that can be brought to bear for any individual author, tools to create portals and platforms, for books to be made self-referencing and to be converted into machine components, and ultimately to take their positions as nodes on the network. These tools will be distributed to others in the value chain, to readers and critics, who will add new links to each text. We can see the beginnings of this on Amazon.com, where each title is surrounded by copy provided by publishers and reviews by amateurs and professionals. Because of the cost to create the processed book, size matters: the processed book will contribute to the ongoing consolidation of the publishing industry, as fewer but bigger houses take a larger share of the market.

In the final indignity to authors, it seems likely that the creators of books will begin to lose control of the editorial entity. This has always been true to some extent, of course, as bad or misguided reviews have influenced the reception of many books over the years. With the processed book, however, matters change in both degree and kind. A book can find itself overwhelmed with linked commentary, and if the commentary is irresponsible, how does one correct that except by overlaying even more commentary? Worse (or better, depending on one's bias), multiple versions of a work can circulate independently, each developing its own network around it; and if one or more of these versions do not precisely represent the author's original intention, well, who is to say? What is to stop someone from making some changes in the original work (I never did like Desdemona's eyes) before forwarding the revised text to the next person, and the next person, to everyone, and forever? Or perhaps, by virtue of the sheer accessibility of all texts at all times, with all their relationships mapped in exquisite and excruciating detail, there is an Invisible Hand in critical commentary that ensures that over time the "right" text and the "right" interpretation will prevail.

IV

Suppose this essay were written and augmented as a processed book? How would it differ from what you are reading right now?

To begin with the obvious point, this essay is a processed book, though the extent of its processing is too conventional to catch anyone's attention. This document was created in Microsoft Word, which is word-processing (good term) software. It has been edited and reedited, moved around, and spliced and diced. It has also been sent over the Internet any number of times to friends who agreed to provide comments. In sending it around, I chose to leave it in the Word format, which enables editing, rather than freezing the text as a PDF file. Indeed, one of the alleged virtues of PDF files, namely, that they can prevent tampering with the text, may in the context of a processed book prove to be a liability, as uneditable files are less likely to have a network of comments built around them. (Some new variants and add-ons to PDF technology permit a base file to remain unaltered even as edited versions, including those with commentary, are displayed alongside.) An early version of this essay has even been mounted on a Web server, which may in time raise a practical problem for me: How will I make sure readers find their way to the current version? The answer, for better or worse, is that the processed book inevitably leads to a loss of editorial control. This makes me wonder if in a world without editorial control, authors may cease to write for attribution.

But let's work through the five aspects of the processed book and see how they apply to this very document.

As a portal, "The Processed Book" would include links to many Web sites for further information on people and ideas discussed here. For example, there would be links to further information on Stevan Harnad, Alan Kay, and Ted Nelson. There might also be links to earlier drafts of this essay, perhaps including the many e-mails I collected from people who have commented on it, not all of them favorable. The reference to doing a Google search on "computational linguistics" would be enacted right within the text: click and the search results would appear.

The essay has already served as a makeshift portal for one reader, a longtime friend who went through it carefully. He came upon the references to Isabel Archer and Osmond and didn't know what they were. So he proceeded to Google and soon found himself reading about a novel by Henry James entitled Portrait of a Lady, whose protagaonist, Isabel, marries the scheming Osmond. Why were these references not spelled out in the text to begin with? For several reasons. First, in my (East Coast) circle almost everyone knows who Isabel Archer is, though references to Calvino (for example) require greater explanation. But you never can tell what the range of reference of a particular reader is; there is, after all, no agreed-upon culture to draw upon, no canon of bedrock ideas; in such a world, which is the one we live in right now, the processed book becomes a means of cultural unification. One reader (West Coast) will be puzzled by Isabel Archer, another (East Coast) by references to algorithms derived from Bayesian statistics. A writer has to work with all such possibilities, which is why the processed book-as-portal is inevitable. Another reason Isabel was not explained in the text was my own anticipation of making the very point I am making now; in other words, I left the reference slightly obscure in order to demonstrate the need for the book-as-portal. I chose to write the essay under my own surveillance.

The book-as-portal will become more robust in time. It is one thing to look up keywords in Google (e.g., proper names), but it is still a considerable challenge to capture some allusions. The phrase "all men are cousins" appears early in this essay, but what is a search engine to do with it? An alert reader will pick up the allusion to "all men are brothers," but Google also gives us "all men are pigs," "all men are scum," and "all men are created equal." (Presumably for some audiences, all these phrases are equivalents.) The alert reader will also note that the phrase "all men are cousins" is the only time in the document that so-called sexist terms are used — but, the writer protests, I couldn't very well have said "all men and women are cousins," as that would have obscured the allusion to "all men are brothers." There is cultural content here that would make it very hard for a fully automated process to generate a meaningful link. But this will improve, and probably soon.

There is also a reference to Walter Pater in this essay, which I will decline to highlight. Literary students will pick it up, but for everyone else it will remain hidden. When the time comes when everyone can find the allusion, the processed book-as-portal will truly have arrived.

As a self-referencing text this essay could provide comprehensive indexes, which are currently not included. Besides the aid such indexes would provide to readers, such indexes would also make it easier for search engines to find and classify this essay, which would in turn potentially bring more readers to it, assuming it were posted on the Web, as inevitably it will be. A self-referencing text would also clear up some possible confusion in the preceding section, where I referenced Harnad, Kay, and Nelson. Only Harnad has been discussed up to this point; Kay and Nelson are yet to come. A self-referencing text would permit a reader to see all references to Kay and Nelson simultaneously. Such a text would also cluster metaphors and categories of information together. So, for example, all literary references could be highlighted, as could all references to the computer industry. A harder trick would be to identify all the metacomments, of which there are many. How can a machine tell the difference between saying something and saying something about saying something? It may be that self-commentary will be the last bastion of the purely human.

A self-referencing text could also provide quantitative information. For example, a publishing friend asked if I intended to publish this essay in book form, which brought up the question of its length. A word or byte count is a trivial exercise for a machine. Although it isn't clear why anyone would want to do it, this text could also be analyzed to determine how much space was given to each topic or the distance between literary allusions or the frequency of quotation marks and special characters.

The value of a self-referencing text grows with the length of the work. For this essay, self-referencing is not particularly revealing; for Moby Dick, it would be breathtaking. On the other hand, if all the comments made on the drafts of this essay were to be included as part of the text, self-referencing would become more valuable, as it would trace the evolution of ideas. This raises the question of whether self-referencing of a text should apply only to a particular network node or to the entire processed network.

It is very interesting to think of this essay as a platform. To a small extent, it has already served in that capacity. One early reader asked me for permission to use one section for a project he was working on, a major reference work in botany. He wanted to present the book-as-platform idea to the writers; he wanted, in other words, to use a section of "The Processed Book" as material for his own work — he wanted to use the essay as a platform. Well, this is only a tiny matter of technology: all he needs to do is copy the relevant section and paste it into his document. But what he perceived is that a book is often copyrighted and that he needed more than cut-and-paste technology to use the essay as a platform.

If this essay were a platform, it would include tools to enable other writers to "call" its text or a section of the text. These tools would necessarily include copyright information, without which clearing permissions can become tiresome. (I don't want to get into the fair use aspect of copyright law, though it is relevant here, as it is complicated and certain to provoke much unproductive argument.) One company (now defunct) had a technology that is likely to be imitated that had the copyright policies of a particular work pop up on the screen simply by having the mouse pass over the object in question. What would those policies be? It depends. A writer or publisher could take a tough stance on copyright, requiring all uses of the platform to involve permission and fees. Or there might be a matrix for copyright questions, depending on the size and nature of the use — free for schools, costly for corporations, and so forth. For that matter, the work could simply be put into the public domain.

Fascinating work in this area is being put together by Hal Abelson and Lawrence Lessig at their public service organization, Creative Commons. Among other things, Creative Commons proposes to "brand" the public domain, that is, it is developing a set of signposts so that users will know whether or not a particular information object is under copyright. As part of this project, a series of intellectual property contract templates is being developed, which will allow the owner of a creative work to determine the copyright status of his or her work. This is important. Prior to the work of Creative Commons, much intellectual property was either totally controlled by its owner or not controlled at all, that is, it was in the public domain. The contracts being developed by Creative Commons would allow me, as the author of this essay, to choose an intermediate position. I might assert the right for all commercial uses of this essay (not many and not worth much), but I might also stipulate that noncommercial uses require no fees or permissions. If this were a novel, I might insist that I controlled everything in it, but I might make the characters available to others for free or for a fee for derivative works.

... we are not only dealing with what technology can do with content but also about the total set of social and legal issues that surround a work.

The point here is that as we think of the processed book, we are not only dealing with what technology can do with content but also about the total set of social and legal issues that surround a work. Social and business rules can be codified and instantiated within technology. A reader or user can then draw on these rules without fear of violating anyone's rights. The book-as-platform may have more to do with copyright law and marketing strategy than with bits and bytes.

With the book as a machine component, things really begin to get interesting. Hide as I may try, this essay says a lot about me. The word choice and syntax are mine, the allusions part of my mental framework. Words and ideas don't have to be original to say something about the person who uses them. For example, the fact that I prefer the work of Borges to that of Faulkner, though Faulkner is arguably the superior writer, says something about me, even though I couldn't hope to write a line like Borges; we are, after all, our tastes as well as our expression. The works of Marshall McLuhan and Ted Nelson are as much a part of me as extraordinary tales of growing up in Fort Lee, N.J. Computers can take this essay and convert it into a proxy for me through various analyses. In other words, "The Processed Book" is the raw material that can result in a computer agent.

What would an agent do? Just about anything. I would like a well-crafted agent that would regularly poll the Internet for things of interest and that would also filter out a number of related things. For example, I am interested in copyright issues on the Internet (as this essay reveals), but hardly want to read all the manifestoes of the information-wants-to-be-free crowd: perhaps an agent can find information on copyright and weed out the histrionics. An agent could also be used to find things that I don't even know I care about by identifying themes in my writing (e.g., submerged metaphors) and matching them to related themes found on servers anywhere.

Computer agents are not new. What is new is the increasing sophistication with which they are being built and their purposes. All Internet users are familiar with the kind of profiling that ecommerce sites habitually engage in, profiling that says something about the kind of merchandise to offer particular users. Most of these agents are put together, however, in fairly clumsy ways. So, for example, the all-important Zip Code is likely to say something about one's household income and education level and many other things besides. But we all know how imperfect Zip Code analysis is. On my street we have university faculty, Silicon Valley executives, and (apparently) a couple New Age households made up of students and former students. And let's not forget the transplanted retirees (this is a beach town). But what, someone is bound to ask, does that say about me? By taking a statistical abstract of a person's writings, these profiles can become more intimate, and their uses can become more interesting than determining which digital camera I am likely to buy.

One intriguing application of the use of personal content is to create spam filters. Paul Graham (see http://www.paulgraham.com/spam/html) has written a white paper on the use of Bayesian statistics to develop highly accurate filters to catch unwanted unsolicited e-mail. This works by breaking a user's incoming e-mail into spam and not-spam (the user determines which is which). Then a statistical abstract is taken from both groups and all further incoming e-mail is measured against these abstracts. An additional feature is that the filter becomes better the more you use it, as you continue to build a larger database, which makes the statistical measures increasingly accurate. It is not hard to imagine similar processes to be applied to the content of "The Processed Book."
Were this essay to become a machine component, its task would be to serve as my virtual representative — it would become, in other words, the soul of the machine. Such a machine would incorporate human culture (mine) into its processes and thus become more human-like in the tasks it can take on. And why stop with this essay? We could add all the e-mail I write (and give it a high ranking), all the Web pages I view (and give them a lower ranking, because reading is not as close to the bone as writing), and anything that is my personal expression. This is the ultimate goal of the processed book: to inform a generation of robots, not to make the world more machine-like but to make machines more human.

It should be clear by now how "The Processed Book" would serve as a network node. All the other four aspects of processing would apply here: the portal, self-referencing text, platform, and machine component. Each of these aspects contribute to the network. Commentary would sit somewhere between the portal and platform aspects, depending on which text is doing the pointing and which is being pointed to. "The Processed Book," in other words, like any written document, develops a community around it. The relative size of that network depends on the importance of any particular book — a small network for this essay, an enormous one for Ted Nelson's Literary Machines. It is noteworthy that such a network has in fact not sprung up around Literary Machines, despite that work's enormous importance, almost certainly as a result of the author's eccentric decision to self-publish, denying Literary Machines of the marketing clout of even a modestly-sized publisher.

It is an interesting marketing exercise to consider how to build such a network for "The Processed Book." Most obviously, the paper should be mounted on a Web server, where it will be indexed by search engines, which will in turn point users to it. It can also be distributed in various pre-publication forms, some of which will inevitably end up on the Web as well (this is already happening). It can be sent around to interested (and uninterested) parties as an e-mail attachment. Links to it can be posted in newsgroups. The way to market this book, or any book, in networked mode is to let the network do the work. This means relaxing some common controls. Digital Rights Management (DRM), for example, which can reduce or eliminate the copying of digital works, may be a good economic decision for Stephen King and John Grisham, but unknown authors — like that of "The Processed Book" — are better off allowing their work to be copied and sent around — and even in some cases to be changed somewhat. Since a friend posted a draft of this essay on a Web site, which I noted in two newsgroups, I have been astounded by the number of responses I have received. The network is working.

The processed book of tomorrow will have to fight for attention just as much as yesterday’s primal book.

Of course, not all network nodes are created equal. (Imagine for a moment what a computer could do with that sentence. Besides picking up the reference to "The Gettysburg Address," it would also note the earlier passage in this essay where the phrase "all men are created equal" appears and then back into "all men are scum," etc. The poor machine!) The book-as-network is a new phenomenon and we still don't know what the inherent rules for building out such networks are. Does every node have the potential of building an ever-growing network, or do some nodes have the potential to diminish or even wipe out the network aspirations of other nodes, as the wake of a large ship will overwhelm that of a tiny rowboat? We don't know the answer to this at this time, but my guess is that in a networked world, the big shall rule and that the diversity of voices that currently characterizes the Internet will increasingly become dominated by the roars of a handful of media empires, barring a regulatory regime. The processed book of tomorrow will have to fight for attention just as much as yesterday’s primal book.

V

Although I noted at the outset that the processed book was not to be confused with physical devices, it is useful to see how the advent of the processed book will help to influence the shape of such devices.

The concept of the processed book complements such ideas about electronic publishing as Alan Kay's Dynabook and Ted Nelson's hypertext and networked information. The Dynabook is essentially a hardware concept: a portable hand-held computer that could serve as a viewing device for the world's knowledge. It differs in an obvious way from the processed book in that the processed book is about content in digital form, not hardware, whose creation is shaped by the presence of ubiquitous computing. Of course, many of the ideas of the Dynabook have now found their way into the marketplace in the form of personal digital assistants and some aspects of wireless phones. These devices, among many others, play a role in the development of the processed book: For viewing, editing, linking, and communicating or transmitting.

Ted Nelson's vision of non-linear writing closely resembles the concept of the book-as-network-node, though Nelson ultimately became devoted to building a system to enable his vision and focused less on the creation of content. The distinguishing aspects of the processed book are that (a) it is about content; (b) it outlines how the creation of content changes in a digital environment; (c) it implies a certain business dimension (who will build these tools and why); and, (d) it points to the increasing alienation of an author from his or her work as the act of processing serves to separate the wellsprings of creativity from all the acts of summarizing, indexing, and abstracting that automation is heir to. Ultimately Kay and Nelson are humanists, but the processed book is a post-modern development.

Among the many competing visions for electronic publishing today, one (mostly favored by established media companies) wants electronic publishing to look very much like hardcopy publishing, but without the expense of managing physical inventory. To which I say: Nice work if you can get it. This vision usually concerns itself with such things as copyright protection and is inclined to support electronic publishing initiatives where properties are kept distinct. One outcome of this vision is a generation of ebooks — in this case, hardware devices — that are dedicated book readers. The word "dedicated" is important: An ebook that is only an ebook is fundamentally different from a digital cell phone or personal computer, which have multiple applications. A dedicated ebook is a separate device. Most importantly, it is not designed as a computer peripheral because to do so would mean that the content would be copied in the process of moving from the computer to the peripheral, and if it can be copied in that way, it can be copied in many others.

A book that stands by itself literally stands by itself. It competes with an army of networked information.

This version is bound to fail, not because copyright is dead but because all such books published in this manner will have to compete with books that draw on the resources of the processed book. A book that stands by itself literally stands by itself. It competes with an army of networked information. This is not to say that some individual books are not better in some important way than a book that is a network node, but to make the obvious marketing point: The real challenge for creative people is to get others to pay attention to their work. This is why publishers exist and why they will continue to exist. Without the support of a network, most books will get lost amidst the huge outpouring of new material.

So we shouldn't expect to see dedicated ebook readers. Instead, we will have reading devices that connect to other computing devices: The wireless phone with a bigger and better display, for example. In the technology world, these are called convergence devices. In time we should expect that we will all carry one — and only one, serving multiple uses.

This means tradeoffs. While a dedicated ebook reader would naturally be optimized for the reading experience, convergence devices will please no one entirely. The importance of this is that it will slow down the acceptance of digital readers as hardcopy simply continues to do a better job for some functions. Over time, even as we see the hardcopy world shrink, certain areas will remain mostly in ink and paper, literature in particular. The processed book will invade professional information first, college texts second, and then begin to nibble at the edges of consumer or trade publishing. The inroads of the processed book will be gradual enough that many people will not notice it happening, even as they now happily and innocently purchase DVDs of movies that include all sorts of "non-primal" elements such as previously deleted scenes and interviews with directors and actors. The processed book will inevitably takes its place on the virtual bookshelf, where it will be read in front of the fireplace, while the genetically engineered dog snoozes on its pillow.

About the author

Joseph J. Esposito is President of Portable CEO, a Bay Area independent consultancy specializing in digital media. Mr. Esposito has extensive experience in the publishing and software industries, and has served as CEO three times: at Encyclopaedia Britannica, where he led the company to create the first online encyclopedia; at Tribal Voice, an Internet Communications company; and, SRI Consulting. In all three instances he developed and consummated exit strategies for the shareholders. Currently his clients include both commercial and not–for–profit organizations. He is actively engaged in researching business models for a post–copyright age. He can be reached at espositoj@gmail.com.

Note

This material is copyright © 2003, 2005 by Joseph J. Esposito. It may be used for non–commercial purposes provided the copies bear the full copyright notice and the author’s e–mail address: espositoj@gmail.com

Editorial history

Paper received 30 December 2002; revised 9 February 2003; accepted 26 February 2003; latest additions 23 October 2005.

Copyright ©2003, 2005 Joseph J. Esposito

The processed book by Joseph J. Esposito
First Monday, volume 8, number 3 (March 2003),
URL: http://firstmonday.org/issues/issue8_3/esposito/index.html