Four sets of Web pages previously visited in the summer of 2000 were revisited one year later. Of 707 pages containing metatag descriptions in 2000, 586 retained descriptions in 2001, and, of 1,230 pages lacking descriptions in 2000, 101 had descriptions in 2001. Home pages appeared to both lose and change descriptions more than other pages, with about 19% of descriptions changed in the two sets where home pages predominated versus about 12% in the other two sets. About two-thirds of changes involved minor revisions, and changes fell into a wide variety of categories. Some implications for software to assist in description revision are discussed.
The research reported on in this paper originates in the author's ongoing research aimed at developing a computerized abstractor's assistant (Craven, 1988, 1991, 1993, 1996, 1998). In addition to a simple word processor and other general writer's tools, the assistant integrates tools, such as an automatic extractor, related specifically to the task of summarizing. Apart from the author's own work, Paice (1994) has given a list of desirable features for such a package. A hybrid system, in which some tasks are performed by human abstractors and others by software, appears to be an appropriate short-term goal, since purely automatic abstracting methods (Endres-Niggemeyer, 1998; Paice, 1990, 1994; Pinto and Galvez, 1999) do not show immediate promise of totally superseding human effort.
With a view specifically to applying the computerized assistant to summarizing Web pages, this development work lead to an investigation (Craven, 2000) into how people and organizations in fact summarize their own Web pages, specifically in metatags on the pages themselves. An underlying assumption of that investigation was that the author would create descriptions that would reflect features that authors and other users might consider desirable.
Other aspects of the content of Web pages have been studied by various researchers; for example, page layout of home pages (King, 1998); characteristics of anchors (Haas and Grams, 2000); informetric measures (Almind and Ingwersen, 1997); links to e-journals and their articles (Harter and Ford, 2000). Little investigation had been done into descriptions in metatags. Turner and Brackbill (1998) have, however, reported results of a small experiment that showed that addition of a description did not improve retrievability of Web pages on Infoseek and Altavista, and similar results have been reported for the these two search engines and five others by Henshaw and Valauskas (2001).
Another article by the author (Craven, 2001) has reviewed advice given in both printed and Web-based sources on the function, content, structure, and style of metatag descriptions.
Unlike scholarly articles and other traditional published documents, Web pages are frequently dynamic, subject to regular, or irregular, updating. Thus, authors of Web pages may benefit from assistance, not only with the initial creation of metatag descriptions, but also with the revision of these descriptions as pages evolve and are revised over time.
One question regarding such revision is how often it is in fact required. A Web page may be revised frequently, and yet its overall description may remain entirely valid and in no need of further attention. A possible indication of the frequency with which Web page descriptions should be revised is the frequency with which they are in fact revised. We might conjecture that the latter figure would form a lower bound for the former, on the assumption that desirable revisions are often not carried out but that revisions that are carried out are generally needed.
A related question relates to whether revisions are more commonly needed for some kinds of pages than others. If certain kinds of pages need more frequent description updates, these should be flagged for special attention by description authors.
A further question regarding description revision is whether there are particular types or patterns of revision that are common and for which it might be worthwhile developing special assistance tools. For instance, is more information simply appended to the end of a description frequently enough to warrant the design of a method of simplifying this procedure? Are changes to names of associated companies or organizations often the reason for a revision, or is content more important? It may also be more helpful to concentrate any revision assistance tool development efforts on types of pages need more frequent revision.
Four sets of URLs for pages had been identified in the summer of 2000 through the Yahoo! random page service. The four sets consisted of two derived directly from the random page service (level 1: sets 1a and 1b), one derived indirectly by following a random link from a page returned by the service (level 2: set 2), and one derived by following a chain of two random links from a page returned by the service (level 3: set 3). Level-1 pages tended to be home pages to a significantly greater degree than pages at levels 2 and 3.
Each set was divided into two categories: those having metatag descriptions in the summer of 2000 and those not having metatag descriptions at that time. The pages were then revisited a year later, to determine the types of changes that might have taken place: what proportion had lost descriptions, what proportion had gained descriptions, and what changes had been made to descriptions. When a requested page was returned, specially designed software logged data that included the metatag description and the URL.
The following statistics were computed for of the eight sets (the four original sets divided by presence or absence of metatag descriptions in 2000):
- number of URLs in the original list;
- number of items in the log file;
- number of items in the log file as a percentage of the number of URLs in the original list;
- number of items in the log file to be disregarded on account of titles indicating errors, etc. broken down as follows:
- a. number with empty titles
- b. number with titles containing various "not-found" types of message
- c. number with titles indicating a moved or redirected page.
- d. number with titles containing various "non-compliant-browser" messages.
- e. number with largely or entirely unreadable titles.
- f. number of items remaining after disregarding the above.
- g. number of these containing descriptions.
- h. proportion containing descriptions out of those not disregarded.
The "not-found" category included a large variety of specific texts that might occur in the title: "401.2", "404", "access denied", "bad page", "bad request", "cannot be found", "cannot find", "cannot be displayed", "CGI Script Error", "couldn't find", "directory listing denied", "error", "does not exist", "doesn't exist", "existiert nicht mehr", "index of", "invalid URL", "lost", "most likely you have typed the wrong URL", "no longer available", "no site selected", "not available", "not found", "out of date", "server notice", "server report", "service unavailable", "site unknown", "we apologize for the inconvenience", "we are currently doing maintenance on our servers", "wrong page", "yet to be created". Moved or redirected pages were marked with titles containing the texts "moved", "redirect", "redirecting", or "site jump". Noncompliant browser messages included the following texts: "browser error", "browser not compliant", "HTTP version not supported", "requires internet explorer 4", "update your browser".
In cases where pages containing descriptions could be found in 2001 with the same URLs as pages that had contained descriptions in 2000, the description pairs were compared. The number, and the percentage, with changed descriptions was determined for each set. Any specific changes were noted for each pair, and changes were also more broadly classified as follows, where applicable:
- word-for-word reduplication of the description;
- appending to an otherwise unchanged description;
- dropping a proper name;
- adding a proper name;
- change in reference (different company, etc.).
Table 1 shows the statistical summary for the four sets that contained descriptions in 2000.
Table 1: Analysis of Four Sets Containing Descriptions from Yahoo! Random Page Service in 2000
Set 1a Set 1b Set 2 Set 3 number logged in 2000 319 406 213 227 number logged in 2001 264 329 163 170 percentage 82.8% 81.0% 76.3% 74.9% number disregarded 58 82 32 47 empty titles 23 42 5 16 errors 31 37 27 30 moved/redirect 4 2 0 1 unreadable 0 1 0 0 remaining 206 247 131 123 containing descriptions 163 196 116 111 percentage remaining containing descriptions 79.1% 79.4% 88.6% 90.2%
It may be noted that the percentages still containing descriptions after one year are higher for sets 2 and 3 than for sets 1a and 1b. The previous research had shown that sets 1a and 1b appeared to consist much more of home pages than did sets 2 and 3; thus, it would appear that home pages may be somewhat less likely to retain descriptions than are other pages on a Web site.
Table 2 shows the results for the pages that did not contain descriptions in 2000.
Table 2: Analysis of Four Sets Not Containing Descriptions from Yahoo! Random Page Service in 2000
Set 1a Set 1b Set 2 Set 3 number logged in 2000 513 633 607 600 number logged in 2001 420 510 475 482 percentage 81.9% 80.6% 78.3% 80.3% number disregarded 139 177 163 178 empty titles 79 79 69 50 errors 55 85 92 122 moved/redirect 4 10 1 6 unreadable 1 3 0 0 remaining 281 333 312 304 containing descriptions 22 41 15 23 percentage remaining containing descriptions 7.8% 12.3% 4.8% 7.57%
In the level-1 sets, the loss of descriptions from pages that previously had descriptions is not made up by a gain in descriptions on pages that previously lacked them, with net changes of -21 (=163-206+22) and -10 (=196-247+41) respectively. In the other sets, by contrast, losses are more than made up by gains, with net changes of 0 (=116-131+15) and +11 (=111-123+23) respectively. Overall, the proportion of pages containing descriptions appears to remain fairly steady.
Table 3 shows the results for 2000-2001 description pairs for pages with the same URLs.
Table 3: Analysis of 2000-2001 Description Pairs for Pages with the Same URLs from Yahoo! Random Page Service
Set 1a Set 1b Set 2 Set 3 number of pairs 162 194 116 110 number with change in description 32 37 13 14 percentage with change in description 19.8% 19.1% 11.2% 12.8% word-for-word reduplication 1 1 0 0 appending 3 3 0 0 dropping proper name 8 3 1 2 adding proper name 6 3 2 3 change in reference 2 11 1 3
The higher proportion of changed descriptions at level 1 when compared to the other two levels is statistically significant according to a chi-square test (p=0.0185).
In addition to the categories of change enumerated above, various others were observed. About 29 pairs showed a major modification, to the description as a whole, but with the site affiliation remaining the same (though a company's name might have changed); 3 others involved changes in the overall description of services. A few changes were relatively minor: four involved only removal of something from the end of the description; two, the addition of a single word ("wooden", "brown").
It would appear that home pages may be somewhat less likely to retain descriptions than are other pages on a Web site. A likely cause is the frequent redesign of home pages, possibly using various page creation software; because the descriptions are invisible when previewing a page, they are easily overlooked.
When home pages do retain descriptions, these are more likely to change. This makes sense in terms of search engines, since the home page descriptions can be viewed as more important advertisements for the sites than descriptions of other pages; site administrators might therefore tend to pay more attention to updating.
Overall, there is no indication from the present research of either a net decline or a net increase in the use of metatag descriptions, at least over the time period covered. Various possible developments might cause a disturbance in this apparent steady state: changes in search engine policies; addition of metatag display to browsing software (Beagle, 1999); the advent of page editing software that makes metatags more prominent or assists in their creation; inclusion of the metatag description as a required element in HTML, the omission of which would be flagged by validation services (we may compare here the requirement of the ALT attribute for images in HTML 4); supplanting of present metatag descriptions by another kind of meta data, such as Dublin Core (Dublin Core Metadata Initiative, 2000) or by external descriptions generated by commercial indexing services (Thomas and Griffin, 1998).
In terms of assisting authors in updating descriptions, about one third of the changes in descriptions observed involved major rewriting, but about two thirds involved lesser modifications. Software should thus certainly make it easy for authors to access an existing description and make small modifications to it. The types of minor changes made varied considerably, and no common pattern emerged, suggesting that support for modification should be of a fairly general kind; for example, simply appending to the old description is rare, and a feature to facilitate this function would likely receive little or no use. Another observation supporting the idea of keeping revision functions fairly general is the likelihood that individuals use quite different approaches in revising, just as they have been reported to do in writing abstracts of scholarly articles (Endres-Niggemeyer, Waumans, and Yamashita, 1991).
The assumption was mentioned earlier that desirable revisions are often not carried out but that revisions that are carried out are generally needed. Future research might address the extent to which this assumption is in fact true. For example, if both an older and a newer description are both equally applicable to the same version of the corresponding Web page, was the change actually needed?
The converse clearly is whether needed revisions are not being performed. Over a longer period of time, one might attempt to determine the extent to which descriptions become stale and no longer valid. A related question is naturally how good the descriptions are in the first place. Previous investigations by the author have clearly shown that they are certainly not particularly consistent in content or form. Inconsistencies and other defects have also been demonstrated in published author abstracts (Pitkin, Branagan, and Burmeister, 1999).
This study examined internal self-descriptions, contained within the Web pages themselves. The author has recently begun to analyze external descriptions, on Web pages that link to the Web pages described. Some earlier work in this area was also done by Wheatley and Armstrong (1997). Looking at patterns of changes in external descriptions could also be of interest. This might be especially true of large collections of descriptions by single authors, which could serve as case study material to see whether assistance tools could be customized to suit the individual styles of composition and revision. Patterns of change may also be important in descriptions by multiple authors which are nevertheless intended to aim at a certain consistency.
About the Author
Timothy C. Craven is a Professor in the Faculty of Information and Media Studies, The University of Western Ontario. He has published some 50 articles since 1976 in the areas of computer-assisted indexing, abstracting, and thesaurus construction and is the author of String Indexing (New York: Academic Press: 1986). He currently teaches courses in the Graduate Library and Information Science program in research methods, Internet information services, database management systems, software evaluation, and subject analysis and thesaurus construction.
Research reported in this article was supported in part by the University of Western Ontario Office of Research Services with funds provided by the Natural Sciences and Engineering Research Council of Canada.
The extensive assistance of research assistant Emmett Macfarlane in data gathering is also acknowledged.
T.C. Almind and P. Ingwersen, 1997. "Informetric analyses on the World Wide Web: Methodological approaches to 'Webmetrics'," Journal of Documentation, volume 53, number 4, pp. 404-426.
D. Beagle, 1999. "Visualization of metadata," Information Technology and Libraries, volume 18, number 4, pp. 192-199.
T.C. Craven, 2001. "'DESCRIPTION' META tags in locally linked Web pages," Aslib Proceedings, volume 53, number 6, pp. 203-216.
T.C. Craven, 2000. "Features of DESCRIPTION META tags in public home pages," Journal of Information Science, volume 26, number 5, pp. 303-311.
T.C. Craven, 1998. "Human creation of abstracts with selected computer-assistance tools," Information Research, volume 3, number 4, paper 47, at http://www.shef.ac.uk/~is/publications/infres/paper47.html.
T.C. Craven, 1996. "An Experiment in the use of tools for computer-assisted abstracting," In: S. Hardin (editor). ASIS '96: Proceedings of the 59th ASIS Annual Meeting 1996, Baltimore, Maryland, October 21-24, 1996, volume 33. Medford, N.J.: Information Today. pp. 203-208.
T.C. Craven, 1993. "A Computer-aided abstracting tool kit," Canadian Journal of Information Science, volume 18, number 2, pp. 19-31.
T.C. Craven, 1991. "Algorithms for graphic display of sentence dependency structures," Information Processing and Management, volume 27, number 6, pp. 603-613.
T.C. Craven, 1988. "Text network display editing with special reference to the production of customized abstracts," Canadian Journal of Information Science, volume 13, numbers 1/2, pp. 59-68.
Dublin Core Metadata Initiative, 2000. Dublin Core Element Set, Version 1.1, at http://purl.oclc.org/dc/documents/rec-dces-19990702.htm, accessed 24 April 2000.
B. Endres-Niggemeyer, 1998. Summarizing information. Berlin: Springer-Verlag.
B. Endres-Niggemeyer, W. Waumans, and H. Yamashita, 1991. "Modelling summary writing by introspection: A Small-scale demonstrative study," Text, volume 11, number 4, pp. 523-552.
S.W. Haas and E.S. Grams, 2000. "Readers, authors, and page structure: A Discussion of four questions arising from a content analysis of Web pages," Journal of the American Society for Information Science, volume 51, number 2, pp. 181-192.
S.P. Harter and C.E. Ford, 2000. "Web-based analyses of e-journal impact: Approaches, problems, and issues," Journal of the American Society for Information Science, volume 51, number 13, pp. 1159-1176.
R. Henshaw and E.J. Valauskas, 2001. "Metadata as a catalyst: Experiments with metadata and search engines in the Internet journal, First Monday," Libri, volume 51, number 2, pp. 86-101.
D.L. King, 1998. "Library home page design: A Comparison of page layout for front-ends to ARL library Web sites," College and Research Libraries, volume 59, number 5, pp. 458-465.
C.D. Paice, 1990. "Constructing literature abstracts by computer: Techniques and prospects," Information Processing and Management, volume 26, number 1, pp. 171-186.
C.D. Paice, 1994. "Automatic abstracting," In: A. Kent and C.M. Hall (editors). Encyclopedia of Library and Information Science, volume 53 (supplement 16). New York: Dekker, pp. 16-27.
M. Pinto and C. Galvez, 1999. "Paradigms for abstracting systems," Journal of Information Science, volume 25, number 5, pp. 365-380.
R.M. Pitkin, M.A. Branagan, and L.F. Burmeister, 1999. "Accuracy of data in abstracts of published research articles," Journal of the American Medical Association, volume 281, number 12, pp. 1110-1111.
C.F. Thomas and L.S. Griffin, 1998. "Who will create the metadata for the Internet?" First Monday, volume 3, number 12, at http://firstmonday.org/issues/issue3_12/thomas/.
T.P. Turner and L. Brackbill, 1998. "Rising to the top: Evaluating the use of the HTML meta tag to improve retrieval of World Wide Web documents through Internet search engines," Library Resources and Technical Services, volume 42, number 4, pp. 258-271.
A. Wheatley and C.J. Armstrong, 1997. "Metadata, recall, and abstracts: Can abstracts ever be reliable indicators of document value?" Aslib Proceedings, volume 49, number 8, pp. 206-213.
Appendix: Examples of Description Changes
Capitalization and punctuation have been standardized to facilitate reading. Changed passages are marked in italics.
2000 version 2001 version URL The Job Resource, the tool of choice for online college recruiting, where college students and recruiters come together. We offer an array of features not found on any other site. AfterCollege, formerly The Job Resource, the tool of choice for online college recruiting, where college students and recruiters come together. We offer an array of features not found on any other site. http://www.thejobresource.com Marin County, California, real estate brokerage. A highly skilled team of real estate professionals assist you when you buy sell or lease through our organization. From the simplest short term rental to the most complicated multi property tax deferred exchange transactions, you will receive the same high level of care and attention. http://www.pegcopple.com Yale Systems, Inc.: applied intelligence solutions for industry. Yale Systems .com: applied intelligence for the 21st century. http://www.yale-systems.com Total Sports. We change the way you will experience the game. Complete coverage of all sports, updated continuously, plus cybercasts, personality sites, and official school sites. FansOnly is your ticket to college sports. News, stats, scores, recruiting information, and more. http://www.uhcougars.com PT Grand Kartech is formed in 1993; specializes in the fabrication of pressure vessel, manufacturing of steam, as well as hot water boiler and heat exchanger. PT Grand Kartech is formed in 1993; specializes in the technical service maintenance as well as fabrication of pressure vessel, manufacturing of steam, as well as hot water boiler, autoclave, and heat exchanger. http://www.e-steamboilers.com Emphasis Technography provides medicolegal investigation of death and injury for individuals, insurance companies, law firms, medical examiner, coroner offices, and local governments, including independent autopsy service. Emphasis Technography provides medicolegal investigation of death and injury for individuals, insurance companies, law firms, local governments, including independent autopsy service. http://www.nwrain.com/~tardieu/ Caribbean Vacation Villas. Caribbean Vacation Villas: St. Martin/Maarten, Barbados, and St. Lucia. http://www.hotcarib.com Web hosting. Web design at the best rates. Shopping carts. We'll help you improve your business. Web site development. Web hosting, with no contracts, at $6.95 per month. Custom Internet programming. http://www.access2001.com Stunned artzine is an independent Irish online space dedicated to Irish and international contemporary art and culture. Stunned is an independent Irish online space dedicated to Irish and international contemporary art and culture. http://www.stunned.org Kal Log, online source for advice on the construction of your log home or cabin. Ask Kal any questions about building or manufacturing a log home. Online source for advice on the construction of your log home or cabin. Speak to a consultant for any questions concerning log homes. http://www.kal-log.com
Paper received 11 September 2001; accepted 21 September 2001.
Copyright ©2001, First Monday
Changes in Metatag Descriptions Over Time by Timothy C. Craven
First Monday, volume 6, number 10 (October 2001),