First Monday

Building semantic bridges between museums, libraries and archives: The CIDOC Conceptual Reference Model

Abstract
Building semantic bridges between museums, libraries and archives: The CIDOC Conceptual Reference Model by Tony Gill

The CIDOC Conceptual Reference Model is an object–oriented domain ontology for the interchange of rich and heterogeneous cultural heritage information from museums, libraries and archives. It is the evolutionary result of over two decades of collaborative international standards work by ICOM/CIDOC, the Comité International pour la Documentation of the International Council of Museums. This paper briefly explains the purpose, scope, structure and history of the CIDOC CRM, and outlines how it could be used as a building block in a global Semantic Web of culture.

Contents

Information diversity in museums, libraries and archives
The CIDOC Conceptual Reference Model
Applications
Building bridges with the CIDOC CRM
A global Semantic Web of culture

 


 

++++++++++

Information diversity in museums, libraries and archives

Around the world, there is both increasing demand and unprecedented funding available for high quality digital cultural content from trusted information providers such as museums, libraries and archives.

Unlike much of the information available on the Web, digital cultural content from so–called "memory institutions" (Dempsey, 1999) is typically authoritative, high quality, useful for education and research and has broad appeal to a wide range of audiences. As a result, there is a big impetus to make interoperable digital cultural content available via the Internet (Gill and Miller, 2002).

However, the curatorial information managed by museums, libraries and archives tends to be somewhat heterogeneous in structure and content, although there are significant conceptual overlaps; collections and the descriptions of their contents can vary according to factors such as:

Because of this diversity, no single descriptive schema has been devised that meets the needs of all museums, libraries [ 1] and archives. Instead, a plethora of localized domain and application–specific standards and practices has evolved for the documentation of collections.

This diversity of descriptive information can be compared to biodiversity in a natural ecosystem — "infodiversity" is the natural and appropriate evolutionary response of a flourishing, diverse and dynamic information ecosystem.

However, these differences in descriptive schema across museums, libraries and archives, although necessary for individual applications, can seriously hinder cross–domain discovery and interoperability of cultural information resources in the global context of the Internet.

Fortunately, the differences are primarily at the level of data structure and syntax. Significant conceptual overlaps exist between the descriptive schema used by memory institutions; elemental concepts such as objects, people, places, events, and the interrelationships between them are almost universal.

The traditional compromise for providing access across heterogeneous information sources is to map everything to a simple schema with broad and universal semantics for the purposes of initial resource discovery. These simple descriptions — sometimes called resource discovery metadata — help the researcher discover and evaluate resources, and can sometimes also direct them to the full, rich descriptions in the native format.

This approach to cross–domain resource discovery certainly has some merit, since it can potentially yield more precise search results than simple keyword searching. The Dublin Core Metadata Element Set [ 2] was originally intended solely for resource discovery — although it has been used (some would argue misused!) for many other purposes.

However, there are a number of drawbacks with this approach. First, because the source descriptions are "dumbed down" to the broad universal semantics of the resource discovery schema, access is reduced to the level of the lowest common denominator; it is the semantic equivalent of lossy compression, and may not provide adequate support for sophisticated queries or search precision across large datasets.

Secondly, it is very difficult (not to mention frustrating and tedious!) to map multiple rich data sources to a much simpler schema in a consistent manner, because of the various semantic compromises that must be made.

Thirdly, even if users successfully discover relevant resources using the simple resource discovery metadata, there is no guarantee that they will then access the richer and more useful source descriptions — if they are available at all. As a result, collections, and by extension their custodial institutions, are often represented and judged online by simple descriptions that may fall short of users’ expectations.

The CIDOC Conceptual Reference Model offers an alternative solution to the challenge of providing meaningful integrated access to heterogeneous cultural heritage information.

 

++++++++++

The CIDOC Conceptual Reference Model

The CIDOC Conceptual Reference Model (Crofts et al, 2003) is an object–oriented domain ontology for exchanging rich cultural heritage data. This means that it employs object–oriented data modeling techniques to formalize the semantic concepts used in museum, library and archive documentation, with the aim of facilitating information interchange.

As the name suggests, the CIDOC CRM is a reference model, not a data or metadata standard, although it is intended to be used both to express existing metadata standards, and as the basis for future metadata standards.

The CIDOC CRM is maintained by the CIDOC CRM Special Interest Group [ 3], a diverse international group of museum information professionals with an official mandate from ICOM/CIDOC (the Comité International pour la Documentation of the International Council of Museums) to develop and promote the standard.

Work on the CIDOC CRM formally began in 1996, although it is the evolutionary descendent of CIDOC data standards initiatives going back to 1980.

The primary function of the CIDOC CRM Special Interest Group (CIDOC CRM–SIG) over the last five years has been to test and refine the CIDOC CRM in preparation for publication by ISO, the International Organization for Standardization. The CIDOC CRM was accepted by ISO as "Committee Draft" ISO/CD 21127, and we anticipate publication as a full International Standard late in 2004.

Publication of the CIDOC CRM as an ISO standard is helpful because it demonstrates that the standard has undergone a rigorous peer review process and guarantees the stability of the model, both of which encourage active use.

Scope of the CIDOC CRM

A clear definition of an ontology’s scope is vital, both as a useful constraint during the development process and to help ensure that the ontology is used correctly.

The scope of the CIDOC CRM is defined by both the intended scope, expressed as a set of principles, and the practical scope, the set of relevant domain data structures.

The intended scope is defined as the integration and exchange of heterogeneous information required for the scientific documentation of cultural heritage collections.

Some of the terms in this definition require further clarification:

The intended scope also explicitly includes information interchange between museums, libraries and archives.

The practical scope of the CIDOC CRM is defined as the set of extant data sets, data structures and data structure standards used in the practice of museum, library and archive documentation.

Mappings to the CIDOC CRM have already been created for a number of relevant data structures and data standards, for example SPECTRUM, the U.K. Museum Documentation Standard, the Art Museum Image Consortium Data Dictionary, the Encoded Archival Description document type definition, the Dublin Core Metadata Element Set and the International Federation of Library Associations’ Functional Requirements for Bibliographic Records. The mapping documents are available from the Technical Papers section of the CIDOC CRM SIG Web site [ 4].

Structure of the CIDOC CRM

The CIDOC CRM is comprised of a class hierarchy of 81 named classes, interlinked by 132 named properties. Because it follows object oriented design principles, the classes in the hierarchy inherit properties from their parents, also known as superclasses.

Expressing the CIDOC CRM as an object–oriented semantic model yields a number of significant benefits:

Ontologically, the CIDOC CRM is empirical and descriptive; it formalizes the semantics necessary to express stated observations about the world in the domain of museum documentation — even if the observations are contradictory or nonsensical.

The following diagram ( Figure 1) illustrates how the CIDOC CRM conceptualizes the domain of museum documentation at a very general level. Actors (i.e. people, either individually or in groups) participate in Temporal Entities (e.g. events), which are affected by Physical Entities (i.e. material things) and Conceptual Objects (i.e. ideas and concepts), and occur at Places within certain Time–Spans. Appellations (names) can be used to identify any of these entities, and Types can be used to classify them to the appropriate level of detail.


Figure 1: Generalized "meta–model" of the CIDOC CRM’s world view.

This diagram also clearly illustrates that the CIDOC CRM is event–centric; people, things, ideas, places and time–spans are all inter–related through common events.

Because observations about the world are typically incomplete, the CIDOC CRM also includes a number of short–cuts — properties that allow direct relationships between people, things, ideas, and places to be expressed (short–cuts are not shown in Figure 1).

For example, museums commonly record the dimensions of the objects in their collections, but generally do not record information about the measurement event, such as when and where an object was measured, and by whom. Although the CIDOC CRM supports the modeling of measurement events by providing a specific E16 Measurement Event class, it also provides the direct short–cut P43 has dimension property between things and their dimensions. This can simplify mapping and result in a much more economical serialization of data instances when fully qualified information about the associated event is not available or required.

The CIDOC CRM has been constructed according to contemporary knowledge representation principles, and it is therefore relatively trivial to express both compliant data instances and the CIDOC CRM itself using any of the emerging Web standards for Semantic Web applications, such as Resource Description Framework (RDF) [ 6], Resource Description Framework Schema Language (RDFS) [ 7], DARPA Agent Markup Language + Ontology Inference Layer (DAML+OIL) [ 8] and Web Ontology Language (OWL) [ 9].

 

++++++++++

Applications

"The primary role of the CRM is to serve as a basis for mediation of cultural heritage information and thereby provide the semantic 'glue' needed to transform today’s disparate, localised information sources into a coherent and valuable global resource." (Doerr and Crofts, 1999).

Conceptual reference

As the name suggests, one of the main roles of the CIDOC CRM is to promote a shared understanding of the concepts used in cultural heritage documentation by acting as a conceptual reference model.

This is particularly important in cultural technology development projects, where it is vital to disambiguate dialogue between domain experts and systems implementers to avoid misunderstandings that can lead to costly development mistakes.

By expressing the CIDOC CRM as a robust object–oriented model that includes comprehensive and carefully edited descriptive scope notes, cross–references and examples, it can provide the basis for shared mutual comprehension.

The CIDOC CRM can also be used to validate and enhance existing descriptive schema; mapping an existing cultural data standard or database schema to the CIDOC CRM will rapidly highlight any logical or structural shortcomings in the source schema, and suggest ontologically consistent remedies.

Information exchange

The CIDOC CRM is specifically designed to promote the meaningful exchange of heterogeneous digital cultural content from museums, libraries and archives.

Because it consists of a "superset" of the concepts used in the cultural heritage domain and is also coherently extensible, the CIDOC CRM is ideally suited as a universal semantic target to which any cultural source data can be mapped, with minimal or no loss of semantic precision.

The CIDOC CRM does not specify or require any particular encoding for compatible data instances, but because it is based on current practice in knowledge representation and ontology design, it is easy to serialize CIDOC CRM–compatible data instances using languages such as eXtensible Markup Language [ 10] or Resource Description Framework for information interchange purposes.

System and schema design

The CIDOC CRM provides a valuable conceptual reference for information architecture when designing systems and schema for managing and delivering digital cultural content from museums, libraries and archives.

A number of systems and schema influenced by the CIDOC CRM have already been implemented, and more are in development (the following list is illustrative, rather than comprehensive):

Potential future applications

The CIDOC CRM has been designed to support the next generation of advanced digital cultural heritage applications, for example mediation systems, data warehouses and agents.

Mediation systems are likely to become increasingly important because, for a variety of economic, technical and organizational reasons, it is often not practical to physically aggregate cultural heritage data from multiple heterogeneous sources. Instead, mediation systems would facilitate federated searching by translating queries and results between the various distributed data sources and the canonical common semantics of the CIDOC CRM.

Cultural data warehouses, on the other hand, might take the opposite approach, by physically aggregating data from multiple sources periodically into a CIDOC CRM compatible data warehouse and "merging" instances of identical entities, to provide both a coherent consolidated knowledge base and a cross–referenced index to the original sources.

Software agents for digital cultural content are perhaps the most ambitious application envisaged for the CIDOC CRM. The term "software agent" is used in computer science to refer to pieces of autonomous, or semi–autonomous proactive and reactive, computer software [ 11].

Intelligent agents for digital cultural content would be able to parse complex user queries into a CIDOC CRM compatible form, and would then search across a global cultural Semantic Web of CIDOC CRM compatible data sources and query mediators to locate and return the most relevant results to the user.

The CIDOC CRM is already being deployed in a number of advanced research and development applications. For example:

 

++++++++++

Building bridges with the CIDOC CRM

Although the CIDOC CRM originated in the museum community, it has been designed from the outset to promote rich information exchange between museums, libraries and archives.

In addition to the cross–disciplinary mapping activities described previously, members of the CIDOC CRM Special Interest Group are also actively engaged in ontology harmonization efforts with other communities.

A particularly significant effort is currently underway to harmonize the CIDOC CRM with a comparable model from the library community, the IFLA Functional Requirements for Bibliographic Records, or FRBR (IFLA, 1998). The FRBR is a conceptual framework for bibliographic description, expressed as an entity–relationship (E–R) model.

The DELOS Network of Excellence on Digital Libraries provides funding for the Ontology Harmonization Working Group [ 13] as part of the European Union’s Sixth Framework Programme for Research and Technological Development.

The mapping and harmonization effort has already revealed some interesting comparisons between the approaches to description taken by museums and libraries. For example, the primary objects of documentation for most museums are the unique individual objects in their collections, whereas libraries primarily catalogue the properties of classes of objects, such as editions of a book, rather than specific instances (Lebœuf, 2003).

However, there are exceptions in both communities; descriptive methods in library special collections are often more akin to museum practice, since they focus more on the unique properties of specific items, whereas the taxonomic approach to description so essential in natural history museums shares many similarities with library cataloguing.

It is becoming clear that there is much conceptual overlap between the descriptive approaches used by museums, libraries and archives, and significant resources are available for leveraging those similarities in order to make digital cultural content more accessible online.

 

++++++++++

A global Semantic Web of culture

The CIDOC CRM is a robust, stable and mature domain ontology for the interchange and mediation of richly detailed cultural information from museums, libraries and archives.

It is the product of many years of collaborative development by a diverse international group of experts, is promoted by an active and inclusive user community, and is shortly to be published by the International Organization for Standardization as an ISO standard.

It has been developed and tested according to current best practices in knowledge representation and ontology construction, and is compatible with the technologies and methodologies of the emerging Semantic Web (Berners–Lee et al, 2001).

The CIDOC CRM has already been proven to have utility in a number of areas, and the Special Interest Group is confident that it will continue to have significant value for a wide range of digital cultural heritage applications for many years to come.

The real prize, however, is for the CIDOC CRM to become one of the foundation stones in a global Semantic Web of culture. So far, the prospects look good. End of article

 

About the Author

Tony Gill is the Director of Metadata & Cataloguing for ARTstor Inc. [ 14]. He is also an editor of the CIDOC Conceptual Reference Model, and represents the United States on ISO TC46 SC4 WG9, the working group that is guiding the CRM through the ISO standardization process.
E–mail: tg@artstor.org

 

Notes

1. Across the memory institution sector, the library community has undoubtedly made the most significant progress towards the widespread adoption of shared descriptive practices, due primarily to the compelling economics of the co–operative cataloguing model: the MARC family of standards can trace its ancestry back 35 years. However, even with a strong economic incentive for sharing information, there are still myriad versions and flavours of MARC.

2. Dublin Core Metadata Element Set, Version 1.1: Reference Description, at http://www.dublincore.org/documents/dces/, accessed 13 April 2004.

3. CIDOC CRM–SIG official Web site at http://cidoc.ics.forth.gr/, accessed 10 April 2004.

4. Mappings to the CIDOC CRM can be accessed from a list of Technical Papers on the CIDOC CRM SIG Web site, at http://cidoc.ics.forth.gr/technical_papers.html, accessed 11 April 2004.

5. In fact CIDOC produced a relational data model to support the CIDOC Information Categories in the mid–1990’s. Although highly valuable as an intellectual exercise, it was too complex to be implemented as working system, which was in large part the motivation for beginning work on the CIDOC CRM in 1996. Information about the CIDOC Relational Data Model is available at http://www.willpowerinfo.myby.co.uk/cidoc/model/relational.model/, accessed 13 April 2004.

6. Resource Description Framework (RDF), at http://www.w3.org/RDF/, accessed 13 April 2004.

7. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004, at http://www.w3.org/TR/rdf–schema/, accessed 13 April 2004.

8. DAML+OIL (March 2001), at http://www.daml.org/2001/03/daml+oil–index.html, accessed 13 April 2004.

9. OWL Web Ontology Language Overview, W3C Recommendation 10 February 2004, at http://www.w3.org/TR/owl–features/, accessed 13 April 2004.

10. Extensible Markup Language (XML), at http://www.w3c.org/XML/, accessed 13 April 2004.

11. This definition of software agents is paraphrased from the definition in Wikipedia, at http://en.wikipedia.org/wiki/Software_agent, accessed 13 April 2004.

12. SIMILE: Semantic Interoperability of Metadata and Information in unLike Environments, at http://web.mit.edu/simile/www/, accessed 15 April 2004.

13. DELOS Network of Excellence on Digital Libraries: Ontology Harmonization Working Group, at http://delos–noe.iei.pi.cnr.it/activities/standardizationforum/ontology/ontology.html, accessed 15 April 2004.

14. ARTstor Inc. at http://www.artstor.org/, accessed 15 April 2004.

 

References

Tim Berners–Lee, James Hendler and Ora Lassila, 2001. "The Semantic Web," Scientific American (May), at http://www.sciam.com/article.cfm?articleID=00048144–10D2–1C70–84A9809EC588EF21, accessed 13 April 2004.

Nicholas Crofts, Martin Doerr, Tony Gill, Stephen Stead and Matthew Stiff, 2003. "Definition of the CIDOC Conceptual Reference Model," at http://cidoc.ics.forth.gr/definition_cidoc.html, accessed 10 April 2004.

Lorcan Dempsey 1999. "Scientific, Industrial, and Cultural Heritage: a shared approach. A research framework for digital libraries, museums and archives," Ariadne, issue 22 (December), at http://www.ariadne.ac.uk/issue22/dempsey/, accessed on 10 April 2004.

Martin Doerr and Nicholas Crofts, 1999. "Electronic Esperanto: The Role of the Object Oriented CIDOC Reference Model," In: David Bearman and Jennifer Trant (editors). Cultural Heritage Informatics 1999: Selected papers from ichim99, pp. 157–173, and at http://cidoc.ics.forth.gr/docs/doerr_crofts_ichim99_new.pdf, accessed 10 April 2004.

Tony Gill, 2002. "Touring the Information Landscape: Designing the Data Model for RLG Cultural Materials," RLG Focus, issue 58 (October), at http://www.rlg.org/r–focus/i58.html#touring, accessed 13 April 2004.

Tony Gill and Paul Miller, 2002. "Re–inventing the Wheel? Standards, Interoperability and Digital Cultural Content," D–Lib Magazine, volume 8 number 1 (January), at http://www.dlib.org/dlib/january02/gill/01gill.html, accessed 9 April 2004.

Jane Hunter, 2002. "Combining the CIDOC CRM and MPEG7 to Describe Multimedia in Museums" at http://www.archimuse.com/mw2002/papers/hunter/hunter.html, accessed 10 April 2004.

ICOM 1946–2001. Definition of a Museum in "ICOM Statutes" at http://icom.museum/definition.html, accessed 10 April 2004.

IFLA Study Group on the Functional Requirements for Bibliographic Records, 1998. "Functional Requirements for Bibliographic Records: Final Report," at http://www.ifla.org/VII/s13/frbr/frbr.htm, accessed 11 April 2004.

Sanghee Kim, Paul Lewis and Kirk Martinez 2004, "SCULPTEUR D7.1, Semantic Network of Concepts and their Relationships – Public Version," at http://www.sculpteurweb.org/html/events/D7.1_Public.zip, accessed 15 April 2004.

Patrick Lebœuf, 2003. "The Book, the Bug & the Bangle: A parallel & a paradox" at http://cidoc.ics.forth.gr/docs/symposium_presentations/leboeuf_bookbugbangle_revised.doc, accessed 9 April 2004.


Editorial history

Paper received 19 April 2004; accepted 22 April 2004.


Contents Index

Copyright ©2004, First Monday

Copyright ©2004, Tony Gill

Building semantic bridges between museums, libraries and archives: The CIDOC Conceptual Reference Model by Tony Gill
First Monday, volume 9, number 5 (May 2004),
URL: http://firstmonday.org/issues/issue9_5/gill/index.html