CGKAT: a Knowledge Acquisition Tool and
a Precision Oriented Information Retrieval Tool
Which Exploits a Structured Document Editor,
a Conceptual Graph Workbench and Ontologies

During the knowledge extraction and modelling phases, a knowledge engineer often has to look for pieces of information in documents which are expertise sources, e.g. interview transcriptions and technical reports. Then, s/he often represents some of these pieces of information in a Knowledge Base (KB). If links are set between the information and their representations, the information are indexed by the KB and they may be retrieved via searches in the KB. Thus, queries may be written for generating parts of documents, e.g. specification documents and technical documentation. More and more information retrieval systems, e.g. hypertext systems, are knowledge-based although they are not intended to be used for knowledge acquisition.

CGKAT (Martin, 1995a, 1996) is a tool which integrates the CG workbench CoGITo (Haemmerlé, 1995) and the structured document editor Thot (previously named Grif) (Quint & Vatton, 1992). As a result, the user may use and combine 1) an advanced technique for organizing, accessing and handling knowledge, and 2) an advanced technique for organizing, accessing and handling information.

1) Building and Organizing a Knowledge Base with a Structured Document Editor

In structured documents, document elements (DEs) are typed and may be organised by structural links and hypertext links (structural links mean embeddings of DEs). Examples of DEs are paragraphs, sections, notes, images and graphics. Any DE, including a whole document, must follow a structure model called a document type definition (DTD). One or several presentations for a kind of DEs may be specified in presentation models. Each presentation model defines a default layout and presentation constraints for later modifications of this layout.

We have defined DTDs and presentation models for allowing the storage and display of CGs (including embedded CGs and type definitions using CGs), inside Thot documents. Then, Thot can act as a graphic and syntactic editor for those kinds of DEs. All the facilities of Thot may be exploited for building DEs displaying CGs: contextual editing menus, presentation menus (formats, colors, zoom, multiple views, etc.), search menus (search on strings and/or on DE attributes and/or on DE types), and so on.

Furthermore, we used the functional interfaces of Thot and CoGITo so that when a DE displaying a CG is built in Thot, or when a Thot document including such DEs is opened by a CGKAT user, the CGs are also automatically created in the base of CoGITo. (A creation or a modification of a DE displaying a CG is not accepted in Thot if the corresponding operation cannot be done in CoGITo, i.e. if it violates some constraints, e.g. a relation signature). The CGs are removed from CoGITo when the Thot documents which include them are closed. Thus, Thot documents may be used as a graphical interface for the CoGITo base.

Inside Thot documents, DEs displaying CGs may be organized by structural and hypertext links, between them or with other DEs (navigation is possible in both ways on these links). For instance, a section may mix DEs displaying CGs, textual paragraphs and images. A DE may also be included (i.e. reused and shared) by other DEs. An inclusion is a "living copy" of a DE and is connected to this DE by an hypertext link. (Inclusions are "living copies" in the sense that all changes made in the DE source are automatically reflected in the copies). Inclusions allow the building of "virtual documents" or "views" on parts of other documents. We recommend the CGKAT users to use inclusions of concepts when building CGs. Then, all the CGs which share a same concept may be found by hypertext navigation (whatever the documents the CGs are included in) and thus, a user may search information by navigating from CGs to CGs according to the relations which connect their concepts.

2) Exploitation of links between document elements and knowledge representations

We have defined in a DTD, hypertext links which may be used for connecting DEs displaying CGs to other DEs, and which express the fact that these CGs represent or annotate these DEs. We have put some semantic constraints on what a DE representation should be (see Martin & Alpay, 1996) but any kind of CG may be used as a DE annotation.

When such links are created, the CGs index the DEs they represent or annotate, and conversely, these DEs document the CGs. A user may navigate from a DE to one of its indexations, then navigate between them following some semantic links, and thus find some interesting other indexed DEs. Therefore CGKAT is based on the same principle as knowledge-based hypertext systems. However, the CG formalism allows more precise representations of DEs than the knowledge representation languages used in these systems; the semantic network may be divided into separate graphs and graphs may be embedded.

Like most other knowledge-based hypertext systems, CGKAT also allows searches by conceptual queries. In these systems, queries are logical expressions on 1) attributes of nodes and links, and/or 2) sequences of nodes and links (i.e. paths) inside the whole network (regular expressions are generally used for specifying a path). In CGKAT, a query is expressed using a CG, and specialisations or generalisations of this CG are searched in the CG base. We have specified a query language allowing to combine searches for specialisations and searches for paths specified by regular expressions, but this has not yet been implemented.

For each CG which is an answer to a conceptual query, CGKAT may present 1) a "graphical representation" of this CG using the DE with which the CG has been built, and/or 2) the DEs which are represented by the CG, and/or 3) the DEs which are annotated by the CG. The user may choose the kind of result s/he wants. In any case, the presented DEs are not simple copies of DEs but inclusions, i.e. living copies. Thus, the result of a query is a generated DE which is a view on the knowledge base or on other documents, and this view corresponds to the criteria chosen by the user. Such results of requests may constitute a whole document or may be combined and completed for producing documentations. They may also be the basis for generating explanations.

In addition to menus, CGKAT includes textual commands for CG manipulation (e.g. directed join, maximal join, etc.) and also a script language (the standard UNIX system command interpreter) to let the user combine the results of commands or queries, and write scripts for generating views and answering "frequently asked questions". Such scripts may be associated with a DE and may be explicitly or implicitly activated (scripts may be considered as dynamic hypertext links). Generated views really enable a user to combine searches by queries and by hypertext navigation since a view is the departure point of many hypertext links toward the sources and contexts of the DEs collected by the view.

CGKAT also exploits the Thot index facility (Richy, 1994) and the representations of occurrences of terms in a document, for generating a glossary of these terms which synthesizes their representations (the terms and their representations are alphabetically sorted and duplicates are eliminated).

3) A default ontology for guiding knowledge modelling and information retrieval

For easing knowledge reuse or IR by many users, the types used in the concepts or relations of DE representations should be derived from an ontology which is shared and understandable by all the users. For this, CGKAT exploits the semantic dictionary WordNet (Miller & al., 1990) for providing 1) a default browsable and updatable hierarchy of 90,000 concept types, and 2) a facility for accessing the concept types in this ontology using lexical terms (so the concept types corresponding to the known meanings of the terms are given). This facility eases the use of the ontology and guides knowledge representation. CGKAT also proposes a default hierarchy of 200 relation types: thematic, mathematic, spatial, temporal, rhetoric and argumentative relations (see (Martin, 1995b) or (Martin, 1995a), or click here for more details on the content of these ontologies).

4) References

Haemmerlé O. (1995). CoGITo: une plate-forme de développement de logiciels sur les graphes conceptuels. Ph.D thesis, Montpellier II University, France, January 1995.

Martin Ph. (1995a). Knowledge Acquisition Using Documents, Conceptual Graphs and a Semantically Structured Dictionary. In Proc. of KAW'95, Gaines, B.R. Eds, University of Calgary, Banff, Alberta, Canada, February 26-March 3, 1995.

Martin Ph. (1995b). Using the WordNet Concept Catalog and a Relation Hierarchy for Knowledge Acquisition. Proceedings of Peirce'95, 4th International Workshop on Peirce, University of California, Santa Cruz, August 18, 1995.

Martin Ph. & Alpay L. (1996). Conceptual Structures and Structured Documents. In Proc. of ICCS'96, 4th International Conference on Conceptual Structures, Sydney, Australia, August 19-22, 1996.

Martin Ph. (1996). Exploitation de graphes conceptuels et de documents structurés et hypertextes pour l'acquisition de connaissances et la recherche d'informations. Ph.D thesis, University of Nice - Sophia Antipolis, France, October 14, 1996.

Martin Ph. (1997). CGKAT: a Knowledge Acquisition Tool and an Information Retrieval Tool Which Exploits Structured Documents, Conceptual Graphs and Ontologies. In Proc. of CGTOOLS'97, 2nd Conceptual Graph Tools Workshop, University of Washington, Seattle Washington USA, August 8, 1997.

Miller G.A., Beckwith R., Fellbaum C., Gross D. & Miller K. (1990). Five Papers on WordNet. CSL Report 43, Cognitive Science Laboratory, Princetown University, July 1990.

Quint V. & Vatton I. (1992). Hypertext Aspects of the Grif Structured Editor: Design and Applications. R.R. 1734, INRIA, Rocquencourt, July 1992.

Richy H. (1994). A hypertext electronic index based on the Grif structured document editor. In Proc. of Electronic Publishing, vol. 7, num. 1, pp. 21-34, March 1994.