Incremental Integration of Fragmented Knowledge Via the Edition Protocol of a Shared Knowledge Base


Ph. A. Martin

University of La Réunion, France


www.webkb.org/doc/papers/ickg23/ickg23_slides.html



Plan

1.  Scalable "knowledge integration", i.e., general Knowledge Sharing
      → "shared knowledge-representation servers" that are non-restricting

2.  → shared-KB edition protocol with at least the described 2 main kinds of rules

/* Hello, My talk has 2 main points: 
1st, that for a scalable integration of knowledge, 
                that is, for general Knowledge Sharing,
       Web users need to exploit "shared knowledge-representation servers"
                that are non-restricting, and
2nd, that these servers need to have an edition protocol that
       enforces at least the 2 main kinds of rules that I describe in the article.
(35s -> 40s) */

1. General Knowledge Sharing: Needed Approach

     KNOWLEDGE-REPRESENTATION_sharing = KR_sharing = KrS
                {complete, disjoint}
             ________________
            |                |
Restricted_KrS        GENERAL_KrS   //e.g. Fragmented_knowledge_sharing_and_integration
// ← via most tools              |tool                         |   |part
// for the W3C                v 1..*                   |  v 0
// Semantic Web vision   Networked-or-not_shared-KBMS   | Non-domain-related_content-restriction
                          |tool                                | //→ e.g. not just consensual knowledge
                          v 1                                  v 0
EDITION_PROTOCOL_of_a_shared_KB        Creation_of_independent-or-competing_KBs
//→ relations between competing objects    //KBs with non inter-related objects (as in one bigger KB)
// → organization of knowledge             //     especially mutually contradictory/redundant objects 
//  → the inference engine can make        //   since then each new created KB would lead to
//    choices between competing objects    //       less and less inter-related objects between the KBs.
//    according to user preferences        //The minimum number of KBs, the better.

/* To explain - or at least illustrate - the 1st point, here is an UML diagram representing
      some knowledge about Knowledge-representation_sharing.
   This diagram first represents the fact that the notion or type of KNOWLEDGE-REPRESENTATION_sharing
      can be partitionned into the two disjoint notions of Restricted_KrS and GENERAL_KrS.
   The Restricted one is about representing and sharing knowledge only for particular applications and
      it requires knowledge engineers to communicate directly with each other.
      This is what  most  tools for the W3C Semantic Web vision only aim for.
   General_KrS does not have these restrictions and hence is required for
      supporting Fragmented_knowledge_sharing_and_integration, in the general case.
   General_KS has to exploit at least one shared-KBMS, and 
      typically a network of shared KBs, each one focusing on a particular domain and 
      all exchanging knowledge or queries between themselves.
   General_KS cannot perform content restrictions 
                                         about what is stored by Web users in the shared KBs
                                         except to focus on a particular domain.
        For instance, General_KS cannot exclude non-consensual knowledge since this would prevent some
           people or applications to retrieve this knowledge. This would literally not be general KS.
           However, General_KS can represent that some statements are non-consensual.
   General_KS cannot rely on the creation of independently developped KBs, i.e.
                                            KBs with non inter-related objects (as in one bigger KB)
                                               especially mutually contradictory/redundant objects 
                                                  since then each new created KB would lead to 
                                                  less and less inter-related objects between the KBs.
                                           For the same reason, the minimum number of KBs, the better. 
   And each shared-KBMS must have at least one Edition_protocol to enforce an explicit organization of
   at least competing knowledge objects - within each shared KB and between them -
   and thus the inference engine of each KB can make choices between competing objects
                 according to user provided rules, hence according to user preferences.

(3'40/4' -> 4'30) */

2.1. Top-level Rules of the Edition Protocol of a Shared KB

        Edition_protocol_of_a_shared_KB
        |part
        v 1..* 
        Rule_for_the_edition_protocol_of_a_shared_KB
            ↑                            ↖ {complete, disjoint}
        __________________________________               __________________________________
       |                                  |             |                                  |
"Rule_enforcing_the_representation_of     |    Shared-KB-edition-protocol-rule_for_terms   |
 the_source-and-owner(s)_of_each_object"  |                                                |
 //→ contradictions are not inconsistencies      |   Shared-KB-edition-protocol-rule_for_statements
                                          |              ↑{complete, disjoint}
 "Rule enforcing the existence of relations              __________________________________
  of correction/specialization/equivalence              |                                  |
  between each pair of objects             Shared-KB-edition-protocol-rule_for_definitions |
  within a shared KB"                                                                      |
  // updates are additive (i.e. via relations)           Shared-KB-edition-protocol-rule_for_beliefs
  //  loss-less "knowledge integration"

/* An Edition_protocol_of_a_shared_KB  has rules,  of at least two kinds:
1st, the rules enforcing the representation of the source and owner(s) of each object  
    That way, contradictory statements are, more precisely, contradictory beliefs.
      One of the advantages is that a KB may include contradictory beliefs and still be consistent. 
2nd, there should also be rules enforcing the representation of relations of 
     correction or exclusion or specialization between
     each pair of objects within a shared KB, in order to enforce its good organization.
  Because all updates are made by adding such relations, knowledge-integration is loss-less,
     which it has to be in general KS.

All of this implies that there are different kinds of rules for differents kinds of objets:
* the rules handling terms, i.e., identifiers or expressions
* the rules handling handling statements that are statements.
  - Within them, the rules handling definitions, that is, statements that are "always true by definition",
  - and the rules handling statements that are not definitions, here named "beliefs", 
                                                in other words, statements that can be false. 
(2' -> 6'30) */

2.2. ... for Terms


/* Now, for each of these kinds of objects, and first, for terms:

- the representation_of_the_sources-and-owners_of_each_term can use relations or,
  more commonly, a prefix or suffix in each identifier, as in the W3C languages,
  e.g. in the term "dc:creator", "dc" is an abbreviation of a Web address for the Dublin Core ontology.

- the most important relations for organizing terms
  – and hence for example retrieve them – are those of
  exclusion, specialization and equivalence.  All such relations
  represent the existence or not   of full or partial redundancies   between the connected terms.
  Connecting each pair of terms, e.g. types, via such relationsxs is easy when 
  subtype sets like "subtype partitions" are systematically used
  because then the inference engine can deduce all these relations.

(1'10 -> 7'40) */

2.2. ... For Definitions


/* About definitions now. 
 
Their sources and owners are represented via the defined term
     or via a relation between the defined term and the definition.

Similarly,
the exclusion or specialization or equivalence relations between definitions
are represented via relations from the defined terms.

To sum up, the rules for handling definitions reuse the rules for handling terms.

(30' -> 8'10) */

2.3. ... For Beliefs

3. Conclusion:


/* To conclude:

1st about implementation methods:
- for representing the sources and owners of statements and the relations between statements,
  all that is required is that meta-statements must be usable ;
- regarding the edition protocol, it can be represented or implemented in many different ways:
  rules, functions, queries or constraints

The consequences of the basic kinds of edition protocol rules that I described are that
- the KB is "complete with respect to the advocated relations", that is, 
   results for queries on these relations are the same
   with or without the "closed-world hypothesis" and the "unique name assumption"
- the approach is generic since usable with any logic, hence any inference engine
- the approach is scalable since there are no content restriction, hence
    any user can see and exploit all the knowledge according to provided
    preferences, that is knowledge inferencing/filtering rules or criteria.
(1:30' -> 11'40) */