Incremental Integration of Fragmented Knowledge Via the Edition Protocol of a Shared Knowledge Base

Ph. A. Martin

University of La Réunion, France

www.webkb.org/doc/papers/ickg23/ickg23_slides.html

Plan

1. Scalable "knowledge integration", i.e., general Knowledge Sharing
→ "shared knowledge-representation servers" that are non-restricting

2. → shared-KB edition protocol with at least the described 2 main kinds of rules

/* Hello, My talk has 2 main points: 
1st, that for a scalable integration of knowledge, 
                that is, for general Knowledge Sharing,
       Web users need to exploit "shared knowledge-representation servers"
                that are non-restricting, and
2nd, that these servers need to have an edition protocol that
       enforces at least the 2 main kinds of rules that I describe in the article.
(35s -> 40s) */

1. General Knowledge Sharing: Needed Approach

     KNOWLEDGE-REPRESENTATION_sharing = KR_sharing = KrS
               ↑ {complete, disjoint}
             ________________
            |                |
Restricted_KrS        GENERAL_KrS   //e.g. Fragmented_knowledge_sharing_and_integration
// ← via most tools              |tool                         |   |part
// for the W3C                v 1..*                   |  v 0
// Semantic Web vision   Networked-or-not_shared-KBMS   | Non-domain-related_content-restriction
                          |tool                                | //→ e.g. not just consensual knowledge
                          v 1                                  v 0
EDITION_PROTOCOL_of_a_shared_KB        Creation_of_independent-or-competing_KBs
//→ relations between competing objects    //KBs with non inter-related objects (as in one bigger KB)
// → organization of knowledge             //     especially mutually contradictory/redundant objects 
//  → the inference engine can make        //   since then each new created KB would lead to
//    choices between competing objects    //       less and less inter-related objects between the KBs.
//    according to user preferences        //The minimum number of KBs, the better.

/* To explain - or at least illustrate - the 1st point, here is an UML diagram representing
      some knowledge about Knowledge-representation_sharing.
   This diagram first represents the fact that the notion or type of KNOWLEDGE-REPRESENTATION_sharing
      can be partitionned into the two disjoint notions of Restricted_KrS and GENERAL_KrS.
   The Restricted one is about representing and sharing knowledge only for particular applications and
      it requires knowledge engineers to communicate directly with each other.
      This is what  most  tools for the W3C Semantic Web vision only aim for.
   General_KrS does not have these restrictions and hence is required for
      supporting Fragmented_knowledge_sharing_and_integration, in the general case.
   General_KS has to exploit at least one shared-KBMS, and 
      typically a network of shared KBs, each one focusing on a particular domain and 
      all exchanging knowledge or queries between themselves.
   General_KS cannot perform content restrictions 
                                         about what is stored by Web users in the shared KBs
                                         except to focus on a particular domain.
        For instance, General_KS cannot exclude non-consensual knowledge since this would prevent some
           people or applications to retrieve this knowledge. This would literally not be general KS.
           However, General_KS can represent that some statements are non-consensual.
   General_KS cannot rely on the creation of independently developped KBs, i.e.
                                            KBs with non inter-related objects (as in one bigger KB)
                                               especially mutually contradictory/redundant objects 
                                                  since then each new created KB would lead to 
                                                  less and less inter-related objects between the KBs.
                                           For the same reason, the minimum number of KBs, the better. 
   And each shared-KBMS must have at least one Edition_protocol to enforce an explicit organization of
   at least competing knowledge objects - within each shared KB and between them -
   and thus the inference engine of each KB can make choices between competing objects
                 according to user provided rules, hence according to user preferences.

(3'40/4' -> 4'30) */

2.1. Top-level Rules of the Edition Protocol of a Shared KB

        Edition_protocol_of_a_shared_KB
        |part
        v 1..* 
        Rule_for_the_edition_protocol_of_a_shared_KB
            ↑                            ↖ {complete, disjoint}
        __________________________________               __________________________________
       |                                  |             |                                  |
"Rule_enforcing_the_representation_of     |    Shared-KB-edition-protocol-rule_for_terms   |
 the_source-and-owner(s)_of_each_object"  |                                                |
 //→ contradictions are not inconsistencies      |   Shared-KB-edition-protocol-rule_for_statements
                                          |              ↑{complete, disjoint}
 "Rule enforcing the existence of relations              __________________________________
  of correction/specialization/equivalence              |                                  |
  between each pair of objects             Shared-KB-edition-protocol-rule_for_definitions |
  within a shared KB"                                                                      |
  // updates are additive (i.e. via relations)           Shared-KB-edition-protocol-rule_for_beliefs
  // ↔ loss-less "knowledge integration"

/* An Edition_protocol_of_a_shared_KB  has rules,  of at least two kinds:
1st, the rules enforcing the representation of the source and owner(s) of each object  
    That way, contradictory statements are, more precisely, contradictory beliefs.
      One of the advantages is that a KB may include contradictory beliefs and still be consistent. 
2nd, there should also be rules enforcing the representation of relations of 
     correction or exclusion or specialization between
     each pair of objects within a shared KB, in order to enforce its good organization.
  Because all updates are made by adding such relations, knowledge-integration is loss-less,
     which it has to be in general KS.

All of this implies that there are different kinds of rules for differents kinds of objets:
* the rules handling terms, i.e., identifiers or expressions
* the rules handling handling statements that are statements.
  - Within them, the rules handling definitions, that is, statements that are "always true by definition",
  - and the rules handling statements that are not definitions, here named "beliefs", 
                                                in other words, statements that can be false. 
(2' -> 6'30) */

2.2. ... for Terms

Rule_enforcing_the_representation_of_the_sources-and-owners_of_each_term → via (a relation or) a prefix/suffix in the identifier, e.g.: dc:creator
Rule_enforcing_the_existence_of_\ relations_of_exclusion/specialization/equivalence_between_each_pair_of_terms → e.g., (inferred or explicitly stated) subtype/exclusion relations between each type (easy to enter when subtype sets like "subtype partitions" are systematically used)

/* Now, for each of these kinds of objects, and first, for terms:

- the representation_of_the_sources-and-owners_of_each_term can use relations or,
  more commonly, a prefix or suffix in each identifier, as in the W3C languages,
  e.g. in the term "dc:creator", "dc" is an abbreviation of a Web address for the Dublin Core ontology.

- the most important relations for organizing terms
  – and hence for example retrieve them – are those of
  exclusion, specialization and equivalence.  All such relations
  represent the existence or not   of full or partial redundancies   between the connected terms.
  Connecting each pair of terms, e.g. types, via such relationsxs is easy when 
  subtype sets like "subtype partitions" are systematically used
  because then the inference engine can deduce all these relations.

(1'10 -> 7'40) */

2.2. ... For Definitions

Rule_enforcing_the_representation_of_the_sources-and-owners_of_each_definition → via the defined term or a relation between the defined term and the definition
Rule_enforcing_the_existence_of_\ relations_of_exclusion/specialization/equivalence_between_each_pair_of_definitions → relations between the defined terms

/* About definitions now. 
 
Their sources and owners are represented via the defined term
     or via a relation between the defined term and the definition.

Similarly,
the exclusion or specialization or equivalence relations between definitions
are represented via relations from the defined terms.

To sum up, the rules for handling definitions reuse the rules for handling terms.

(30' -> 8'10) */

2.3. ... For Beliefs

Rule_enforcing_the_existence_of_the_sources-and-owners_of_each_belief → use of contextualizing meta-statements (i.e. "contexts") Not a problem: meta-statements can be represented directly or via cumbersome ad-hoc ways in any Knowledge Representation Language (KRL)

Rule_enforcing_the_existence_of_\ relations_of_correction/specialization/equivalence_between_each_pair_of_beliefs

u1#`every Bird is agent of a Flight´ | \c_=> _[u3] | ↘ u3#`at least 75% of Healthy Flying_bird can be agent of a Flight´ | ↑ |c_=>_/^{^} _[u2] |c_=> _[u3] ⇐ ... ↓ | u2#`every Bird can be agent of a Flight´

/* Handling beliefs is more complex.

- Representing the_sources-and-owners_of_each_belief
requires the use of contextualizing meta-statements, that is, "contexts",
but this is not an obstacle since
meta-statements can be represented directly or via cumbersome ad-hoc ways
in any Knowledge Representation Language

- Regarding the enforcement of the_representation_of_relations_\
of_correction/specialization/equivalence_between_each_pair_of_beliefs, here is an example.
Let us assume that the user u1 has represented his belief that every bird flies.
In the FE notation, this can be represented that way: ...
Now, let us assume that the user u2 that, more precisely, every bird can fly, i.e. ...
To allow u2 to enter this belief, the protocol will ask for the relations of ...
Then, u2 will enter that his belief is a generalization and a logical consequence of u1's belief,
but also a more correct version of u1's belief.
Then, u3 wants to enter that ...
To do so, u3 states that this new belief is a logical consequence and a more correct version of
u2's belief and u1's belief.
And, again using meta-statement, anyone can argument or counter-argument on this relation of u3.
Then, if and when the inference engine has to choose between these 3 beliefs to make an inference,
it can do so according to default rules or rules provided by each user.
E.g., a good default rule is to choose the most corrected and most specialized version of the
competing beliefs.
(2' -> 10'10) */

3. Conclusion:

Implementation methods: - for representing the sources/owners of statements and the relations between statements: meta-statements - for the edition protocol: rules, functions, queries or constraints
Consequences of the described basic kinds of edition protocol rules: - the KB is "complete with respect to the advocated relations" → same results with or without the closed-world hypothesis and the "unique name assumption" (details in the article) - genericity ← usable with any logic, hence any inference engine - scalability ← no content restriction ← any user can see and exploit the knowledge according to the provided preferences (knowledge inferencing/filtering rules or criteria)

/* To conclude:

1st about implementation methods:
- for representing the sources and owners of statements and the relations between statements,
  all that is required is that meta-statements must be usable ;
- regarding the edition protocol, it can be represented or implemented in many different ways:
  rules, functions, queries or constraints

The consequences of the basic kinds of edition protocol rules that I described are that
- the KB is "complete with respect to the advocated relations", that is, 
   results for queries on these relations are the same
   with or without the "closed-world hypothesis" and the "unique name assumption"
- the approach is generic since usable with any logic, hence any inference engine
- the approach is scalable since there are no content restriction, hence
    any user can see and exploit all the knowledge according to provided
    preferences, that is knowledge inferencing/filtering rules or criteria.
(1:30' -> 11'40) */