2. Rule-based inferencing (I): Knowledge-bases and rules
2.1 Introduction
The DataLex inferencing software is an Internet-based expert system shell for the development of inferencing systems (sometimes called ‘expert systems’) in the legal domain. It may be used to develop systems incorporating rule-based inferencing (discussed in this Chapter and the two following chapters), example or case-based inferencing (Chapter 7), and automated document assembly (Chapter 6). The User Interface Manual is in Chapter 8.
2.1.1 Levels of complexity with DataLex
DataLex is very simple to use to create small practice expert systems, at least for most types of statute-based applications. This is because all you have to do, to get a small system up and running, is to paraphrase a section or two of an Act into a somewhat strict logical form, using logical connectors such as IF, THEN, AND and OR. The result is a knowledge-base, expressed in DataLex's ‘English like’ knowledge representation language. The DataLex inference engine then does the rest, running your knowledge base to generate a dialogue with the user, asking questions and giving answers. You don't write any of the questions or answers – DataLex generates them automatically from your knowledge-base.
However, while DataLex can be used easily by relying only on a small number of its features, the DataLex engine has a very powerful and complex range of features which can be used as you proceed to develop more sophisticated applications.
The easiest way to understand how a DataLex knowledge-base is written is to study the examples given in this and the following Chapters, and then to use this Manual to explain features that you don't understand fully.
2.1.2 Main features
The main features of DataLex are:
- a 'quasi-natural-language' or English-like syntax, which encourages isomorphism (similarity between the structure of a knowledge-base and source legal documents), transparency (purpose of rules is relatively obvious) and rapid prototyping (easy to get small systems running);
- rules of any degree of complexity may be written, using propositional logic;
- backward and forward chaining rule-based inferencing;
- conventional procedural code including mathematical calculations;
- a form of reasoning by analogy, or example-based reasoning; and
- a document generation facility.
DataLex is therefore a fairly versatile tool with which a variety of inferencing applications may be created.
2.1.3 User commands and the DataLex User Interface Manual
The
DataLex User Interface Manual, in Chapter 8 of this Manual,explains the interface to DataLex applications when they are running, from a user perspective. It should be read either before or in conjunction with this Chapter.
2.1.4 Developing a DataLex application – Where is the developer’s interface?
See Chapter 1, sections 1.4 and 1.5 for the two ways to do this.
The developer’s interface for DataLex applications is primarily within the DataLex section of the AustLII Communities environment <
http://austlii.community/wiki/DataLex/ >. There is also a development environment located outside AustLII Communities which is used for teaching and development of test knowledge-bases, the ‘DataLex knowledge-base Development Tools’ <
http://www.datalex.org/dev/import/ > .
2.2 Knowledge-bases and rules
A knowledge-base is a set of declarations, so called because they 'declare' items of knowledge about a subject area. This type of programming is therefore called 'declarative' programming, in contrast to 'procedural' programming, which is of the form 'first do this step; then do this step ....'. The author of a DataLex application therefore creates a 'knowledge-base' rather than a 'program'.
2.2.1 Knowledge-bases as sets of rules
The most important category of declarations in DataLex is rules (so knowledge-bases are often called rule-bases). A knowledge-base, at its simplest, is therefore a set of rules. When the rule-base is 'run' by DataLex it attempts to find the truth of a specified fact.
It does this by going to a rule which has that fact as its conclusion and examining each of the premises of that rule to determine whether the conclusion of the rule is true. In evaluating the premises of a rule DataLex uses any other rules which have any of the premises as their conclusion. DataLex repeats this process along each branch of reasoning until it reaches a premise for which there is no rule to derive a conclusion. At this point, DataLex interrogates the user about the truth of the premise. It does not generally matter, therefore, in which order the rules occur. Rather, DataLex searches the knowledge-base for relevant rules in relation to each fact or premise which it is evaluating.
In its simplest form, a rule contains four elements:
(i)
the keyword ‘RULE’, indicating the start of a new rule (in the absence of any specification otherwise, the rule will be both backward chaining and forward chaining);
(ii)
the name of the rule (usually just the name of the Act and section that it paraphrases); The name of a rule should differ from that of any other rule in the rule-base;
(iii)
the keyword ‘PROVIDES’, indicating the start of the body of the rule; and
(iv)
the statement(s) which make up the inferencing content of the rule (one or more statements). Statements consist of declarations. One of the simplest forms of a statement is 'IF condition THEN conclusion'.
The simplest syntax for a rule is therefore as follows:
RULE name
PROVIDES statements
The example below shows a rule with one moderately complex set of statements:
RULE Freedom of Information Act 1982 (Cth) s11 PROVIDES
a person has a legally enforceable right under s11 to obtain access to a document ONLY IF
s11(a) applies OR
s11(b) applies
2.3 Content of rules - keywords and descriptors
Knowledge-base rules consist of keywords and descriptors.
Keywords are used to join together, in a logical form, a number of
descriptors, which are simply terms or phrases used to describe some object, event etc.
2.3.1 Keywords
Keywords give rules the logical structure used by DataLex to draw inferences. They are written in FULL UPPER CASE so DataLex can distinguish them from their equivalents in ordinary words (which may occur in descriptors).
Some examples of important keywords, or sets of keywords are: ONLY IF; IF .... THEN; IF ... THEN .... ELSE; IS; AND; OR; PLUS; MINUS ; PERSON; THING.
These and other keywords have functions in a DataLex knowledge-base which is very similar to their normal linguistic function as words. This correspondence is a large part of what gives DataLex a 'quasi natural language' or 'English like' syntax.
There is a list of keywords which may be used with DataLex at the end of this Chapter.
DataLex is very case-sensitive. It expects keywords to be in FULL UPPER CASE.
2.3.2 Descriptors
Descriptors may be any sequence of words or symbols but must not contain keywords (although they can contain the lower case versions of them). Descriptors are generally written in lower case, with normal capitalisation. See Chapter 3 for details of how descriptors should be written in order to work best in DataLex.
In the example below, some descriptors used are ‘a person has a legally enforceable right under s11 to obtain access to a document’, ‘s11(a) applies’ and ‘the document is not an exempt document’. These are all attributes.
There are a number of varieties of descriptors, of which the most important are (i) constants, (ii) facts or attributes, (iii) named subjects (a special type of attribute) and (iv) rule names. Each is discussed in detail in the following chapter. A type of attribute used only in documents (see Chapter 6) is called ‘text’.
First, however, a simple example of a rule, and how to make it run, is given.
2.4 Example of a rule – FOI Act s11
2.4.1 The section
The
Freedom of Information Act 1982 (Cth) s11 reads:
11. Subject to this Act, every person has a legally enforceable right to obtain access in accordance with this Act to –
(a) a document of an agency, other than an exempt document; or
(b) an official document of a Minister, other than an exempt document.
A rule-base of 3 rules consisting solely of this section could read as follows. The rules have been (over-)simplified, for demonstration purposes, by ignoring the words ‘subject to this Act’ in s11.
2.4.2 A rule-base of 3 rules
RULE Freedom of Information Act 1982 (Cth) s11 PROVIDES
a person has a legally enforceable right under s11 to obtain access to a document ONLY IF
s11(a) applies OR
s11(b) applies
RULE Freedom of Information Act 1982 (Cth) s11(a) PROVIDES
s11(a) applies ONLY IF
the document is a document of an agency AND
the document is not an exempt document
RULE Freedom of Information Act 1982 (Cth) s11(b) PROVIDES
s11(b) applies ONLY IF
the document is an official document of a Minister AND
the document is not an exempt document
2.5 Running and de-bugging a DataLex application
Text, such as that above, is all that is needed for a valid knowledge-base. The knowledge-base can be invoked as a DataLex session by selecting the ‘Run Consultation’ button.
If a knowledge-base does not behave as intended, go back to the editing page, edit the rule, and run it again. The main purpose of the type/paste window on the manual start page is to allow the developer to test minor changes to rules without having to create a new web page each time in order to do so.
2.5.1 Debugging
In addition to the ‘Run Consultation’ button, there are two additional buttons which allow you to check for some types of errors in your knowledge-base, either before you try to run it, or after you so, and it does not perform quite as expected. They are ‘Check Fact Cross References’ and ‘Check Fact Translations’.
There is also another debugging tool that can be used while the application is running, Verbose Mode (see 6.11.2).
2.5.2 Check Fact Cross References
Use of similarly named but not identically named attributes is one of the main causes of errors in YSH knowledge-bases, particularly where rules which are supposed to chain do not do so. The ‘ Check Fact Cross References’ button allows you to check for such errors.
The ‘Check Fact Cross References’ button causes each fact/attribute to be printed (in alphabetical order - except where it begins with an hypertext link) showing the names of rules which set (*) and rules which use (-) the attribute. Named subjects are also listed.
Use of similar but not identical attributes is one of the main causes of errors in DataLex. The ‘Check Fact Cross References’ button allows you to check for such errors.
2.5.3 Check Fact Translations
Use of the ‘Check Fact Translations’ button enables you to check that your attributes are expressed correctly.
For each attribute in the knowledge-base, in the order in which they occur, it shows: (i) prompts (questions); (ii) a translation in positive form; and (iii) a translation in negative form. For example, the interrogative, positive and negative translations of the attribute ‘s11(a) applies’ are as follows:
-Does s11(a) apply?
-S11(a) applies.
-S11(a) does not apply.
Use the ‘Check Fact Translations’ button to check that your attributes are expressed correctly.
2.6 Some style guidelines for DataLex applications
Although DataLex is designed to be fairly flexible, it is worth bearing in mind the following guide-lines for developing rule-bases:
2.6.1 Simplicity
Try to aim for simplicity wherever possible. Complicated kludges and workarounds detract from the readability of the code and can have unexpected repercussions, particularly when the knowledgebase is later expanded or changed. Don't use facilities simply because they are available.
2.6.2 Isomorphism
Where the knowledge-base represents rules from a legal source document such as a piece of legislation, try to directly translate the statutory rules into DataLex rules, observing as far as possible the order and grouping of the legislative rules, and adding as little interpretation as possible. Keep other rules, such as interpretation or 'common sense' rules which do not derive directly from the legislation, in a separate part of your rule-base.
2.6.3 Small rules
Avoid large and complicated rules. Small rules are easier to understand and will assist with automatic explanations.
2.6.4 Attribute Names
Include the legal basis for attributes in their descriptors, as in
the layout is in "material form" as defined in s.5. This will make for more meaningful explanations. Avoid using unnecessarily long descriptors. These make for convoluted questions and explanations. Do not use the translation and prompt options unnecessarily. Try changing the attribute name to get DataLex to handle it properly,
first. Avoid use of embedded attributes.
2.6.5 Rule Types
Use only the default rule type unless you have a good reason for doing otherwise. Forward chaining rules and daemons should generally only be used to alter the operation of rules encompassing knowledge rather than to embody knowledge themselves.
2.6.6 Declarative Representation
Do not represent knowledge procedurally using DETERMINE and CALL statements except where unavoidable. Avoid being concerned about the actual operation of knowledge-rich rules and instead concentrate on
describing the item of knowledge with which you are dealing.
Avoid relying on comments to understand your code. The code should largely be transparent. However, you can use comments to indicate what legislative provisions you have omitted from your DataLex representation.