3. Rule-based inferencing (II): Descriptors, keywords and expressions
Rules in knowledge-bases consist of keywords and descriptors.
Keywords are used to join together, in a logical form, a number of
descriptors, which are simply terms or phrases used to describe some object, event etc.
Descriptors may be any sequence of words or symbols but must not contain keywords (although they can contain the lower case versions of them). Descriptors are generally written in lower case, with normal capitalisation. The most important types of descriptors, discussed in this Chapter, are (i) constants, (ii) facts or attributes, (iii) named subjects (a special type of attribute) and (iv) rule names.
3.1 Attributes (or ‘Facts’)
An attribute is any descriptor (a sequence of words or symbols which does not contain a keyword) which is not a constant (see below). The purpose of attributes is to hold values which are determined during the evaluation of a knowledge-base. Every sequence of text in a knowledge-base which is not a keyword or a constant must therefore make sense as something which has a value. (Chapter 6 explains an exception where text follows the keyword TEXT)
Attributes are also referred to as ‘facts’ in the DataLex error messages.
Attributes and other descriptors have a maximum length of 4096 characters.
3.1.1 Consistent naming of attributes
Consistent naming of attributes, including consistency in capitalisation and punctuation, is vital to DataLex's operation. DataLex does not forgive inconsistency.
Lack of consistency is the principal cause of DataLex applications running other than as expected. Use the ‘Check Fact Cross References’ button to check for possible inconsistencies in naming of attributes.
DataLex cannot tolerate inconsistencies in either capitalisation or punctuation.
3.1.2 Boolean (true/false) attributes and their names
The default attribute type is boolean (that is, true/false). When naming boolean attributes, you should choose a name starting with a subject, then a verb (expressed in the positive or negative) and, optionally, an object.
For example, each of the following is a boolean attribute, correctly expressed:
Subject Verb Object
the claimant satisfies s23(1)
the circuit layout is in material form
section 9 applies
section 9 does not apply to bills of exchange
The purpose of the recommended subject/verb/object form is explained below in relation to the generation of questions and explanations (see 2.10 Generating questions and explanations).
3.1.3 Non-boolean attributes - types
DataLex recognises the following attribute
types:
Type | Values | Example |
BOOLEAN | T/F/U (default type) | See above |
INTEGER | whole numbers only | the number of applicants |
REAL | fractions accepted | the number of degrees tolerance |
STRING | a string of text | the alleged defamatory statement |
GENDER | male, female or unspecified | the gender of the claimant |
DOLLAR | dollars and cents | the value of the estat |
DATE | a date | the date of the intestate's death |
Non-boolean attributes are introduced in one of two ways: (i) automatically by use; or (ii) formally by a declaration.
3.1.4 Automatic type recognition of non-boolean attributes
If the first use of an attribute in a knowledge-base requires DataLex to recognise it as something other than boolean, that type is automatically associated with it. From then on, you must use the attribute consistently or an error message will result. In other words, DataLex is able to make an 'intelligent guess' about the type of non-boolean attribute that is intended, based on other aspects of the expression it is first found in. For example, in the expression 'IF the date of arrival IS GREATER THAN 1 May 1977', DataLex is able to work out that the attribute 'the date of arrival' is probably a non-boolean attribute of type DATE, because another date (1 May 1977) appears in conjunction with a relational operator.
While DataLex is generally accurate in recognising non-boolean attributes, it sometimes makes an error. This may be avoided or corrected by an explicit declaration of the type of the attribute.
The syntax for formal declarations is:
TYPE attribute-name
optionally followed by a list of
translations and
valid ranges (discussed below).
For example, to declare attributes to be of the types ‘DATE’ and ‘DOLLAR’:
DATE the date of the intestate's death
DOLLAR the value of the estate
Because of these declarations, or because of automatic recognition, DataLex would only accept responses from a user that were of the specified types.
Attribute declarations should appear outside of rules and procedures. Otherwise, they can appear anywhere in a knowledge-base, provided they appear somewhere in the knowledge-base prior to where the attribute is first used. It is often convenient to group them all at the start.
3.1.6 Range limitation of attribute values [advanced]
If there was a need to further limit the range of acceptable responses from the user (eg to dates only within a specified period, or to amounts less than a certain maximum), then a RANGE statement is available
The syntax is:
RANGE expression [TO expression]
The statement should appear immediately after a fact declaration. It may be used multiple times if there are many valid ranges. Where the optional TO
expression is used it indicates that the value for the fact should be between the result of the first expression and the result of the second expression. However, in this case the expressions must produce numeric results.
Some examples of RANGE statements are:
STRING the name of the intelligence agency
RANGE "ASIO"
RANGE "ASIS"
RANGE "DSD"
DOLLAR the value of the household chattels
RANGE Ø TO the value of the estate
3.1.7 Attribute names for non-boolean attributes
You must choose an attribute name which can be followed by an
‘is’ then a value so that DataLex can correctly provide prompts and translations. For example, the non-boolean attribute declarations given above will correctly result in the following prompts and (when answered) translations:
DATE the date of the intestate's death
What is the date of the intestate's death ?
The date of the intestate's death is 1st January 1991.
DOLLAR the value of the estate
What is the value of the estate ?
The value of the estate is $250,000.
DataLex cannot tolerate inconsistencies in either capitalisation or punctuation.
3.2 Constants
Whereas attributes have a variable value which is determined during the evaluation of a knowledge-base, a constant has a fixed value. DataLex recognises any of the following descriptors as constants: an
integer (eg 1000), a
real number (eg 7.15), a
dollar amount (eg $950 or $950.00), the words ‘
true’ and ‘
false’ (boolean constant) and the words ‘
male’ and ‘
female’ (gender constant), a
date (in any sensible format including the word
today), and any descriptor placed in
double quotes (a
string constant).
DataLex is generally able to automatically recognise constants, and to give them the correct type. If DataLex does not recognise a descriptor as being in any of these categories of constant, it assumes that the descriptor is an attribute.
Constants are used primarily in expressions which use binary operators (eg PLUS; EQUALS; IS LESS THAN; IN) and in assignment statements (see below).
3.3 Generating questions and explanations
One of DataLex's main features is its capacity to automatically generate questions (prompts) by re-parsing the attribute that it is attempting to find a value for, into an interrogative form (ie by re-parsing the part of the rule it is at present evaluating). Similarly, it can provide explanations by re-parsing rules that it has previously evaluated, substituting the values that it has established for those rules.
3.3.1 Automatic generation of questions (prompts) and explanations
Provided that boolean attribute names appear in the subject/verb/object form explained above (see 2.8.2 Boolean (true/false) attributes and their names), or non-boolean attribute names appear in the 'is' form explained above (see 2.8.2 Attribute names for non-boolean attributes), DataLex will normally be able to affect sensible translations automatically, for use during problem sessions. For the above examples, the following automatic prompts and translations would be generated by DataLex:
Does the claimant satisfy s23(1) ?
The claimant satisfies s23(1).
The claimant does not satisfy s23(1).
Is the circuit layout in material form ?
The circuit layout is in material form.
The circuit layout is not in material form.
Does section 9 apply ?
Section 9 applies.
Section 9 does not apply.
What is the date of the intestate's death ?
The date of the intestate's death is 1st January 1991.
Use the ‘Check Fact Translations’ button to check whether sensible prompts and translations are being generated.
DataLex re-parses all boolean attribute names into a consistent positive form for storage purposes, and so recognises different grammatical forms of the same attribute. For example, the following statements all refer to the same attribute:
the Act applies
the Act does not apply
the Act does apply
It therefore does not matter which form you use in a rule, as DataLex will normally understand that you are referring to the same attribute. In other words, different forms of the attribute can be used in different rules.
3.3.3 Verbs declarations - correcting DataLex's grammar
While DataLex's ability to ‘understand’ and generate different grammatical forms of the same attribute is reasonably sophisticated, it sometimes makes errors in translating verbs into different tenses.
For boolean attribute names, the most important component of the describing sentence is the verb. DataLex only knows about a list of about 1,000 common verbs. For the remainder it simply makes an educated guess (ie it uses a fairly simple set of heuristic rules concerning the behaviour of verbs). DataLex has to be able to locate the verb and transform its plurality and tense.
Where DataLex makes a mistake, its behaviour can be altered by a declaration specifying that a word is a verb and giving the appropriate forms. The syntax for this is:
VERBS {
present-tense|pluralpast-tense|past-participle }
as in:
VERBS
oversee|oversees|oversaw|overseen
sell|sells|sold|sold
This declares that the forms of ‘oversee’ are ‘oversees’ (plural), ‘oversaw’ (past), and ‘overseen’ (past-participle).
This
verbs declaration should appear outside of other declarations such as rules and procedures and should appear at the start of a knowledge-base.
Where DataLex cannot recognise a verb, this can sometimes be remedied by putting the word 'will' or 'does' in front of the verb in the attribute, because DataLex recognises 'will .......' as a compound verb.
3.3.4 Adding your own attribute translations - PROMPT and TRANSLATE [advanced]
One of the main purposes of DataLex's automatic re-parsing of rules to produce prompts and explanations is so that there is normally no need to maintain separate bodies of text for each attribute, with all the complications this implies for development and maintenance.
However, if the automatic parsing performed by DataLex is inadequate for some reason, it is possible to ‘override’ it and to declare what the prompt and translation should be for a particular attribute.
For example, the attribute ‘the date of death of the intestate’ would normally generate the prompt ‘What is the date of death of the intestate?’ and the translation would be ‘The date of death of the intestate is ....’. This can be altered by adding PROMPT and TRANSLATE statements after an attribute type declaration for the attribute. For example:
DATE the date of death of the intestate
PROMPT when did the intestate die
TRANSLATE AS the intestate died on <>
The use of angle brackets (ie <>) without an attribute name causes the value of the attribute being evaluated to be substituted.
Where an attribute has more than one possible value, different translations for each value may be provided. For example:
INTEGER the number of surviving children
PROMPT how many children survived the intestate
TRANSLATE 0 AS no children survived the intestate
TRANSLATE 1 AS one child survived the intestate
TRANSLATE AS <> children survived the intestate
Where no value appears (as in the last TRANSLATE statement above) this is used as the default translation for values which do not match any of the other TRANSLATE statements.
Avoid using your own attribute prompts or translations if possible. DataLex knowledge-bases are easier to maintain if translations are automatic.
3.4 Named subjects - names of people and things
Attribute descriptors often contain references to persons and things as their subjects (eg ‘the intestate’, ‘the property’). By default, the generated prompts and translations just use these embedded subject descriptions literally. If you wish, you can have these automatically replaced with names, pronouns and possessives. Subjects which are to be treated in this way are referred to as
named subjects.
The use of named subjects allows you to instantiate the dialogues that DataLex generates, making them appear much more responsive to the answers you have already given.
Use named subjects wherever possible, as they improve communication.
3.4.1 Named subject declarations
Named subjects are a set of special attributes. They are declared in the same way as attributes, but are given the types
PERSON,
THING or
PERSONTHING. When an attribute containing a defined subject is first evaluated, automatic prompts for a subject name and, in the case of persons, the subjects' sex, will be issued. Where the type is
PERSONTHING, the subject may be either a person or a thing (eg where either a natural person or a company may be a subject). A prompt (
Is x a natural person ?) will be issued to determine this.
Examples:
PERSON the claimant
THING the agreement
PERSONTHING the first party
PERSON the intestate
Once a named subject is declared, DataLex will recognise it as a named subject in any subsequent part of the rule-base, without need for any further identification of it as such. Named subjects referred to in other attributes are recognised automatically, and their values are substituted in the other attributes.
For example, where there have been named subject declarations such as the ones above, an attribute in a rule such as 'the claimant has made a statutory declaration concerning the agreement' would generate a prompt such as 'Has John Smith made a statutory declaration concerning the Contract of Insurance?'.
3.4.2 The automatic attribute declarations [advanced]
When a named subject is declared, it results in up to another three automatic attribute declarations. These take the following forms:
the name of subject (set for all types)
the gender of subject (set for PERSONS and PERSONTHINGs)
subject
is a natural person (set only for PERSONTHINGs)
These automatically declared attributes can be manipulated just like normal ones. The
types are STRING, GENDER and BOOLEAN respectively. This allows you to work out whether or not a
PERSONTHING is a natural person.
It also allows you to change the default prompts and translations, as in:
PERSONTHING the claimant
STRING the name of the claimant
PROMPT please enter the claimants' name
TRANSLATE AS the claimants' name is
BOOLEAN the claimant is a natural person
TRANSLATE true AS the claimant is a natural person
TRANSLATE false AS the claimant is a company
GENDER the gender of the claimant
PROMPT what is the claimant's preferred gender
TRANSLATE male AS the claimant identifies as being male
TRANSLATE female AS the claimant identifies as being female
TRANSLATE unspecified AS the claimant's preferred gender is non-binary or unspecified
3.5 Variable attributes [advanced]
An important aspect of DataLex is that it allows legal knowledge to be represented in something approaching English ('quasi natural language' knowledge representation). This is one reason why propositional logic is used as the form of representation, as opposed to predicate calculus. Predicate logic is, however, more powerful. One of its advantages is that it allows rules where there may be a number of instances of an attribute which need to be considered in the one problem session (eg the attribute 'is a child of the intestate' may be satisfied by three children, all of whom may need to be considered).
Variable attributes have been introduced into DataLex as an experimental way of dealing with such problems. However, they detract from the 'English-like' nature of the syntax and should only be used sparingly.
A variable attribute is allowed to contain one (only) variable element, which element is represented as <>. Whenever DataLex encounters this <> symbol in a rule, it looks for instance of the attribute in other rules which are identical except that they have the variable element 'filled in'. These instances of the variability are then 'read into' the rule under consideration. In effect, DataLex creates multiple versions of the rule under consideration, one for each instance of the variable element being satisfied. In any expression containing the <> variable, each instance of the <> variable will be given the same value. DataLex then proceeds to process whichever version of the rule is satisfied on the facts given. A variable attribute is therefore a shorthand way of writing multiple rules with slightly different wordings.
For example, s32(4) of the Copyright Act 1968 (Cth) specifies whether a person is a 'qualified person' in determining whether a work is protected by copyright. Various different timing and other conditions can satisfy the requirements for a 'qualified person'. The rule below shows that only one rule need be written to capture this.
RULE Copyright Act 1968 s32(4) PROVIDES
the author was a 'qualified person' <> under s32(4) ONLY IF
the author was an Australian citizen <> OR
the author was an Australian protected person <> OR
the author was a person resident in Australia <>
If the system needs to determine at any time a value for the attribute 'the author was a 'qualified person'
at the time the work was made under s32(4)' (emphasis added), in order to process another rule, the above rule will cause the following questions to be asked:
Was the author a 'qualified person' at the time the work was made under s32(4)? [emphasis added]
Was the author was an Australian protected person at the time the work was made under s32(4)? [emphasis added]
Was the author was a person resident in Australia at the time the work was made under s32(4)? [emphasis added]
If the answer to any of these is 'yes', the rule will fire and the attribute 'the author was a 'qualified person'
at the time the work was made under s32(4)' will obtain a 'true' value.
Similarly, if the system needs to know a value for the attribute, 'the author was a 'qualified person'
for a substantial part of the period during which the work was made under s32(4)' (emphasis added), the rule will ask the appropriate questions to obtain a value for this attribute.
In other words, one variable rule can be used to obtain values for numerous similar but not identical attributes which have similar conditions for their satisfaction.
Variable attributes should only be used sparingly and with considerable care.
3.6 Expressions - the use of operators
An expression consists of attribute and constant references, connected by operators (types of keywords). Expressions are used to build more complex statements. Attribute names and constants have already been discussed. Operators therefore describe relationships between two attributes (in the case of binary operators), or (in the case of a Unary operator) transform an existing attribute. The available operators (in order of precedence) are:
3.6.1 (Pre) Unary Operators
NOT boolean NOT
DAY extract day from date
MONTH extract month from date
YEAR extract year from date
3.6.2 (Post) Unary Operators
DAYS date days multiplier
WEEKS date weeks multiplier
MONTHS date months multiplier
YEARS date years multiplier
3.6.3 Binary Operators
DIVIDED BY arithmetic division
TIMES arithmetic multiplication
PLUS arithmetic addition
MINUS arithmetic subtraction
IN relation in (substring)
EQUALS relational equality
NOT EQUALS relational inequality
IS GREATER THAN relational greater than
IS LESS THAN relational less than
IS GREATEREQUAL THAN relational greater equals
IS LESSEQUAL THAN relation less or equal
AND boolean conditional AND
OR boolean conditional OR
(The normal AND and OR; AND has higher binding strength than OR; DataLex ceases evaluation of expressions where an ‘AND’ condition fails or an ‘OR’ condition is satisfied, and does not evaluate the other arguments in the expression)
AND/OR boolean conditional OR (high binding)
(A special OR with a higher binding strength than AND; use instead of BEGIN-END pairs to ensure the order of evaluation)
AND/WITH boolean non-conditional AND
OR/WITH boolean non-conditional OR
AND/OR/WITH boolean non-conditional OR (high binding)
(Special AND and OR operators where DataLex continues to evaluate the other arguments in the expression even though an ‘AND’ condition fails or an ‘OR’ condition is satisfied; Used to force DataLex to evaluate all alternatives.)
3.6.4 Examples of the use of expressions
the year in which the layout was made PLUS 10
the date of death PLUS 50 YEARS
YEAR the date of death
the value of the estate IS GREATER THAN 0