A Practical Method of Developing Natural Language Rule Statements (Part 5)
What is this series of articles about?This is the fifth article in a series in which I describe a practical method of developing unambiguous natural language rule statements. I've developed this method for a large Australian government agency that has selected the Business Rules Approach and the Object Management Group's Semantics of Business Vocabulary and Business Rules (SBVR)[1] as representative of best rules practice.
The story so far
We've been looking at some of the rules governing an online "Book Flights" facility provided by an airline. So far we've created a set of rule statements, the fact types on which the rules are based, and some rule statement templates and sub-templates for generating rule statements. Note that a number of rule statements in the previous article were provided without the associated fact types: as an exercise you might like to work out for yourselves what those fact types should be. All fact types, along with the rule statements, templates, and sub-templates are listed in the [sidebar].
We also made a start on a taxonomy of rules (see [2]) as well as a rule statement development method based on selection of the appropriate template and sub-template(s) for each type of rule. We shall extend this taxonomy in this article.
The font and colour conventions used in these articles reflect those in the SBVR.[3]
An erratum: enhancing a template
Up to now I've been asserting that template RT3 (reproduced below) can be used to generate rule statements RS19 to RS24 inclusive. However, as the eagle-eyed among you may have already noticed, RS20 (also reproduced below) with its (if any) phrase requires an enhancement to RT3, in the form of RT9 below.
RT3. The <term 1> that <verb phrase> each <term 2>
must be
<predicate>.RS20. The return date (if any) specified by each flight booking request must be no earlier than the departure date specified by that flight booking request. RT9. The <term 1> {(if any)|} that <verb phrase> each <term 2>
must be
<predicate>.
The (if any) phrase should be used in rule statements generated from this template where the subject term is optional (e.g., return date) but not where the subject term is mandatory (e.g., departure date).
Subject and focus
From examination of the templates in the [sidebar] it should be clear that each template has:
- a subject phrase, in each case <term 1> preceded by an article (The, A, or An), the quantifier Each, or The <set function> of {the|}
- a verb phrase in the modal form must <verb phrase>, must be, or must specify
- an object phrase.
It is important that the subject of each rule statement refer to the focus of the rule (i.e., what is governed by the rule) rather than some other aspect of the rule.
In the case of a cardinality rule (for which the rule statement will be based on template RT1, RT2, RT6, RT7, or RT8), it might be thought that the rule governs the data item that must or must not be present. However, as we saw in the first article in this series (see [4]), making the data item the subject of the rule statement makes the rule statement ineffective. This is because, as worded, it appears to place an obligation on where to specify the data item (e.g., departure date) if there is one, but doesn't actually place any obligation on the content of a transaction or form (e.g., flight booking request) in which that data item is required.
The subject of a cardinality rule must always be the transaction or form in which the data item is required to be present or absent. However the subject of a data content rule statement (using template RT4 or RT9) must always be the data item (or a set function of that data item) that is constrained.
The importance of making the appropriate concept the subject of a rule statement is highlighted by the following malformed rule statement, which might be produced to ensure that heavy bags are suitably marked so as to avoid injury to baggage handlers:
RS36. A 'heavy bag' label must be affixed to a bag that weighs more than 20kg.
What this rule statement actually mandates is what to do if you have a 'heavy bag' label rather than what to do if you have a bag weighing over 20kg. To put it another way, a 'heavy bag' label not affixed to a bag violates this rule statement, whereas a bag weighing over 20kg without a label doesn't violate this rule statement. The correct formulation of a rule statement to reflect the actual rule is:
RS37. Each bag that weighs more than 20kg must be labelled with a 'heavy bag' label.
Although a 'heavy bag' label can be considered as an item of data, this is in fact a process rule rather than a data rule. We shall look at process rules and their templates in a later article. In the meantime, however, remember that each rule statement is based on fact types, in this case:
FT34. bag weighs weight FT35. bag is labelled with label FT36. 'heavy bag' label is a category of label
More complex range predicates
If a flight booking request complies with all relevant rules, it is acceptable as input to a process that will display at least some of the flights between the specified cities on the specified date(s). The flight arranger can then fill in a flight booking confirmation, specifying, among other things, the actual flights.
If the return flight is on the same day as the outgoing flight, there must be a minimum interval (typically 1 hour) between the arrival of the outgoing flight and the departure of the return flight. This can be expressed using rule statement RS38: to help readers understand the structure of this rule statement, "that is" has been included in each place from which it might be omitted, but struck out, so as to indicate that the rule statement is more natural if it is omitted:
RS38. The departure time thatisof the return flight (if any)thatisspecified by each flight booking confirmation must be no earlier than 1 hour after the arrival timethatisof the outgoing flightthatisspecified by that flight booking confirmation.
The most likely template from which to generate this rule statement is RT9 (see above) but it requires:
- an enhancement (to RT9 itself) to allow for the term representing the subject data item to be qualified;
- an enhancement to the range predicate syntax provided by sub-template ST5 (reproduced below).
ST5. <range predicate> ::=
{no|} {greater|less|later|earlier} than
{<literal>|the <term> {<qualifying clause>|}}
Template RT9 only allows for a subject phrase consisting of a term representing a data item (e.g., departure time) qualified by a term representing a transaction (e.g., flight booking confirmation). Rule statement RS38 requires that the term representing the data item be firstly qualified in terms of something other than the transaction (in this case, which of the two flights we are talking about).
It is therefore necessary to replace RT9 with RT10, which provides for the additional qualifying clause.
RT10. The <term 1> {<qualifying clause>|} {(if any)|}
that <verb phrase> each <term 2>
must be
<predicate>.
Sub-template ST5 allows for comparison with either a literal (e.g., one or today) or a term, possibly qualified, referring to some other data item. Rule statement RS38 requires comparison with a time displaced from a particular defined time, i.e.,
{no|} {later|earlier} than <literal> {after|before} the <term> {<qualifying clause>|}.
The most natural formulation for non-time comparisons of this kind is however a bit different, taking the form
at {least|most} <literal> {more|less} than the <term> {<qualifying clause>|}.
It is therefore necessary to replace ST5 with ST16 – ST18, which provide for the different formulations for time and non-time predicates.
ST16. <range predicate> ::=
{<time range predicate>|<non-time range predicate>}ST17. <time range predicate> ::=
{no|} {later|earlier} than
{<literal>|
{<literal> {after|before}|} the <term> {<qualifying clause>|}}ST18. <non-time range predicate> ::=
{{no|} {more|less} than} <literal>|
{{no|} {more|less} than|at {least|most} <literal> more than} the <term> {<qualifying clause>|}}
Sub-template ST17 allows for the following broad types of time range predicate:
• {no|} {later|earlier} than <literal> • {no|} {later|earlier} than the <term> {<qualifying clause>|} • {no|} {later|earlier} than <literal> {after|before} the <term>
{<qualifying clause>|}
By contrast sub-template ST18 allows for the following broad types of non-time range predicate:
• {no|} {more|less} than <literal> • {no|} {more|less} than the <term> {<qualifying clause>|} • at {least|most} <literal> more than the <term>
{<qualifying clause>|}
A similar constraint applies to the departure time of the outgoing flight, in that an outgoing flight that departs in less than a certain interval from the time of making the booking cannot be selected. This interval will, in practice, vary depending on whether the booking is made online, over the phone, or at the airport (in the latter two situations some discretion on the part of the sales representative may be allowed). Considering for the moment only the online situation, the necessary constraint can be expressed using rule statement RS39:
RS39. The departure time of the outgoing flight specified by each flight booking confirmation that is made online must be no earlier than 3 hours after the booking confirmation time of that flight booking confirmation.
Template RT10 only allows for a subject phrase consisting of an optionally qualified term representing a data item (e.g., departure time), qualified in turn by a term representing a transaction (e.g., flight booking confirmation). Rule statement RS39 requires that the term representing the transaction also be qualified (in this case, in terms of the channel by which the transaction is executed).
It is therefore necessary to replace RT10 with RT11, which provides for the additional qualifying clause.
RT11. The <term 1> {<qualifying clause>|} {(if any)|}
that <verb phrase> each <term 2> {<qualifying clause>|}
must be
<predicate>.
By the way, rule statement RS38 is based on the following fact types:
FT37. flight has departure time FT38. flight booking confirmation specifies flight FT39. outgoing flight is a category of flight FT40. return flight is a category of flight FT41. flight has arrival time
Similarly, rule statement RS39 is based on the following fact types in addition to those underlying RS38:
FT42. flight booking confirmation is made online FT43. flight booking confirmation has booking confirmation time
Another ambiguous formulation
It is important to note that sub-template ST8 explicitly excludes an ambiguous formulation, namely "at {least|most} <literal> less than". Consider "at least 2 less than the number of passengers": does the quantity that is 3 less than the number of passengers comply with that predicate or not? If this predicate is intended to mean "(at least 2) less than the number of passengers", that quantity does comply with the predicate, but, if it means "at least (2 less than the number of passengers)", it doesn't. However, despite their use in mathematical formulae, parentheses are not used in this way in natural language.
More alternative wordings
Note that in sub-template ST8 I surreptitiously replaced greater (as in ST5) with more. While greater is acceptable in a simple numeric comparison (and might be favoured by mathematicians or programmers), more is more natural in a wider variety of contexts. There are some other alternative wordings that your business stakeholders might favour. Recall ST1:
ST1. <cardinality> ::=
{exactly|at {least|most}|at least <positive integer> and at most} <positive integer>
At the government agency I've been working for I found that some preferred no more than to at most (although there was no corresponding preference for no less than rather than at least). Others preferred just rather than exactly. If consensus can be obtained, all that is required is to modify the relevant (sub-)templates accordingly, although I advise against including both alternatives in the one (sub-)template, for two reasons:
- The (sub-)templates become unnecessary complex.
- If both wordings are allowed, it is harder to compare rule statements for duplication, overlap, or conflict.
The rule taxonomy extended
Of the many types of rules that may be required, here are those we have looked at so far:
- Cardinality rules: data must be present or absent, and/or is restricted in terms of the number of instances.
- Mandatory data rules: one or more data items are required in a particular context.
- Mandatory data item rules: a particular single data item must be present.
- Mandatory option selection rules: one of a set of pre-defined options must be specified.
- Mandatory group rules: at least one of a group of data items must be present.
- Prohibited data rules: a particular data item is not allowed in a particular context.
- Singular data rules: only one instance of a particular data item is allowed in a particular context.
- Dependent cardinality rules: the number of instances of a data item depends on some other data, as in rule statement RS25.
- Data content rules: data is constrained to certain values.
- Value set rules: a data item must have a value from a discrete set.
- Match rules: a data item must be the same as or different to some other data item.
- Range rules: a data item must have a value from within a continuous range.
For each type of rule there is a particular template or templates to be used. There may be metarules as to which particular template should be used in what circumstance or what substitution(s) of particular syntactic elements are allowed. These are set out in the accompanying table.
Rule type |
Template |
Metarules |
Mandatory data item rule |
RT1 |
<cardinality> ::= |
Mandatory option selection rule |
RT5 |
|
Mandatory group rule |
RT6 |
if either or both of 2 items in the group |
RT7 |
if only 1 of 2 items in the group |
|
RT8 |
if more than 2 items in the group |
|
Prohibited data rule |
RT2 |
|
Singular data rule |
RT1 |
<cardinality> ::= {exactly|at most} one |
Dependent cardinality rule |
RT4 |
<set function> ::= number |
Value set rule |
RT11 |
<predicate> ::= <value set predicate> |
Match rule |
RT11 |
<predicate> ::= <match predicate> |
Range rule |
RT11 |
<predicate> ::= <range predicate> |
To be continued...
In subsequent articles we will look at further rule types, templates, and sub-templates, including those for conditional clauses. We will also explore some techniques for rule statement quality assessment, including identification of redundant and conflicting rule statements.
References
[1] Semantics of Business Vocabulary and Business Rules (SBVR), v1.0. Object Management Group (Jan. 2008). Available at http://www.omg.org/spec/SBVR/1.0/PDF
[2] Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 2)," Business Rules Journal, Vol. 10, No. 3 (Mar. 2009), URL: http://www.BRCommunity.com/a2009/b468.html
[3] The font and colour conventions used in these articles reflect those in the SBVR, namely underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals. Note that, for clarity, less than well-formed rule statements will not use these conventions.
[4] Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 1)," Business Rules Journal, Vol. 10, No. 2 (Feb. 2009), URL: http://www.BRCommunity.com/a2009/b461.html
# # #
About our Contributor:
Online Interactive Training Series
In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.