A Practical Method of Developing Natural Language Rule Statements (Part 2)
What is this series of articles about?
This is the second article in a series in which I describe a practical method of developing unambiguous natural language rule statements. I've developed this method for a large Australian government agency that has selected the Business Rules Approach and the Object Management Group's Semantics of Business Vocabulary and Business Rules (SBVR)[1] as representative of best rules practice.
We have been looking at some of the rules governing an online "Book Flights" facility provided by an airline. So far we have identified the following rule statements:[2]
RS8. Each flight booking request must specify exactly one departure date.
RS9. Each flight booking request for a return journey must specify exactly one return date.
RS11. A flight booking request for a one-way journey must not specify a return date.
These are based on the following fact types:
FT1. flight booking request specifies departure date
FT2. flight booking request is for journey
FT3. return journey is a category of journey
FT4. flight booking request specifies return date
FT5. one-way journey is a category of journey
Derivable Fact Types
If two fact types have a term in common, it may be appropriate to derive a fact type from them. For example,
FT6. flight booking request is for return journey
can be derived from FT2 and FT3, and
FT7. flight booking request is for one-way journey
can be derived from FT2 and FT5.
Rule Statement Templates
We noted that rule statements RS8 and RS9 are of similar form. This is a good thing: if all rule statements of a particular type use the same syntax it is easier to check whether any rules in our rulebook duplicate, overlap, or conflict with each other. The rule statement template
RT1. Each <term 1> {<qualifying clause>|} must <verb phrase 1> <cardinality> <term 2>.
can be used to generate rule statements RS8 and RS9, whereas
RT2. {A|An} <term 1> {<qualifying clause>|} must not <verb phrase 1> {a|an} <term 2>.[3]
can be used to generate rule statement RS11. In each of these statement templates:
- <term 1> and <term 2> are placeholders, in place of which terms may be substituted;
- <qualifying clause> is a placeholder, in place of which a qualifying clause may be substituted: one form of qualifying clause is "that <verb phrase 2> a <term 3>" but if <verb phrase 2> starts with "is", "that is" can be omitted;
- the symbols { | } allow for alternatives within a template: in these templates either <qualifying clause> or null may appear, i.e., <qualifying clause> is optional;
- <verb phrase> is a placeholder in place of which a verb phrase may be substituted;
- <cardinality> is a placeholder in place of which a cardinality may be substituted, e.g., exactly one, at least one, at most one.
Fact Types required by Rule Statement Templates
In both RT1 and RT2, <term 1> <verb phrase 1> <term 2> should be a fact type (possibly derived). Thus rule statement RS8 requires fact type FT1, while RS9 and RS11 require FT4.
If <qualifying clause> in RT1 and RT2 is "that <verb phrase 2> a <term 3>", <term 1> <verb phrase 2> <term 3> should be a fact type (possibly derived). Thus rule statement RS9 requires fact type FT6, while RS11 requires FT7.
Generating more Rule Statements
These templates can be used for statements of many other rules governing a Flight Booking Request or Flight Booking Confirmation (in which, having selected a flight or flights returned by the Flight Booking Request, the customer submits passenger identification, payment details, and contact details):
RS12. Each flight booking request must specify exactly one origin city.
RS13. Each flight booking request must specify exactly one destination city.
RS14. Each flight booking request must specify exactly one number of passengers.
RS15. Each flight booking request must specify at most one frequent flier membership.
RS16. Each flight booking confirmation must specify at least one passenger name.
RS17. Each flight booking confirmation must specify exactly one payment arrangement.
These are based on the following additional fact types:
FT8. flight booking request specifies origin city
FT9. flight booking request specifies destination city
FT10. flight booking request specifies number of passengers
FT11. flight booking request specifies frequent flier membership
FT12. flight booking confirmation specifies passenger name
FT13. flight booking confirmation specifies payment arrangement
Information Cardinality Rules — a Summary
All of the rules we've looked at so far are concerned with two types of requirement:
- the requirement for certain data to be present (or, in the case of RS11, absent), i.e., 'mandatory data' and 'prohibited data' rules;
- the requirement that there be only one instance of a particular data item in any given transaction.
The keywords exactly one, at most one, and at least one correspond of course to the 'mandatory data' and 'cardinality' rules well known to database designers, but other forms of <cardinality> may be used, such as exactly two, at most two, at least two, at least one and at most three, and so on.
Sub-templates
We can formalise this by establishing a template for <cardinality> as follows:
ST1. <cardinality> ::=
{exactly|at {least|most}|at least <positive integer> and at most} <positive integer>
Of course we could also allow for alternative expressions such as up to three, either one or two, from one to three inclusive, and so on, but unless business stakeholders insist on these forms it is better to keep alternative ways of expressing the same rule to a minimum.
More Complex Rules
Those among you who've been paying attention may have noticed that I've skirted around a few complexities in the rules I've listed so far.
Firstly, while the terms flight booking request, flight booking confirmation, return journey, one-way journey, payment arrangement, and frequent flier membership correspond to what a data modeller might call entities (or a UML modeller might call object classes), not all of the remaining terms correspond to simple attributes. A passenger name consists of family and given names and a salutation. And references to other entity instances may require multiple data items: for example, the airline may well reward members of multiple frequent flier schemes, in which case frequent flier membership would require not only a membership number but a loyalty scheme identifier. Similarly a payment arrangement requires specification of the type of card, the issuing company, the card number, the name on the card, the expiry (expiration) date, and the security number printed on the reverse of the card. We will look at such complex attributes and references in a subsequent article.
Another complexity is that the number of passenger names must be equal to number of passengers. While it is tempting to try to state this rule using a variant of template RT1, this would yield a formulation that only a mathematician would love:
RS18. Each flight booking confirmation must specify exactly n passenger names where n is number of passengers.
There is a more natural formulation that we will look at in a subsequent article.
Yet another complexity is that while at least one contact number must be provided, this may be either a mobile (cellphone) number or a combination of landline numbers in the origin and destination cities. In a subsequent article we will look at rules requiring the presence of at least one (or exactly one) of a number of alternatives.
'Data Content' Rules
Of course there are other rules about the data in a Flight Booking Request:
- The departure date must be no earlier than today (some airlines may require that it be later than today).
- The return date must be no earlier than the departure date.
- The origin and destination cities must both be cities served by the airline (possibly via a code share scheme).
- The origin and destination cities must be different.
- The number of passengers must be positive.
etc., etc.
These rules are concerned with the content of data items rather than whether or not they must be present and how many there may or must be. Each of these rules constrains the set of allowable values for a particular data item, and therefore the statement of each of these rules should have as its focus the name of that data item. In each case, the constraint applies to that data item in the context of Flight Booking Request rather than any other form, so that the rule statement in each case needs to qualify the name of the data item accordingly. That is, each rule statement will have the form:
RT3. The <term 1> that <verb phrase> each <term 2> must be <predicate>.
For each of the rules we are currently working with, <verb phrase> will be is specified by and <term 2> will be flight booking request. In each rule statement, as previously, "that is" can be omitted.
These rules are of three types.
- The rule requiring that both cities must be served by the airline is a common type of rule that constrains a data item to have a value from some set. Sometimes that set may be static and have only a few members (e.g., payment methods) but the set of cities served by the airline may change over time and has more than a few members, so we need to specify rather than enumerate the set.
- The rule requiring that the two cities be different is one of a common class of rules that require either that two things or parties are the same or are different.
- Each of the other rules listed above constrains some data item to have a value within a particular range.
Each of these three types of rule will need a slightly different form of predicate in its rule statement:
- A rule constraining a data item to a value from a discrete set needs a predicate of the form
"one of the <term> {<qualifying clause>|}", e.g.,
"one of the cities served by the airline".
- A rule requiring a data item to be the same as or different to some other data item needs a predicate of the form
"{the same as|different to} the <term> {<qualifying clause>|}", e.g.,
"different to the origin city specified by that flight booking request".
- A rule constraining a data item to a value within a continuous range needs a predicate of the form
"{no|} {greater|less|later|earlier} than
{<literal>|the <term> {<qualifying clause>|}}", e.g.,
"no earlier than today", "no less than one".
This can be expressed formally using sub-templates:
ST2. <predicate> ::=
{<value set predicate>|<match predicate>|<range predicate>}ST3. <value set predicate> ::=
one of the <term> {<qualifying clause>|}ST4. <match predicate> ::=
{the same as|different to} the <term> {<qualifying clause>|}ST5. <range predicate> ::=
{no|} {greater|less|later|earlier} than
{<literal>|the <term> {<qualifying clause>|}}
Inserting the appropriate predicates into template RT3 gives us the following rule statements:
RS19. The departure date specified by each flight booking request must be no earlier than today.
RS20. The return date (if any) specified by each flight booking request must be no earlier than the departure date specified by that flight booking request.
RS21. The origin city specified by each flight booking request must be one of the cities served by the airline.
RS22. The destination city specified by each flight booking request must be one of the cities served by the airline.
RS23. The destination city specified by each flight booking request must be different to the origin city specified by that flight booking request.
RS24. The number of passengers specified by each flight booking request must be no less than one.
These are based on the following fact type in addition to FT1, FT4, FT8, FT9, and FT10:
FT14. airline serves city
Alternative Verb Phrase Forms
By convention all verb phrases in fact types are in the third person singular present indicative form ("3PSPI" for short), e.g., specifies. Some alternative forms are however required in rule statements:
- After must, the infinitive form is required, e.g., specify in the case of specifies.
- If the rule is more easily stated using the terms of the fact type in reverse sequence, a reversed (often passive) form is required, either "3PSPI" or infinitive, e.g., is specified by or be specified by in the case of specifies, is categorised as or be categorised as in the case of is a category of.
A Reminder — Elision of "that is"
Wherever a term is immediately followed by a verb phrase in a rule statement, the words "that is" can be elided (removed without loss of meaning) and have been in each applicable rule statement so far in this article. Thus, for example, RS9
Each flight booking request for a return journey must specify exactly one return date.
is actually a shortened form of
Each flight booking request that is for a return journey must specify exactly one return date.
Similarly, RS21
The origin city specified by each flight booking request must be one of the cities served by the airline.
is actually a shortened form of
The origin city that is specified by each flight booking request must be one of the cities that is served by the airline.
While the shorter form is preferred, the job of establishing the underlying fact types can be made easier if "that is" is mentally re-inserted into each rule statement wherever a term is immediately followed by a verb phrase.
Toward a Rule Taxonomy
We have now looked at a few of the many types of rules that may be required:
- Cardinality rules: these state how many of a particular data item are required or allowed in a particular context.
- Mandatory data rules: these state that a particular data item is required in a particular context.
- Prohibited data rules: these state that a particular data item is not allowed in a particular context.
- Singular data rules: these state that only one instance of a particular data item is allowed in a particular context.
- Data content rules: these state constraints on the values that a data item may hold.
- Value set rules: these state that a particular data item must have a value from a particular discrete set.
- Match rules: these state that a particular data item must be the same as or different to some other data item.
- Range rules: these state that a particular data item must have a value from within a particular continuous range.
All of these are data rules: they place constraints on data supplied to or stored by a system.
Toward a Rule Statement Development Method
The essence of the method is quite simple:
- For each rule required:
- If the rule has been stated already in some form, ensure that it is stated only using standard terms from our fact model, either by replacing each non-standard term — after possibly making it a synonym of the corresponding standard term — or by adding the new term to our fact model. If the rule has not yet been stated, select the appropriate standard terms from our fact model to express each concept to which the rule statement will need to refer.
- Ensure that there is an unbroken chain of fact types in our fact model that link the terms needed in our rule statement. If necessary add any missing fact types.
- Establish what type of rule it is we are dealing with.
- Choose the appropriate template and substitution metarules (see the table below).
- Create your rule statement by substituting actual terms, verb phrases, keywords, names and/or literals in the chosen template and sub-templates.
- If the rule has been stated already in some form, ensure that it is stated only using standard terms from our fact model, either by replacing each non-standard term — after possibly making it a synonym of the corresponding standard term — or by adding the new term to our fact model. If the rule has not yet been stated, select the appropriate standard terms from our fact model to express each concept to which the rule statement will need to refer.
Rule type |
Template |
Substitution Metarules |
Mandatory data rule |
RT1 |
<cardinality> ::= |
Prohibited data rule |
RT2 |
|
Singular data rule |
RT1 |
<cardinality> ::= |
Value set rule |
RT3 |
<predicate> ::= |
Match rule |
RT3 |
<predicate> ::= |
Range rule |
RT3 |
<predicate> ::= |
To be continued...
In the next article we will look at some more rule types and templates as well as examine qualifying clauses in detail.
References
[1] Semantics of Business Vocabulary and Business Rules (SBVR), v1.0. Object Management Group (Jan. 2008). Available at http://www.omg.org/spec/SBVR/1.0/PDF
[2] The font and colour conventions used in this and other well-formed rule statements and fact types in these articles reflect those in the SBVR, namely underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals. Note that, for clarity, less than well-formed rule statements will not use these conventions.
[3] This template has been corrected since its initial posting. (04-01-09)
# # #
About our Contributor:
Online Interactive Training Series
In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.