A Practical Method of Developing Natural Language Rule Statements (Part 6)

Graham   Witt
Graham Witt Consultant / Author, Read Author Bio || Read All Articles by Graham Witt
What is this series of articles about?

This is the sixth article in a series in which I describe a practical method of developing unambiguous natural language rule statements.  I've developed this method for a large Australian government agency that has selected the Business Rules Approach and the Object Management Group's Semantics of Business Vocabulary and Business Rules (SBVR)[1] as representative of best rules practice.

The story so far

We've been looking at some of the rules governing an online "Book Flights" facility provided by an airline.  So far we've created a set of rule statements, the fact types on which the rules are based, and some rule statement templates and sub-templates for generating rule statements.

We've also been developing a taxonomy of rules as well as a rule statement development method based on selection of the appropriate template and sub-template(s) for each type of rule.

The taxonomy, templates, and sub-templates are listed in the [sidebar] along with all valid rule statements and supporting fact types.

The font and colour conventions used in these articles reflect those in the SBVR.[2]

Complex data items

Some data items in a form may have internal structure.  Postal addresses are good examples of this.  Although addressing standards vary from country to country, there is typically a requirement that a postal address contain a street name and number (typically the first of two or more "address lines"), a placename (city, town, suburb, or locality name), a postal code, and a country name.  Rule statements RS40 to RS43 express these requirements.

RS40. The postal address (if any) specified in each flight booking confirmation must include at least one address line.
RS41. The postal address (if any) specified in each flight booking confirmation must include exactly one placename.
RS42. The postal address (if any) specified in each flight booking confirmation must include exactly one postal code.
RS43. The postal address (if any) specified in each flight booking confirmation must include exactly one country name.

Similarly, airline reservation systems require that each passenger name have both first and last names (even though individuals in some cultures have only one name and other individuals may formally choose to be known by only one name).

Unlike the postal address, there can be more than one person included in a flight booking and hence more than one passenger name.  Rule statements RS44 and RS45 express these requirements.

RS44. Each passenger name specified in each flight booking confirmation must include exactly one first name.
RS45. Each passenger name specified in each flight booking confirmation must include exactly one last name.

These rule statements all express cardinality rules, but the standard template for such rules (RT1, reproduced below) cannot be used as it stands, since it does not cater for additional initial clauses such as "The postal address (if any) specified in" or "Each passenger name specified in".  Such additional clauses can be accommodated if an enhancement is made to template RT1, in the form of RT12 below.

RT1.  Each <term 1> {<qualifying clause>|}
must <verb phrase>
<
cardinality> <term 2>.
RT12.  {{The|Each} <term 1> {(if any)|} that <verb phrase 1>|}
       
each <term 2> {<qualifying clause>|}
must <verb phrase 2>
<
cardinality> <term 3>.

Our new rule statements require the following fact types:

FT44. postal address includes address line
FT45. postal address includes placename
FT46. postal address includes postal code
FT47. postal address includes country name
FT48. postal address includes first name
FT49. postal address includes last name

Note, by the way, that while it is permissible to divide a Person Name into First Name and Last Name, you cannot infer which of those names is the family name (surname), since many cultures (e.g., Asian and Eastern European) place the family name before the given names.  If it is important to know customers' family names, forms, rule statements, and the fact and data models should divide the Person Name data item into Family Name and Given Name data items.

Don't divide the Person Name data item into First Name and Surname data items (since for some they are one and the same!).  And don't use the term Christian Name, which may exclude a significant proportion of your customer or employee population.

Repeating groups of data items

A form may allow for more than one instance of a group of data items.  For example a flight booking must identify, among other things, not only the origin and destination cities and flight dates and times but which flight(s) are required and also, for each flight:

  1. which travel class is required, i.e., First, Business, or Economy (Coach)
  2. where more than one fare class (e.g., Full Fare, Discounted) is available within that travel class, which fare class is required.

Note that the travel class and fare class must be specified for each flight since it is possible (at least with some airlines) to fly, say, Business Class out and First Class home, or Discounted Economy Class out and Full Fare Economy Class home.

This is rather different conceptually from internal structure, such as in a person's name; "each flight must include a travel class" doesn't quite convey the association between flights and travel classes.

Let us first look at the cardinality rules containing data about flights in a flight booking confirmation.  Rule statements RS46 and RS47 state the rules about how many flights must be specified; these can be generated from template RT12.

RS46. Each flight booking confirmation for a one-way journey must specify exactly one flight.
RS47. Each flight booking confirmation for a return journey must specify exactly two flights.

These rule statements require the following fact types:

FT38. flight booking confirmation specifies flight
        (previously used)
FT50. flight booking confirmation is for return journey
        (new, derived from fact types FT6 and FT15)
FT51. flight booking confirmation is for one-way journey
        ((new, derived from fact types FT7 and FT15)

The cardinality rules for travel class and fare class need more work.  You might, of course, be tempted to write rule statements RS48 and RS49 by analogy with RS46 and RS47.  Alternatively you might create a single rule statement as in RS50.  While a reasonable person might infer that, if two travel classes are specified, one applies to each flight, neither RS49 nor RS50 actually states that.

RS48. Each flight booking confirmation for a one-way journey must specify exactly one travel class.
RS49. Each flight booking confirmation for a return journey must specify exactly two travel classes.
RS50. The number of travel classes specified in a flight booking confirmation must be equal to the number of flights specified in that flight booking confirmation.

One tempting solution to that problem is to state clearly which flight each specified travel class refers to, as in rule statements RS51 and RS52.

RS51. Each flight booking confirmation must specify exactly one travel class for the outgoing flight.
RS52. Each flight booking confirmation that is for a return journey must specify exactly one travel class for the return flight.

This introduces another problem: RS52 clearly does not preclude the specification of an additional travel class for other than the return flight (i.e., for the outgoing flight): this is because of the qualifying clause that follows "travel class".  By analogy RS51 does not preclude the specification of an additional travel class either, so it allows a flight booking confirmation for a one-way journey to specify more than one travel class!

RS53 solves all of these problems with a single rule statement!

RS53. Each flight booking confirmation must specify exactly one travel class for each flight.

Similarly:

RS54. Each flight booking confirmation must specify exactly one fare class for each flight.

These require an enhancement to template RT12 above, in the form of RT13:

RT13.  {{The|Each} <term 1> {(if any)|} that <verb phrase 1>|}
       
each <term 2> {<qualifying clause>|}
must <verb phrase 2>
<
cardinality> <term 3> {for each <term 4>|}.

Ternary fact types

Note that the fact types linking flight booking confirmation with travel class and fare class are not binary (FT52 and FT53) but ternary (FT54 and FT55):

FT52. flight booking confirmation specifies travel class
FT53. flight booking confirmation specifies fare class
FT54. flight booking confirmation specifies travel class for flight
FT55. flight booking confirmation specifies fare class for flight

Generalising data items

RS53 and RS54 also rely (as would RS50) on the fact that outgoing flight and return flight can be generalised as flight, as already established in fact types FT39 and FT40:

FT39. outgoing flight is a category of flight
FT40. return flight is a category of flight

Uniqueness constraints

An important class of data content rule is those that require that all instances of a data item be different.  For example, each confirmed flight booking is allocated an alphanumeric 5-7 character identifier, known as the record locator.  Obviously each booking must have a different record locator.  This can be expressed using rule statement RS55:

RS55. The record locator allocated to each confirmed flight booking must be different to the record locator allocated to each other confirmed flight booking.

Constraints on combinations of data items

Of course a uniqueness constraint can apply across a combination of data items as in rule statement RS56:

RS56. The combination of departure date, flight number, departure city and seat number specified by each seat allocation must be different to the combination of departure date, flight number, departure city and seat number specified by each other seat allocation.

A value set rule may also apply across a combination of data items as in rule statement RS57:

RS57. The combination of placename and postal code included in the postal address (if any) specified by each flight booking confirmation must be one of the combinations of placename and postal code listed in the postal code file obtained from the postal authority.

Before we look at the additional templates and sub-templates we require, have a think about the issues raised by that last rule statement (RS57).  We'll discuss these in the next article.

Templates and sub-templates needed

Rule statement RS55 can be generated from template RT11 (reproduced below) but requires an additional predicate type and hence an enhancement to sub-template ST8 (also reproduced below):

RT11.  The <term 1> {<qualifying clause>|} {(if any)|}
       
that <verb phrase> each <term 2> {<qualifying clause>|}
must be
<predicate>.
ST8. <predicate> ::=
  {<value set predicate>|<match predicate>|<range predicate>|
         <equality predicate>}

It is necessary to replace ‎ST8 with ‎ST19 and ST20, which provide for the additional predicate type.

ST19. <predicate> ::=
  {<value set predicate>|<match predicate>|<range predicate>|
         <equality predicate>|<uniqueness predicate>}
ST20. <uniqueness predicate> ::=
  different to the <term 1> that <verb phrase> each other
            <
term 2> {<qualifying clause>|}

Rule statements RS56 and RS57 need a new template and sub-templates:

RT14.  The combination of <simple and-list>
            {<qualifying clause>|} {
(if any)|}
       
that <verb phrase> each <term 2> {<qualifying clause>|}
must be
<combination predicate>.
ST21. <combination predicate> ::=
  {<combination value set predicate>|
        <combination uniqueness predicate>}
ST22. <combination value set predicate> ::=
  one of the combinations of <simple and-list>
            {<qualifying clause>|}
ST23. <combination uniqueness predicate> ::=
  different to the combination of <simple and-list>
           
that <verb phrase> each other <term>
            {<qualifying clause>|
}

Note that value set and uniqueness constraints are the only data content rules that can apply across a combination of data items.  Match and range rules can only apply to single data items.

So how does the rule taxonomy look now?

Here are the rule types we have looked at so far:

  1. Cardinality rules: data must be present or absent, and/or is restricted in terms of the number of instances.

    1. Mandatory data rules:  one or more data items are required in a particular context.
      1. Mandatory data item rules:  a particular single data item must be present.
      2. Mandatory option selection rules:  one of a set of pre-defined options must be specified.
      3. Mandatory group rules:  at least one of a group of data items must be present.

    2. Prohibited data rules:  a particular data item is not allowed in a particular context.

    3. Singular data rules:  only one instance of a particular data item is allowed in a particular context.

    4. Dependent cardinality rules:  the number of instances of a data item depends on some other data, as in rule statement RS25.

  2. Data content rules:  data is constrained to certain values.

    1. Value set rules:  a data item (or combination of data items) must have a value from a discrete set.

    2. Match rules:  a data item must be the same as or different to some other data item.

    3. Range rules:  a data item (or combination of data items) must have a value from within a continuous range.

    4. Uniqueness constraints:  a data item (or combination of data items) must be different from other instances of the same data item.

The template(s) to be used for each type of rule, and the associated metarules (which particular template should be used when, and/or what substitution(s) of particular syntactic elements are allowed) are set out in the accompanying table.

Rule type Template Metarules
Mandatory data item rule RT13 <cardinality> ::=
    {exactly <positive integer>|
    at least <positive integer>
    {and at most <positive integer>|}}
Mandatory option selection rule RT5  
Mandatory group rule RT6 if either or both of 2 items in the group
RT7 if only 1 of 2 items in the group
RT8 if more than 2 items in the group
Prohibited data rule RT2  
Singular data rule RT13 <cardinality> ::= {exactly|at most} one
Dependent cardinality rule RT4 <set function> ::= number
Value set rule RT11 <predicate> ::= <value predicate>
if only 1 data item
RT14

<combination predicate> ::=
<combination value set predicate>
if combination of data items

Match rule RT11 <predicate> ::= <match predicate>
Range rule RT11 <predicate> ::= <range predicate>
Uniqueness constraint RT11 <predicate> ::=
<uniqueness predicate>
if only 1 data item
RT14

<combination predicate> ::=
<combination uniqueness predicate>
if combination of data items



To be continued...
In subsequent articles we will look at further rule types, templates, and sub-templates, including those for conditional clauses.  We will also explore some techniques for rule statement quality assessment, including identification of redundant and conflicting rule statements.

References

[1]  Semantics of Business Vocabulary and Business Rules (SBVR), v1.0.  Object Management Group (Jan. 2008).  Available at http://www.omg.org/spec/SBVR/1.0/PDF  return to article

[2]  The font and colour conventions used in these articles reflect those in the SBVR, namely underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals.   Note that, for clarity, less than well-formed rule statements will not use these conventions.  return to article

# # #

Standard citation for this article:


citations icon
Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 6)" Business Rules Journal, Vol. 10, No. 7, (Jul. 2009)
URL: http://www.brcommunity.com/a2009/b489.html

About our Contributor:


Graham   Witt
Graham Witt Consultant / Author,

Graham Witt has over 30 years of experience in assisting organisations to acquire relevant and effective IT solutions. NSW clients include the Department of Lands, Sydney Water, and WorkCover while Victorian clients include the Departments of Sustainability & Environment, Education & Early Childhood Development, and Human Services. Graham previously headed the information management and business rules practice in Ajilon's Sydney (Australia) office.

Graham has developed specialist expertise in business requirements, architectures, information management, user interface design, data modelling, relational database design, data quality, business rules, and the use of metadata repositories & CASE tools. He has also provided data modelling, database design, and business rules training to various clients including NAB, Telstra, British Columbia Government, and ASIC and in the form of public courses run by Simsion Bowles and Associates (Australia) and DebTech (USA).

He is the co-author, with Graeme Simsion, of the widely-used textbook "Data Modeling Essentials" and is the author of the newly published book, "Writing Effective Business Rules" (published by Elsevier). Graham has presented at conferences in Australia, the US, the UK, and France. Contact him at gwitt@pacific.net.au.

Read All Articles by Graham Witt

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.