Writing Natural Language Rule Statements — a Systematic Approach Part 6: Uniqueness Constraints

Graham   Witt
Graham Witt Consultant / Author, Read Author Bio || Read All Articles by Graham Witt
About this series of articles

While my first series of articles on writing natural language rule statements[1] explored a wide variety of issues in a rather organic and hence random manner, this series takes a more holistic and systematic approach and draws on insights gained while writing my recently-published book on the same topic.[2]  Rule statements recommended in these articles are intended to comply with the Object Management Group's Semantics of Business Vocabulary and Business Rules (SBVR) version 1.0.[3]

The story so far

In previous articles[4] we looked at a variety of data cardinality rule statements[5] and two types of data content rule statement,[6] namely range rule statements[7] and value set rule statements.[8]  In this article we will look at another type of data content rule statement, the uniqueness constraint statement.[9]  However, we need to first explore some subtle distinctions between rules governing information supplied to an organisation, rules governing information recorded in a database, and rules governing the real world.

What do our rules govern?

One type of constraint that can be imposed on a database to ensure its integrity (i.e., consistency) is the uniqueness constraint.  The most common type requires that each row of a table (or record in a file) have a unique identifier (e.g., an order number in the case of a table of purchase orders), otherwise rows (or records) cannot be distinguished from each other.  For example, we could create a rule statement, such as RS76, to govern a table in which travel insurance applications are recorded.

RS76. The Application ID that identifies each Travel Insurance Application
must be different to the Application ID that identifies any other Travel Insurance Application.[10]

Of course, such identifiers are almost always automatically generated by either the DBMS (database management system) or the application logic between the user interface and the database, rather than selected by the person providing information via the user interface (although we shall see an exception shortly).  Furthermore, such identifiers are frequently invisible to users of the interface.  A rule statement such as RS76 is therefore only of interest to the relevant developers and database designers.  Some identifiers are, however, made available to users, such as (in this case) policy numbers.  RS77 is a variation on RS76 that reflects the fact that policy numbers are available to applicants once their policies have been issued.

RS77. The Policy Number issued for each Travel Insurance Application
must be different to the Policy Number issued for any other Travel Insurance Application.

Can uniqueness constraints govern data capture user interfaces?  Where a user interface provides an administrator with a means of creating new records and choosing their identifiers, we need a uniqueness constraint such as RS78.

RS78. The Policy Type Code specified in each Policy Type Creation Form
must be different to the Policy Type Code that identifies any existing Policy Type.

In fact, there are many other circumstances in which we require uniqueness constraints governing user interfaces, as we shall see shortly.  Before that, however, I would like to briefly explore whether mandatory data item rules (introduced in Article 2) can govern databases or even the real world in which the organisation operates (as distinct from user interfaces).

When governing a user interface, a mandatory data item rule ensures that a value is provided for a mandatory data item during capture of particular transaction data.  Such rules are typically motivated by the need to ensure that all necessary information is available to enable a system or human being to make appropriate decisions.  An example of such a rule (from Article 2) is RS15.

RS15. Each Travel Insurance Application must specify exactly one Birth Date for each Passenger.

One might expect that, if travel insurance applications are stored in a database, the column in which birth dates are stored would be mandatory — or "not null" in DDL (Data Definition Language).  However, it is entirely possible that information about passengers covered by travel insurance is stored in a table that also stores information about other customers or parties covered.  If, say, the insurance company also issues home & contents insurance, it may not require birth dates for holders of home & contents policies, in which case the database column in which passenger birth dates are stored would need to be optional ("null" in DDL).

For this reason, it may be useful to record additional mandatory data item rules that govern the organisation's databases, such as RS79.

RS79. Each Party Covered Record must specify exactly one Birth Date.

Can such rules be used to govern the real world phenomena described by the organisation's information?  We have established that in at least one situation it is obligatory to supply a person's birth date in a particular data capture user interface but that a Birth Date column in the corresponding database table is not necessarily mandatory.  In the real world, every passenger (and indeed every human being) must have a birth date.  However, even if we need to document rules governing the real world, I would not recommend a rule statement of the form "Each Customer must have exactly one Birth Date."  This is because the rule in question is a definitional (or structural) rule — a rule that cannot be contravened — by contrast with RS15 and RS79, which are statements of behavioural (or operative) rules — rules that can be contravened and that prompt appropriate action when contravened.  Definitional rule statements should be clearly distinguishable from behavioural rule statements and therefore require different wording, which we shall explore in a later article in this series.

More uniqueness constraints

Uniqueness constraints are typically also required where a data capture form includes repeatable data items, such as (in our travel insurance application form) passengers, regions, high-value items, and passengers' medical conditions.  In such cases it makes no sense to enter details of the same passenger, region, or high-value item more than once, or to enter details of the same medical condition more than once for the same passenger.  If such duplication were to occur, it would suggest an error had been made by the applicant; if it were to be accepted, it would lead to inconsistency in data recording.  To prevent such duplication, we need the following rule statements:

RS80. Each Region (if any)
            specified in each Travel Insurance Application
must be different to any other Region
            specified in that Travel Insurance Application.
RS81. The Description
            specified for each High Value Item (if any)
            in each Travel Insurance Application
must be different to the Description
            specified for any other High Value Item
            in that Travel Insurance Application.
RS82. Each Medical Condition (if any)
            specified for each Passenger
            in each Travel Insurance Application
must be different to any other Medical Condition
            specified for that Passenger
            in that Travel Insurance Application.
RS83. The Frequent Flier Membership (if any)
            specified for each Passenger
            in each Travel Insurance Application
must be different to the Frequent Flier Membership
            specified for any other Passenger
            in that Travel Insurance Application.
RS84. The combination of Family Name, Given Names and Birth Date
            specified for each Passenger
            in each Travel Insurance Application
must be different to the combination of Family Name, Given Names and Birth Date
            specified for any other Passenger
            in that Travel Insurance Application.

Notice how these rule statements have some common content but also differ in various ways.  Each of these statements includes:

  1. the qualifying clause in each Travel Insurance Application at the end of the subject,

  2. must followed by a predicate starting with be different to,

  3. the same term (or combination of terms) at the start of the predicate (i.e., after be different to) as at the start of the subject,

  4. the qualifying clause in that Travel Insurance Application at the end of the predicate.

In addition, for each qualifying clause in the subject there is a corresponding qualifying clause — using the same verb phrase and term(s) — in the predicate.

Two obvious differences are:

  1. RS80 has no other qualifying clauses;

  2. RS84 uses the construction combination of.

The reason RS80 differs from the others is that it governs a simple data item (Region) rather than a complex data item (High Value Item or Passenger).  It therefore has a simpler form in both the subject — Each Region (if any) specified in each Travel Insurance Application — and the predicate — be different to any other Region specified in that Travel Insurance Application.  By contrast, each of the other rule statements includes an additional qualifying clause in both the subject and the predicate:

  1. specified for each High Value Item (if any) and specified for any other High Value Item in RS81,

  2. specified for each Passenger and specified for that Passenger in RS82,

  3. specified for each Passenger and specified for any other Passenger in RS83 and RS84.

Note that RS82 differs from the other rule statements governing complex data items in that the additional qualifying clause in the predicate uses the determiner[11] that rather than any other.  This is because the repeatable data item governed by this rule statement (Medical Condition) is part of a complex data item (Passenger) — the intention of this rule is to prevent a medical condition being specified twice for the same passenger rather than to prevent the same medical condition being specified for different passengers.  By contrast, the other rule statements governing complex data items exist to prevent the same information being specified for different passengers — or high value items in the case of RS81.

However, every uniqueness constraint requires the determiner any other in the predicate:  in RS82 this precedes the term signifying the governed data item (Medical Condition) rather than the containing complex data item (Passenger).

Note also differences in the placement of the clause (if any) — which indicates an optional data item:

  1. after the term signifying the governed data item in RS80, RS82, and RS83,

  2. after the term signifying the complex data item containing the governed data item in RS81.

The difference in the case of RS81 is that the governed data item (Description) is mandatory but is contained in an optional complex data item (High Value Item) whereas in the others the containing complex data item is mandatory and the governed data item is optional.  Furthermore, RS84 has no (if any) clause since the governed data items (Family Name, Given Names, and Birth Date) and the containing complex data item (Passenger) are all mandatory.

The reason RS84 differs from the others is that it governs a combination of data items rather than a single data item.  This is because it is quite legitimate for passengers with the same family names or the same given names or even the same birth dates to travel together (and hence take out travel insurance), whereas it is unlikely that passengers travelling together would have the same family names, given names, and birth dates.  However, it is possible (albeit remotely) and should therefore be permitted, so this rule should not be strictly enforced, i.e., if the rule were to be violated a message along the lines of "are you sure?" should be displayed, and the person entering the application should be allowed to answer "yes" and continue.

Common formulation

Like other types of rule statement, uniqueness constraint statements have a common formulation:

  1. the subject, identifying the governed data item, consisting of:

    1. a determiner:  Each if the data item can be repeated, otherwise The,

    2. either the name of the data item (e.g. Region) or combination of followed by a list of the data items making up the combination,

    3. (if any) if the data item is optional;

  2. specified;

  3. if the governed data item is part of a complex data item, a qualifying clause having the following form:

    1. for the (if there can only be one of the complex data item) or for each (if there can be more than one of the complex data item),

    2. the name of the complex data item,

    3. (if any) if the complex data item is optional;

  4. a qualifying clause identifying the type of transaction, in this case in each Travel Insurance Application;

  5. must, followed by a uniqueness constraint predicate.

Note that, with the exception of the predicate, the common formulation is identical to that of a value set rule statement, set out in the previous article in this series.

Uniqueness constraint predicates

These also have a common formulation:

  1. be different to,

  2. the same term (or combination of terms) as at the start of the subject preceded by the or any other,

  3. for each qualifying clause in the subject, a corresponding qualifying clause — using the same verb phrase and term(s), with the following determiners:

    1. if the initial term (or combination of terms) in the predicate is preceded by any other; that,

    2. if the initial term (or combination of terms) in the predicate is preceded by the:

      1. in exactly one qualifying clause (but not the last):  any other,

      2. in all other qualifying clauses:  that.

Insights from the Business Rules Forum

On November 1st I spoke at the 2012 Business Rules Forum in Fort Lauderdale, FL.  In conversations with other delegates, two views that I have formed while working with rules were confirmed.

One is that there is a spectrum of languages that can be used to state rules.  The most constrained languages are those required to specify rules to a rule engine, while the least constrained languages are natural languages as used by the population at large, e.g., English.  Two features of natural languages are:

  1. the same rule can be stated in many ways;

  2. there is considerable scope for ambiguity: if a statement is ambiguous, there is the risk that different people will interpret that statement differently: it was interesting to observe the many variations on the rules governing check-in at the different US airports we flew out of — such as whether, as international travellers, we could use kerbside check-in or had to pay for checked bags.

On the other hand, many rule engine languages exhibit either or both of the following features:

  1. like programming languages, they use formulations not used in natural language;

  2. often the words, phrases, or clauses exist within the context of a non-verbal environment — such as a spreadsheet or diagram — without which they cannot be fully interpreted.

Between these two ends of the spectrum can be found the various constrained natural languages developed by various authors for the expression of rules.  These include:

  1. "verbalisations" of the constraints that can be modelled in ORM (Object Role Modelling), presented to a 1993 conference by Halpin & Harding,[12]

  2. a technique for modelling business rules as natural language statements which I presented to a 1999 conference,[13]

  3. RuleSpeak™, a widely-used language for expression of rules, described in 2001 by Ross & Lam,[14]

  4. a set of "rule patterns" included in his 2002 book by Morgan,[15]

  5. the set of templates included in my book (early forms of which appeared in my previous set of articles),

  6. a set of rule patterns, based (at least in part) on these templates, provided in RuleXpress, a fact model and rule repository tool.

These languages vary in terms of:

  1. the range of rule types accommodated,

  2. the granularity with which different rule types are distinguished,

  3. the number of different ways the same rule can be expressed.

My approach has been to accommodate a wide range of rule types, make reasonably fine distinctions between different rule types, and provide only a few ways of expressing each rule.  This makes the resulting language rather more constrained than some others.

The principal advantage of a highly-constrained language is that it is much easier to parse each rule statement unequivocally.  Ideally, the number of possible parse trees[16] per rule statement is one, i.e., there is only one possible interpretation.  (As an example of a sentence that can be parsed in more than one way, consider the WW2 newspaper headline "British push bottles up Germans"!)  We haven't yet achieved the ideal of automatic implementation of natural language rule statements since this requires automated parsing of those rule statements; the complexity and hence development and running costs of an automatic parsing engine is decreased if the language is more constrained.

On the other hand, a more constrained language has the disadvantage of requiring many more patterns, templates, or sentence forms.  Dr Silvie Spreeuwenberg, another contributor to BRCommunity.com, told me at the Forum that the developers of RuleXpress, having recognised that many of my templates are variants of each other, have overcome this disadvantage by providing a smaller set of simpler but less-constrained patterns.

To be continued...
The next article in this series will discuss these alternative approaches to rule statement patterns in more detail as well as at least one additional type of data content rule statement.

References

[1]  The first of which is:  Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 1)," Business Rules Journal, Vol. 10, No. 2 (Feb. 2009), URL:  http://www.BRCommunity.com/a2009/b461.html  return to article

[2]  Graham Witt, Writing Effective Business Rules.  Morgan Kaufmann (2012).  return to article

[3]  Semantics of Business Vocabulary and Business Rules (SBVR), v1.0.  Object Management Group (Jan. 2008).  Available at http://www.omg.org/spec/SBVR/1.0/
     The font and colour conventions used in these rule statements reflect those in the SBVR, namely underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals.  Note that, for clarity, these conventions are not used for rule statements that exhibit one or more non-recommended characteristics.   return to article

[4]  Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach:  Part 1 —Basic Principles," Business Rules Journal, Vol. 13, No. 7 (Jul. 2012), URL:  http://www.BRCommunity.com/a2012/b660.html
     Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach:  Part 2 —Mandatory Data Rules," Business Rules Journal, Vol. 13, No. 8 (Aug. 2012), URL:  http://www.BRCommunity.com/a2012/b665.html
     Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach:  Part 3 — Other Data Cardinality Rules," Business Rules Journal, Vol. 13, No. 9 (Sept. 2012), URL:  http://www.BRCommunity.com/a2012/b669.html
     Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach:  Part 4 — Some Data Content Rules," Business Rules Journal, Vol. 13, No. 10 (Oct. 2012), URL:  http://www.BRCommunity.com/a2012/b674.html
     Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach:  Part 5 — Value Set Rules," Business Rules Journal, Vol. 13, No. 11 (Nov. 2012), URL:  http://www.BRCommunity.com/a2012/b677.html  return to article

[5]  Statements of rules that require the presence or absence of a data item and/or place a restriction on the maximum or minimum number of occurrences of a data item.  return to article

[6]  A statement of a rule that places a restriction on the values contained in a data item or set of data items (rather than whether or not they must be present and how many there may or must be).  return to article

[7]  A statement of a rule that requires that the content of a data item be a value within a particular range.  return to article

[8]  A statement of a rule that requires that the content of a data item be (or not be) one of a particular set of values (either a fixed set or a set that may change over time), or that the content of a combination of data items match or not match a corresponding combination in a set of records.  return to article

[9]  A statement of any of the following:

  1. an integrity constraint by which a DBMS ensures that a particular column (or combination of columns) in a table has different values in every row,

  2. in ORM (Object Role Modelling), a constraint in which each instance of a particular object type may participate in no more than one instance of a particular fact type,

  3. a rule that requires that the content of a data item (or combination of data items) be different to that of the corresponding data item(s) in the same or other records or transactions.  return to article

[10]  You may prefer 'different to' or (in American English) 'different than'.  return to article

[11]  A word or phrase used before a noun to provide some information as to which instance (or instances) of the noun's concept are being referred to, such as 'the', 'that', 'each', 'any other'.  return to article

[12]  Terry Halpin & J. Harding, "Automated support for verbalization of conceptual schemas," Proceedings of the 4th Workshop on Next Generation CASE Tools, Paris, France, Twente Memoranda Informatica (1993).  return to article

[13]  Graham Witt, "Modelling Business Rules for School Student Administration: a Case Study," ER99, Paris, France (1999).  return to article

[14]  Ronald Ross & Gladys Lam, RuleSpeak Sentence Templates — Developing Rule Statements Using Sentence Patterns (2001) and Ronald Ross, RuleSpeak Sentence Forms — Specifying Natural-Language Business Rules in English (2009).  return to article

[15]  Tony Morgan, Business Rules and Information Systems. Addison-Wesley (2002).  return to article

[16]  An ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. (Wikipedia).  return to article

# # #

Standard citation for this article:


citations icon
Graham Witt, "Writing Natural Language Rule Statements — a Systematic Approach Part 6: Uniqueness Constraints" Business Rules Journal, Vol. 13, No. 12, (Dec. 2012)
URL: http://www.brcommunity.com/a2012/b682.html

About our Contributor:


Graham   Witt
Graham Witt Consultant / Author,

Graham Witt has over 30 years of experience in assisting organisations to acquire relevant and effective IT solutions. NSW clients include the Department of Lands, Sydney Water, and WorkCover while Victorian clients include the Departments of Sustainability & Environment, Education & Early Childhood Development, and Human Services. Graham previously headed the information management and business rules practice in Ajilon's Sydney (Australia) office.

Graham has developed specialist expertise in business requirements, architectures, information management, user interface design, data modelling, relational database design, data quality, business rules, and the use of metadata repositories & CASE tools. He has also provided data modelling, database design, and business rules training to various clients including NAB, Telstra, British Columbia Government, and ASIC and in the form of public courses run by Simsion Bowles and Associates (Australia) and DebTech (USA).

He is the co-author, with Graeme Simsion, of the widely-used textbook "Data Modeling Essentials" and is the author of the newly published book, "Writing Effective Business Rules" (published by Elsevier). Graham has presented at conferences in Australia, the US, the UK, and France. Contact him at gwitt@pacific.net.au.

Read All Articles by Graham Witt
Subscribe to the eBRJ Newsletter
In The Spotlight
 Ronald G. Ross
 Silvie  Spreeuwenberg

Online Interactive Training Series

In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.