Writing Natural Language Rule Statements — a Systematic Approach Part 11: Data Item Format Rules
About this series of articlesWhile my first series of articles on writing natural language rule statements[1] explored a wide variety of issues in a rather organic and hence random manner, this series takes a more holistic and systematic approach and draws on insights gained while writing my recently-published book on the same topic.[2] Rule statements recommended in these articles are intended to comply with the Object Management Group's Semantics of Business Vocabulary and Business Rules (SBVR) version 1.0.[3]
The story so far
In previous articles in this series (see the "Language Archives" sidebar) we have looked at a variety of data cardinality rule statements,[4] as well as the following data content rule statements:[5] range rule statements,[6] value set rule statements,[7] uniqueness constraint statements,[8] (in)equality rule statements,[9] data consistency rule statements,[10] and temporal data constraint statements.[11] Each type of rule statement has a common formulation, which we have discussed in both a relatively informal way and by way of rule statement patterns.
We will now look at an often overlooked type of rule, the data item format rule.
Valid and Invalid Quantities
A number of rules we have encountered so far govern quantities of some kind — Travel Duration, Number of Passengers, Age, Value (of a High Value Item), Landing Distance Required, Length (of a Runway), or Base Premium. So far the only rule statements governing these quantities have either been:
- data cardinality rule statements, each requiring that one of these quantities be mandatory, or
- range rule statements, each requiring that one of these quantities be (no) greater than or (no) less than some value, e.g., a passenger's age must be less than 80 years, a high value item's specified value must be no less than $500.
For a range rule statement to be meaningful, the value actually specified in each transaction must be numeric. While a range rule statement assumes that the governed data item is numeric, that rule statement does not specify that that data item be numeric. Furthermore, data cardinality rule statements are not concerned with the content actually specified in each transaction, only that there be (or not be) some content. We therefore need, for each of these data items, an additional rule statement that requires the value actually specified in each transaction to be numeric. In case you're not convinced, I've encountered application user interfaces in which a data item is required to be greater than or less than some value but for which there is no explicit test that the value entered is numeric. During testing of such user interfaces I deliberately enter non-numeric data into each apparently-numeric field; this sometimes causes the application to display a screen full of technobabble, or even crash, rather than produce a meaningful error message.
Most numeric data items are required to be unsigned integers, which means that those data items can only include numerals (i.e., the digits 0 – 9), with no decimal point or minus sign. This includes many monetary amounts such as values of high value items in a travel insurance application.
For many numeric quantities, zero doesn't make sense (for example, Travel Duration in a travel insurance application); these data items are therefore limited to not only unsigned values but positive values (as in rule statement RS109). However, only those who have remembered the mathematics they learnt at school will recall that positive numbers do not include zero, so I always include a range rule statement for such items, admittedly redundant but included to aid communication.
RS109. | The Travel Duration specified in each Travel Insurance Application must be a valid Positive Integer. |
RS110. | The Travel Duration specified in each Travel Insurance Application must be at least 1 day. |
There are, however, numeric quantities that are allowed to be specified as zero, such as the number of child passengers in a flight booking request (as in rule statement RS111).
RS111. | The Number of Child Passengers specified in each Flight Booking Request must be a valid Unsigned Integer. |
Occasionally a numeric data item allows for non-integer quantities, a common example being the amount to be paid or transferred in an online banking transaction, which may, of course, have a decimal point (or comma) followed by two digits (as in rule statement RS112). As there may be numerous monetary amounts in some environments, it is better to define the term Currency Amount to be "either an integer or a decimal fraction with up to two digits after the decimal point" and Unsigned Currency Amount as "a Currency Amount with no sign" rather than use such cumbersome phrases in rule statements.
RS112. | The Amount specified in each Online Banking Transaction must be a valid Unsigned Currency Amount. |
Very occasionally a numeric data item may allow for negative quantities to be specified. The only system in which I've encountered this recently is one for the recording of meteorological data; temperatures measured in degrees Celsius may, of course, be negative (as in rule statement RS113).
RS113. | The Temperature specified in each Set of Meteorological Readings must be a valid Unsigned Decimal Number or Negative Decimal Number. |
You should ensure that the rule book[12] includes definitions (or references to definitions) for terms such as Positive Integer, Unsigned Integer, Unsigned Currency Amount, Unsigned Decimal Number, and Negative Decimal Number; we shall discuss documentation of term definitions in a future article.
Valid and Invalid Textual Data Items
Textual data items may also be subject to constraints in terms of length and/or the characters they may contain. For example, Credit Card Numbers must contain only digits and be from 12 to 19 digits long. Within-country Phone Numbers must also contain only digits, although the minimum and maximum lengths can vary from country to country — with perhaps different lengths for mobile (cell) and fixed phones; all personal phone numbers in Australia — mobile (cell) or landline — are 10 digits in length and the first digit of each personal phone number is 0, while the second digit is (currently) 2, 3, 7, or 8 for fixed phones and 4 for mobile phones. Note that some business phone numbers may have fewer digits or have a first digit other than 0.
Data items that allow for overseas as well as local phone numbers may, in addition, allow for an initial '+' to indicate the international access code for overseas numbers and parentheses to enclose a digit that is only dialled when calling from within the same country. Thus an Australian phone number would be represented globally as "+61 (0) 2 8123 4567". Not all systems allow spaces in phone numbers, however.
Rather than try to capture all these rules in a single rule statement for each phone number data item, or create a set of rules that needs to be replicated for each phone number data item, it is better to create (and define) a term to represent each relevant standard for representing phone numbers and refer to that term in data item format rule statements.
RS114. | The Contact Phone Number (if any) specified in each Travel Insurance Application must be a valid Australian Personal Phone Number. |
RS115. | Each Overseas Contact Phone Number (if any) specified in each Travel Insurance Application must be a valid International Phone Number. |
While these constraints reflect current real-world restrictions on phone numbers, many systems impose constraints on text fields that are much more restrictive than those that apply in the real world. Most systems impose restrictions on the entry of Personal Names that do not reflect reality. Many systems do not allow "accented characters" (those with diacritics such as 'é', 'ç', 'õ', 'ü', or 'č') in data items for the entry of surnames (family names) or given names. While people with German names can replace 'ä', 'ö', or 'ü' by 'ae', 'oe', or 'ue' respectively, many other people have to misspell their names when providing data to automated systems. Some systems do not even allow apostrophes or spaces in personal names, creating problems for some people of Irish, Portuguese, Spanish, or Dutch background, among others. These are not really business rules, the only benefit to the business being that they do not have to update their systems to reflect the real world. For this reason, such rules may be referred to as system rules.
Ironically, there are real-world rules that are often not implemented in systems, such as those that require that individual parts of a surname or given name are separated by a single space, a single hyphen, or a single apostrophe (as in "de Sousa", "Jean-Paul" or "O'Brien" respectively).
Again, whatever rules apply to each personal name, rather than try to capture them all in a single rule statement for each personal name data item, or create a set of rules that needs to be replicated for each personal name data item, it is better to create (and define) a term to represent each relevant standard for personal names and refer to that term in data item format rule statements.
RS116. | Each Passenger Name specified in each Flight Booking Confirmation must be a valid Personal Name. |
Valid and Invalid Dates
Other rules we have encountered govern dates — Departure Date, Return Date, Birth Date, or Expiry Date. In each situation in which one of these dates is to be specified, the date specified must be a valid date; for example, 29th February in other than a leap year and 31st April are invalid dates and should be rejected. This is in addition to any other requirements, such as that the date is before or after some other date — for example, the Return Date of an itinerary must be no earlier than the Departure Date of the same itinerary.
Dates should also be appropriately formatted. There are various date formatting conventions available, including:
- numbers representing the month, the day of the month (each 2 digits), and the year (4 digits) respectively, separated by "slashes" (obliques or solidi), e.g., 03/17/2013, as commonly used in the US
- numbers representing the day of the month, the month (each 2 digits), and the year (4 digits) respectively, separated by slashes, e.g., 17/03/2013, as commonly used in Australia and the UK
- numbers representing the year (4 digits), the month, and the day of the month (each 2 digits) respectively, separated by hyphens, e.g., 2013-03-17, as in the ISO 8601 standard[13]
- variations of the above conventions with one or more of the following alternatives:
- 1 digit for month and/or day numbers less than 10, e.g., 17/3/2013
- 2 digits for year numbers, e.g., 17/3/13
- month names rather than numbers (either in full or abbreviated) , e.g., 17 March 2013 or 17 Mar 2013
- Roman numerals for month numbers, e.g., 17.III.2013 (as in some European countries)
- alternative separators, such as dots or spaces, as in the previous 3 examples.
Most (if not all) user interfaces requiring that a date be specified (or providing for the specification of an optional date) accept only dates that comply with one of these date formatting conventions. In general, the same standards will apply to all dates in a given operating environment so again, rather than spell out each of the relevant rules governing day number, month number, year number, sequence, and separators for each date that may be specified, choose (and define) an appropriate term to signify the type of date representation (US Date, Australian Date, ISO 8601 Date) and write a single rule statement for each date limiting it to that date representation, such as RS117.
RS117. | The Departure Date specified in each Travel Insurance Application must be represented using a valid Australian Date. |
Note that the verb phrase in the predicate is be represented using rather than simply be since it is not only the value but the representation that is constrained.
Note also that some dates may omit the year or day number. For example, expiry dates of credit cards include only month and year numbers, whereas recurrent dates (such as those of public holidays) include only day and month numbers. For these it will be necessary to define alternative terms such as Month and Year and Day and Month.
I have occasionally heard from business or IT stakeholders that, since the user interface will employ a calendar gadget to assist users to enter dates, such rules are unnecessary. However, many user interfaces allow direct entry of dates as well as a calendar gadget or may allow overwriting of the date returned by the calendar gadget; in these, the rule is still necessary. In any case, an on-screen gadget is a means of implementing the rule (in the same way that a 'combo box' providing a 'drop down list' or 'pick list' can be used to implement a value set rule); no on-screen gadget replaces the rule it implements. I have also encountered people who believe you only need a business rule for each error message that might be output from a system. Again, if you can prevent users from breaking a rule, you don't need an error message but the rule still exists.
Some user interfaces represent dates using separate data items for day, month, and year numbers. You can still use a rule statement like RS117 in this situation unless you would rather refer to individual data items in your rule statements, in which case RS118 is an appropriate alternative. Note that while rule statements like RS117 could be used for Departure Month and Departure Year, an analogous rule statement is not possible for Departure Day since its validity depends on the value of Departure Month.
RS118. | The combination of Departure Day, Departure Month and Departure Year specified in each Travel Insurance Application must form a valid Date. |
There are analogous requirements for data items in which times of day can be specified. There are various conventions available for representing times of day, including:
- The numbers representing the hours and minutes may be separated by a colon or by some other character.
- The number representing the hours may or may not require a leading zero if less than 10.
- A 12-hour or 24-hour clock may be used.
The pattern for data item format rule statements
P32. | <data item format rule statement> ::= {Each | The} {<data item term> {(if any) |} | combination of <item combination list>} specified {for {the | each} <complex data item term> {(if any) |} |} {in | on} each <transaction term> {<qualifying clause> |} must <data item format predicate> {{if | unless} <conditional clause> |}. |
P33. <data item format predicate> ::=
{be | be represented using | form} a valid {<data type term> | <data type list>}
To be continued...
The next article in this series will briefly look at rules governing spatial data and discuss data update rules.
References
[1] The first of which is: Graham Witt, "A Practical Method of Developing Natural Language Rule Statements (Part 1)," Business Rules Journal, Vol. 10, No. 2 (Feb. 2009), URL: http://www.BRCommunity.com/a2009/b461.html
[2] Graham Witt, Writing Effective Business Rules. Morgan Kaufmann (2012).
[3] Semantics of Business Vocabulary and Business Rules (SBVR), v1.0. Object Management Group (Jan. 2008). Available at http://www.omg.org/spec/SBVR/1.0/
The font and colour conventions used in these rule statements reflect those in the SBVR, namely underlined teal for terms, italic blue for verb phrases, orange for keywords, and double-underlined green for names and other literals. Note that, for clarity, these conventions are not used for rule statements that exhibit one or more non-recommended characteristics.
[4] Statements of rules that require the presence or absence of a data item and/or place a restriction on the maximum or minimum number of occurrences of a data item.
[5] A statement of a rule that places a restriction on the values contained in a data item or set of data items (rather than whether or not they must be present and how many there may or must be).
[6] A statement of a rule that requires that the content of a data item be a value within a particular range.
[7] A statement of a rule that requires that the content of a data item be (or not be) one of a particular set of values (either a fixed set or a set that may change over time), or that the content of a combination of data items match or not match a corresponding combination in a set of records.
[8] A statement of any of the following:
- an integrity constraint by which a DBMS ensures that a particular column (or combination of columns) in a table has different values in every row,
- in ORM (Object Role Modelling), a constraint in which each instance of a particular object type may participate in no more than one instance of a particular fact type,
- a rule that requires that the content of a data item (or combination of data items) be different to that of the corresponding data item(s) in the same or other records or transactions.
[9] A statement of a rule that requires that the content of a data item be the same as (or different to) that of some other data item (or some specific value).
[10] A statement of a rule that requires the content of multiple data items to be consistent with each other, other than as provided for by a value set rule, range rule, or equality rule.
[11] A statement of a rule that constrains one or more temporal data items (dates or times).
[12] A collection of rule statements expressing the rules governing an organization.
[13] ISO 8601:2004 Data elements and interchange formats — Information interchange — Representation of dates and times.
# # #
About our Contributor:
Online Interactive Training Series
In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.