Integrity Constraints (Part 3)
This is the last installment of my miniseries on the topic of integrity constraints. I looked at type constraints in the first installment and column and table constraints in the second; now I want to conclude by considering database constraints.
Like a table constraint, a database constraint can be as complicated as you like, provided only that it refers to at least two distinct tables (it must be at least two, because otherwise the constraint isn't a database constraint after all but a table constraint).
Here's an example, based (like my examples last time) on the suppliers and parts database:
IF S.STATUS < 20 AND S.S# = SP.S# THEN SP.QTY
£
QTY(500)
("suppliers with status less than 20 must not supply any part in a quantity
greater than 500"; by the way, observe the QTY
literal, QTY(500)
).
Note that this constraint certainly does refer to two tables, S
and
SP
(note in particular the "join term" that interrelates them).
Note too that the constraint can be regarded as shorthand for a more formal version
that looks something like this:
FORALL s# IN S#, sn IN NAME, st IN INTEGER, sc IN CHAR,
p# IN P#, q IN INTEGER
( IF { S# s#, SNAME sn, STATUS st, CITY sc } IN S
AND { S# s#, P# p#, QTY q } IN SP AND st < 20
THEN q£
QTY(500) )
As I noted last time, however, I'm not going to worry too much about such more formal expressions in this series.
A couple of points arise from this example:
- First, observe that we distinguish between single- and multi-table constraints,
not between single- and multi-row constraints.
For example, the superkey constraint shown last time for table
S
is certainly a single-table constraint, even though it talks about two rows (SX
andSY
). In the past, by contrast, taxonomies for integrity constraints have typically drawn a dividing line between single- and multi-row constraints However, that distinction, it seems to me, is one that's more important from an implementation point of view than it is from a logical one. - Second, database constraints, like column and table constraints, are checked
immediately -- another (and major) point of difference between the taxonomy under
discussion and others in the past.
In previous schemes, multi-table constraints, at least (and possibly others too), were checked at
COMMIT
time (that is, at end-of-transaction) instead of immediately. In particular, the SQL standard includes something it calls "DEFERRED checking" (though in SQL the checking is done atCOMMIT
time at the latest -- it might actually be done before end-of-transaction -- but that's a detail we can ignore here). Deferred checking leads to major problems, however, of both a theoretical and a practical nature; in The Third Manifesto, therefore, we take the position that all checking must be immediate.[1]
Note: It's true that certain constraints (especially database ones) involve keeping distinct data items "in synch," as it were, and hence that certain data items sometimes need to be updated at the same time, in effect. The Third Manifesto provides a parallel form of assignment in order to take care of this requirement. I might say more about that operation (which is of fundamental importance) in some future installment of this regular column.
Here are a couple more examples of database constraints:
EXISTS S ( S.S# = SP.S# )
This example defines the foreign key constraint from shipments to suppliers ("for
every shipment SP
, there must exist a supplier S
with the same supplier number"; note the use of the existential quantifier,
EXISTS
). Note that the constraint
is indeed a database constraint, because it spans two tables.
EXISTS P ( P.P# = SP.P# )
This example is similar to the previous one.
Certain pragmatically important special cases arise in connection with database constraints. For example, consider the notions of kernel, characteristic, and associative entities, which are intuitively useful concepts in connection with logical database design. (The terms are taken from Codd's extended relational model RM/T.[2] See also my own tutorial description.[3] Briefly:
- A kernel entity is one that has independent existence
(kernels are "what the database is really all about" -- they're the entry
points into the database, in effect). In the
suppliers and parts database, suppliers and parts are kernels.
- A characteristic entity is one that further describes
or "characterizes" some other entity (and is existence-dependent on that
other entity). There aren't any characteristics
in the suppliers and parts database, but order line items might provide an example
in a customers and orders database (order line items are certainly existence-dependent
on the relevant order).
- An associative entity is a many-to-many (or many-to-many-to-many ...) relationship that serves to link other entities together, logically speaking. (Relationships are entities, of course!) Shipments provide an example in the suppliers and parts database.
Kernel entity types are represented by base tables (typically; sometimes they're represented by views instead, but I don't want to get into details of that possibility right now). The same is true of characteristic and associative entity types as well, of course, but the tables corresponding to such entity types have foreign keys to represent the characteristic or associative relationship. So it is indeed the case that database constraints are relevant here.
The other pragmatically important case I want to mention
in connection with database constraints is "HAS A
"
(not "IS A
"!) relationships.
For example, a part-time employee "has a" hourly wage instead of a monthly
salary. So, first, there'll be a table for employees
in general, with common information such as employee number, department number, and
so forth; second, there'll be a table for part-time employees specifically, giving
information specific to part-time employees (for example, hourly wage); and third,
there'll be a foreign key constraint -- a database constraint, in fact -- from the
part-time employees table to the "employees in general" table.
Note: There's a great deal of
confusion in the industry over IS A
vs. HAS A
.
IS A
is subtyping and inheritance, and I talked
about it briefly a couple of installments back ("a circle IS A
ellipse"). HAS A
is quite different! -- in particular, it isn't
subtyping, and it isn't inheritance. See the
book by Hugh Darwen and myself.[4]
If you want to investigate this distinction in depth; here
I just want to make it clear that both IS A
and HAS A
relationships
can be handled straightforwardly within the constraint taxonomy under discussion.
I'll close this miniseries with some further examples of constraints for the suppliers and parts database and showing how they would look under the scheme I've been describing. I'll use Tutorial D (again, see a couple of installments back) to state the constraint formally in each case. Here goes:
- Legal supplier numbers are of the form Snnnn (nnnn = up to four decimal digits).
TYPE S#
£
POSSREP { X CHAR WHERE
SUBSTR ( X, 1, 1 ) = 'S' AND
IS_INTEGER ( SUBSTR ( X, 2 ) ) AND
CHAR_LENGTH ( X )5 } ;
This first one is a domain or type constraint. I've assumed the availability of certain operators for use in expressing that constraint:
SUBSTR
, for extracting a substring of a given string;IS_INTEGER
, for testing whether a given string denotes an integer; andCHAR_LENGTH
, for obtaining the length of a given string.
- Red parts weigh less than 50 pounds.
IF P.COLOR = COLOR ( 'Red' ) THEN P.WEIGHT < WEIGHT ( 50 )
This one is a table constraint -- actually a "single-row" constraint. Note the
COLOR
andWEIGHT
literals.
- Suppliers
S1
andS4
are always in the same city.
IF SX.S# = S# ( 'S1' ) AND SY.S# = S# ( 'S4' )
THEN SX.CITY = SY.CITYThis example is also a table constraint, but a "multi-row" one.
- No shipment quantity can be greater than twice the average of all such quantities.
SPX.QTY £ 2 * AVG ( SPY, QTY )
Another multi-row table constraint, this one involving aggregates. The expression
AVG(SPY,QTY)
returns the average of the values currently appearing in theQTY
column of theSP
table.
- Every London supplier supplies part
P2
.
IF S.CITY = 'London' THEN EXISTS ( SP WHERE SP.S# = S.S#
AND SP.P# = P# ( 'P2' ) )This one is a database constraint, involving exactly two tables,
S
andSP
.
- Suppliers in Athens can move only to London or Paris.
IF S'.CITY = 'Athens' THEN S.CITY = 'Athens'
OR S.CITY = 'London'
OR S.CITY = 'Paris'This one needs a little more explanation. First of all, note that it's a transition constraint (a table transition constraint, to be precise); all prior examples have been examples of state constraints instead. The primed table name
S'
refers to the value of tableS
before the update, the unprimed nameS
refers to the value after the update. The constraint can thus be read as follows: "If the supplier city was Athens before the update, then after the update it must be London or Paris" (unless it hasn't changed at all, in which case it's still Athens, of course).
- Supplier cities and part cities are disjoint.
FORALL S ( NOT EXISTS P ( P.CITY = S.CITY ) )
In very stilted English: "For all suppliers, there doesn't exist a part with the same city" -- another database constraint.
- Average shipment quantities never decrease.
No solution provided.
This is a transition constraint involving aggregates (though only one table, so it's a table constraint, not a database one): "The average quantity of any given part, taken over all suppliers, can never decrease." This constraint is quite complicated! No solution is provided here, because a proper formulation would require rather more apparatus than I've had room to describe in this short series; I'll have to ask you to take it on trust that the constraint can indeed be expressed in Tutorial D, as required.
References
[1] C. J. Date and Hugh Darwen, Foundation for Future Database Systems: The Third Manifesto (2nd edition), Addison-Wesley, 2000.
[2] E. F. Codd, "Extending the Database Relational Model to Capture More Meaning," ACM Transactions on Database Systems 4, No. 4, December 1979.
[3] C. J. Date: " The Extended Relational Model RM/T," Relational Database Writings 1991-1994 (Addison-Wesley, 1995).
[4] C. J. Date and Hugh Darwen, op.cit.
Copyright (c) 2001 C.J. Date
# # #
About our Contributor:
Online Interactive Training Series
In response to a great many requests, Business Rule Solutions now offers at-a-distance learning options. No travel, no backlogs, no hassles. Same great instructors, but with schedules, content and pricing designed to meet the special needs of busy professionals.