Integrity Constraints (Part 2)

C.J. Date Author, Read Author Bio || Read All Articles by C.J. Date

This is the second of a short series of installments of this regular column on the important topic of integrity constraints. Just to remind you, an integrity constraint (or just constraint for short) is basically a conditional expression that must evaluate to true. Last time, I introduced a classification scheme for such constraints, dividing them into domain (or type) constraints, column constraints, table constraints, and database constraints, and I gave a number of examples of type constraints specifically. Now I want to consider some examples of other kinds of constraints.

The examples that follow (in this installment and in remaining installments in this short series) are all based on the well-known suppliers and parts database. Here's an SQL definition for that database (in outline):

CREATE TABLE S --------------------------------- suppliers

			( S# ... , 

			  SNAME ... , 

			  STATUS ... , 

			  CITY ... , 

			    PRIMARY KEY ( S# ) ) ;

			

			CREATE TABLE P ------------------------------------- parts

			( P# ... , 

			  PNAME ... , 

			  COLOR ... , 

			  WEIGHT ... , 

			  CITY ... , 

			    PRIMARY KEY ( P# ) ) ; 

			

			CREATE TABLE SP -------------------------------- shipments

			( S# ... , 

			  P# ... , 

			  QTY ... , 

			    PRIMARY KEY ( S#, P# ) , 

			    FOREIGN KEY ( S# ) REFERENCES S , 

			    FOREIGN KEY ( P# ) REFERENCES P ) ;

As you can see, first of all, there's a suppliers table S, with supplier number {S#} as primary key and some other items (supplier name, status, and city or location). Second, there's a parts table P, with part number {P#} as primary key and some other items (part name, color, weight, and city or location). And third, there's a shipments table SP, which shows which suppliers are shipping, or supplying, which parts in which quantities. The primary key for shipments is the combination {S#,P#} of supplier number and part number (because there's an assumption that there can't be more than one shipment of any given part by any given supplier at any given time). Note that shipments can be thought of as a relationship between suppliers and parts: a many-to-many relationship, to be more specific. Note also the foreign key specifications.

Column Constraints

Now I want to focus on column constraints. Like domain or type constraints, column constraints are fairly simple; the column constraint for some given column C is essentially just a statement to the effect that values of that column are drawn from some given domain or type D. Here are some examples -- to be specific, the column constraints for the columns of the suppliers table S:

S.S# IN S# S.SNAME IN NAME S.STATUS IN INTEGER S.CITY IN CHAR

The symbol "S" in these constraints denotes an arbitrary suppliers row. For simplicity (though perhaps a little unrealistically), I'm assuming that columns STATUS and CITY are defined on some simple system-defined types (INTEGER and CHAR, respectively), instead of on user-defined types as the other two columns are (see the previous installment in this series). Note that I'm using "S#" as the name of a domain as well as the name of a column.

Column constraints are effectively checked "immediately," or in other words at the end of each update statement, loosely speaking. For example, whenever the user tries to place a value somewhere in column SNAME (say), the system immediately checks to see whether that value is a legal NAME value; if it isn't, the statement is rejected, and no update is done.

By the way, note that column constraints don't apply just to columns in base tables! -- because (as we saw a couple of installments back) it must be possible to update views as well as base tables.

Table Constraints

Now let's turn to table constraints. A table constraint can be as complicated as you like, provided only that it refers just to the individual table in question and no others. Here's an example:

S.STATUS > 0 AND S.STATUS £100

("table S can never have a value in which some row has a status value that's outside the range 0 to 100 inclusive"). Note that this constraint certainly does refer to just the relevant table (namely, table S). Note too that it can be regarded as shorthand for a more formal version that looks something like this:

FORALL s# IN S#, sn IN NAME, st IN INTEGER, sc IN CHAR ( IF { S# s#, SNAME sn, STATUS st, CITY sc } IN S THEN st > 0 AND st£100 )

("for all S# values s#, all NAME values sn, all INTEGER values st, and all CHAR values sc, if a row with S# component s#, SNAME component sn, STATUS component st, and CITY component sc appears in table S, then st must be in the range 0 to 100 inclusive"). For obvious reasons, I'm not going to worry too much about such more formal expressions in this series; I merely remark that we'd have to worry about them if we were trying to develop this theory formally (which isn't what I'm trying to do here, of course).

By the way, please understand that the example is indeed an example of a table constraint, not a column constraint, even though it mentions just one column. Column constraints, to repeat, are constraints of the form "column C takes its values from domain D." The example is also, obviously, not a domain constraint either (domain constraints have to do with the legal values in a domain, not in a column or a table).

Note: Like column constraints, table constraints don't apply just to base tables, because (to repeat) we must be able to update views as well as base tables.

Here's another example of a table constraint:

IF S.CITY = 'London' THEN S.STATUS = 20

("suppliers in London must have status 20"). This example refers to two distinct columns (in the same table, of course).

And one more example:

IF SX.S# = SY.S# THEN SX.SNAME = SY.SNAME AND SX.STATUS = SY.STATUS AND SX.CITY = SY.CITY

("if supplier rows SX and SY have the same supplier number, then they also have the same name, status, and city; hence they're really the same row, and {S#} is a superkey"). Note: In case you're not familiar with the term "superkey," let me explain:

First, a candidate key is a set of columns that can serve as a unique identifier for the table in question and includes no column that's unnecessary for unique identification purposes. For example, {S#} is a candidate key for suppliers, but {S#,CITY} isn't (because of the CITY column, which isn't needed to identify suppliers).
Second, a superkey is a superset of a candidate key (so a candidate key is always a superkey, but the converse is false). For example, {S#,CITY} is a superkey for suppliers. So is {S#}, of course.
Finally, a primary key is a candidate key that's chosen for some reason to be "more equal than the others." (If indeed there are any others, that is. If there's just one candidate key, no harm is done by calling it primary. But if there are two or more, then which if any is to be regarded as primary is essentially a matter of psychology, not logic, and I certainly don't want to get into that kind of issue here.)

Like column constraints, table constraints are checked immediately (where, to say it again, "immediately" means at the end of each update statement, loosely speaking). Thus, whenever the user tries to update table S, for example, the system immediately checks to see whether the new table value is a legal value for S; if it isn't, the statement is rejected, and no update is done.

(To be continued.)

# # #

Standard citation for this article:

C.J. Date, "Integrity Constraints (Part 2)" Business Rules Journal, Vol. 2, No. 2, (Feb. 2001)
URL: http://www.brcommunity.com/a2001/b055.html

About our Contributor:

C.J. Date Author,

C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database technology. He is best known for his book An Introduction to Database Systems (eighth edition, Addison-Wesley, 2004), which has sold some 725,000 copies and is used by several hundred colleges and universities worldwide. He is also the author of many other books on database management.

Read All Articles by C.J. Date