Null (SQL)
From Wikipedia, the free encyclopedia
In SQL:2003 NULL is a special marker used to indicate that a data value is unknown. NULL
is also an SQL reserved keyword used to identify the NULL special marker.
In SQL database management systems that follow the SQL standard, columns in tables can be designated as nullable or non-nullable through the use of SQL NOT NULL
constraints. Columns that are designated nullable may contain the NULL marker in place of actual data, non-nullable columns cannot.
NULL is untyped in SQL. Therefore, it is sometimes mandatory to explicitly cast NULLs so that the DBMS can evaluate results to the intended data type. For example, if overloaded functions are supported, the DBMS cannot resolve to the correct function without knowing the data types of all paramaters.
Contents |
[edit] Comparisons and Three-Valued Logic
NULL is not considered a value because it lacks an inherent property of values: NULLs cannot be compared with other values of the same data type. NULL is an untyped marker indicating the absence of a value, and not an actual data value. Comparisons of actual data values with NULL never result in True or False, but always in a third logical result: Unknown. Comparison of a NULL with any another NULL also results in Unknown, as shown below:
10 = NULL -- Results in Unknown NULL <> 'Hello' -- Results in Unknown NULL > NULL -- Results in Unknown
Because SQL implements a third logical result (Unknown), SQL implementations must provide for a specialized three-valued logic (3VL). SQL three-valued logic is shown in the table below:
p | q | p AND q | p OR q |
---|---|---|---|
True | True | True | True |
True | False | False | True |
True | Unknown | Unknown | True |
False | True | False | True |
False | False | False | False |
False | Unknown | False | Unknown |
Unknown | True | Unknown | True |
Unknown | False | False | Unknown |
Unknown | Unknown | Unknown | Unknown |
p | NOT p |
---|---|
True | False |
False | True |
Unknown | Unknown |
To test if the values in a column are NULL or are not NULL, the IS NULL and IS NOT NULL operators must be used in SQL comparison predicates. The syntax looks like the following:
column IS NULL column IS NOT NULL
Because of this, special care must be taken when joining tables on columns that are declared nullable.
[edit] Math and String Concatenation Operators
Because NULL is not a data value, but a marker for an unknown or not-existent value, using mathematical operators on NULL results in NULL. For instance, all of the following examples result in NULL:
NULL + NULL -- Result is NULL 10 * NULL -- Result is NULL 0 / NULL -- Result is NULL
String concatenation operations, common in SQL, also result in NULL when one of the operands is NULL. The following example demonstrates the result of using NULL with the SQL ||
string concatenation operator:
'Fish ' || NULL || 'Chips' -- Result is NULL
(Note: some DBMS systems use the +
operator for string concatenation instead of ||
or CONCAT
. This is not conforming to the SQL standard, however.)
[edit] Aggregate Functions and NULL
SQL defines aggregate functions to simplify server-side aggregate calculations on data. Almost all aggregate functions perform a NULL elimination step, so that NULL values are not included in the final result of the calculation. This implicit NULL elimination, however, can have an impact on aggregate function results. The only aggregate function that does not implicitly eliminate NULL is the COUNT(*)
function.
[edit] Grouping and Sorting NULL
Because SQL:2003 defines all NULL markers as being unequal to one another, a special definition was required in order to group NULLs together when performing certain operations. SQL defines any two values that are equal to one another, or any two NULLs, as not distinct. The definition of not distinct allows SQL to group and sort NULLs when the GROUP BY
clause (and other keywords that perform grouping) are used.
The SQL standard does not explicitly define a default sort order for NULLs. Instead, on conforming systems, the user can specify for each element in the order by list whether NULLs are to be sorted before or after all values using the NULLS FIRST
or NULLS LAST
clauses, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for NULL sorting in the DBMS.
[edit] NULL-Handling Functions
SQL defines two functions to explicitly handle NULLs. The COALESCE
function accepts a list of parameters, returning the first non-NULL value from the list:
COALESCE(value1, value2, value3, ...)
COALESCE
is defined as shorthand for the following SQL CASE
expression:
CASE WHEN value1 IS NOT NULL THEN value1 WHEN value2 IS NOT NULL THEN value2 WHEN value3 IS NOT NULL THEN value3 END
Some SQL DBMS's implement vendor-specific functions similar to COALESCE
. Microsoft SQL Server, for instance, implements an ISNULL
function which is similar in functionality to COALESCE
.
The NULLIF
function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF
returns NULL. Othewise, the value of the first parameter is returned.
NULLIF(value1, value2)
Thus, NULLIF
is an abbrevation for the following CASE
expression:
CASE WHEN value1 = value2 THEN NULL ELSE value1 END
[edit] Controversy
In The Relational Model for Database Management: Version 2, Dr. E.F. Codd suggested that SQL NULL was flawed and should be replaced by two different NULL-type markers. The markers he proposed were to stand for "Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. Others have suggested adding additional NULL-type markers to Codd's recommendation to include even more reasons that a data value might be "Missing", increasing the complexity of SQL.
At various times, proposals have been put forth to implement multiple user-defined NULL markers in SQL. Because of the complexity of the NULL-handling and logic systems required to support these NULL markers, none of these proposals have gained widespread acceptance.
Some Relational Management experts like the authors of The Third Manifesto, Chris Date and Hugh Darwen, have suggested that SQL NULL is inherently flawed and should be eliminated altogether. These experts often point to inconsistencies and flaws in the implementation of SQL NULL-handling (particularly in aggregate functions) as proof that the entire concept of NULL is flawed and should be removed from the Relational Model. Others, like author Fabian Pascal, have stated "How the function calculation should treat missing values is not governed by the relational model."