Data validation

From Wikipedia, the free encyclopedia

In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic.

The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers should include the digits and possibly the characters +, -, (, and ) (plus, minus, and parentheses). A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified.

Incorrect data validation can lead to data corruption or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.

Some methods used for validation are:

  • Format or picture check
    Checks that the data is in a specified format (template), e.g., dates have to be in the format DD/MM/YYYY.
  • Data type checks
    Checks the data type of the input and give an error message if the input data does not match with the chosen data type, e.g., In an input box accepting numeric data, if the letter 'O' was typed instead of the number zero, an error message would appear.
  • Range check
    Checks that the data lie within a specified range of values, e.g., the month of a person's date of birth should lie between 1 and 12.
  • Limit check
    Unlike range checks, data is checked for one limit only, upper OR lower, e.g., data should not be greater than 2 (>2).
  • Presence check
    Checks that important data are actually present and have not been missed out, e.g., customers may be required to have their telephone numbers listed.
  • Check digits
    Used for numerical data. An extra digit is added to a number which is calculated from the digits. The computer checks this calculation when data are entered, e.g., The ISBN for a book. The last digit is a check digit calculated using a modulus 11 method.
  • Batch totals
    Checks for missing records. Numerical fields may be added together for all records in a batch. The batch total is entered and the computer checks that the total is correct, e.g., add the 'Total Cost' field of a number of transactions together.
  • Hash totals
    This is just a batch total done on one or more numeric fields which appears in every record, e.g., add the Telephone Numbers together for a number of Customers.
  • Spelling check
    Looks for spelling and grammar errors.
  • Consistency Checks
    Checks fields to ensure data in these fields corresponds, e.g., If Title = "Mr.", then Gender = "M".
  • Cross-system Consistency Checks
    Compares data in different systems to ensure it is consistent, e.g., The address for the customer with the same id is the same in both systems. The data may be represented differently in different systems and may need to be transformed to a common format to be compared, e.g., one system may store customer name in a single Name field as 'Doe, John Q', while another in three different fields: First_Name (John), Last_Name (Doe) and Middle_Name (Quality); to compare the two, the validation engine would have to transform data from the second system to match the data from the first, for example, using SQL: Last_Name || ', ' || First_Name || substr(Middle_Name, 1, 1) would convert the data from the second system to look like the data from the first 'Doe, John Q'

[edit] External links