Data validation
From Wikipedia, the free encyclopedia
In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic.
The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers should include the digits and possibly the characters +
, -
, (
, and )
(plus, minus, and parentheses). A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified.
Incorrect data validation can lead to data corruption or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.
Some methods used for validation are:
- Format or picture check
- Checks that the data is in a specified format (template), e.g., dates have to be in the format DD/MM/YYYY.
- Data type checks
- Checks the data type of the input and give an error message if the input data does not match with the chosen data type, e.g., In an input box accepting numeric data, if the letter 'O' was typed instead of the number zero, an error message would appear.
- Range check
- Checks that the data lie within a specified range of values, e.g., the month of a person's date of birth should lie between 1 and 12.
- Limit check
- Unlike range checks, data is checked for one limit only, upper OR lower, e.g., data should not be greater than 2 (>2).
- Presence check
- Checks that important data are actually present and have not been missed out, e.g., customers may be required to have their telephone numbers listed.
- Check digits
- Batch totals
- Checks for missing records. Numerical fields may be added together for all records in a batch. The batch total is entered and the computer checks that the total is correct, e.g., add the 'Total Cost' field of a number of transactions together.
- Hash totals
- This is just a batch total done on one or more numeric fields which appears in every record, e.g., add the Telephone Numbers together for a number of Customers.
- Spelling check
- Looks for spelling and grammar errors.
- Consistency Checks
- Checks fields to ensure data in these fields corresponds, e.g., If Title = "Mr.", then Gender = "M".
- Cross-system Consistency Checks
- Compares data in different systems to ensure it is consistent, e.g., The address for the customer with the same id is the same in both systems. The data may be represented differently in different systems and may need to be transformed to a common format to be compared, e.g., one system may store customer name in a single Name field as 'Doe, John Q', while another in three different fields: First_Name (John), Last_Name (Doe) and Middle_Name (Quality); to compare the two, the validation engine would have to transform data from the second system to match the data from the first, for example, using SQL: Last_Name || ', ' || First_Name || substr(Middle_Name, 1, 1) would convert the data from the second system to look like the data from the first 'Doe, John Q'