Raw data
Raw data (also known as primary data) is a term for data collected from a source. Raw data has not been subjected to processing or any other manipulation, and are also referred to as primary data. Raw data is a relative term (see data). Raw data can be input to a computer program or used in manual procedures such as analyzing statistics from a survey. The term can refer to the binary data on electronic storage devices such as hard disk drives (also referred to as low-level data).
In computing, raw data may have the following attributes: possibly containing errors, not validated; in different (colloquial) formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, these raw data may be processed stored as a normalized format, perhaps a Julian date, so as to be easier for computers and humans to interpret during later processing.
Raw data (sometimes called "sourcey" data or "eggy" data) are the data input to processing. A distinction is sometimes made between data and information to the effect that information is the end product of data processing. Raw data that has undergone processing are sometimes referred to as "cooked" data.
Although raw data has the potential to become "information," extraction, organization, and sometimes analysis and formatting for presentation are required for that to occur.
For example, a point-of-sale terminal (POS terminal) in a busy supermarket collects huge volumes of raw data each day, but that data doesn't yield much information until it is processed. Once processed, the data may indicate the particular items that each customer buys, when they buy them, and at what price. Such information could then become data for processing predictive marketing campaigns. As a result of processing, raw data sometimes ends up in a database, which enables the raw data to become accessible for further processing and analysis in any number of different ways.
Tim Berners-Lee (inventor of the World Wide Web) proposes that sharing raw data is important. Inspired by a post by Rufus Pollock of the Open Knowledge Foundation his call to action is "Raw Data Now", meaning that everyone should demand that governments and businesses share their information as raw data. He points out that "data drives a huge amount of what happens in our lives… because somebody takes the data and does something with it." To Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge.
Further reading
- Give Us the Data Raw, and Give it to Us Now - the blog post from Rufus Pollock that inspired Tim Berners-Lee
- Tim Berners-Lee Gives the Web a New Definition