Unstructured data

From Wikipedia, the free encyclopedia

Unstructured data or -information refers to masses of (usually) computerized information which do not have a data structure which is easily readable by a machine. Examples of unstructured data may include audio, video and unstructured text such as the body of an email or word processor document.

Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data.[1]

Data with some form of structure may also be referred to as unstructured data if the structure is not helpful for the desired processing task. For example, an HTML webpage is highly structured, but this structure is often oriented towards formatting, rather than performing more complex tasks with the content of the page.

Contents

[edit] Dealing with unstructured data

Data mining and text analytics techniques are different methods used to find patterns in, or otherwise interpret, this information. UIMA provides a common framework for processing this information to extract meaning and create structured data about the information.

[edit] Notes

  1.   The problem with unstructured data, DMReview, February 2003.

[edit] See also

[edit] External links