Representation term
From Wikipedia, the free encyclopedia
A representation term is a word, or a combination of words, used as part of a data element name. Representation class is sometimes used as a synonym for representation term, but the ISO/IEC 11179 standard for metadata registries distinguishes the two terms.
In ISO/IEC 11179, while a representation term is a part of a data element name, a representation class provides a way to classify or group data elements. ISO/IEC 11179-5:2005 defines representation term as a designation of an instance of a representation class.
A Representation Term may be thought of as an attribute of a data element in a metadata registry that classifies the data element according to the type of data stored in the data element.
There are only a small list of approved representation terms (around a dozen) and some standards such as the Universal Data Element Framework assign numeric codes to these terms.
Complex Data Elements (sometimes called domains or concepts) typically do not have a representation term in many metadata systems such as GJXDM or the NIEM. Therefore the presence of a representation term is usually a good indication that the data element is NOT complex and can be considered a property of another object.
Contents |
[edit] Use cases for representation term
[edit] Finding equivalent properties
When a person or software agent is analyzing two separate metadata registries to find property equivalence, the Representation Term can be used as a guide. For example, if system A has a Data Element such as PersonGenderCode and system B has a data element such as PersonSexCode the code suffix might assist the two systems to only match data elements that have the suffix "Code".
[edit] Inference
The Representation Term can be used in many ways to do inferences on data sets. Representation Terms tells the observer of any data stream about the data types and gives an indication of how the Data Element can be used. This is critical when mapping metadata registries to external Data Elements. For example if you are sent a record about a person you may look for any "ID" suffix to understand how the remote system may differentiates two distinct records.
[edit] Required fields
Representation Terms are also used to make inferences about the requirements of a property. For example if a data stream had Data Element PersonBirthDateAndTime you would know that BOTH the date AND time are available and relevant, not just the date. If the birth time was optional, a separate data elements should be used such as PersonBirthDate and PersonBirthTime.
[edit] Finding data warehouse dimensions and measures
When creating a data warehouse, a business analyst the Representation Terms to quickly find the dimensions and measures of a subject matter in order to build OLAP cubes. For example:
- Indicator or Code are use to create data warehouse dimensions
- Date or DateTime are used to related to the time dimension, which are frequently shared between cubes using conformed dimensions
- Amount, Number, Measure or Value terms (which can be added together) are candidates for a measures
- Name and Text are used for screen labels or other descriptive elements
- Percent need to be analyzed since they can't really be added together with clear meaning
- ID is used to remove duplicate records
[edit] Universal Data Element Framework
Some informal standards such as the Universal Data Element Framework (which refer to a Representation Term as a "Property Word") assign unique integer IDs to each Representation Term. This allows metadata mapping tools to map one set of data elements into other metadata vocabularies. An example of these mappings can be found at Property word ID. Note that as of November 2005 the UDEF concepts have not been widely adopted.
[edit] Example of representation terms as an XML suffix
For example if an XML Data fragment had the was the following:
<Person> <PersonID>123-45-6789</PersonID> <PersonGivenName>John</PersonGivenName> <PersonFamilyName>Smith</PersonFamilyName> <PersonBirthDate>1990-08-14</PersonBirthDate> </Person
In the example above, the Representation terms are "ID" for the <PersonID>, the suffix "Name" for the Given and Family names, and "Date" for the <PersonBirthDate>.
[edit] Sample representation terms
The following are samples of Representation Terms that have been used for the exchange of electronic messages in systems such as NIEM, ebXML 1.9 or GJXDM 3.0:
Term | Usage |
---|---|
Amount | Monetary value with units of currency. |
BinaryObject | Set of finite-length sequences of binary octets used to represent sound, images and other structures. |
Code | An enumerated list of all allowable values. Each enumerated value is a string that for brevity represents a specific meaning. For example for a PersonGenderCode the valid values might be "male", "female" or "unknown". |
Date | An ISO 8601 date usually in the format YYYY-MM-DD |
DateTime | An ISO 8601 date (in the format YYYY-MM-DD) AND time structure. Note: Do not use unless BOTH the date AND time are REQUIRED fields. If one OR the other is optional always specify the data elements as separate date and time elements. |
Graphic | Used to store images. Secondary to Binary Object. |
ID | Abbreviation for Identifier |
Identifier | A language-independent label, sign or token used to establish identity of, and uniquely distinguish one instance of an object within an identification scheme. |
Indicator | Boolean, exactly two mutually exclusive values (true or false). A precise definition must be given for the meaning of a true value. |
Measure | Numeric value determined by measurement with units. Typically used with items such as height or weight. if the unit of measure is not clear it should be specified. |
Name | A textual label used as identification of an object. A name is usually meaningful in some language, and is the primary means of identification of objects for humans. Unlike an identifier, a name is not necessarily unique. |
Number | Assigned or determined by calculation. |
Text | Character string generally in the form of words. |
Time | An ISO 8601 time structure. |
Value | A type of Numeric. |
Percent | A type of Numeric that traditionally is the results of a ratio calculation that ranges from values of 0 to 1 for values of 0% to 100%. |
Quantity | Non-monetary numeric value or count with units. |
Rate | A type of Numeric |
Year | An ISO 8601 Year |
[edit] Pros of representation terms
- Use of representation terms in a data element name is a convention that is widely adopted by several large systems such as GJXDM and ebXML.
- Many data architects that are responsible for mapping XML from foreign sources find Representation terms very useful.
- Standards such as the UDEF depend on accurate coding of Representation Terms.
- Tools that validate against enumeration lists can distinguish coded values quickly by looking for the "Code" suffix.
[edit] Cons of representation terms
- People frequently complain that adding a representation terms to each XML data element makes the XML data too bulky.
- The actual representation terms used is not a required part of the ISO/IEC 11179 standard. It is only recommended set of words. The actual words used and their integer codes associated with a representation term is dependent on the implementation.
- Although the Representation Term can give the importer of a data stream a hint as to the data type it does not contain enough information about what low-level computer datatype should be used such as an 8, 16, 32 or 64 bit integer. In practice many people feel the Representation Terms are simply ways to classify data elements and should be called a Classification scheme.
[edit] See also
- ISO/IEC 11179
- Metadata
- Data element
- Representation class
- Universal Data Element Framework
- XML
- XML Schema
[edit] External links
- ebXML: Electronic Business using eXtensible Markup Language Core Components Naming Conventions See page 8.
- ISO/IEC 11179-3:2003 Metadata registries (MDR) — Part 3: Registry metamodel & basic attributes (546K zip file)
- ISO/IEC 11179-5:2005 Metadata registries (MDR) — Part 5: Naming and identification principles (238K zip file)
- ISO/IEC TR 11404:1996 Language-independent datatypes (14 MB zip file)
- ISO/IEC TR 20943-1:2003(E) Procedures for Achieving Metadata Registry Content Consistency — Part 1: Data elements See page 84.(700K zip file)
- DOJ and GJXDM training slides on naming
- ISO/IEC JTC 1/SC 32/WG 2 Metadata
- US Department of Interior represnation terms
[edit] Notes
- ↑ ISO/IEC 11179-5 3.11 (238K zip file)
- ↑ In ISO/IEC 11179-3:2003 5.4 (546K zip file) it is actually representation class which is specified as an attribute of a data element.