Union (computer science)

From Wikipedia, the free encyclopedia

In computer science, a union is a data structure that stores one of several types of data at a single location. There are only two safe ways of accessing a union object. One is to always read the field of a union most recently assigned; tagged unions enforce this restriction. The other is to only access functionality common to all types in the union. For example, if the fields are all subtypes of a common supertype, then it is always legal to perform operations on the union object that one can perform on the supertype.

The remainder of this article will refer strictly to primitive untagged unions, as opposed to tagged unions.

Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in an unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store the tag. Most type inference algorithms cannot cope with untagged union types.[citation needed]

The name "union" stems from the type's formal definition. If one sees a type as the set of all values that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written.

Contents

[edit] Unions in various programming languages

[edit] C/C++

In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. However, C++ does not allow for a data member to be any type that has "a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator." In particular, it is impossible to have the standard C++ string as a member of an union. The union object occupies as much space as the largest member, whereas structures require space equal to at least the sum of the size of its members. This gain in space efficiency, while valuable in certain circumstances, comes at a great cost of safety: the program logic must ensure that it only reads the field most recently written along all possible execution paths.

The primary usefulness of a union is to conserve space, since it provides a way of letting many different types be stored in the same space. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.

One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. This is not, however, a safe use of unions in general, and is forbidden by the C language standard.

Note that the safer tagged unions can be constructed from untagged unions (see tagged union). The safe C dialect Cyclone encourages the preference of tagged unions to untagged.

Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.
- ANSI/ISO 9899:1990 (the ANSI C standard), section 6.5.2.1

[edit] See also

[edit] External links

In other languages