Data set (IBM mainframe)

From Wikipedia, the free encyclopedia

The term data set or dataset is used to refer to files on an IBM mainframe computer, typically stored on DASD or magnetic tape. They are record-oriented files. The term pertains to the IBM mainframe operating systems starting with OS/360, and continued to be used through later systems based on that heritage, MVS system, OS/390, and z/OS.

Unlike files on UNIX systems, they are not unstructured streams of bytes but rather are organized in various logical record and block structures determined by the DSORG (data set organization) and RECFM (record format) parameters of the DCB (Data Control Block). The DCB is a data structure used to access datasets. These parameters may also be specified in the Job Control Language JCL DD statements that are used to allocate them.

[edit] Dataset Organization

In OS/360, The DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential {"IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed in, in particular, by how it might be updated.

[edit] Record Format (RECFM)

Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter, and RECFM=V specifies a variable-length record. Variable-length records are prefixed by a "Record Descriptor word" containing the integer length of the record in bytes. Records of format FB and VB are fixed-blocked, and variable-blocked, respectively. This means that multiple logical records are grouped together into a single physical block on tape or disk. The BLKSIZE parameter specifies the maximum length of the block. RECFM could also specify "FBS" meaning Fixed-blocked-standard, meaning the all blocks except the last one were required to be full-length. RECFM=VBS, means Variable-blocked-spanned, meaning that a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.

This mechanism eliminates the need for using any "delimiter" byte value to separate records. The file is an abstraction of a collection of records, in contrast to the unstructured "stream" of bytes found in systems found in smaller computers such as Unix, Windows, or MacOS. This allows data to be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition.

[edit] Partitioned Datasets

For example, a PDS or Partitioned Data Set is a dataset containining multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file system. This type of dataset is used to hold executable programs, or Load Modules. PDS's are also used to store source program libraries, especially Assembler macro definitions.

A PDS consists of a directory and a group of small, related sequential files stored together in a single dataset. Each sequential file is known as a member of the PDS and is accessed directly using the directory structure. Once a member is located, the data stored in that member is handled in the same manner as a PS (sequential) file.

Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is updated, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space in a PDS is to perform frequent file compression which moves all members to the front of the data space and leaves free usable space at the back. PDS files can only reside on disk in order to use the directory structure to access individual members. They are most often used for storing job JCL, utility control statements and executable modules.

Since MVS/XA there is also the Partitioned DATA set Extended (PDSE).

PDS/E file structure is similar to PDS files and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E stores members in such a way that no compression is needed to reclaim dead space. PDS/E files can only reside on disk in order to use the directory structure to access individual members. PDS/E files are also referred to as Libraries.

In other languages