Hartmann pipeline
Paradigm | Dataflow programming |
---|---|
Designed by | John P. Hartmann |
Developer | IBM |
First appeared | 1986 |
Stable release | 110C0006 / 2011-06-06 |
Website | http://vm.marist.edu/~pipeline |
Influenced by | |
Pipeline (Unix), APL |
A Hartmann pipeline is an extension of the Unix pipeline concept, providing for more complex paths, multiple input/output streams, and other features. It is an example and extension of Pipeline programming.
A Hartmann pipe is a non-procedural representation of a solution of a data processing problem as a dataflow. The error-prone step of translating the dataflow to a traditional procedural programming language is eliminated. Hartmann pipelines may thus be considered as an executable specification language.
The concept was developed by John Poul Hartmann (born 1946), a Danish engineer with IBM. It is available as a software product CMS/TSO Pipelines for a number of IBM platforms. A somewhat backlevel version is included with every level of VM/ESA and z/VM.
Overview
A pipeline consists of a collection of stages, joined together by stage separators. Stages can be written in a variety of languages, and are either filters that process data records or device drivers (sources and sinks) that read data into or out of the pipeline. Unlike other implementations of pipeline programming, Hartmann's design allows multiple streams in and out of each stage and can interconnect them non-sequentially. Unlike many programming languages, pipelines have a very small amount of notation, limited to stage separators (typically "|
"), pipeline separators (typically ";
" or "?
"), and label separators (":
"). Due to common usage, the diskread
stage is also known as <
and diskwrite
as >
; however, all stages have names that are words in or make some sense in English.[1]
A simple example that reads a disk file, separates records containing the string "Hello" from those that do not, and writes both sets of records to different disk files can be written as:
(end ;) < input.txt | A: locate /Hello/ | > found.txt ; A: | > notfound.txt
where the <
stage reads the input disk file, the two >
stages write the output disk files, and the locate
stage separates the input stream into two output streams. locate
's primary output (records containing Hello) is passed to the first >
stage, and its secondary output (records not containing Hello) is passed through the A:
connector to the second >
stage. The ; divides the specification into 2 pipelines. The collection of pipelines is called a pipeline set.
Features
Some of the salient characteristics that distinguish Hartmann Pipeline from ordinary Unix pipes are:
- Filters may have multiple inputs and multiple outputs. For example, a selection filter can send the found records down one output pipe and the not found records down another.
- (Key feature) A pacing strategy in the Pipeline supervisor that allows, for example, a stream to be split, say by a selection filter, and the records on the output legs to be processed by other stages, then merged by a join stage and have the record order preserved in result stream.
- As implied by the previous item, data streams are (generally) not simply buffered and passed along to the next stage. The stages operate in parallel with input and output records handled by the Pipeline supervisor.
- A linear notation for representing pipeline networks.
- An interface that allows REXX programs to act as stages.
Similarity to APL
Programmers familiar with the APL programming language will see some similarities in Hartmann pipelines. It is obvious that the author was influenced by APL; some of the filters have names and functions similar to specific APL primitive functions. Examples include the TAKE filter, which passes a specified number of records, and the DEAL filter, which spreads its input records out across its output streams, in imitation of the APL deal operator.
See also
References
- ↑ Melinda Varian (November 1995). "Plunging Into Pipes" (PDF). Retrieved 2006-11-08.