Batch processing
Batch processing is the execution of a series of programs ("jobs") on a computer without manual intervention.
Jobs are set up so they can be run to completion without human interaction. All input parameters are predefined through scripts, command-line arguments, control files, or job control language. This is in contrast to "online" or interactive programs which prompt the user for such input. A program takes a set of data files as input, processes the data, and produces a set of output data files. This operating environment is termed as "batch processing" because the input data are collected into batches or sets of records and each batch is processed as a unit. The output is another batch that can be reused for computation.
Benefits
Batch processing has these benefits:
- It can shift the time of job processing to when the computing resources are less busy.
- It avoids idling the computing resources with minute-by-minute manual intervention and supervision.
- By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one.
- It allows the system to use different priorities for interactive and non-interactive work.
- Rather than running one program multiple times to process one transaction each time, batch processes will run the program only once for many transactions, reducing system overhead.
History
Batch processing has been associated with mainframe computers since the earliest days of electronic computing in the 1950s. There were a variety of reasons why batch processing dominated early computing. One reason is that the most urgent business problems for reasons of profitability and competitiveness were primarily accounting problems, such as billing. Billing may conveniently be performed as a batch-oriented business process, and practically every business must bill, reliably and on-time. Also, every computing resource was expensive, so sequential submission of batch jobs on punched cards matched the resource constraints and technology evolution at the time. Later, interactive sessions with either text-based computer terminal interfaces or graphical user interfaces became more common. However, computers initially were not even capable of having multiple programs loaded into the main memory.
Batch processing is still pervasive in mainframe computing, but practically all types of computers are now capable of at least some batch processing, even if only for "housekeeping" tasks. That includes UNIX-based computers, Microsoft Windows, Mac OS X (whose foundation is the BSD Unix kernel), and even smartphones. Even as computing in general becomes more pervasive, batch processing is unlikely to lose its significance.
Modern systems
Batch applications are still critical in most organizations in large part because many common business processes are amenable to batch processing. While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines.
Modern batch applications make use of modern batch frameworks such as Jem The Bee, Spring Batch or implementations of JSR 352[1] written for Java, and other frameworks for other programming languages, to provide the fault tolerance and scalability required for high-volume processing. In order to ensure high-speed processing, batch applications are often integrated with grid computing solutions to partition a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong input/output performance and vertical scalability, including modern mainframe computers, tend to provide better batch performance than alternatives.
Scripting languages became popular as they evolved along with batch processing.
Batch window
A batch window is "a period of less-intensive online activity",[2] when the computer system is able to run batch jobs without interference from online systems.
Many early computer systems offered only batch processing, so jobs could be run any time within a 24-hour day. With the advent of transaction processing the online applications might only be required from 9:00 a.m. to 5:00 p.m., leaving two shifts available for batch work, in this case the batch window would be sixteen hours. The problem is not usually that the computer system is incapable of supporting concurrent online and batch work, but that the batch systems usually require access to data in a consistent state, free from online updates until the batch processing is complete.
In a bank, for example, so-called end-of-day (EOD) jobs include interest calculation, generation of reports and data sets to other systems, printing statements, and payment processing.
As requirements for online systems uptime expanded to support globalization, the Internet, and other business requirements the batch window shrank and increasing emphasis was placed on techniques that would require online data to be available for a maximum amount of time.
Common batch processing usage
Databases
Batch processing is also used for efficient bulk database updates and automated transaction processing, as contrasted to interactive online transaction processing (OLTP) applications. The extract, transform, load (ETL) step in populating data warehouses is inherently a batch process in most implementations.
Images
Batch processing is often used to perform various operations with digital images such as resize, convert, watermark, or otherwise edit image files.
Conversions
Batch processing may also be used for converting computer files from one format to another. For example, a batch job may convert proprietary and legacy files to common standard formats for end-user queries and display.
Notable batch scheduling and execution environments
UNIX utilizes cron and at facilities to allow for scheduling of complex job scripts. Windows has a job scheduler. Most high-performance computing clusters use batch processing to maximize cluster usage.
The IBM mainframe z/OS operating system or platform has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution. Today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single operating system image. Technologies that aid concurrent batch and online processing include Job Control Language (JCL), scripting languages such as REXX, Job Entry Subsystem (JES2 and JES3), Workload Manager (WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS), DB2 data sharing, Parallel Sysplex, unique performance optimizations such as HiperDispatch, I/O channel architecture, and several others.
See also
- Background process
- Processing modes
- Batch file
- Batch renaming - to rename lots of files automatically without human intervention, in order to save time and effort
- Batch-queuing system - for schedulers that plan the execution of batch jobs
- Job Processing Cycle - for detailed description of batch processing in the mainframe field
- BatchPipes - for utility that increases batch performance
- Production support - for batch job/schedule/stream support
- Jem The Bee - Job Entry Manager the Batch Execution Environment
References
- ↑ "Batch Applications for the Java Platform". Java Community Process. Retrieved 2015-08-03.
- ↑ IBM Corporation. "Mainframes working after hours: Batch processing". Mainframe concepts. Retrieved June 20, 2013.