Spreadsheet

From Wikipedia, the free encyclopedia

Screenshot of a spreadsheet made with OpenOffice.org.

A spreadsheet is a rectangular table (or grid) of information, often financial information. The word came from "spread" in its sense of a newspaper or magazine item (text and/or graphics) that covers two facing pages, extending across the center fold and treating the two pages as one large one. The compound word "spread-sheet" came to mean the format used to present bookkeeping ledgers—with columns for categories of expenditures across the top, invoices listed down the left margin, and the amount of each payment in the cell where its row and column intersect—which were traditionally a "spread" across facing pages of a bound ledger (book for keeping accounting records) or on oversized sheets of paper ruled into rows and columns in that format and approximately twice as wide as ordinary paper.

1 History
- 1.1 Early implentations
- 1.2 Visicalc
2 Programming issues
3 Shortcomings
4 Web based spreadsheets
5 See also
6 References
7 External links
- 7.1 General information
- 7.2 Research organisations

[edit] History

[edit] Early implentations

[edit] Batch spreadsheets

One of the first commercial uses of computers was in processing payroll and other financial records, so the programs (and, indeed, the programming languages themselves) were designed to generate reports in the standard "spreadsheet" format bookkeepers and accountants used. As computers became more available and affordable in the last quarter of the 20th century, more software became available for them, and programs to keep financial records and generate spreadsheet reports were always in demand. Those spreadsheet programs can be used to tabulate many kinds of information, not just financial records, so the term "spreadsheet" has developed a more general meaning as information presented in a rectangular table, usually generated by a computer.

The concept of an electronic spreadsheet was outlined in the 1961 paper "Budgeting Models and System Simulation" by Richard Mattessich. Some credit for the computerized spreadsheet perhaps belongs to Rene K. Pardo and Remy Landau, who filed U.S. Patent 4,398,249 on some of the related algorithms in 1970. While the patent was initially rejected by the patent office as being a purely mathematical invention, Pardo and Landau won a court case in 1983 establishing that "something does not cease to become patentable merely because the point of novelty is in an algorithm." This case helped establish the viability of software patents.

[edit] Interactive spreadsheets

It was not until the ready availability of visual display units ("VDU's") that fully interactive spreadsheets became possible. Earlier implementations were mainly designed around batch programs. In the early 1970's text based VDU's began to be used as input/output devices for interactive transaction processes. It was several years later before full function graphic user interfaces were available for spreadsheets.

The generally recognized inventor of the spreadsheet as a commercial product for the personal computer is Dan Bricklin although a fully interactive implementation produced in the United Kingdom at Imperial Chemical Industries, running on an IBM mainframe platform using CICS pre-dated Bricklin's version by several years even featuring shared public spreadsheets from the outset.^{[citation needed]}

[edit] Works Records System

Screenshot of ICI Works records System.

The system, known as "The Works Records System", was designed by Robert Mais then an employee of ICI Mond Division in the UK and was implemented in 1974 by a team which included Ken Dakin, author of several successful CICS debugging products which were used extensively during its development to ensure the highest possible performance by detecting "hot spots" (high execution locations) during code execution.^{[citation needed]}

All operations were performed using "double precision" floating point arithmetic and formulae (which performed calculations and linked cells, either in the same spreadsheet or in completely separate spreadsheets) could be entered on multiple lines to aid comprehension. Formulae were converted (compiled) to "machine" language "on the fly" on first use and stored for subsequent executions.This technique is now known as Just-in-time compilation (JIT) or, more specifically, "incremental compilation" - but given no label at the time. Data including "aged" values was stored using an Adabas database (described as a “Relational Like" database in the Wikipedia article about Adabas, although it was not fundamental to the operation of the system).

The IBM 3270 workstation chosen for its implementation at the time was a new "breed" of not so dumb terminals which had some basic built-in hardware validity checking such as 'numeric only' input fields.

Despite the limitations of the device, the input screens could nevetheless be designed interactively by non programmers by using simple "<" and ">" as "field" (cell) de-limiters during "the design phase" (building the spreadsheet). As with modern day word processors, these "tab characters" would not normally be visible during normal usage. The same technique was used to define "on screen" the layouts of printed reports that were not limited to the 80 column screen width of the 3270.

It is interesting to note that the system was capable of detecting some illogical operations because of a "units" attribute (such as "kilograms" , "ounces", "feet" or "inches") for numeric values (analogous to currency symbol attributes in today's spreadsheets). It was impossible therefore to multiply kilograms by ounces or commit similar logic errors.

By contrast, today's commercial spreadsheets will willingly allow a column of mixed currencies (say pounds & dollars) for example, to be summed or multiplied with not even a warning!

The Works records system represents the first known use of a shared public spreadsheet since it allowed multiple users to access the linked spreadsheets across a private online network covering many remote locations.

[edit] APLDOT

Another example of an "industrial weight" spreadsheet produced two years later in 1976 at the United States Railway Association on an IBM 360/91 running at The John Hopkins University Applied Physics Laboratory in Laurel MD. The application, named APLDOT, was used successfully for many years in developing such applications as financial and costing models for the US Congress and for Conrail.All software development was in the public domain. The software system underwent a court challenge in US Government vs PennCentral Et al. in 1978, 1979. It was dubbed a "spreadsheet" because that was what the financial analysts and strategic planners called those green pads they used to do their planning on in 1976.

[edit] Visicalc

Dan Bricklin has spoken of watching his university professor create a table of calculation results on a blackboard. When the professor found an error, he had to tediously erase and rewrite a number of sequential entries in the table, triggering Bricklin to think that he could replicate the process on a computer, using the blackboard as the model to view results of underlying formulas. His idea became VisiCalc, the first application that turned the personal computer from a hobby for computer enthusiasts into a business tool.

Screenshot of VisiCalc, the first spreadsheet.

VisiCalc went on to become the first "killer app", an application that was so compelling, people would buy a particular computer just to own it. In this case the computer was the Apple II, and VisiCalc was no small part in that machine's success. The program was later ported to a number of other early computers, notably CP/M machines, the Atari 8-bit family and various Commodore platforms. Nevertheless, VisiCalc remains best known as "an Apple II program".

The acceptance of the IBM PC following its introduction in August, 1981, began slowly, because most of the programs available for it were ports from other 8-bit platforms. Things changed dramatically with the introduction of Lotus 1-2-3 in November, 1982, and release for sale in January, 1983. It became that platform's killer app, and drove sales of the PC due to the improvements in speed and graphics compared to VisiCalc. VisiCorp was unable to respond competitively, and disappeared within a few years.

Lotus 1-2-3 underwent an almost identical cycle with the introduction of Windows 3.x in the late 1980s. Microsoft had been developing Excel on the Macintosh platform for several years at this point, and it had developed into a fairly powerful system. A port to Windows 3.1 resulted in a fully functional Windows spreadsheet which quickly took over from Lotus in the early 1990s. By the time Lotus responded with a usable Windows version of their own, Microsoft had started compiling their Office suite, which still dominates the industry.

A number of companies have attempted to break into the spreadsheet market with programs based on very different paradigms. Lotus introduced what is likely the most successul example, Lotus Improv, which saw some commercial success, notably in the financial world where its powerful data mining capabilities remain well respected to this day. Spreadsheet 2000 attempted to dramatically simplify formula construction, but was generally not successful. Stories attempted to make it easier to deal with 3-D blocks of data (as opposed to the 2-D nature of most spreadsheets), but appears to have seen little or no use.

[edit] Programming issues

Just as the early programming languages were designed to generate spreadsheet printouts, programming techniques themselves have evolved to process tables (also known as spreadsheets or matrices) of data more efficiently in the computer itself.

Spreadsheets have evolved into powerful programming languages; specifically, they are functional, visual, and multiparadigm languages.

Many people find it easier to perform calculations in spreadsheets than by writing the equivalent sequential program. This is due to two traits of spreadsheets.

They use spatial relationships to define program relationships. Like all animals, humans have highly developed intuitions about spaces, and of dependencies between items. Sequential programming usually requires typing line after line of text, which must be read slowly and carefully to be understood and changed.
They are forgiving, allowing partial results and functions to work. One or more parts of a program can work correctly, even if other parts are unfinished or broken. This makes writing and debugging programs much easier, and faster. Sequential programming usually needs every program line and character to be correct for a program to run. One error usually stops the whole program and prevents any result.

A spreadsheet program is designed to perform general computation tasks using spatial relationships rather than time as the primary organizing principle. Many programs designed to perform general computation use timing, the ordering of computational steps, as their primary way to organize a program. A well defined entry point is used to determine the first instructions, and all other instructions must be reachable from that point.

In a spreadsheet, however, a set of cells is defined, with a spatial relation to one another. In the earliest spreadsheets, these arrangements were a simple two-dimensional grid. Over time, the model has been expanded to include a third dimension, and in some cases a series of named grids. The most advanced examples allow inversion and rotation operations which can slice and project the data set in various ways.

The cells are functionally equivalent to variables in a sequential programming model. Cells often have a formula, a set of instructions which can be used to compute the value of a cell. Formulas can use the contents of other cells or external variables such as the current date and time. It is often convenient to think of a spreadsheet as a mathematical graph, where the nodes are spreadsheet cells, and the edges are references to other cells specified in formulas. This is often called the dependency graph of the spreadsheet. References between cells can take advantage of spatial concepts such as relative position and absolute position, as well as named locations, to make the spreadsheet formulas easier to understand and manage.

Spreadsheets usually attempt to automatically update cells when the cells on which they depend have been changed. The earliest spreadsheets used simple tactics like evaluating cells in a particular order, but modern spreadsheets compute a minimal recomputation order from the dependency graph. Later spreadsheets also include a limited ability to propagate values in reverse, altering source values so that a particular answer is reached in a certain cell. Since spreadsheet cells formulas are not generally invertable, though, this technique is of somewhat limited value.

A cell may contain a value or a formula, or be empty. In addition it can contain information about the data type of the data it holds, or expects when a value is entered. This may determine the format in which a value is displayed, and the allowed operations on it. A formula often contains references to other cells. Such a cell reference is a kind of variable. Its value is the value of the referenced cell. If that cell in turn references other cells, the value depends on the values of those. Note that in general the cell content should be distinguished from the cell value.

A typical cell reference consists of one or two case-insensitive letters to identify the column (if there are up to 256 columns: A-Z and AA-IV) followed by a row number (e.g. in the range 1-65536). Either part can be relative (it changes when the formula it is in is moved or copied), or absolute (indicated with $ in front of the part concerned of the cell reference).

Many of the concepts common to sequential programming models have analogues in the spreadsheet world. For example, the sequential model of the indexed loop is usually represented as a table of cells, with similar formulas.

[edit] Shortcomings

While extremely popular, spreadsheets are not without their downsides. Some of the problems associated with spreadsheets include^[1]^[2]:

Lack of auditing and revision control. This makes it difficult to determine who changed what and when. This can cause problems with regulatory compliance, among other things.
Lack of security. Generally, if one has permission to open a spreadsheet, one has permission to modify any part of it. This, combined with the lack of auditing above, can make it easy for someone to commit fraud.
Lack of concurrency. Unlike databases, spreadsheets typically allow only one user to be making changes at any given time.
Because they are loosely structured, it is easy for someone to introduce an error, either accidentally or intentionally, by entering information in the wrong place or expressing dependencies among cells (such as in a formula) incorrectly.
The results of a Formula (example "=A1*B1") applies only to a single cell (that is, the cell the formula is actually located in - in this case perhaps C1), even though it can "extract" data from many other cells, and even real time dates and actual times. This means that to cause a similar calculation on an array of cells, an almost identical formula (but residing in its own "output" cell) must be repeated for each row of the "input" array.This differs from a "formula" in a conventional computer program which would typically have one calculation which would then apply to all of the input in turn. With current spreadsheets, this forced repetition of near identical formulae can have detrimental consequences from a quality assurance standpoint and is often the cause of many spreadsheet errors.This last problem could be solved conceptually, simply by permitting the specification of a new category of "spatially independent" formula, allowing the "left hand" (target) of the formula to be entered combined with use of "indexed cell addressing" of the generic form:-

WHILE COUNT(A1:A20) > 0), C(i) = A(i)*B(i)     where i=incremented row number (1-20)

This theoretical category of formula could reside anywhere within the spreadsheet since its target cell(s) are specified independently of their location in the spreadsheet. (However, for clarity, the "cloned" formula could optionally be shown in each target cell, any change to one affecting all its clones automatically, thereby reducing errors).

or, to conform more to current "spreadsheet like" syntax perhaps:-

=IF(COUNT(A1:A20) > 0, A(i)*B(i),"")   where 2nd parameter represents the formula

to be applied to each occurence - but entered only in the first cell, the rest of them displaying the cloned formula.

With the recent advent of remote data update of cells, the need to specify conditional formula of this type will assume a new urgency since the precise contents and extents of external spreadsheets may not be fully discernable before execution.

While there are built-in and third-party tools for desktop spreadsheet applications that address some of these shortcomings, awareness of these is generally low, and usage lower still. However, many of these earlier shortcomings can be handled by online spreadsheets such as EditGrid and Google Docs & Spreadsheets.

[edit] Web based spreadsheets

The advent of advanced web technologies, such as Ajax and XUL, circa 2005 has propelled the emergence of a new generation of online spreadsheets. Equipped with a rich Internet application user experience, many of the web based online spreadsheets boast many of the features seen in desktop spreadsheet applications and some already surpass them offering real time updates from remote sources such as stock prices and currency exchange rates.