Superimposed code

A superimposed code such as Zatocoding is a kind of hash code that is popular in marginal punched-card systems.

Contents

marginal punched-card systems

Many names, some of them trademarked, have been used for marginal punched-card systems: edge-notched cards, slotted cards, E-Z Sort, Zatocards, McBee, McBee Keysort, Flexisort, Velom, Rocket, etc. The center of each card held the relevant information—typically the name and author of a book, research paper, or journal article on a nearby shelf; and a list of subjects and keywords. Some sets of cards contained all the information required by the user on the card itself, handwritten, typewritten, or on microfilm (aperture card). Every card in a stack has the same set of pre-punched holes. The user finds the particular cards relevant to a search by aligning the holes in the set of cards (using a card holder or card tray), inserting one or more knitting-needle-like rods all the way through the stack, so the desired cards (which had been notched or cut open) fell out from the irrelevant cards in the collection (left un-notched), which remain on the needle(s). A user could repeat this selection many times to form a complex Boolean searching query. A card that was relevant to 2 or more subjects would have the slot(s) for each of those subjects cut out, so that card would drop out when either one or the other or both subjects was selected . The "superimposed code" coding systems, such as Zatocoding, saved space by entering several or all subjects in the same field; such a "superimposed code" stores much more information in less space, but at the cost of occasional "false" selections.[1]

Once you have a collection of index cards, one per book, research paper, or journal article in a library, with a list of keywords (subjects) discussed in a particular book written on that book's card, the "obvious way" to code those subjects is to count up the total number of subjects used in the entire collection R, make a row of R holes near the top of every card, and for each subject actually discussed in a particular book, cut a slot from the hole corresponding to that subject in the card corresponding to that book. [2] Naturally, this also requires a separate list of every subject used in the collection that indicates which hole is punched for each subject. Unfortunately, there may be thousands of distinct subjects in the collection, and it is impractical to punch thousands of holes in every card. While it may not seem possible to use less than 1 hole per subject, superimposed code systems can solve this problem.

Superimposed codes

The Zatocoding system of information retrieval was developed by Calvin Mooers in 1947.[3]

Calvin Mooers invented Zato Coding at M.I.T., a mechanical information retrieval system based on superimposed codes, and formed the Zator Company in 1947 to commercialize its applications.[4] The particular superimposed code used in that system is called Zatocoding, while the marginal-punched card information retrieval system as a whole is called "Zator".[5]

Setting up a superimposed code for a particular library goes something like this:

Later, when we need to find books on some particular subject, we look up that subject in our list of all R subjects, find the corresponding slot pattern of n slots, and put n needles are through the whole stack in that pattern. All of the cards that have been cut with that pattern will fall out. It is possible that a few other, undesired cards may also fall out—cards who have several subjects whose hole patterns overlap in such a way as to mimic the desired pattern. The probability F of some undesired card with v slots cut in it falling through when we select some pattern of n needles is approximately F = (\frac{v}{N})^n. Most systems have a N large enough and r small enough such that, v < N/2 (i.e., the card is less than half-punched), so that probability of an undesired card falling through is less than F < (\frac{1}{2})^n.[2]

There are several different ways to choose which holes will be slotted for each subject.

Zatocoding

Setting up a Zato code for a particular list of R subjects goes something like this:[2]

other superimposed codes

A Zatocode requires a code book that lists every subject and a randomly generated notch code associated with each one. Other "direct" superimposed codes have a fixed hash function for transforming the letters in (one spelling of) a subject into a notch code. Such codes require a much shorter code book that describes the translation of letters in a word to the corresponding notch code, and can in principle easily add new subjects without changing the code book.[5]

A Bloom filter can be considered a kind of superimposed code.

External links

  1. ^ Robert V. Williams. "Punched Cards: A Brief Tutorial". computing now 2002.
  2. ^ a b c d W. Ross Ashby. W. Ross Ashby's Journal: Zato-coding 1960 Sep. 22. p. 6208-6222
  3. ^ "About the Cover". College and Research Libraries News, April 2008. [1][2]
  4. ^ Eugene Garfield. "Continuing relevance of superimposted coding. Journal of Information Science 8 (1984) 181.
  5. ^ a b Herbert Marvin Ohlman. "Subject-Word Letter Frequencies with Applications to Superimposed Coding". Proceedings of the International Conference on Scientific Information (1959).