chomski

pp, chomski virtual machine
Major implementations
Paradigm	scripting language
Designed by	mj bishop
First appeared	2007
Typing discipline	none; all data is treated as a string
OS	Cross-platform
Website	bumble.sourceforge.net/pp/

Influenced by
Sed, Awk

chomski virtual machine (named after the noted linguist Noam Chomsky) and pp (the pattern parser) refer to both a command line computer language and utility (interpreter for that language) which can be used to parse and transform text patterns. The utility reads input files character by character (sequentially), applying the operation which has been specified via the command line or a pp script, and then outputs the line. It was developed from 2006 as a Unix and Windows utility, and is available today for Windows and Linux systems. Pp has derived a number of ideas and syntax elements from Sed, a command line text stream editor.

Features

The chomski language uses many ideas taken from sed, the Unix stream editor. For example, sed includes two virtual variables or data buffers, known as the "pattern space" and the "hold space". These two variables constitute an extremely simple virtual machine. In the Chomski language this virtual machine has been augmented with several new buffers or registers along with a number of commands to manipulate these buffers.

The chomski virtual machine includes a tape data structure as well as a stack (data structure), along with a "workspace" (which is the equivalent of the sed "pattern space" and a number of other buffers of lesser importance. This virtual machine is designed specifically to be apt for the parsing of formal languages. This parsing process traditionally involves two phases; the lexical analysis phase and the formal grammar phase. During the lexical analysis phase as series of tokens are generated. These tokens are then used as the input for a set of formal grammar rule. The chomski virtual machine uses the stack to hold these tokens and uses the tape structure to hold the attributes of these parse tokens. In a pp script, these two phases, lexing and parsing, are combined in one script file. A series of command words are used to manipulate the different data structures of the virtual machine.

Purpose and Motivation

The purpose of the pp tool is to parse and transform text patterns. The text patterns conform to the rules provided in a formal language and include many context free languages. Whereas traditional Unix tools (such as awk, sed, grep, etc.) process text one line at a time, and use regular expressions to search or transform text, the pp tool processes text one character at a time and can use context free grammars to transform (or compile) the text. However, in common with the Unix philosophy, the pp tool works upon plain text streams, encoded according to the locale of the local computer, and produces as output another plain text stream, allowing the pp tool to be used as part of a standard pipeline.

The motivation for the creation of the pp tool and the chomski virtual machine was to allow the writing of parsing scripts, rather than having to resort to traditional parsing tools such as Lex and Yacc.

Usage

The following example shows a typical use of chomski, where the -s option indicates that the chomski expression follows:

cat inputFileName | chomski -s  '/(/ { until ")"; print; } clear;' > outputFileName

In the above script, only text within brackets would be saved in the output file.

Under Unix (and Windows), chomski can be used as a filter in a pipeline:

generate_data | chomski -s '/x/{clear;add "y";}print;clear;'

That is, generate the data, and then make the small change of replacing x with y.

Several commands can be put together in a file called, for example, substitute.chom and then be applied using the -f option to read the commands from the file:

cat inputFileName | chomski -f substitute.chom > outputFileName

Besides substitution, other forms of simple processing are possible. For example, the following uses the plus and count commands to count the number of lines in a file:

cat inputFileName | chomski -s '[-n]{plus;} <>{count;print;}'

This example used some of the following metacharacters and language features:

The square Brackets ([]) indicate the matching of a character class.
The -n string matches a newline character.
The <> string matches the end of the input stream (text file).
The curly braces ({}) follow tests and group multiple statements.
The semi-colon (;) terminates all statements,

Complex chomski constructs are possible, allowing it to serve as a simple, but highly specialised, programming language. Chomski has only one flow control statement (apart from the test structures <>, [], // etc.), namely the check command, which jumps back to the @@ label (no other labels are permitted).

History

The idea for chomski arose from the limitations of regular expression engines which use a line by line paradigm, and the limitations on parsing nested text patterns with regular expressions. chomski evolved as a natural progression from the grep and sed command. Development began approximately in 2006 and continues sporadically.^[1]

Limitations

Chomski is not a general purpose programming language. Like sed it is designed for a limited type of usage. It currently does not support unicode strings, since the current implementation uses standard C character arrays. Chomski does not currently have a debugger for debugging complex scripts.

References

↑ Developer’s (M.J. Bishop) personal recollection

External links

Unix command-line interface programs and shell builtins

File system	cat chmod chown chgrp cksum cmp cp dd du df file fsck fuser ln ls mkdir mount mv pax pwd rm rmdir size split tee touch type umask

Processes	at bg chroot cron fg kill killall nice pgrep pkill ps pstree time top

User environment	clear env exit finger history id logname mesg passwd su sudo uptime talk tput uname w wall who whoami write

Text processing	awk banner basename comm csplit cut diff dirname ed ex fmt fold head iconv join less more nl paste printf sed sort spell strings tail tr uniq vi wc xargs yes

Shell builtins	alias cd echo test unset wait

Networking	dig host ifconfig inetd netcat netstat nslookup ping rdate rlogin route ssh traceroute

Searching	find grep locate whatis whereis

Documentation	apropos help man

Miscellaneous	bc dc cal expr lp od sleep true and false

List of Unix utilities

This article is issued from Wikipedia - version of the Friday, November 20, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.