Substitute character

A substitute character () is a control character that is used in the place of a character that is recognized to be invalid or in error or that cannot be represented on a given device. It is also used as an escape sequence for some programming languages.

In the ASCII and Unicode character sets, this character is encoded by the number 26 (1A hex). Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously (Ctrl+Z, by convention often described as ^Z).

Uses

Under CP/M 1 and 2 (and derivatives like MP/M) it was necessary to explicitly mark the end of a file (EOF) because the CP/M filesystem could not record the file size by itself and files were allocated in extents (records) of a fixed size typically leaving some allocated but unused space at the end of each file.[1][2] This was then filled up with 1A hex characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives like Concurrent CP/M, Concurrent DOS and DOS Plus) do support byte-granular files,[3][4] so this was no longer a physical requirement but a mere convention in order to ensure backward compatibility.

In CP/M, 86-DOS, MS-DOS, PC DOS, DR-DOS and their various derivatives, character 26 was also used to indicate the end of a character stream, and thereby used to terminate user input in an interactive command line window (and as such, often used to finish console input redirection, e.g. as instigated by COPY CON: TYPEDTXT.TXT).

While no longer technically required to indicate the end of a file many text editors and program languages up to the present still support this convention or can be configured to insert this character at the end of a file when editing or at least properly cope with them in files. In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file but more a marker that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is TYPEd to the console or opened in editors. Many file format standards (e.g. PNG or GIF) include character 26 in their headers to perform precisely this function. Some modern text file formats (e.g. CSV-1203[5]) still recommend a trailing EOF character to be inserted as the last character in the file.

Some programming languages (e.g. VisualBasic) will not read past SOFT EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.) and alternate methods must be adopted e.g. opening as BINARY or using the File System Object to progress beyond it.

In Unix operating systems, this character is typically used to suspend the currently executing interactive process. The suspended process can then be resumed in foreground (interactive) mode, or be made to resume execution in background mode, or be terminated.

The Unicode Security Considerations report recommends this character as a safe replacement for unmappable characters during character set conversion.

References

  1. John Elliott (1998). CP/M 1.4 disc formats. ()
  2. John Elliott (1998). CP/M 2.2 disc formats. ()
  3. John Elliott (1998). CP/M 3.1 disc formats. ()
  4. John Elliott (1998). CP/M 4.1 disc formats. ()
  5. CSV-1203 format specification

See also

This article is issued from Wikipedia - version of the Friday, May 23, 2014. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.