User:Dcoetzee/Wikicode/Specification

From Wikipedia, the free encyclopedia

This pseudocode is no longer a proposed standard. See User:Dcoetzee/Wikicode.

Welcome to the wikicode standard specification! If you want a gentle overview of Wikicode, please see User:Dcoetzee/Wikicode instead. If you want to discuss changes to Wikicode, please do so on the talk page for this page.

Contents

[edit] The Wikicode Standard

This is an informal standard for a pseudocode called wikicode. The purpose of wikicode is to easily facilitate both writing and reading of pseudocode for clear and concise descriptions of algorithms and data structures in a way that is consistent across articles. This standard doesn't affect any examples that use source code for actual programming languages, nor are source code examples discouraged. Moreover, any changes are permitted on a per-article basis as is necessary to be as clear as possible — this document does not restrict, only provides recommended guidelines that should be followed unless there is a reason to do otherwise.

Four main goals were pursued in the design of wikicode:

  • Clarity: the most important; it should be clear what the code does
  • Brevity: the code should not occupy much article space, and should consist mainly of algorithm content as opposed to unnecessary syntax
  • Ease of writing: the code should be easy to write and edit
  • Universality: the code should be clear to programmers familiar with nearly any common language, and not too similar to any one

[edit] General Guidelines

Authors using wikicode should keep in mind that the purpose of pseudocode is to convey a process to an audience that may include peers as well as beginners. Therefore, where possible, examples should be as short and simple as practical, with no unnecessary embellishments. Examples should be preceded by text explaining the overall process, followed by the pseudocode. Comments should appear on most, if not all pseudocode lines, and be used to summarize and relate back to the details in the text. Lengthy examples should be broken into segments, with each segment introduced by a description of the process, followed by the pseudocode with any necessary comments, and possibly followed by a wrap-up paragraph.

Assumptions should be clearly stated before they are used, and/or identified in comment statements. For example:

Linked lists can be used to ... (blah blah blah)
Assuming the following routines are available:
   function nodeInsert(node prev, node n)   // insert node 'n' before 'prev'
   function nodeFind(node list, node n)     // find node 'n' in list
We can do the following:
   node n, list, t   // node to add, linked list, and temp node
   t := nodeFind(list, n) // see if node already in list
   (additional code ... blah blah blah)
As can be seen from the above example, (blah blah blah)
However, if (blah blah blah) we can do the following:
   node n, list, t    // node to add, linked list, and temp node
   t := nodeFind(list, n) // see if node still in list
   (additional code ... blah blah blah)
The advantages of the first and second examples are (blah blah blah)

The following guidelines are intended to provide a consistency across various examples that use pseudocode. Conformance to these standards is encouraged. However, exceptions are permitted where the deviations increase the simplicity and/or clarity of the example. Any such deviation should be well documented in the article to prevent confusion for those already familiar with the standard.

[edit] Syntax

All wikicode will be typeset in a fixed-width font, for example by prefixing each line with spaces. Do not wrap the code in HTML <pre> tags; this disallows additional markup inside the code block. If wikicode occurs in the middle of encyclopedic text, this will be indicated by wrapping it in HTML <code> tags, which also use a fixed-width font.

[edit] Functions

A possibly recursive function is defined using the emboldened function keyword followed by the function name, then an argument list enclosed in parentheses. Function and argument types may be specified if it adds to the clarity of the example. Data types should be indicated by an abbreviated type preceding the name (see Types for a list of types). An empty argument list is indicated by an open and close parentheses. Unabbreviated names are preferred, and abbreviated names should be explained in the text or comments. The body of the function should be indented and enclosed in curly braces, with the closing brace but not the opening brace on its own line:

 // sort the list, using the comparisonPredicate to compare
 function sort(list, comparisonPredicate) {
     // indented body of function
 }
 function addModK(int left, int right, int modulus) {
     // indented body of function
 }
 function doSomething() {
     // indented body of function
 }

(If the keywords above do not appear in bold, make sure your Monospace or fixed-width font is set to a font which has a bold version, such as Courier.)

Results are returned from functions using the return keyword, as in return 5 or simply return if there is no result. Control can also return to the caller when the end of the function is encountered. Functions are called by simply specifying the name, followed by a parenthesized list of arguments:

 addModK(5, 3, 7) // always explain the nature/reason of the call
 sort(myList, func(x,y) x < y) // explain any complex formations
 compareEntry(ent1, ent2, cmpEnt) // function cmpEnt used to compare
 sine(5)  // avoid using abbreviated names, even if common (e.g., sin)

As shown above, anonymous functions may be created with func. The anonymous function feature should be used sparingly, and only if the function is very simple; otherwise simply include a function name with a description of what it does.

You can generally assume all sorts of useful math, string, etc. functions are available, as long as you explain them in comments or in the text (as also shown above). However, when manipulating the built-in lists, sets, and maps, use their standard interface unless you have a good reason to do otherwise (see their sections).

Finally, we also provide the ability to use a more natural, sentence-like calling syntax:

 add 5 to end of numberList
 add x to end of numberList

New functions can be declared which are called in this manner:

 function add (i : int) to end of (lst : list) {
     // body of function
 }

[edit] Comments

Comments should appear frequently within wikicode examples, and fall into three categories:

  • inline comments that follow a line of pseudocode
  • full-line comments that are on a line by themselves
  • block comments, which are several lines of comments.

Both inline and full-line comments are started with ''//, the italics mark followed by a double slash, followed by the comment. Block comments should be in the text of the article before or after the wikicode.

This is a very long and unnecessary comment that should appear as normal text before the actual wikicode example. Block comments should appear before each major step of the example so that the reader is introduced to concepts and then see the example.
 // Do something silly
 i := i + 0       // add nothing to i

As in any language, comments should enhance the description, not merely repeat the wikicode. For example, the following is a poor comment:

 i := i + 1   // add one to i

Where possible, summarize and explain why things are being done, not just what is being done.

[edit] Variables

Variables are mutable, are referenced by name, and are assigned new values using the := operator. See Names for naming guidelines. Local variables need not be declared, but if you wish to you can use the var form for this:

 var w               // number of free munchkins
 var x := 2
 var int y := 5
 var int z
 var int a := 2, b, c, d := 5

You can optionally precede a variable with a type name in parentheses for clarity. Variables need not be initialized when declared, but an uninitialized variable should not be referenced.

[edit] Conditional Expressions

Conditional tests can be done with the if-else statement, where else is optional and if and else are emboldened. If the body of the if statement is small, braces can be excluded, but need not be; closing braces but not opening braces shall be placed on their own lines, except that the else keyword may be placed on the same line as the preceding closing brace:

 if x = 5 {
     // x is 5 here
 } else {
     // x is not 5 here
 }
 if x ≠ y
     add new entries to old list
 else
     remove selected entries from list
     // can have more than one line without braces

The condition must have have bool type (see section Types).

Simple conditional expressions can be written on one line using the then and else keywords

 if x < y then a := b else a := c

[edit] Loops

There are five types of loops:

  • while loop
  • do-while loop
  • for loop
  • for each loop
  • loop loop

The body of each loop is indented and optionally enclosed in braces, with the closing brace but not the opening brace on its own line; larger bodies should usually include braces, for clarity. The until keyword can be used in place of the while keyword if desired, which reverses the condition.

The while loop and the do-while loops are identical except for the first iteration, since the while loop tests the conditional before the body of the loop, and the do-while loop tests the conditional after the body of the loop. For the while loop, if the conditional is initially false, then the body of the loop is not executed at all. The do-while loop always executes the body of the loop at least once, and then repeats the loop until the conditional is false. Loops only terminate when the conditional is tested. For example, if the conditional becomes false while in the middle of the loop, the remainder of the body is still executed. If the conditional is still false when the conditional is encountered, the loop will terminate; otherwise it will continue.

The while loop iterates while the condition holds; the until keyword may also be used to iterate until the condition does not hold.

 while x ≠ 0 // loop until x becomes zero
     // body of while loop
 (This is the same as the above loop)
 until x = 0 { 
     // body of the until loop
 }

The do-while loop, unlike the while loop, always executes at least once before testing its condition:

 do   // the body will be executed at least once
     (body)
 while done = false
 do   // this loop is the same as above
     // body
 until done

Since the do-while loop contains keywords both before and after the body of the loop, curly braces are not needed, but may optionally be used, as long as the opening brace is on the same line as do, and the closing brace on the same line as while.

The for loop steps an integer variables along a range of integer values in increasing order, including both endpoints. For decreasing order, the keyword down to can be used instead of to. An optional by keyword can be used if the increment is not one or negative one.

 for x from 1 to 10 { // will loop 10 times
     // body, executed 10 times
 }
 (x starts at 10 and goes down to 1)
 for x from 10 down to 1 { 
     // body, executed 10 times
 }
 (In this loop, the last time through the loop x will be 9)
 for x from 1 to 10 by 2 {
     // x will be 1, 3, 5, 7, and 9
 }

The for each loop is used with collections such as lists and sets (see their sections) and iterates over the elements of alist in order or a set in some arbitrary order:

 for each n in collection {
     // n will iterate through each element of the collection
 }

Finally, the simplest loop is the loop loop, which simply loops forever or until the loop is exited.

The keyword exit loop can be used to break out of any loop at any time and execute the instructions following the loop. The keyword continue can be used to skip the remainder of the loop body and begin another iteration of the loop. If the continue statement is used within a for loop, the loop variable will be incremented (or iterated).

[edit] Operators

From highest to lowest order of precedence, the standard operators are listed below. Although operator precedence is defined, you are encouraged to use parentheses to clarify any expressions that might be confusing to casual or inexperienced readers.

Op Typed Description
^ ^ Exponentiate numeric values
modulus modulus Infix modulus operator
×,÷ &times;,&divide; Multiply/divide numeric values
+,- +,- Add/subtract numeric values
=,<,>,≤,≥,≠ =,<,>,&le;,&ge;,&ne; Comparison
in '''in''' Collection membership
not in '''not in''' Negation of collection membership
not '''not''' Logical not
and,or '''and''','''or''' Logical and, or
additional operators as needed

All of the entities used above are rendered correctly by all modern browsers. All operators are left-associative except ^. There is no integer divide; instead use truncate(x ÷ y), or just floor(x ÷ y) for positive quantities. Although idioms such as * for multiply and != for not-equal are familiar to most programmers, symbols such as × (&times;) and ≠ (&ne;) are familiar to anyone who has had basic math, both in the article and the edit page.

New operators may be used at any time, if they are explained. If you do need to invent your own operators, consider using english terms or symbols such as:

[edit] Types

There are only a few built-in types available, as well as mechanisms for creating user-defined types. Types (like comments) are always typeset in italics; no ambiguity arises because types should not occur on a line by themselves or be preceded by //.

Types used within pseudocode should be as simple as possible. Avoid unnecessary detail that doesn't contribute to the topic being discussed. Include sufficient comments and discussions so that readers at all levels will understand the examples.

Although these types are unrealistic, they are useful for two reasons: some of these types can be simulated to a large extent in practice (using arbitrary-precision arithmetic libraries), and, more importantly for pseudocode, the use of such types simplifies many algorithms, eliminating some relatively unimportant implementation concerns.

If a type is not specified for an entity, such as an argument, local variable, or record field, it can assume any value. Operations such as + may "fail" (make the pseudocode invalid) if they may be applied to a value they do not apply to. Comments should indicate the nature of any such ambiguous operations.

[edit] boolean

The simplest built-in type boolean has one of two built-in values: true or false. Any boolean value is a valid condition for a conditional or loop construct.

[edit] int, real, and float

The built-in type int is an infinite-precision integer type including positive and negative values and zero. The type real represents infinite-precision, infinite-range real numbers, while float represents a floating point number with whatever characteristics you like; if they are important, explain them in the text. If you need additional integer types, such as unsigned 32-bit integers, in your example, name a type in the article text and then use it.

[edit] character and string

The type character includes all conceivable writable characters and is written by surrounding the character with single quotes ('), and string is a finite-length string of such characters, surrounded by double-quotes ("). Double-quotes may be used inside a string, as long as it's clear from context where the string ends; the outer double-quotes may be emphasized with bold ("hel"lo") if this is not clear.

[edit] record

A new record type (called a struct in C) may be defined using the record keyword, may be recursive, and must include field names but types can be eliminated if they are not significant to the discussion:

 record employee { // employee record
     string name         // employee name
     int age             // age of employee
     employee mentor     // reference to another employee
     stuff                 // no specific type given
 }

Records, as well as tagged unions below, are only referred to by reference, and references may contain the special value null to indicate they refer to nothing. Record fields are accessed using ., as in e.name, and such values may be assigned to. Multiple fields of the same type may be listed on a line, if commas are used:

 record employee {
     string name, address, phone
     int age, dependents
     employee mentor
     stuff, morestuff
 }

[edit] tagged union

Types may also be constructed using the tagged union keyword, which defines a tagged union, much like ML's datatype. Tagged unions may be recursive, and assign a tag name to each branch:

 tagged union tree {
     Leaf:
     Node:  tree left, tree right
 }

Fields are specified after each tag; only one set of these fields is available at a time, depending on the tag. The tag of a tagged union object may be determined using the cases keyword, and within the cases, fields of the correct branch of the tagged union may be accessed:

 cases t {
     Leaf: // it's a leaf
           // do stuff
     Node: // it's a node
           // do stuff with t.left and t.right
 }

[edit] Lists and Arrays

A list stores a sequenced list of values. The primitive list operations are defined below, although they should be explained when used within an article:

 list(1, 2, 3)                      (the list containing 1, 2, 3 in that order)
 emptyList                          (the list containing no elements)
 insert (value) at front of (list)  (inserts new value at front of list; push)
 insert (value) at end of (list)    (inserts new value at end of list; enqueue)
 removeFront(list)                  (removes value from front of list; pop/dequeue)
 removeEnd(list)                    (removes value from end of list; undo enqueue)
 isEmpty(list)                      (returns true if and only if list is empty)
 getFront(list)                     (gets the value at the front of the list)
 getEnd(list)                       (gets the value at the end of the list)
 list[i]                            (gets ith element of the list, zero for front)
 x in list                          (x is an element of the list)
 size(list)                         (gets number of elements in the list)
 listToSet(list)                    (get a set containing all elements of the list)

You can also use for each-in to iterate over a list from front to back. When a list variable with elements of one type is declared, the syntax type[lower bound..upper bound] should be used, as in:

 var int[0..k-1] numbers
 var (string[1..numNames] names ) := list("Joe", "Harry")
 var (string[] moreNames ) := list("Joe", "Harry")

If the upper bound is omitted, the size can vary; if the lower bound is omitted, it is assumed to be zero.

The type array is a special synonym for list which implies that the operations above have the following associated costs, where n is the number of elements in the array:

 insert (value) at front, removeFront(list)      O(n)
 insert (value) at end                           O(1) amortized
 removeEnd(list)                                 O(1)
 isEmpty(list)                                   O(1)
 getFront(list), getEnd(list), list[i]           O(1)
 list[i]                                         O(1)
 x in list                                       O(n)
 size(list)                                      O(1)
 listToSet(list)                                 O(n)

There is also an array constructor array(1, 2, 3) and an object emptyArray equivalent to the list versions but meant to suggest they produce arrays.

Finally, multidimensional arrays are permitted, declared and index as shown:

 var int[1..j,1..k] a
 var int[,] b
 a[2,3] := a[3,4] + 1
 for i from 1 to 10
     b[1,i] := i+1

Accesses to elements which do not exist expand the array. If you need to initialize the contents of an array, you are encouraged to do this in the text, as in: The array b is initialized to this value:

\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

[edit] Set

A set stores an unordered sequence of values with no duplicates. The primitive set operations are:

 set(1, 1, "a", 3)         (the set containing 1, "a", and 3)
 emptySet                  (the empty set; don't use the math symbol)
 insert(set, value)        (insert a new value into the set)
 remove(set, value)        (remove a value from the set)
 x := extractElement(set)  (removes a value from the set and returns it)
 x in set                  (x is in the set)
 size(set)                 (gets number of elements in the set)
 setToList(set)            (get a list containing all elements of the set)
 

You can also use for-in to iterate over a set in some arbitrary order.

[edit] Map

Maps are associative arrays. They take keys and produce associated values. All keys which have not been assigned values map to the special value unassigned, but to actually determine if a key is in the map it is preferable to use x in keys(map). The primitive operations are:

 map(1 -> "a", 'k' -> 7)   (the map mapping 1 to "a" and 'k' to 7)
 emptyMap                  (the empty map)
 map[x]                    (get value associated with x)
 map[x] := y               (make x map to y from now on)
 map[x] := unassigned      (unassign x)
 size(map)                 (gets number of assigned keys in the map)
 keys(map)                 (gets a list of the keys in the map)
 values(map)               (gets a list of the values in the map)

You cannot iterate over a map; if you need to do this, iterate over the list of keys instead, looking up corresponding values as needed.

[edit] Names

No two named entities, including functions, variables, and types, may share the same name in a single code listing (although duplicate names are possible in a single article). Valid names may include letters, numbers, greek letters, and mathematical markup, such as subscripts, superscripts, and primes (typed as apostrophes), but names may not start with a number. Names are case-sensitive, but names that differ only by case are strongly discouraged.

Multiword names will use mixed case, always beginning with lowercase. Underscores (_) or other word separators will not be used.

[edit] Line length

If possible, no line should exceed 78 characters, but there may be up to 100 characters on a line. Also, if lines can be made shorter without adding additional lines or sacrificing clarity, this should be done. Lines can be continued across several lines by indenting the continuation lines.

 somevariable := some + long * expression + (that - takes)
   several / lines * because - of + the * length
   of - lines
 anothervariable := shorter - lines

[edit] Indention

The entire code listing will be indented an initial 2 spaces, including blank lines. A function body or the body of a conditional or loop construct must always be indented exactly 4 additional spaces. For example:

 function foo(x, y) {
     if x ≠ y {
         // Doesn't terminate for negative y
         while x > y
             x := x - y
         return x
     } else {
         foo(x + 1, y)
     }
 }

Any other indention performed is up to the writer, but should be done in a clear way that keeps line length under the maximum while retaining clarity and not falsely associating unrelated elements.

The following are some helpful suggestions about indentation that are not mandated. If a function call's arguments are large enough that they cannot fit on one line, a line break should either follow an operator, a comma, or an opening parenthesis. If it follows a comma, the next line is indented to the column following the initial opening parenthesis of the call. If it follows an operator or parenthesis, it is indented 2 additional spaces. For example (don't really use names like the following):

 function foo(x, y) {
     if x > y {
         otherFunctionWithGreatBigName(thisArgumentFillsUpTheRestOfTheLine,
                                       argument1ToThePlusOperator +
                                         argument2ToThePlusOperator,
                                       callYetAnotherFunctionWithBigName(
                                         argToYetAnotherFunction))
     }
 }

Naturally, since this is less clear, clarity of naming and line length should be traded off intelligently.

[edit] Links

Links may be used anywhere they are useful in pseudocode, particularly in comments and around calls to functions not specified in the current listing. However, if used on an emboldened or italic word, it should be used in addition to the existing markup. It may also be desirable to link tagged union or record the first time they are used conspicuously in an article. Also, use common link sense: link only strongly relevant articles.

[edit] {{wikicode}}

A page has been created describing wikicode for readers (User:Dcoetzee/Wikicode, still needs work), and each page which uses wikicode must immediately precede its first conspicuous use with {{wikicode}} (Template: Wikicode), a message which will provide the reader a link to the explanation for readers, as well as this page. Other uses may optionally include the tag, and are recommended to if they're in widely separated or independent sections.