Wikipedia:WikiProject Computer science/Manual of style (computer science)

From Wikipedia, the free encyclopedia

This is a draft in progress — feel free to add or change things, but also make sure to look at the talk page for discussion about the content of this draft


This manual contains some suggestions which are hoped to contribute towards writing clear, pleasant looking, and hopefully interesting computer science articles. This guide is meant not as a substitute, but rather as a complement, for the Wikipedia manual of style which has much useful information for a Wikipedia editor.

Contents

[edit] Suggested structure of a computer science article

Probably the hardest part of writing any technical article is the difficulty of addressing the level of technical knowledge on the part of the reader. A general approach is to start simple, then move toward more formal and technical statements as the article proceeds.

[edit] Article introduction

The article should start with an introductory paragraph (or two), which describes the subject in general terms. This opening paragraph should give the casual reader a quick understanding of the concept. Name the field(s) of computer science this concept belongs to and describe the context in which the term is used. Write the article title in bold. Include the historical motivation, provide some names and dates, etc. Here is an example.

In computer science, Communicating Sequential Processes (CSP) is a formal language for describing patterns of interaction in concurrent systems. It is a member of the family of mathematical theories of concurrency known as process algebras, or process calculi. CSP was first described in a 1978 paper by C. A. R. Hoare, but has since evolved substantially. CSP has been practically applied in industry as a tool for specifying and verifying the concurrent aspects of a variety of different systems. The theory of CSP is still the subject of active research, including work to increase its range of practical applicability. CSP was influential in the development of the occam programming language.

[edit] Overview and motivation

It is a good idea to begin the main part of the article with an informal introduction that gives the non-technical reader a basic understanding of the fundamental concepts of the topic being presented. It can be helpful to have a section on motivation or applications in the informal introduction, if the concept in question is somewhat theoretical. It's also useful to have some representative examples, which could serve to both expand on the concept in question, and also provide some context as to why one might want to use that concept.

A picture is a great way of bringing a point home, and often it could even precede the technical discussion of a concept. How to create graphs for Wikipedia articles has some hints on how to create graphs and other pictures, and how to include them in articles.

If not included in the introductory paragraph, a section about the history of the concept is often useful and can provide additional insight and motivation.

[edit] Structuring different kinds of articles

Computer science is a broad field which encompasses a number of different kinds of ideas. The structure of the main part of an article will vary with the type of article. Here are some general guidelines for structuring a few different classes of computer science articles. Where possible, these guidelines include an one or two examples of a "good" article, i.e. an article demonstrates the style we're aiming for in that particular class of article. Always keep in mind that these are guidelines, not hard-and-fast rules: some articles will need to deviate from this structure; some articles will be hard to classify; some won't fit into these classifications at all. Use common sense in applying these guidelines.

[edit] Algorithms and Data Structures

An article on an algorithm should generally consist of

  1. A description of the algorithm (including pseudocode)
  2. A formal discussion of the algorithm's time and space complexity
  3. A discussion of any implementation and performance issues

A good example is the Quicksort article.

An article on a data structure should consist of

  1. A description of the structure, and any operations that can be performed on the structure
  2. An overview of the applications for the structure in question
  3. A discussion of any implementation and performance issues

[edit] Classic problems

An article describing a classic problem should generally consist of

  1. A statement of the problem.
  2. A discussion of the relevance of the problem.
  3. A discussion of the history of the problem if notable.
  4. A description of solutions to the problem if any exist.

An example is the Dining philosophers problem

[edit] Design patterns

The classic GoF format is a good guideline for the structure of an article describing a design pattern.

[edit] Fields of study

An article describing a field of study within computer science should include

  1. A brief overview of the history of the field
  2. A basic introduction to the key concepts around which the field revolves
  3. A description of the important principles, theorems, or results produced by the field
  4. A discussion of relationships with other fields of study (inside and outside CS)
  5. Some pointers to articles that describe particular aspects of the field in greater detail

[edit] Formalisms

An article describing some kind of formalism should contain

  1. A brief outline of the history of the formalism (originator, major developments, etc.)
  2. An informal introduction to the basic concepts involved (preferably including some simple examples)
  3. A formal definition
  4. Discussion of important applications or implementations
  5. Description of major tools that support the formalism
  6. Brief overview of related or derived formalisms

A good example is Lambda-calculus.

[edit] Programming constructs

An article describing a programming construct should generally include

  1. A brief overview of what the construct is, and how it is used (perhaps including an informal operational semantics)
  2. A listing of alternative names for the construct
  3. Pseudocode or a small code sample demonstrating the contruct in use
  4. Description of any equivalences between the construct and other constructs
  5. A discussion of any variations in the semantics of the construct
  6. A discussion of any disadvantages to use of the construct

Examples include Continuation, Closure (computer science), and Exception handling.

[edit] Programming languages

An article describing a programming language should generally include at least

  1. A brief outline of the history of the language
  2. An overview of the language features
    • Programming paradigm(s) that the language supports, and how well it supports them
    • Style of type-checking, support for design by contract or other specification techniques
    • Memory management style
  3. A basic introduction to the language syntax (including some code samples)
  4. An overview of the formal semantics of the language (if one exists)
  5. A list of available implementations and supported platforms

Good examples include Python programming language, OCaml, and BASIC programming language.

[edit] Theorems and Conjectures

An article describing theorems or conjectures should generally contain at least

  1. A brief informal description of the theorem or conjecture.
  2. A formal statement of the theorem or conjecture.
  3. A discussion of the history of the theorem or conjecture.
  4. A discussion of impacts, consequences, or implications of the theorem or conjecture.

As many computer science theorems and conjectures are often stated informally in popular literature it may also be beneficial to provide some discussion of common misconceptions or misinterpretations of the theorem or conjecture.

Good examples include Halting problem and Church-Turing thesis

[edit] Tools

An article describing a class of tool should generally contain

  1. A brief overview of the purpose of the tool
  2. A brief history of the development of the class of tool
  3. Discussion of any major subclasses of the tool class
  4. Brief description of the the key underlying algorithms or implementation techniques

Examples include Compiler, Text editor, and Theorem prover.

[edit] Concluding matters

It is good to have a see also section, which connects to related subjects, or to pages which could provide more insight into the contents of the current article.

Lastly, a well-written and complete article should have a references section. This topic will be discussed in detail below.

[edit] Style guidelines

[edit] Theoretical topics

The Wikipedia:Manual of style (mathematics) has some good advice on how to discuss more theoretical topics, as well as when and how to include proofs of important theorems.

[edit] Mathematical and logical expressions

See the Wikipedia:Manual of style (mathematics) for guidance on how to typeset equations, logical expressions, and other mathematical notation.

[edit] Code samples

Samples of actual source get included in articles for a variety of reasons, although the most typical reasons are to demonstrate the "look" of a particular language, to provide examples of language-specific constructs or features, and to provide examples of algorithms not easily expressed in pseudocode. While there's nothing inherently wrong with including sample code, excessive amounts of it can detract from the content of the article itself; avoid writing sample code unless it contributes significantly to a fundamental understanding of the encyclopedic content.

Wikipedia is not a source code repository. Code that is not relevant to encyclopedic content should be moved out of Wikipedia. WikiBooks is the appropriate WikiMedia project for existing GFDL-compatible code. The external LiteratePrograms wiki is an appropriate place to put new sample implementations, along with descriptions of how those implementations work. Important note: existing implementations on Wikipedia cannot be transferred to the LiteratePrograms wiki because Wikipedia content is licensed under the GFDL, while LiteratePrograms uses the GFDL-incompatible MIT/X11 license.

Some general guidelines on code samples:

  • Sample implementations of algorithms are fine, but every algorithm article should include a pseudocode description of the core algorithm when possible, so that anyone can understand how the algorithm works. See below for guidelines on pseudocode.
  • The sample should use a language that clearly illustrates the algorithm to a reader who is relatively unfamiliar with the language — even if you believe that the language is well-known. To remain language-neutral, choose languages based on clarity, not popularity. Avoid esoteric or language-specific operations.
  • Source code implementations must be compatible with the GFDL (which is incompatible with the GPL).
  • Multiple source code implementations are not appropriate unless they contrast specific aspects of the code and that contrast is important to the encyclopedic content of the article. If possible, accentuate differences by providing the alternate implementation in the same language as the original.
  • All code samples should be typeset in a monospaced typeface to ensure that spacing is preserved. Monospace fonts can be obtained by indenting lines by one or more spaces, by surrounding a block of code in <pre> tags, or by enclosing inline text in <code> tags. Doing this is particulary important for languages where whitespace has syntactic significance — ideally we'd like people to be able to copy and paste sample code into a text editor or IDE. For example,
int main(void) {
    printf("hello, world\n");
    return 0;
}

  • All code samples should be enclosed in <code> tags, regardless of whether such a tag is required for proper formatting. This will allow greater flexibility in future MediaWiki revisions, as well as providing additional information to screen readers and users with customized CSS.

[edit] Algorithms

There are no universally accepted standards for presenting algorithms on Wikipedia. Algorithms on Wikipedia is a vague precedent. A past attempt at standardized pseudocode for is archived at User:Dcoetzee/Wikicode, though "[t]he author advises that such a proposal not be advanced again, as it is unlikely to gain consent". Within this WikiProject, the consensus has generally been that where possible algorithms should be presented in pseudocode. Use of pseudocode is completely language agnostic, and is more NPOV with respect to programming languages in general. Pseudocode also provides far more flexibility with regard to the level of implementation detail, allowing algorithms to be presented at however high a level is required to focus on the algorithm and its core ideas, rather than the details of how it is implemented. Finally, suitably high-level pseudocode provides the most accessible presentation of an algorithm, particularly for non-programmers.

[edit] Example

algorithm ford-fulkerson is
    input: Graph G with flow capacity c, 
           source node s, 
           sink node t
    output: Flow f such that f is maximal from s to t

    (Note that f(u,v) is the flow from node u to node v, and c(u,v) is the flow capacity from node u to node v)

    for each edge (u, v) in GE do
        f(u, v) ← 0
        f(v, u) ← 0

    while there exists a path p from s to t in the residual network Gf do
        let cf be the flow capacity of the residual network Gf
        cf(p) ← min{cf(u, v) | (u, v) in p}
        for each edge (u, v) in p do
            f(u, v)f(u, v) + cf(p)
            f(v, u) ← −f(u, v)

    return f

[edit] General guidelines for writing pseudocode

  • Make sure the algorithm is understandable without having to read this page first.
  • All assignment, comparison, and other mathematical operators should be rendered with proper mathematical symbols wherever possible:
Operator Result Entity Example
Assignment ← or := &larr;, := c ← 2πr, c := 2πr
Comparison =, ≠, <, >, ≤, ≥ =, &ne;, <, >, &le;, &ge;
Arithmetic +, −, ×, /, mod +, &minus;, &times;, /, mod
Floor/ceiling ⌊, ⌋, ⌈, ⌉ &lfloor;, &rfloor;, &lceil;, &rceil; a ← ⌊b⌋ + ⌈c
Logical and, or '''and''', '''or''' ab and ac
Sums ∑ ∏ &sum; &prod; h ← ∑aA 1/a
  • Try to focus on outlining the algorithm, and where possible keep discussion and explanation of the algorithm outside of the pseudocode.

[edit] Low Level Pseudocode
  • All control structure keywords should be bolded, comments should be in italics (in addition to whatever other manner for denoting comments is used)
  • Try to keep the algorithm description sufficiently high level so as to avoid most implementation specific details.
  • Try to avoid using structures or techniques that are idiomatic to a particular language or programming paradigm.
  • If idiomatic structures or techniques are unavoidable try to present them in a generic high level way in line with the style outlined below:
  • The preferred conditional structure is
if condition then
    code path
else if condition then
    code path
else
    code path
end if
  • The preferred looping constructs are
while condition do
    code block
repeat
and
for loop_control do
    code block
repeat
where loop_control is any suitable clause to control a for loop, such as item in list or 1 ≤ i ≤ n etc.
  • The preferred function definition structure is
function function_name is
    code block
    return variable
end function
where function_name is any reasonable format of function name and arguments. Alternatively inputs and outputs can be specified within the function block:
function function_name is
    input: description of inputs
    output: description of outputs
    code block
    return variable
end function
  • Code blocks should be indented. If sufficiently clear, block closing keywords (end, repeat etc.) may be omitted.

Example of pseudocode roughly hewing to these guidelines is provided as the example on the Algorithm page, and in Bucket sort and Topological sort.

[edit] High Level Pseudocode
Inputs description of input arguments
Output description of outputs
  1. (description of a step in the algorithm)
  2. (the next step in the algorithm)
    1. (substep)
  3. (etc.)
using markup:
 :'''Inputs''' ''description of input arguments''
 :'''Output''' ''description of outputs''
 :# ''(description of a step in the algorithm)''
 :# ''(the next step in the algorithm)''
 :## ''(substep)''
 :# ''(etc.)'' 
 
  • Descriptions of steps should be high level, and may simply be English sentences.
  • Substeps of the algorithm, due to branch conditions or loop structures should be indented and subnumbered.
  • Termination of the algorithm with a return value should be denoted using the keyword return

Examples of algorithms in this format include Buchberger's algorithm, Pohlig-Hellman algorithm, Itoh-Tsujii inversion algorithm, Baby-step giant-step, Pollard's p-1 algorithm, and Pollard's rho algorithm.

[edit] Including literature and references

It is quite important for an article to have a well-chosen list of references and pointers to the literature. Some reasons for this are the following:

  • Wikipedia articles cannot be a substitute for a textbook (that is what Wikibooks is for). Also, often one might want to find out more details.
  • Some notions are defined differently depending on context or author. Articles should contain some references that support the given usage.
  • Important algorithms, theorems, or definitions should cite historical papers as an additional information (not necessarily for looking them up).
  • Today many research papers or even books are freely available online and thus virtually just one click away from Wikipedia. Newcomers would greatly profit from having an immediate connection to further discussions of a topic.
  • Providing further reading enables other editors to verify and to extend the given information, as well as to discuss the quality of a particular source.

The Wikipedia:cite sources article has more information on this and also several examples for how the cited literature should look.

[edit] See also