Wikipedia talk:Algorithms on Wikipedia

From Wikipedia, the free encyclopedia

1 C++
2 Recursion vs. Iteration
3 Role of this page
4 Pseudocode vs. real code
5 Choice of language criteria
6 Standardized pseudocode
7 Samples of code for popularity
8 Haskell?
9 Proposal: construct our own "executable pseudocode"
10 LiteratePrograms
11 Major Changes
12 The programming language in algorithm articles
13 Pseudocode syntax
14 New formatting tag

[edit] C++

This is a simple issue. Use C++ without any object oriented constructs and without pointers except for binary trees and other dynamically allocated data structures. References should be used whenever possible. Also, no macros.

Secondly, clarity/readability should be emphasized over speed.

Advantages:

- basic syntax of C++ is essentially the same/easily understandable by programmers in other languages
- C is widely known
- Avoids Unix specific languages such as Ruby, Python, Perl, etc. (NB: Technically, these are portable but almost all users of those languages use linux/unix).

Why C++ and not C? Also although a lot of samples are in C, once you start having to deal with pointers it's probably getting too bogged down in the C-ness of it and you need a more "pseudo"-type language such as Python. —EatMyShortz 13:55, 14 February 2006 (UTC)

There's a very good reason not to use C/C++ exclusively. Their syntax has a lot of problems, and C++ in particular is extremely verbose, even for relatively simple things. A lot of the built-in operators are counterintuitive to people who just know high-school math, such as assignment/equality, not-equals, bitwise versus logical and/or, and so on. They use different bitwise operators from computer engineers. Damian Conway has written papers on how confusing C/C++'s declaration syntax is. Additionally, C is limited to fixed-size integers, which makes many algorithms in number theory and other areas much more complicated. C++ at least has string operators, That said, I do support procedural code samples for most things, and I have often used C samples where they are simple. Deco 20:07, 14 February 2006 (UTC)

I don't think using any concrete programming language is a good idea, as the clearest algorithms are written with appropriate section in words, e.g. union of X_i for i ∈ { 1 ... n } is infinitely clearer than (apply union xs) in Lisp/Scheme, foldl union [] xs in Haskell, and the corresponding C++ iterator loop. --Mgreenbe 20:35, 14 February 2006 (UTC)

I have to agree with Mgreenbe that the requirement of executable code in a concrete programming language is only going to obfuscate things. High level pseudocode allows you to condense something that would require considerable complexity in a concrete language into a simple and accessible description in pseudocode. For instance I think it would be hard to write an executable version of Buchberger's algorithm in C or C++ or Scheme or Python that is going to be as clear as the psuedocode presented - far too much code would be spent dealing with the complications of implementation rather than in explicating the important steps of the alorithm itself. Leland McInnes 22:03, 14 February 2006 (UTC)

I agree that it's not good to use a certain programming language indefinitely. In the case of Buchburger's algorithm, that's explained as it stands in a mixture of pure maths and english - a very pseudo pseudocode. Which is good - since it's broken down into these neat steps and it's a very mathematical procedure, it's good to have it expressed like that. I think problems like this though should be explained in "pseudo pseudocode" and then also more detailed in "pseudocode" - see Longest-common subsequence problem, where it is explained twice - thrice in fact. So I think we should only use computer code for either simpler things, or more-computational-less-mathematical things. Python would be my language of choice - even if you've never used it it is very easy to understand, some of my lecturers use it for pseudocode even though they haven't taught us the language. C should be used on things like 'Pointer' where it's necessary to explain it. —EatMyShortz 02:14, 15 February 2006 (UTC)

How about Topological sort then? That's largely computational and is fairly simple. The algorithm, as presented in pseudocode, is quite clear. Converting that into executable code is going to involve declarations of some sort of data structure to hold the graph, and then manipulations specific to that structure cluttering up the internals of the algorithm itself. The result will still be readable, but it will be more complex, less clear, and do a much poorer job of actually expressing the important points with regard to how the algorithm works.

Pseudocode is also more neutral, thus less inclined to induce language wars and redundancy via multiple different implementations in multiple different langauges. Pseudocode is also more flexible - you aren't bound to a the constructs offered by a particular implementation: for example pseudocode could support pointers, or pattern matched functions, or dependent types. As long as you can find a clear generic way of expressing it, pseudocode can be bent to any task required of it. I don't believe that's particularly true of any concrete languages - certainly not without extra hoop jumping (such as metaprogramming in Lisp) to set it up. I would ask, instead, what the benefit of expressing algorithms in a concrete language is. Anything presented as (pseudo-)code on Wikipedia ought to be referenced and verifiable by inspection, so correctness can't be the issue. If you want to provide people with a working example they can play with - you're making the assumption they have a compiler or interpreter for the language you've chosen. Instead working examples would much better be provided on a linked page or on Wikisource as a respository of implementations in as many different languages as people care to submit - why limit yourself to one implementation language, and why clutter the main page with multiple implementations in different languages?

It seems to me that pseudocode is the most neutral, the most expressive, the most flexible, the most consistent, and the simplest solution. I see few real benefits to using a concrete language that can't be obtained by simply linking to a page of concrete implementations. In turn there is much to be lost in straightjacketing yourself into concrete implementations, particularly if you want to have consistency. Surely pseudocode, standardised with some basic guidelines outlining suitable style, is the most natural answer? Leland McInnes 02:40, 15 February 2006 (UTC)

Well expressed - I agree. I think the only problem with pseudocode is that it *isn't* standardised so as somebody said, it isn't immediately clear what you mean by "=" for example, and some people start using <- arrowey things for assignation, and it all becomes inconsistent. I'd like to see some consistent pseudocode rules set up, with the intention of having writers need to know the rules and follow them, but readers can understand them just by looking. However this has already been proposed and failed (Wikicode), probably because nobody took the interest to learn how to write it. —EatMyShortz 13:52, 19 February 2006 (UTC)

Why not have a look at the draft Wikipedia:WikiProject Computer science/Manual of style (computer science)? I think most of the feeling is against strict pseudocode structure, but syntax we can agree on. --Mgreenbe 14:37, 19 February 2006 (UTC)

[edit] Recursion vs. Iteration

I can appreciate the emphasis on iteration versus recursion, as recursion is not taught to some people. I don't agree that iteration is always preferable. A recursive algorithm that searches a binary tree is simpler to understand, requires less book-keeping in variables, and is in general, preferable when explaining to a neophyte. I think there are an entire class of problems that are similar. --BenBaker

I'm aware that recursion vs. iteration is one of more controversial of hints listed. There is certainly class of problems are much easier to solve using recursion than iteration. That is a hint for problems that can be solved equally easily with both methods (like fibonacci).

There is also a problem that various people find either of methods "simpler to understand". There's some point in providing both implementations, so both groups can find easy-to-understand algorithm.

Probably some weaker wording should be used. --Taw

Iterative where immediately apparent would be nice. On the other hand, if its a problem that lends itself to recursion, no reason not to. programming is complicated, there's no two ways about it.

BTW, Even with iterative constructs available in LISP, I tend to go recursive just because the language lends itself to it, but in C I go iterative if I can help it. Something to think about... --alan d

[edit] Role of this page

I think this page should be renamed "List of Algorithms", should have a one-line explanation of every algorithm, and should be listed on Reference tables. I don't think it makes much sense to try to dictate in which way people are to write sample implementations. Personally, I'd prefer pseudo code. Also, why is it directed at Unix programmers? What about Hurd hackers? --AxelBoldt

Page name is almost irrelevant.
One-line ok
Reference Tables done
It's not discating, it's setting standards
It makes LOT of sense. Bad sample implementation is almost completely useless.
Pseudo code is really bad idea. It is in no aspect better than real languages like Ruby or Python. It's impossible to test, so may contain errors and one has to assume many things like array indexing about it. Pseudocode should be discouraged.
It should be directed at Unix programmers, because majority of people interesed in this topic knows Unix or technologies that come from it, like C. No Unix-specific quirks in code of course.
Objectively speaking, Hurd is kind of Unix or at least mostly Unix-compatible.
Hurd hackers are very small group of people and I'm sure that all of them know some other Unices.

--Taw

Page names involving "Wikipedia" are generally taken to be meta articles about Wikipedia, not genuine Wikipedia articles. If this is intended to be a policy article about Wikipedia coding standards, then there should be a separate list of algorithms.
While it is true that pseudo code cannot be tested, it is intended for human understanding and may therefore contain less errors. Programming languages often contain obscure constructs (= vs == for instance, or overloaded division /) which make understanding harder. Just because you can test a program doesn't mean that it is less likely to have errors.
The reference to Hurd was a joke. I was alluding to the fact that there are many programmers which don't use Unix.

The article should be directed at programmers, period. --AxelBoldt

[edit] Pseudocode vs. real code

So the only unresolved issue is pseudocode, right ?

Arguments for pseudocode and against simple real languages:

Rarely used languages like Ruby may confuse readers
Pseudo code forces the programmer to think through the algorithm before implementing it; the danger with real code is that they just cut and paste
Pseudo code can focus in on the heart of the algorithm, glossing over trivial parts (initializing of arrays, type casts)
Pseudo code is the most concise description of an algorithm designed for human consumption, programming languages are designed for consumption by computers.
Pseudo code can use generally established conventions of mathematical notation (xⁿ, √x, x mod n), while avoiding idiosyncracies of programming languages.
Any algorithm that can be expressed at all can be expressed in pseudo code; some algorithms can not be expressed in some programming languages.
...

Arguments for simple real languages and against pseudocode:

Pseudocode can't be easily tested, so may contain subtle errors like off-by-one
Pseudocode needs explanations about things like == vs. =, type of division, array indexing etc. before each example
Pseudocode isn't standarized, so it may confuse reader
Reader can't play with sample implementation, and playing can help to understand in case of some algorithms
Pseudocode needs some redundant information, for example it must have array size specified as separate variable, while in most languages constructs like array.size can be used to get it.
Pseudocode can't be easily used for many algorithms, for example those involving pointers
...

I think the argument that "Pseudocode can't be easily tested" is a spurious one. Unless we're engaged in "original research", the pseudocode in question should be coming from some external (referenced) source, so there shouldn't be any need to test it (although there might be a need to proof-read it against the reference). With regard to the argument about sample implementations

constructing their own sample implementation based on the pseudocode will probably help the reader gain an even better understanding of the algorithm
unless we're planning on providing sample implementations in every conceivable language, there's a chance that $RANDOMREADER won't be able to make use of the sample implementation anyway
there's nothing that says that there shouldn't be a few sample implementations in addition to a good language-independent pseudocode description of the algorithm

--Allan McInnes 00:44, 29 January 2006 (UTC)

I'm moving Ruby into the list of less-recommended languages. Smalltalk and Eiffel both have many more users than Ruby does, so if you're going to recommend against them (which I think is reasonable), then you have to take out Ruby as well. The only ones I think have a significant enough user base that any programmer should at least be familiar with them are C/C++, Java, BASIC, Pascal, Perl, and Python. BASIC and Pascal are bad because there are too many incompatible variants.

-- LDC

C programs can be a mess because to write them portably, taking into account char-size, endianness etc., can make them unreadable. While I like Perl, I think it is unreadable unless you have really used it quite a bit. Ideally, I would like the sample implementations to be relevant to beginning computer science students as well as to working programmers. --AxelBoldt

[edit] Choice of language criteria

There are two things that need to be considered in choosing languages - its readability and its popularity. That's why languages like Python and Ruby, which were designed (successfully) with readability in mind are more apropriate, while Perl and C are less apropriate, than they would be if only popularity mattered. I put Eiffel and Smalltalk on discouraged list not only because of their little popularity but also because of their non-standard syntax, and it case of Smalltalk, philosophy.

About popularity:

From http://www.eros-os.org/pipermail/e-lang/2001-October/005793.html (freshmeat projects count, only relevant entries quoted):
- Python: 403
- Ruby: 31
- Eiffel: 16
- Delphi: 15
- Pascal: 15
- Smalltalk: 3
- Visual Basic: 6
- Object Pascal: 4
- Basic: 1
- so if Unix hackers are considered, BASIC, Pascal, Smalltalk and Eiffel are less popular than you stated while Ruby is more. (in other groups, popularity ranking will be different)
BASIC and Pascal aren't bad only because there are too many variants. There are many bigger problems with these languages. (details are easy to find online)
Nowadays most new programmers aren't familiar with Pascal and BASIC. They might have seen some code, but haven't used it to do anything serious. Most schools don't use these languages for teaching and they are virtually unused in UNIX world.
Ruby's popularity is comparable to Python's in Japan (there was article on /. about it, but i don't have hard data handy) --Taw
Algol was specifically designed to be used for print media in a pseudocodish kind of way. Why not continue the tradition?

[edit] Standardized pseudocode

As a beginning computer science student, I request that all algorithms be written in a Wikipedia-standardized pseudocode. I'm not particularly familiar with any language, and IMO, the most generalized presentation would be of greatest benefit. There's no reason why there can't also be implementations in Python or Java, giving us the best of both worlds. --Stephen Gilbert

What's the point in "standard" pseudocode as compared to using simple real world language ? --Taw

I find that pseudocode helps me understand the general algorithm, as opposed to a specific implementation of it. Why can't both pseudocode and a simple real world language be used? --Stephen Gilbert

[edit] Samples of code for popularity

Freshmeat is hardly an unbiased sample, and the numbers quoted are too small to mean much anyway. Google returns twice about as many hits for "Smalltalk language" and "Eiffel language" as it does for "Ruby language". Check out any bookstore: you're likely to find 5-10 books on Eiffel and Smalltalk; I think only one Ruby book exists. Smalltalk is big in academia as a teaching language (like Pascal used to be, and Java is now), so a lot of people are familiar with it even if it doesn't get used for real projects much. I have no idea what you mean by its "philosophy" or why that matters. BASIC isn't used in academia and it's not used much in Unix, but it's still the #1 most popular programming language by many measures--let's not kid ourselves that the world is Unix--the world in Windows, and us Unix hackers are growing minority, but still a minority.

The Ruby code does have the advantage that it's pretty readable as pseudocode for simple things. All in all, I think if I were to add code samples here, I'd use Java (which you omitted from the table above--its count is 740, above all those you mentioned and below C/C++, Perl, and PHP), since it is quite common, well-defined and standardized, and quite readable. -LDC

By user count, world may be Windows, but by programmer count it's not.
I wouldn't call Java very "readable".
Google count for 'basic language' and 'basic programming language' contains mostly false positives on main page - basic is popular english word after all

Your first point may be true; it would be hard to measure. Perhaps the "bookstore" metric would be useful there as well (that is, count the sales of related books). I personally don't find Java any less readable than Python or Ruby; they all use the same operators, keywords are sensible, etc. At any rate, that's a matter of personal taste. Yes, searching for "BASIC language" on Google would be silly, which is why I didn't do or suggest it. Even the ones I did do are only the roughest possible estimate, just like your "Freshmeat" metric. Another metric that might be useful is to look at newspaper job advertisements. Java, C, and BASIC (usually MS VB) programmers are in greatest demand, followed by Perl, ASP (a BASIC derivative), and miscellaneous others.

I'd support using real programming code (my favourites would be C/C++ or Java, but take your pick), not psuedocode. The problem with psuedocode is that there are no standards, and sometimes it is difficult to interpret. At least with real programming code, there is always only one (in theory at least) interpretation for the code. -- SJK

Why don't we just let everybody use their favorite programming language? That way, not only will we get the biggest possible catalog of sample implementation, but we will also get a nice collection of sample code snippets that we can link to from the respective programming language pages. Whenever you come across an ugly sample implementation in language you don't understand, like Scheme, just add your own implementation written in Brainfuck. Can't hurt, can it? --AxelBoldt

Nobody's saying that one can't add another sample implementation, if there is some reason of doing it. Hints are made so people have easier time if they're unsure what's the good way. --Taw

I'd prefer real code to pseudocode in general, but sometimes pseudocode is far more readable. For example, I just added an algorithm to Complexity_classes_P_and_NP. It's 8 lines of pseudocode, and fairly readable (I think). In any real language, this would be far less readable, since you'd have to build an interpretor, and deal with all sorts of data structures that aren't really important to the central idea. --LC

But that's because it's hardly an "algorithm", it's rather abstract mathematical construct --Taw

That's what algorithms are! Actual code implements an algorithm, which is the abstract mathematical construct of a sequence of steps to produce a result. Taw uses pseudocode here to give a very high-level presentation of an idea. In a real language, you'd have to just put in a procedure call like "RunProgram(x)", and comment that this is left as an exercise for the reader. :-)

The list of algorithms seems to be rather slewed towards algorithms used in computer programming. Do algorithms like Euclid's algorithm (a numerical algorithm) and Grover's algorithm (a quantum algorithm) belong? Or are the lists meant to be geared towards computer programming? -- CYD

[edit] Haskell?

See /More talk for the page history of this section. Graham87 01:38, 8 May 2008 (UTC)

A couple of points:

OOP is about programming in the large. For programming in the small (to provide illustrative examples of algorithms, for instance), OOP is just a bunch of unnecessary cruft. Hence, Java and C++ don't add very much over C for the purpose of illustrating fundamental algorithms, and the cruft gets in the way of understanding.
Some of the HLL's (Perl, for instance), aren't exactly ideal for describing implementations of data structures (trees, hash tables etc).
As far as the untestability of pseudocode, most algorithms we present should be verifiable by inspection.
Perl looks ugly and encourages, er, idiosyncratic coding. It's great for getting jobs done. It's bad for presenting examples in, IMHO.
Of course, I think we should present most of our algorithms in something like Haskell, but I can't see us winning that argument :) --Robert Merkel

Yeah Robert I agree at most. Especially for C as I don't understand all the HLL's, PERL's, IMHO's, Haskell 98's, Haskell++, O'Haskell's and Mondrian's stuff you've written. C++ and Java are very active nowadays and perhaps according to C++, through C# and some other futher classification efforts C will see its own rebirth. I am also very much interested in RPN programming and in Maple V R4.00a codeing for the number theory applications. --XJamC 4 Wednesday (Thor's day) [2002.02.28) (0)

I suggest as a compromise the Python programming language one of the most readable progamming languages, often described as "executable pseudocode". -- Anon.

I would be very much in favour of real pseudo-code. This has benefits that

No programmer is "angry" because the implementation is in a - in his eyes - stupid, silly, inferior programming language (it's NPOV). Moreover, they will not feel compelled to add other versions of the same algorithm
No details of implementation have to be presented, and use of mathematical signs (also variables with subscripts) or plain text make the code very readable
Pseudocode makes it possible to focus on what's important about the algorithm.
An encyclopedia is about things, not the things themselves. Therefore, having executable code is not very useful (it's not in all languages anyway). If somebody is interested in implementing one of the algorithms, pseudocode should give him more than enough information to implement things in his favourite language, especially if the article also contains some pointers on implementation details (f.e., an algorithm on graphs can point to an article Graph representation in programming, or something like that.

I could probably go on here, but I think my point is clear. A problem of pseudo-code is of course that there's no standard, but it should not be a problem to create a Wikipedia pseudo code language, and linking to explanation when used. Jeronimo

If you fully define a pseudocode then it becomes a code. Next thing someone will write a complier for it, then people will start with the language pissing matches etc.... ;-)

Hehe, you might be right. But te definition should simply define some syntax (assignment, if-then-else, while-do, etc). The rest of the notation should be relatively free. That's why it's pseudo-code. Jeronimo

[edit] Proposal: construct our own "executable pseudocode"

I've been thinking about the real-language versus pseudocode debate, and I think an interesting compromise would be to construct an "executable pseudocode" language with a real interpreter specifically with the aims of readability, simplicity, and brevity in small code samples. To facilitate embedded English statements, we could either allow English statements to "summarize" (hide) sections of code that implement them, or we could have the interpreter pause and make the human interpret the English statement. This would allow us to test code samples without succumbing to the idiosyncrisies and verbosity of real languages, with the additional benefit of ensuring regularity of syntax for common constructs like loops. Real languages could still be used where one is particularly suitable or simple, or in a supplementary fashion.

This is all a little abstract, so here's a contrived example. Let's say you type this very C-like text into a text file:

function insert(array a, int length, value) {
    set i to index of last element
    while i >= 0 and a[i] > value {
        a[i + 1] := a[i];
        i--;
    }
    a[i + 1] := value;
}

You run it through a "beautifier" which converts it to this pretty wikitext, expanding or replacing counterintuitive constructs and operators and adding bold/italics:

function insert(array a, int length, value) {
    set i to index of last element
    while i ≥ 0 and a[i] > value
        a[i + 1] ← a[i]
        i ← i − 1
    a[i + 1] ← value
}

Next you load this up in the interpreter. It gives you an input-eval-print loop in which to enter test invocations. Let's say you type "insert({1,3,5,0},3,2)". It executes the above function, and when you get to "set i to index of last element", the interpreter stops and asks you to carry this action out. You type "i := 2" and then continue running. The result "{1,2,3,5}" comes back.

So that's a brief preview - any suggestions or criticism welcome. Deco 21:23, 14 February 2006 (UTC)

Who would be entering an algorithm that is unable to either (1) test and experiment with a working implementation or (2) prove correctness (perhaps following a proof in a book)? I feel like anyone who doesn't fall into one of these two categories shouldn't be entering algorithms. So would this be helpful to readers? I wouldn't find much use from it (biased sample). Perhaps it would be more helpful to leverage WPCS to add real-language implementations of algorithms to Wikisource. --Mgreenbe 22:08, 14 February 2006 (UTC)

Unless someone is doing original research, any algorithm that gets entered into Wikipedia should be traceable to a reference. The algorithm should not require any testing or proofs, because the reference is responsible for the correctness of the algorithm. At worst, the algorithm may need to be proof-read against the reference. This is true regardless of whether the algorithm is provided as pseudocode or in a particular language. IMHO the algorithm itself should be defined in pseudocode, for many of the reasons that Leland describes below. I think it's reasonable to have sample implementations of an algorithm that are not traceable to a specific reference (perhaps generated by WPCS members) where those samples are based on the pseudocode presentation, which is traceable to a reference. Where a sample implementation isn't directly based on the pseudocode (or where no pseudocode is provided), it should be traceable to a specific reference. --Allan McInnes (talk) 23:19, 14 February 2006 (UTC)

While I appreciate your efforts (and particularly the benefits of your experience in this issue) I would like to question how important it is to actually have "executable" code. A few people have cited the need to ensure "correctness" but executability doesn't guarantee correctness, only that the code can be tested - it may still fail for corner cases or cases that haven't been tested. Moreover I would suggest that any algorithm (pseudo)code posted should be checkable by inspection - if it isn't then the algorithm really needs to be rewritten at a higher level to make the important processes clear. I would also suggest that any algorithms really ought to provide a source: given a loose/general enough set of pseudocode guidelines transcription from, and verficiation against, the source pseudocode ought to be trivial.

On the other hand the requirement of executability is a heavy constraint leading code to become potentially mired in necessary implementation details that obscure the important points of the algorithm. Your suggestion of allowing english descriptions of steps that the executable simply prompts the user for is a good one, and does go some distance toward alleviating this issue, but might not always be sufficient. For example, consider the pseudocode for Topological sort: the statement for each node m with an edge e from n to m do is clear enough, but prompting the user to find all the nodes connected by and edge from the given node at each step of the for loop is going to be beyond most people for all but trivial examples, and will still be tedious even for those trivial cases. This could be alleviated by having a suitable data structure declared for the graph, but then the essential points of the algorithm become lost amidst the necessary manipulations of that data structure: it will still be readable but more complicated and less focussed on how the algorithm actually works, which is to say less clear.

In short I believe that "executability" is vastly overrated - the costs are high but the benefits are negligible in comparison. Leland McInnes 22:31, 14 February 2006 (UTC)

I also believe executability isn't really all that necessary for the code we deal with here, particularly when we can supply supplementary real-code implementations for that kind of experimentation, but there is a certain faction that is unwilling to adopt standard notation for any sort of code that isn't executable. In addition to allowing the user to specify the semantics of English steps, they could be specified by "hidden code" that actually implements the prose description, as in:

 '''function''' insert(''array'' a, ''int'' length, value) {
     ''set i to index of last element''
<!-- i &larr; length - 1
-->
     '''while''' i &ge; 0 '''and''' a[i] > value
         a[i + 1] &larr; a[i]
         i &larr; i − 1
     a[i + 1] &larr; value
 }

renders as:

function insert(array a, int length, value) {
    set i to index of last element
    while i ≥ 0 and a[i] > value
        a[i + 1] ← a[i]
        i ← i − 1
    a[i + 1] ← value
}

I'm not certain if this generalizes, though. If nothing else I do rather like the idea of a translator for beautification and normalization of pseudocode. Deco 22:54, 14 February 2006 (UTC)

I also rather like a code beautifier - it seems like a sensible thing to have around. As to "hidden code" - I'm curious as to how well that will scale, particularly for a lot of mathematical algorithms that are best presented at a very high level. Could we not standardise some method for providing executable implementations (whih can be any a wide variety of languages) on some subpage, or Wikisource, linked to from the article? That seems a moe natural way to provide executable code - Wikipedia isn't meant to be a source repository, while Wikisource is... Surely having a link to a wikisource page with working implementations in a dozen different languages is a far better way to go? Leland McInnes 23:07, 14 February 2006 (UTC)

You might want to check out what I did with quicksort and quicksort implementations. The concept there was to provide a few representative implementations in the article, and shove the rest out to another article, sharing the common ones in a template. I did this not because I believe implementations belong on Wikipedia, but because people will keep adding them if they're not there (I don't understand this compulsion of adding sample code). The odd thing is that even though there were a gazillion implementations, many of which almost exactly reflected the pseudocode, it didn't discourage overzealous anons from breaking the pseudocode from time to time. But the same problem exists with real code. I personally think there should be a separate code sample repository project under a non-copyleft license, but that's not going to happen (I've tried). And yeah, the idea of executable pseudocode seems increasingly infeasible. :-) Deco 01:02, 15 February 2006 (UTC)

In fact what you did with quicksort and quicksort implementations has been taken as the model of how to handle such a situation for use in the project style guidelines for WP:COMPSCI! I feel that it is the most natural way to deal with concrete implementations. Perhaps it should be suggested that all algorithm pages have a prominent link to an implementations page (though maybe link to Wikisource). A lot of thoe pages will, naturally, be blank, but if the link is prominent hopefully it will encourage people to put their implementations there and we can end up with a good code library on Wikisource. Leland McInnes 02:46, 15 February 2006 (UTC)

Update: Quicksort implementations has been deleted. If additional unencyclopedic implementations are added in the future, feel free to remove them — better to train the contributors about what's appropriate than to encourage them to add anything so long as it stays out of the way. —donhalcon^〒 21:12, 27 February 2006 (UTC)

[edit] LiteratePrograms

Hey all. This is really an advertisement, since this isn't a Wikimedia project, but I thought some of you might be interested in a new wiki I've started using a modified MediaWiki server. It's called LiteratePrograms, and is based on the idea of literate programming. The concept is that every article is simultaneously a document and a program; the document describes the program piece by piece, and there's a "download code" tab at the top to automatically reassemble the pieces and download it as ready-to-compile source code. I also added automatic syntax highlighting for the code segments. Here are some sample articles:

Any feedback is appreciated. Deco 08:51, 2 March 2006 (UTC)

Wow! Great idea! I really like the codeblock tags — it'd be nice if we could get something similar here on WP. I think LP would make a nice complement to WP articles on particular algorithms, and an excellent repository for implementations (and helps to resolve the licensing issues). --Allan McInnes (talk) 17:45, 2 March 2006 (UTC)

Thanks! I was a bit worried that people would see it as attempting to supplant some aspect of Wikipedia, but it really is distinct in purpose. For the most part I link to Wikipedia for descriptions of the concepts, and the articles just describe specific implementations. At the moment the codeblocks are invoking the external perl tool code2html, which probably wouldn't scale for Wikipedia, but it could easily be rewritten in PHP - it just does some simple regular expression search and replace.

As for licensing, that's a little tricky. Wikimedia wants everything to be under the GFDL, which I really don't think is an appropriate license for code (no disclaimer of warranty/liability, and big companies are very edgy about copyleft code). Using a less restrictive license permits wider use, but also prevents copying of content from Wikimedia projects without the authors' permission. I've got my wiki set up to add comment blocks with the (very short) MIT/X11 license at the top of each downloaded source file (at least the ones it knows how to write comments in). This is nice in that each file can be redistributed separately, whereas a GFDL'd or GPL'd archive would have to include a LICENSE file that has to be redistributed with each substantial part. The comment text is at MediaWiki:Copyrightcomment. Deco 19:12, 2 March 2006 (UTC)

That is a rather good idea but once again, we need to choose a programming language, universal enough, and that may be translated easily in other programming languages. The code currently used, that is Python actually, is not easy to translate. The lack of typed variable and return of functions is a real problem. Pascal is a better choice. Splang 14:13, 17 May 2006 (UTC)

Huh? The LiteratePrograms wiki currently contains code in about 30 or 40 different languages. Python is one of those languages, but the wiki also includes everything from Assembler to XSL. Pascal is also represented, although there are only a few Pascal programs at present. You are welcome to add more (so long as they are under released under the MIT/X11 license). --Allan McInnes (talk) 16:24, 17 May 2006 (UTC)

My comment is misplaced and this is confusing. I repeat it in its own section. Splang 17:27, 21 May 2006 (UTC)

[edit] Major Changes

This page has been extremely static for some time in a state that could be, at best, described as "provisional", and at worst as "dithering, non-committal, and largely meaningless". With that in mind I have decided to be bold and import the work from WikiProject Computer Science Manual of Style to try and get things moving with some assertive guidelines. I'm not suggesting this to be a necessarily finalised version - this topic should remain open for input and discussion. I do hope, however, that discussion can proceed working from this rather more complete set of guidelines. Discussion is most welcome! Leland McInnes 05:39, 20 May 2006 (UTC)

Shouldn't this page just redirect to the WPCS MoS? —Ruud 19:21, 21 May 2006 (UTC)

I viewed that as being a little too bold. WPCS MoS is still just a WikiProject style guide and contains things (like suggestions to link to Literate Programs) that probably aren't ideal for actual Wikipedia policy. Besides, I would rather spend a little more time getting the MoS in good shape and then try and get it promoted to similar footing as Wikipedia:Manual of Style (Mathematics) as a set of official Wikipedia style guidelines rather than doing things piecemeal. Leland McInnes 05:30, 22 May 2006 (UTC)

[edit] The programming language in algorithm articles

We need to choose a programming language, universal enough, and that may be translated easily in other programming languages. The code currently used (for example in the quicksort page), is Python actually, it is not easy to translate. The lack of typed variable and return of functions is a real problem. Pascal should be a better choice. An XML-like syntax (as that of Scriptol) should be perfect, for universality. Splang 14:13, 17 May 2006 (UTC)

Er... if you look closely, I think you'll see that the "code" in the quicksort is actually not Python at all (unless Python has recently started supporting operations like add pivot to pivotList). The examples are in pseudocode. If it looks like Python, that might be because Python has often been described as looking like "executable pseudocode"). I'm not sure why you say it's "hard to translate to other languages". The examples look extremely clear an easy to translate to me (note that I did not write the examples - I'm not sure who did). What exactly is the problem with them?

Pseudocode is generally a better choice for describing algorithms than any specific language. It permits the description to focus on the essentials of the algorithm, rather than the quirks of any one language. As for what the pseudocode should look like, you will probably never get precise agreement on that either (look up Derrick Coetzee's wikicode proposal to see what I mean). As it stands, the current Algorithms on Wikipedia page prescribes some basic guidelines, while leaving the specifics free enough that individual editors can do what makes them happy. If there is a specific piece of pseudocode that you feel looks "too Python-like", and would benefit from being made more "Pascal-like", then feel free to make those changes. --Allan McInnes (talk) 18:57, 21 May 2006 (UTC)

[edit] Pseudocode syntax

The content here overlaps with Wikipedia:WikiProject Computer science/Manual of style (computer science). Which should be the main source? I added the integer division operator to the table. Considering that we have floor and ceiling operators, it may be useful to distinguish. But now I don't know where I should be changing.

The algorithm keyword is a bit redundant. It has the same role as function. I think we should remove it from the Ford-Fulkerson example.

I think we should have a notation for arrays, lists and sets, together with proper operators (especially for sets; that would simplify some parsing algorithms when their correctness proof uses set theory).

And what about increment operators? For example, Counting sort currently (26 January 2007) has a C sample code. It's a very simple algorithm, but without ++ and += it can be very confusing. This

counts[i] += counts[ i - 1 ];
++counts[ A[i] - min ];

would be

counts[i] ← counts[i] + counts[ i - 1 ];
counts[ A[i] - min ] ← counts[ A[i] - min ] + 1;

Does anyone favor increment, a ← a + 1 or inc(a) over a++? What about +=? DanielKO 14:00, 27 January 2007 (UTC)

I think Wikipedia:WikiProject Computer science/Manual of style (computer science) and this page are expeted to diverse slowly. Each has their own demands that are slightly different. If you feel a change is needed in both, then make it to both. If not then make it on whichever seems appropriate.

As to the algorithm keyword - see it as being akin to the procedure keyword, and thus, in some sense, distinguishable from the function keyword. Some might argue that that's a subtle point (though in practice it really isn't), but I think the real point is that the pseudocode guidelines are just that: guidlines; the aim is to provide flexibility and let people do what they want, but provide enough general stylistic consistency that things don't get too wildly divergent.

As to notation for arrays lists and sets - I personally don't want to get too far into nailing down hard syntax. That was lies the problems of the Wikicode debate. I think most people, in general, are going to pick suitable nottion themselves. The real judging point is that whatever notation is chosen, it should be relatively clear, or makde clear, in the article in which it appears.

I'm not a fan of increment operators for pseudocode. With pseudocode the aim shoul be clarity and no requirement for knowledge of any particular language - and increment operators are a language specfic feature. I think a great many users, particularly non-programmers who simply want to understand an algorithm, will find the slightly longer version that mkes the operation explicit far more clear than increment operator notation. Leland McInnes 17:13, 27 January 2007 (UTC)

I'd add that increment operators and other side-effecting operators, while convenient, are notorious for introducing unexpected behavior into programs. Being explicit about where values are being assigned (i.e. the a ← a + 1 form) is probably a safer approach. --Allan McInnes (talk) 18:42, 27 January 2007 (UTC)

I've changed the page to recommend "a div b" over "a ÷ b", since — at least in the United States public school system — "÷" and "/" are both synonyms for "divided by". There's nothing inherently "integer" about the line-and-two-dots division symbol. On the other hand, "div" is used to mean "integer division" in several real-world programming languages.

Of course, there's still the question of whether -1 div 2 is -1 or 0, but that's a question better left unanswered; let the individual algorithm descriptions pick their own rounding modes if they find it necessary.

My two cents on increment operators: It's obvious to me that "a := a + 1" is clearer than anything you could possibly dream up as a pseudosyntax for "increment". Even in C, you should prefer "blah[baz]->foo += 1;" to "++blah[baz]->foo", for clarity, and also to prevent confusion on the reader's part about whether the prefix or postfix operator binds tighter. (Yes, C programmers know which; but do Wikipedia readers?)

The left-arrow "←" should never be used as a shorthand for assignment; it looks too much like a minus sign in Courier New, the most common font with which <code> tags will be displayed in a browser. Use the universally recognized (thanks to Wirth) assignment operator ":=" instead. --Quuxplusone 03:01, 7 March 2007 (UTC)

[edit] New formatting tag

There is now a new tag available: <source>. It can cope with many languages right now, and can be extended if necessary to cover new ones. HTH HAND —Phil | Talk 09:16, 16 May 2007 (UTC)

Very useful for Wikibooks, but I don't see any use for it on Wikipedia. (Unfortunatly some people will now start using it beacuse they can, not because it increases the quality of the articles.) —Ruud 11:14, 16 May 2007 (UTC)

It would be useful for scrapers, because it automatically tags the programming language; it could even be extended to automatically put articles in the proper subcat of Category:Articles with example code. So it would be cool if it worked. However, see Wikipedia talk:WikiProject Programming languages#New_formatting_tag for my short-list of problems with GeSHi. Hopefully they can be resolved, and <source> can replace <code> (or, better, <code> could be implemented according to improved <source> semantics, so we wouldn't have to change the wikitext of a billion articles).

The message I'd like people to take home: Don't use <source> on Wikipedia. (For experts only: Don't use <source> on Wikipedia yet.) --Quuxplusone 05:55, 18 May 2007 (UTC)

I don't see why we shouldn't continue to use it while those problems are being fixed. — Omegatron 00:27, 10 August 2007 (UTC)