Wikipedia:Parser bug reports

From Wikipedia, the free encyclopedia

This Wikipedia page is currently inactive and is kept primarily for historical interest. If you want to revive discussion regarding the subject, you should ask for broader input, for instance at the village pump.

This is an archive only of bug reports from Phase II of the Wikipedia software (used before June 20, 2002). Please see Wikipedia:Bug reports for instructions on adding bug reports for the current system.

=== el error, imagino de poca importancia, es que he notado una pequeña singularidad ; si usted va a una página especial tal como [ [ special:NewPages ] ] directamente, los menús del borde de la página son celestes, como deben ser, pero si usted va allí directamente,redirigiendo por ejemplo [ [ los nuevos asuntos ] ] los menús del bode siguen siendo blancos como si fuera una página normal [ [ user:Bryan Derksen|Bryan Derksen ] ]

[edit] Case-insensitive wiki's

(2002/5/30) I note that the [[ sytax for internal links requires proper capitalization. I like my capitalization to be "proper" and thus use the | syntax all the time to allow the link to work. This is all the more frustrating because many of the entries have incorrect capitalization.

This could all be fixed if the wiki's were case-insenstive in the same way the search is. Or am I missing something obvious?

MSM

I copied this to bug reports, because I'm pretty sure it's new behaviour. -- Marj Tiefert, Thursday, May 30, 2002
No, it's always worked like that so far as I know. Some of the non-English wikis have been on a really ugly system whereby Every Word In A Title Had Its First Letter Capitalized Like This Which Is Really Fricking Annoying And I Am Really Glad That We're Getting Rid Of It There Because Oh-Man-Oh-Man It's Ugly Isn't It? But we haven't had that here on the English wiki, thank goodness; it's case-sensitive except for the very first letter of a title, which is always capitalized in the title and can be either way in a link (ie, Asteroid and asteroid are the same article). Certainly it's been that way at least since mid-January when I got here. If you want your capitalization proper, please fix the articles that have inproper capitalization. Brion VIBBER

Backslashes don't display, Wednesday, May 29, 2002

See Blackboard_bold for two examples. This is a new bug, since they used to work. --Zundark, Wednesday, May 29, 2002

This is an instance of the cache bug: the displayed page was not the current page, but an old cached version. Re-saving the page fixed the problem. AxelBoldt, Wednesday, May 29, 2002
ASCII, on the other hand, is an actual instance of backslashes not displaying. I've fixed it twice now. Each time, the backslash displays immediately after saving the page, but if I then reload the page the backslash goes away again. Bryan Derksen, Wednesday, May 29, 2002
I can't find a description of the cache bug at the moment, but your (Axel's) description appears to be wrong. Re-saving the page had no effect, it only made it look as if the problem was fixed because you were no longer seeing a cached version. Reload/refresh the page and the backslashes will disappear again. See Eigenvector and Matrix for some particularly obvious examples. --Zundark, Sunday, June 9, 2002
Correct, saving a page in cache stripped backslashes. It's now fixed in CVS. AxelBoldt, Sunday, June 9, 2002

Several asterisks in a row will prevent linewraps (or increase the linewrap length considerably?) Koyaanis Qatsi, Monday, April 8, 2002

See the history of Talk:Terrists for an example. I doubt this is a common issue, though, since most of us use four dashes.  :-)

This is not a bug, so I'm going to move this to the "fixed" site. Asterisks at the start of a line are used by the wiki software to make bullet lists, which can be nested. A row of, say, 20 asterisks is asking the software to make 20 nested bullet lists, and it does so correctly. It fails to wrap lines because bullet lists are indented, and when you ask for 20 indents, that line becomes very long to accommodate them, and it takes the rest of the page with it. In short, DON'T DO THAT, because it's not ever going to change. -- Lee Daniel Crocker

Table positioned between two paragraphs displays at bottom of page
(possibly related to above bug??) Wednesday, April 10, 2002

If you look at the table in Talk:High_German, you'll see that instead of appearing between the two paragraphs of my note, it leaves a "close table" tag where the table belongs and puts the table at the end of the page. I've double-checked my table code for errors, and can't find any. I've also tried just making one big table, with the first and last paragraphs in their own table rows, but the problem persists. Is this a bug, or am I having a Stupid Attack™? pgdudda

You're missing a </center> tag; it looks okay after I added that in. But that did trigger a bug in the parser that caused it to eat the table instead of the center tag... I'll try to fix that, but in the meantime, uh, don't do that. :) Brion VIBBER, Wednesday, April 10, 2002
Oh, so I *was* having a Stupid Attack™, but at least my Stupid Attack helped uncover another bug. Thanks!  :-) pgdudda Thursday, April 11, 2002



Linking error 2/25/02

Oregon consititution had several articles with multiple spaces in them - so the link was Article II (two spaces before this) title here instead of Article II title here and the link resolves to different locations. Rob Salzman

Hmm, I think this is semi-fixed. Anyone still seeing these kinds of errors? Brion VIBBER, Friday, April 19, 2002
STATUS: UNKNOWN

[edit] Parser

Last line link in list

(2002/1/29) If the last lines of an article looks like this:

* [http://www.yahoo.com/
Yahoo]

then the bottom part of the page ("Main Page | Recent Changes...") will be indented to the left and screwed up. See SandBox for an example. This only happens if all of the following are true:

  1. we are in a list
  2. we have an URL link
  3. The last letter of the URL is /
  4. The name of the link occurs on the next line
  5. You are using IE 5.5 on Windows. Netscape 4.76 on Linux does not show the effect.

AxelBoldt

(2002/3/2) Right now, I see the bug also in Netscape. An example is at the bottom of Duverger's Law. AxelBoldt

That page renders correctly for me on Mozilla 0.9.8 & Netscape 4.78 (Linux). The example in wikipedia:Sandbox still leaves an indent on the following page contents (which is due to a bug in the wiki-to-html rendering code), but not in the link bar at the bottom (which is now separated by a div tag, so there shouldn't be any interference). Brion VIBBER 2002/03/02

Another linking error - My user page had some external links that were just the usual raw html://yaddayadda.com/etcetera and they used to work, but today they didn't. I wasn't because there was an asterisk or a parenthesis immediately before or after the URL. It doesn't seem to be because the URL ends in a / - I looked at other user pages to compare, and I cannot figure out why it wasn't working - gremlins? (Look back one or two levels in the history of my user page to see the formats - I've since forced it to work, by hiding the URL from the displayed page.) -- Marj Tiefert, Wednesday, May 15, 2002

I believe that the current software is actively unlinking URLs like that. I edited a page that had an external link in it of the form you describe above, just a bare URL, and even though I didn't touch the external URL in question it came up unlinked after the edit. Try it out; find a page with an existing external link, edit it, and the link will be gone in the new version. This doesn't apply to links using the syntax [http://yaddayadda.com/etcetera name of link here], however. Those remain intact through an edit. Bryan Derksen, Monday, May 20, 2002
This only seems to happen under some circumstances; I've seen it in some articles, but I can't for the life of me reproduce it in Wikipedia:Sandbox. I think it may be already fixed under the current development version (where I can't reproduce it even by exactly copying Marj's user page). Brion VIBBER, Monday, May 20, 2002
If you only have a naked URL on a page, it won't turn into a link. If there are other URL's on the page, it sometimes turns into a link. This is fixed in CVS. AxelBoldt
2002-6-10 Seems to still be a problem; see the links at the end of daylight saving time for an example. Rootbeer
2002-6-16 I fixed daylight saving time by opening it for editing and saving it again with no changes. I don't know enough about the internals (caching?) to say why it worked. But if this problem has been fixed in the software now, I suppose that any other pages with non-working links will eventually be edited, and thereby start working again. Rootbeer
STATUS: FIXED IN CVS?

Parser generates extra whitespace

The Bipolar disorder page is full of extra whitespace - looking at the article reveals lots of <p> </p;gt; and <pre> </pre> spans generated.

Similarly, if an otherwise emply line contains some white space, the previous parser took that as a paragraph break, while the new parser treats it as a block of indented nothing, resulting in too much space between the paragraphs.

If whitespace precedes a #, then it is taken to be a numbered list, while before it was taken as a literal # (which is the correct behavior, especially useful for programs). AxelBoldt

STATUS : Solved in CVS

Bad table code can screw up layout

(2002/1/28) In the Quaternions article, the first part of the article appears at the bottom of the page, as do all the QuickBar links. --Zundark

This was caused by Bad Table Code in the article. There was no closing TR tag for the last row in the table, and an extra open TR tag after the end of the table. I've fixed the article... The parser could probably be made to be able to normalize these things, though (ie, remove table-ish tags not inside &amp;amp;lt;TABLE&amp;amp;gt;...&amp;amp;lt;/TABLE&amp;amp;gt;) --Brion Vibber

Parser issues with header lines

The display of Eight queens puzzle is... less than optimal. The problem is that the leading space on a line used to disable the processing of '#': now the Python program example is damaged.


Definition lists produce invalid HTML, could use some improvement as well

(2002/4/16) Lee Daniel Crocker The line

; term : definition

is rendered as

<DL><dt> term </DL><DL><dt><dd> definition</DL>

Note that neither the "dt" nor "dd" elements are properly closed. Further,

(2002/1/25) Definition lists like:

Term 1
Definition 1.
Term 2
Definition 2.

each get put in separate &amp;amp;lt;dl&amp;amp;gt; tags, resulting in too much spacing between them. Carey Evans

While we're at it, it would be nice if the DD/DL elements were only closed off on a full blank line (or end of article), and not just a single newline. That would make them more consistent with regular paragraph text, and make articles with long definitions easier to write and edit.

Specifically,

; term
  : long definition blah blah blah blah blah blah blah blah blah
   blah blah blah blah blah blah blah blah blah blah blah blah blah
   blah blah blah blah blah blah
  

should be rendered identically to

; term
  : long definition blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah
  

This should also be the case for the ULs and OLs created by * and #. Of course, if the first character of a new line within a DD is ";", then close the DD and open a DT; if it is ":", insert an empty DT and open a new DD. When a full blank line is encountered, close both

the open DD and the DL. I'll take a look at the parser code to see if that's possible.

STATUS : Solved in CVS

Character entities in links

Sat Feb 2 00:23:40 UTC 2002: On list of food additives, I have additives like &amp;amp;beta;-cyclodextrine. When I click on the question mark to create an article about it, I get the Main Page displayed for edit instead. Note that since &amp;amp; is a safe character in URI path segments, escaping it as %26 has no effect.

This is due to a bug in the code putting too many HTML escapes into the title; if it were working correctly, the %26 escape would indeed have an effect. My recommendation until this is resolved: use β-cyclodextrine ([[beta-cyclodextrine|&beta;-cyclodextrine]]). --Brion Vibber
There's probably good arguments for actually writing "beta-cyclodextrine" in the article. However, my point about the % escape is that according to RFC 2396, there is no difference between %26 and just & in the path of the URL. --Carey
Well, there's the RFC and then there's the actual behavior of the software... PHP does not seem to consider %26 to be an ampersand for the purposes of extracting variables from the URL's *query* bit. At least my reading of the RFC agrees with it: ?3.4 Query Component ... Within a query component, the characters ..."@", "&", "="... are reserved.? It's not a problem in the path, only in the query when you're e.g. editing the page. --BV
The URL rewriting to give nice URLs like http://www.wikipedia.com/wiki/MainPage rather than .../wiki.phtml?MainPage makes this a bit more complicated. There's no question mark in the URL for this edit page, so Apache is probably justified in converting %26 to & internally, before processing the Alias or RewriteRule directive, or http://www.wikipedia.com/%77%69%6B%69/ wouldn't work. --Carey

(Ideally the URL would be encoded as %CE%B2-cyclodextrine, the UTF-8 encoding of GREEK SMALL LETTER BETA.)

Impossible until the database is converted from ISO-8859-1 to UTF-8. --BV
I would just write &lt;? echo urlencode(recode("h..utf8", $title)) ?&gt;. --Carey
Yeah, that could probably work as long as titles are normalized internally. I'll try banging the code into place... --BV

References: RFC 2396, W3C on i18n of URIs


--Carey Evans


Problem in "printable version" page?

Please go to Category theory and try the "printable version" link; you will probably see that the word functors remains as a blue link instead of becoming simple italics text. I was unable to spot any sort of difference from other links that would cause this strange behaviour, and I suppose it can be considered a bug, since the printable version should not contain any link in the text part. Daniel M

Yup, it's a bug, caused by the fact that the link looks like [[functor|<b>functors</b>]]. It's fixed in the development version of the code. AxelBoldt
STATUS : Solved in CVS



It's always a positive sign that somebody is working hard to improve a system when brand goofy new bugs start to appear. I just put together a new preliminary subject outline for philately with a certain amount of bulleted text. Now, when I enter a new first level bullet after a line with a second level bullet, that first level bullet doesn't appear in the article. Eclecticology, Thursday, May 9, 2002

I've noted this problem in numbered lists too, or something like it. Also, it used to be possible to create a mixed list with a structure something like:
  • One header
    1. A sub-item
    2. Another sub-item
    3. Yet another sub-item
  • A new header
    1. A sub-item for the second header
    2. And so forth

(see Montgomery County, Maryland for an example!) All of a sudden, this has stopped giving reasonable results. -- BRG (Friday, May 10, 2002, but first noticed earlier)


Again with philately: When I linked to this from the "recent changes" page, I was aghast that the work that I had done was gone without explanation. I checked the history and the (diff)'s, but that showed that my input was all still there, and that apparently no human had touched it. From this I made a couple of minor edits (just to have something different) and saved it again. That was last night. Now again if I try to link from "recent changes" I still get an older version of the page. At this point I am beginning to wonder to what extent a link reliably gives the most recent version of a page. Eclecticology, Friday, May 10, 2002

This morning I'd noticed a similar problem with the article on Maryland, which I've worked on a lot; only it had apparently gone back many versions. While trying to restore my edits, however, I found the correct version suddenly appeared! -- BRG

#REDIRECT appears to be broken; 

When I try to access a redirected article, it appears with the "Redirect from foo_bar" header, but the article itself appears as:

  1. REDIRECT foo_bar

Any idea what's happening? Refreshing the page has no effect. I'm using IE6.0 on WinXP. (2002.05.15 pgdudda)

Got an article you can give as an example? This is expected behavior in the case of redirect chains: if article A redirects to B, and B redirects to C, going to A gets you to B but no further (thus preventing endless loops of redirects where A redirects to B to C to A to B to C to A to B to C to...) I've you've got one of these, please edit the articles in question and fix it so that all the redirect pages redirect to the actual page, rather than to other redirects. Brion VIBBER, Wednesday, May 15, 2002
I was running into this problem with the article "prefix__morpheme", attempting to redirect it to the article prefix by typing in "#REDIRECT prefix". That generated an error, but changing it so that there were double square brackets around the word "prefix" makes it work fine. I had thought the square brackets were unnecessary, but apparently not. (FYI, Prefix morpheme now points to Prefix morpheme, though I may consolidate that with the Prefix article, since the former is a one-line stub. Thanks for helping me get that sorted out! (2002.05.15 pgdudda)
Ahh, I see what you mean. Yes, the brackets are required, at least at present; if they're not supposed to be required, somebody better let me know so I can fix it. (Though that would break some other things, pending some discussed changes to the database structure.) Brion VIBBER, Wednesday, May 15, 2002

Another redirect oddity

I came across a page (Giant Impact theory) which had a messed-up redirect. The redirect was coded thusly:

#redirect[[giant impact theory]]

And redirected to an empty page whose name was, character-for-character, #redirect[[giant impact theory]]. I put in a space and upper-cased the word redirect, and that fixed it. Bryan Derksen, Thursday, May 16, 2002

The missing space was triggering the problem; I've fixed it to handle that case a bit more gracefully... Brion VIBBER, Thursday, May 16, 2002
STATUS: FIXED IN CVS.


[edit] = Foreign ISBN

The ISBN link on Tintin is bad, but the ISBN 2203017104 is correct, tested on www.fnac.com. Could this be because the book isn't in English? -- Tarquin

That book doesn't appear to be in Pricescan's database (yeah, probably because it's not in English and thus not in the mainly English-language online shops they search). Try linking to a specific site, such as ISBN no. 2203017104 (amazon.fr) or ISBN no. 220301704 (fnac.com). Or, if there's a generic price-comparison site en français that can search by ISBN, use that. (If you find one, let me know about it!)
By the way; there's a bug in the parser that will mess with links that are given the name of ISBN X, that's why I added in the "no."s. --Brion VIBBER

I don't know if this is a bug, but this is annoying. When we have a external link to something, anything placed right next to it will have a space in between. It seems to only affect Wikibooks.