User:Pfafrich/Blahtex en.wikipedia fixup

From Wikipedia, the free encyclopedia

This page and its sub pages document incompatibilities between the latex used in the English wikipedia and the latex standard (as far as such a thing exists). It is part of the meta:Blahtex project that aims to produce MathML output from wikipedia.

[edit] You can help

The specific incompatibilities and the pages where they are found can be found at

There are still some other incompatibilities which I've yet to search for. See http://blahtex.org/errors-20060220.html for a complete list.

[edit] Code

This info has been extracted from the XML Database dumps, (enwiki-20060125-pages-meta-current.xml.bz2)

A simple perl script and a bit of grep and sed have been used to extract the data.

bunzip2 -c enwiki-20060125-pages-meta-current.xml.bz2 
  | sed 's/&lt;math&gt;/\n<math>\n/g' 
  | sed 's/&lt;\/math&gt;/\n<\/math>\n/g' 
  | perl math.pl > eqnsJan06.txt

This finds all the equations inside <math> tags and lists them by page.

Greping of patterns. The problematic patterns can be found using grep

  • grep '[^\\]\$' eqnsJan06.txt - finds occurences of $
  • grep '[^\\]\%' eqnsJan06.txt - finds occurences of %
  • grep '\underline\s*\\' eqnsJan06.txt/nowiki></tt> - find occurences of \underline\mathrm etc. - clear * <tt><nowiki> - find occurences of