User:Pfafrich/Blahtex en.wikipedia fixup
From Wikipedia, the free encyclopedia
This page and its sub pages document incompatibilities between the latex used in the English wikipedia and the latex standard (as far as such a thing exists). It is part of the meta:Blahtex project that aims to produce MathML output from wikipedia.
[edit] You can help
The specific incompatibilities and the pages where they are found can be found at
- User:Pfafrich/Blahtex % bugs - bug with % legal in texvc illegal in latex and Blahtex replace with \% (done for main name space, a few user and talk pages remaining)
- User:Pfafrich/Blahtex $ bugs - bug with $ legal in texvc illegal in latex and Blahtex replace with \$. Mainly done.
- User:Pfafrich/Blahtex \dot bugs - bug with \dot\vec, \dot\hat legal in texvc illegal in latex and Blahtex. (done for main name space, a few user and talk pages remaining)
- User:Pfafrich/Blahtex \mathbf bugs - bug with \mathbf\vec, \mathrm\hat etc. legal in texvc illegal in latex and Blahtex. done.
- User:Pfafrich/Blahtex ^\sqrt bugs - bugs with x^\sqrt, x^\acute 18 articles remaining.
- User:Pfafrich/Blahtex all commands - all used latex commands - most symbols etc (still need to find occurences in articles)
- User:Pfafrich/Blahtex \mathcal bugs - instances of lowercase symbols in the \mathcal and \mathbb fonts. (done for \mathcal, not checked for \mathbb)
There are still some other incompatibilities which I've yet to search for. See http://blahtex.org/errors-20060220.html for a complete list.
[edit] Code
This info has been extracted from the XML Database dumps, (enwiki-20060125-pages-meta-current.xml.bz2)
A simple perl script and a bit of grep and sed have been used to extract the data.
bunzip2 -c enwiki-20060125-pages-meta-current.xml.bz2 | sed 's/<math>/\n<math>\n/g' | sed 's/<\/math>/\n<\/math>\n/g' | perl math.pl > eqnsJan06.txt
This finds all the equations inside <math> tags and lists them by page.
Greping of patterns. The problematic patterns can be found using grep
- grep '[^\\]\$' eqnsJan06.txt - finds occurences of $
- grep '[^\\]\%' eqnsJan06.txt - finds occurences of %
- grep '\underline\s*\\' eqnsJan06.txt/nowiki></tt> - find occurences of \underline\mathrm etc. - clear * <tt><nowiki> - find occurences of