User:Haus/Hanzo
From Wikipedia, the free encyclopedia
Hanzo is an experimental plug-in for jEdit built to partially automate the process of converting from bad, old-style {{Infobox Ship}} templates to shiny, new {{Infobox Ship Begin}} templates. This is a task undertaken by the Ships Wikiproject and is (briefly) described at Category:Ship articles needing infobox conversion.
It's so named because The Bride was wreaking havoc with a Hattori Hanzo sword on TV while I was searching for a name for a Java class.
The program has absolutely no other use in the universe, and the only way someone else could use it would be to basically do a bit-by-bit copy of my hard drive. It depends on about a zillion other packages. That said, if you want to write a lexer for infoboxes or automate some editing processes in Beanshell, I have some notes below.
Hanzo is 100% finished with the job I created it for, having helped me convert 3282 of 3,282 infoboxes in 3 days 4 days 5 days. There are about 0 left to go. which represents about 0 hours. The remaining 50 or so infoboxes have to be cleaned up by hand, which is costing extra time. Hanzo's current status could best be described as "humming along smoothly for a few hundred edits, then bursting into flames."
Contents |
[edit] Feedback
If you're here, you probably saw an edit summary. I have a watch on the discussion page here. Feedback away.
[edit] Project history
3,282 infoboxes were converted in a period of 5 days, 8 hours and 27 minutes, from:
- 10:35, 27 March 2008 (hist) (diff) USS Patrick Henry (SSBN-599) (replaced infobox using Hanzo) (top)
to
- 19:02, 1 April 2008 (hist) (diff) HMS Dragon (D35) (Migrating infobox with Hanzo)
This represents about 25.54 conversions per calendar hour over the period of 128.45 hours.
[edit] Related infobox issues
As of 30 March, 2008, about 3,750 pages use {{Infobox Ship Begin}}, listed here Ship infoboxes requiring conversion include approximately
- 2,500 {{Infobox Ship}} remaining, listed here (of the original 3,282),
- 50 table header 01 conversions
- 1,000 table header 02 conversions
2,3612,2262,161 hand-tagged articles, including subst'ed infoboxes
I haven't formally analyzed (2) and (3), but they should be mostly amenable to automation. (4) might not be as easy, it may be something of a head-scratcher.
[edit] Technical
Hanzo's main functionality comes from a lexical analyzer written in Java with jFlex. To a large extent, the program lives inside a jEdit environment. A single-purpose program, it just barely functions. It was written in three four rather arduous days: about a day to write the lexer (twice, per Raymond's law), half a day to uninstall/reinstall/fix jEdit to work with wmjed, and a two and a half days to do stuff like:
- get communication from WP to the lexer and back
- preserve UTF-8 characters
- do automatic page loading
- do local diffs
- automate to a 1-click process
It has one goal in life: to translate Ship-specific infoboxes.
Translating these infoboxes with regular expression search-and-replace seemed nuts to me. I couldn't bring myself to hack out code to do it. On the other hand, a small lexer with dozen rules and 4 parse states seems to do it pretty nicely.
[edit] Requirements
The BeanShell scripts below need an environment something like this:
- jEdit version 4.3pre13 or later from http://www.jedit.org
- mwjed wikimedia jedit plugin
- mwjed has some requirements of its own, read the mwjed page carefully
- the JDiff plugin, available from inside the jEdit Plugin manager ( Plugins menu, Plugin manager item, Install tab)