User:Haus/Hanzo

From Wikipedia, the free encyclopedia

0 remain
3282 completed
The program's namesake, Hattori Hanzō.
The program's namesake, Hattori Hanzō.

Hanzo is an experimental plug-in for jEdit built to partially automate the process of converting from bad, old-style {{Infobox Ship}} templates to shiny, new {{Infobox Ship Begin}} templates. This is a task undertaken by the Ships Wikiproject and is (briefly) described at Category:Ship articles needing infobox conversion.

It's so named because The Bride was wreaking havoc with a Hattori Hanzo sword on TV while I was searching for a name for a Java class.

The program has absolutely no other use in the universe, and the only way someone else could use it would be to basically do a bit-by-bit copy of my hard drive. It depends on about a zillion other packages. That said, if you want to write a lexer for infoboxes or automate some editing processes in Beanshell, I have some notes below.

Hanzo is 100% finished with the job I created it for, having helped me convert 3282 of 3,282 infoboxes in 3 days 4 days 5 days. There are about 0 left to go. which represents about 0 hours. The remaining 50 or so infoboxes have to be cleaned up by hand, which is costing extra time. Hanzo's current status could best be described as "humming along smoothly for a few hundred edits, then bursting into flames."

Contents

[edit] Feedback

If you're here, you probably saw an edit summary. I have a watch on the discussion page here. Feedback away.

[edit] Project history

3,282 infoboxes were converted in a period of 5 days, 8 hours and 27 minutes, from:

  • 10:35, 27 March 2008 (hist) (diff) USS Patrick Henry (SSBN-599)‎ (replaced infobox using Hanzo) (top)

to

  • 19:02, 1 April 2008 (hist) (diff) HMS Dragon (D35)‎ (Migrating infobox with Hanzo)

This represents about 25.54 conversions per calendar hour over the period of 128.45 hours.

[edit] Related infobox issues

As of 30 March, 2008, about 3,750 pages use {{Infobox Ship Begin}}, listed here Ship infoboxes requiring conversion include approximately

  1. 2,500 {{Infobox Ship}} remaining, listed here (of the original 3,282),
  2. 50 table header 01 conversions
  3. 1,000 table header 02 conversions
  4. 2,361 2,226 2,161 hand-tagged articles, including subst'ed infoboxes

I haven't formally analyzed (2) and (3), but they should be mostly amenable to automation. (4) might not be as easy, it may be something of a head-scratcher.

[edit] Technical

Hanzo's main functionality comes from a lexical analyzer written in Java with jFlex. To a large extent, the program lives inside a jEdit environment. A single-purpose program, it just barely functions. It was written in three four rather arduous days: about a day to write the lexer (twice, per Raymond's law), half a day to uninstall/reinstall/fix jEdit to work with wmjed, and a two and a half days to do stuff like:

  • get communication from WP to the lexer and back
  • preserve UTF-8 characters
  • do automatic page loading
  • do local diffs
  • automate to a 1-click process

It has one goal in life: to translate Ship-specific infoboxes.

Translating these infoboxes with regular expression search-and-replace seemed nuts to me. I couldn't bring myself to hack out code to do it. On the other hand, a small lexer with dozen rules and 4 parse states seems to do it pretty nicely.

[edit] Requirements

The BeanShell scripts below need an environment something like this:

  • jEdit version 4.3pre13 or later from http://www.jedit.org
  • mwjed wikimedia jedit plugin
    • mwjed has some requirements of its own, read the mwjed page carefully
  • the JDiff plugin, available from inside the jEdit Plugin manager ( Plugins menu, Plugin manager item, Install tab)

[edit] Possibly reusable bits