User:Tim Starling/Weekly reports/2008-W06

From Wikipedia, the free encyclopedia

  • Completed: Preprocessor_Native, Vary options
  • In progress: Preprocessor_Hash, DumpHTML

[edit] Parser work

This week, I started work on a native implementation of the new preprocessor -- that is, one written in C. It is called Preprocessor_Native. Such an implementation would be vastly faster than the one in PHP. I started out exploring the interface between PHP and C, something I've done before but am still very much a beginner at.

When it came time to start putting some text processing into it, I decided that the PHP version needed some more work to make this easier. So I produced a new implementation of the new preprocessor, which does not use XML, and does not depend on the DOM module. It is called Preprocessor_Hash. It has quite a bit of utility beyond its original purpose as a model for creating Preprocessor_Native. We might end up using it as the default.

[edit] DumpHTML

My static HTML dump was killed by Mark because apparently it was using the disk on storage1 too much. He's told me in no uncertain terms that I should forget about the project for two weeks, while more hardware is ordered and installed. I did a bit of investigation into its disk space usage and various disk speed issues, and then took Mark's advice and went on to other things.

[edit] Vary options

A security vulnerability report on a blog, and my recent dealings with the C programming language earlier in the week, gave me the motivation to do some work on Squid. Progress was quick, and I was able to post my new feature to the squid-dev mailing list after about a day and a half.

The vulnerability is not particularly severe, and nothing that we didn't know about already, but the problem is that the blogger in question has a vendetta against Wikipedia, and appeared to be prepared to use it to attack the site.

Fixing it means sending cookies to anonymous users, and that has traditionally meant degrading our cache hit ratio, at the cost of site performance. So the cure seemed to be worse than the disease.

My squid patch avoids the issues with sending cookies, by introducing a new header called X-Vary-Options. This header allows the application server to control which cookies affect the cache. This improves the general performance of our site, and allows us to fix the security problem as well.