User:ST47/perlwikipedia/Bugs

From Wikipedia, the free encyclopedia

This page is a makeshift list of bugs in perlwikipedia for those without Google Code accounts or that don't want to create them.

[edit] New

  1. Description: get_text does not work when on a non-English wiki (example bug)
    • Summary: When using get_text on a non-English wiki, the function will error out with a 404 (I hope to God this isn't true).
    • List any relevant steps to reproduce the bug to help the developers, or a nice *nix-style patch if you've got one. Shadow1 (talk) 19:14, 30 May 2007 (UTC)
  2. Description: linksearch fails when more than 500 results found
    • Summary: When searching on a link that contains more than 500 results, the linksearch methods fails with the following error:
      Can't call method "decoded_content" without a package or object reference at /usr/lib/perl5/site_perl/5.8/Perlwikipedia.pm line 531.
    • Example code: my @results = $bot->linksearch("en.wikipedia.org/wiki/H");
    • Thanks. -- JLaTondre 18:20, 16 January 2008 (UTC)

[edit] Open

[edit] Closed

  1. Description: When running on ActivePerl on a Windows machine, the get_text method hangs in an infinite loop.
    • Summary: This loop seems to occur because the condition on line 295 is never met, because $res->content contains garbled text. (Looks like an encoding problem.)
    • This occurs on my computer running ActivePerl on Windows, with the latest versions of all modules. – Quadell (talk) (random) 16:36, 6 June 2007 (UTC)
    • A work-around has been found! Shadow1 suggested I go through Perlwikipedia.pm and change all instances of ->content to ->decoded_content. This fixes it. I'm not sure if a more seamless solution should be developed before closing this bug though. . . – Quadell (talk) (random) 19:55, 6 June 2007 (UTC)
    Fixed in SVN. Shadow1 (talk) 15:56, 7 June 2007 (UTC)
  2. The code at http://perlwikipedia.googlecode.com/svn/trunk/Perlwikipedia.pm has a bug, in the _put subroutine one declares the variable $res twice. That is easily fixed, and I can do it since I have access to the repository, but I am not sure if the googlecode version of the code is the most recent one. Oleg Alexandrov (talk) 15:53, 20 June 2007 (UTC)
    I fixed it myself. Oleg Alexandrov (talk) 02:07, 23 June 2007 (UTC)

3. Description: get_text fails on certain UTF-8 characters

  • Summary: If you attempt to retrieve the text of a page such as Š, the following error is produced:
 Can't escape \x{0160}, try uri_escape_utf8() instead at {path}/perlwikipedia/Perlwikipedia.pm line 64
  • Test Case: The following code segment demonstrates the problem.
 my @results = $bot->what_links_here("Caron");
 for my $result (@results) {
   my $page = $result->{title};
   print "Getting $page\n";
   my $text = $bot->get_text($page);
 }
  • Resolution: I patched my copy of Perlwikipedia.pm by doing exactly what the error message states. I don't know if this is the best approach, but it works.
 $ svn diff
 Index: Perlwikipedia.pm
 ===================================================================
 --- Perlwikipedia.pm    (revision 88)
 +++ Perlwikipedia.pm    (working copy)
 @@ -7,6 +7,7 @@
  use XML::Simple;
  use Carp;
  use Encode;
 +use URI::Escape qw(uri_escape_utf8);
 
  our $VERSION = '0.90';
 
 @@ -61,7 +62,7 @@
      my $extra     = shift;
      my $no_escape = shift || 0;
 
 -    $page = uri_escape($page) unless $no_escape;
 +    $page = uri_escape_utf8($page) unless $no_escape;
      $page =~ s/\&/%26/g; # escape the ampersand
 
      my $url =
  • Thanks. -- JLaTondre 12:00, 18 July 2007 (UTC)
I applied the patch. I tested it too. Thanks! The new revision is available at the Google code repository for Perlwikipedia. Oleg Alexandrov (talk) 03:28, 19 July 2007 (UTC)

4. Description: get_pages_in_category() does not return images in the category

  • Summary: Can this be changed to include images as well? – Quadell (talk) (random) 13:38, 7 June 2007 (UTC)
Patch written, tested, and committed. Shadow1 (talk) 13:10, 25 August 2007 (UTC)

5. Description: get_history failing on articles with UTF-8 characters in the name

  • Summary: For articles with UTF-8 characters in the name, such as Kashō, get_history fails. The query does not retrieve the results as the UTF-8 characters need to be escaped. I added $pagename = uri_escape_utf8($pagename); to the start of get_history and it fixed the problem. This same problem will occur with any other function that uses _get_api. It cannot be fixed by simply escaping $query within _get_api as that will also escape characters that shouldn't be (ex. the & in &action).
  • The following is the diff for the change I made. Thanks. -- JLaTondre 00:14, 27 August 2007 (UTC)
236a237,238
>     $pagename = uri_escape_utf8($pagename);
>
Committed to SVN, along with some other functions with the same bug. Should be rolled out in version 1.01 soon. Shadow1 (talk) 01:05, 27 August 2007 (UTC)