User talk:ST47/perlwikipedia
From Wikipedia, the free encyclopedia
Contents |
[edit] Unix
I'm running ActivePerl on a Windows box, and perlwikipedia doesn't seem to work. Am I correct that pw assumes an Unix environment? For instance, perlwikipedia.pm says:
system("test -s \".perlwikipedia-$editor-cookies\"");
Is "test" a Unix command? – Quadell (talk) (random) 17:44, 24 May 2007 (UTC)
- I have a similar issue, and others have reported it as well. Shadow doesn't get too happy when we talk about windows around him, though, so I suppose it won't be fixed. I think it's an encoding error, and I've said several times that I was going to look at it, but I never got around to it. --ST47Talk 18:04, 25 May 2007 (UTC)
- Oops! Yeah, "test" is a Unix command to check if a file exists, I'll write a better handler for this code ASAP and commit it to SVN. Shadow1 (talk) 18:58, 25 May 2007 (UTC)
- Ok, the code should now work on Windows/ActiveState if you update your working copy. Thanks for reminding me that not everyone uses Linux! Shadow1 (talk) 19:08, 25 May 2007 (UTC)
Thanks for the quick turnaround! Now it still errors out, but at a different location. When I try to log in, I get:
Error requesting Special%3AUserlogin: 403 Forbidden
When I turn on debug, just before it dies it tells me
Retrieving http://en.wikipedia.org/w/index.php?title=Special%3AUserlogin&action=edit
Of course I can enter this URL in my browser and not get a 403 error. Is this an incompatibility with Windows, or something else? Any ideas? – Quadell (talk) (random) 19:55, 25 May 2007 (UTC)
- That's because the user agent is blocked, you need to change it to something specific to your bot if you want to do anythyng. --ST47Talk 23:57, 25 May 2007 (UTC)
[edit] List?
Another question: Is there a list (or category) of bots using perlwikipedia? – Quadell (talk) (random) 17:54, 24 May 2007 (UTC)
- I just created Category:Perlwikipedia bots. Shadow1 (talk) 18:58, 25 May 2007 (UTC)
[edit] More tech support
Hi. Perlwikipedia looks like a great tool, and I'd love to use it, but I can't get it to work. The supposed test script, login.pl, does not seem to work as-is. (I get "Error requesting Special%3AUserlogin: 403 Forbidden
".) ST47, above, suggested I add the line "$editor->{mech}->agent('w/e');
" to specify the user agent. When I do that, I get this error: "There is no form named "userlogin" at C:/Perl/lib/Perlwikipedia.pm line 102. Died at C:/Perl/lib/WWW/Mechanize.pm line 1684."
If I can't get this to work, I'll have to find some other way to interface with Wikipedia. Any help anyone could provide would be greatly appreciated. (I'm using ActivePerl on a Windows box, by the way.) Thanks, – Quadell (talk) (random) 14:21, 28 May 2007 (UTC)
- First, 'w/e' means 'whatever', so replace that with something descriptive. I usually use Bot/WP/EN/ST47/BotName. I don't know what that error means, but make sure you have the latest version and such. --ST47Talk 14:31, 28 May 2007 (UTC)
- Unless you are using the passwordless login method that I described on the Google Code wiki, there is no reason you should need to use Login.pl. It's a script that is designed to fetch the login data for your bot's account and place it into a file so that your bot can log into Wikipedia without using a password in cleartext. From what I've seen, the source code you're using should work perfectly fine if you insert the bot's password into the right place in the login() call. Shadow1 19:06, 30 May 2007 (UTC)
New problem. It logs in fine, but when attempting to get_text, on a Windows system, it puts itself in an endless loop. (It works fine on a *nix system.) My code looks like this:
use Perlwikipedia; use strict; my $pw=Perlwikipedia->new(); $pw->{debug} = 1; $pw->{mech}->agent('Bot/WP/EN/Quadell/polbot'); my $login_status=$pw->login('Polbot','(my password)'); die "I can't log in." unless ($login_status eq 'Success'); my $html = $pw->get_text('User:Polbot');
The output on a Windows system (with debug on) is as follows:
Retrieving http://en.wikipedia.org/w/index.php?title=Special%3AUserlogin&action=edit Login as "Polbot" succeeded. Retrieving http://en.wikipedia.org/w/index.php?title=User%3APolbot&action=edit&oldid=§ion= Retrieving http://en.wikipedia.org/w/index.php?title=&action=edit Retrieving http://en.wikipedia.org/w/index.php?title=&action=edit Retrieving http://en.wikipedia.org/w/index.php?title=&action=edit Retrieving http://en.wikipedia.org/w/index.php?title=&action=edit . . .
It continues trying to load a page with no title specified until I cancel the program. This seems to be because m/var wgAction = "edit"/ doesn't match, so the until condition is never met. Debugging, I tried to print $res->content from within the get_text definition, and it seems to be complete gobledegook. Is there an encoding problem, maybe? – Quadell (talk) (random) 15:58, 31 May 2007 (UTC)
- Install the module Compress::Zlib. For some reason, the servers like to return gzip-compressed content, so installing this module should fix the last of your problems. Shadow1 (talk) 16:22, 31 May 2007 (UTC)
- I installed Compress::Zlib, but it does the same thing. – Quadell (talk) (random) 17:35, 31 May 2007 (UTC)
- The only other problem I can think of is that there's something wrong with your installation of ActiveState/WWW::Mechanize that's causing it to not properly decode the content. In the actual Perlwikipedia.pm file, change
- I installed Compress::Zlib, but it does the same thing. – Quadell (talk) (random) 17:35, 31 May 2007 (UTC)
use WWW::Mechanize;
to
use WWW::Mechanize::Gzip;
and
WWW::Mechanize->new( cookie_jar => {}, onerror => \&Carp::carp );
to
WWW::Mechanize::Gzip->new( cookie_jar => {}, onerror => \&Carp::carp );
.
Other than that, I really can't help you much more. Shadow1 (talk) 19:23, 1 June 2007 (UTC)
- Actually, no, never mind that. The author of WWW::Mechanize recently removed support for decoding Gzipped content via content(), so make sure you're using the latest version of the module. It should be version 1.30. Update the module and you should be fine. Shadow1 (talk) 13:14, 2 June 2007 (UTC)
-
- I have the latest WWW::Mechanize, v1.30. It's not a problem with Mechanize. The following code works as expected:
my $agent = WWW::Mechanize->new('polbot'); $agent->get("http://en.wikipedia.org/w/index.php?title=Main_page&action=view"); print ($agent->{content});
-
- But this code hangs forever:
my $pw = Perlwikipedia->new(); $pw->{mech}->agent('Bot/WP/EN/Quadell/polbot'); print ($pw->get_text('Main page'));
[edit] New sub I created
Hey. I created a new sub that I use in my Perlwikipedia.pm. You might want to consider adding it to the official release. You pass in an image name, it returns an array of all articles that include the image (from the "File links" list).
=item get_file_links($pagename) Returns array containing the pages that link to an image or other media. =cut sub get_file_links { my $self = shift; my $pagename = shift; my $res = $self->_get( $pagename, 'view'); unless ($res) { return; } unless ($res->decoded_content =~ m/\(pages on other projects are not listed\):<\/div><\/p>\n<ul>(.*?)\n<\/ul>/s) {return;} my $linklist = $1; my @articles = split(/\n/, $linklist); my @return; foreach my $article (@articles) { if ($article =~ m/<li><a href=\"[^"]*\" title=\"([^"]*)\">/) { push(@return, $1); } } return @return; }
[edit] what_links_here
The behavior of what_links_here seems problematic to me. It is currently returning not only pages that link to the specified page, but also pages that link to redirects to the specified page. However, it doesn't return the first page that links to a redirect.
For example, look at Jill Gascoine & Jill Gascoigne. Jill Gascoigne is a redirect to Jill Gascoine. If I compare a what_links_here here on both pages, the results of Jill Gascoine include all of those of Jill Gascoigne except for Morecambe and Wise which is missing.
It seems to me that what_links_here should only return pages that actually link to the requested page. Returning links to redirects doesn't seem that useful as I would rather specifically request what_links_here on the redirect if that's what I want, but perhaps I'm overlooking something.
So, I recommend either that:
- what_links_here be fixed to return the first page linking to a redirect; or
- what_links_here's Special::Whatlinkshere screen-scrap be replaced with a call to api.php which only returns direct links.
The benefit of the second is that api.php also supports filtering by namespace which would be convenient in some applications.
If there is interest in the api.php approach, I am willing to write the patch for that. -- JLaTondre 19:41, 4 August 2007 (UTC)
- It should work now. Shadow1 (talk) 13:44, 25 August 2007 (UTC)
- Thanks. -- JLaTondre 23:56, 26 August 2007 (UTC)
[edit] CPAN
I started using this module and it looks fine. Of all the Perl bot frameworks i tried this is the first that i could install and made it do the right thing pretty quickly. Thanks for your work!
A question: Is there a reason you don't host this module on CPAN? CPAN is the natural place to look for Perl code, but anyone who searches CPAN for "MediaWiki" today finds the module of that name, which has impressive documentation, but appears to be unmaintained. Finding your framework on Wikipedia wasn't so trivial. --Amir E. Aharoni (talk) 15:47, 2 June 2008 (UTC)