User talk:Beland/Notable

From Wikipedia, the free encyclopedia

Wikipedia:WikiProject_Stub_sorting/Criteria/Current

1 New categories list
2 Dahl Book
3 Image:American Civil War Battles by Theater, Year.png
4 Trinidad towns
5 Popups tool
6 Pearle update note
7 Bots
8 User:Mr_beland
9 Parse::MediaWikiDump
- 9.1 XML parsing
10 Empty image description pages versus Commons images
11 Dump-based conversion
12 Transwiki backlog
13 Proposal to help Perle with WP:CU
- 13.1 I started to reasearch categories
- 13.2 Re: Proposal to help Perle with WP:CU
14 Re: Ok categorization will be a low priority for me than.

[edit] New categories list

Hi there! You said on WP:VP that you could create a list of new cats from the monthly database dump. If it's not too much trouble I'd really appreciate that. Yours, R adiant _>|< 11:56, Jun 17, 2005 (UTC)

I'd like to see that one for reference, but I doubt I'd do anything with it until the next version. The idea is to set up a 'new category patrol', basically checking if newly created categories fit in the naming scheme. Ultimately I'd like to set up a WikiProject on Categories, that should examine the entire present list of categories. That sounds daunting, but many of them can instantly be approved, e.g. <nation> <profession> combos. Reason being that categorization is a mess. I'd also like to get in touch with the devs and see what upgrades might be done in the future, but they are somewhat less than reachable. Yours, R adiant _>|< 10:28, Jun 23, 2005 (UTC)

Posted at User:Radiant!/new-categories-2005-06-23 (303k). Sorry it took so long. Good luck getting the category system into shape! -- Beland 7 July 2005 01:31 (UTC)

[edit] Dahl Book

So how did you like How Democratic is the American Constitution? I agree with Dahl that the Senate is egregiously undemocratic. I also think it's a joke that small states might need a senate so that the big states don't gang up on them.

A great (and the only, AFAIK) survey of how small states abuse their disproportionate power in the Senate is "Sizing Up the Senate: The Unequal Consequences of Equal Representation." It's thanks to the Senate that we have Wyoming getting $37 per person for Homeland Security and California getting ~ $5. Dinopup 12:18, 19 May 2005 (UTC)

It's funny that you mentioned that Congressmen are more into pork barrel work than Senators. One of the most interesting findings of Sizing Up the Senate is that small state senators tend to operate in the Senate exactly the way Congressmen operate in the House. Small state Senators tend to sit on Committees that do constituent service (like Appropriations) whereas large state Senators tend to sit on committees that conduct affairs of national importance. Since 1947, the Senators on Appropriations have come from states with an average of 5.29 Congressmen, since 1947, the Senators on Energy and Public Works have come from states with 3.29 Congressmen, the Senators on Veterans Affairs come from states with an average of 4.61 Congressmen, and the Senators on Commerce have come from states with an average of 6.18 Congressmen. By contrast, the Senators on Foreign Relations, Small Business, Labor, and Banking, come from states with an average of 7.63 to 8.89 Congressmen.

If a vote in the Senate is going to be close, small state Senators are much more likely than big state Senators to hold out in hope of getting something for their state. Lee and Oppenheimer analyze over thirty votes that were delayed because of hold outs and find that small state Senators were the ones holding out over half the time. If a big state and small state Senator are both holding out, the small state Senator is more likely to be the one rewarded, since a reward to his state is less expensive than a reward to a large state Senator. One egregious example fo this happening recently was James Jeffords holding out on the prescription drug bill. He voted for it after HHS agreed to subsidize Burlington, VT hospitals the same amount they paid Boston, MA hospitals for labor costs.

Many people say that the Senate products rural interests, but that is false, it merely products people who live in small states. Essex County, New York is as rural as Essex County, Vermont, yet it is only Essex County, Vermont that is privileged. Also, what is rational about giving half the West Coast representation equal to the metropolitan area of Providence?

I agree with you that the Senate apportionment scheme is unchangeable, but I hope we can change the terms of debate. Perhaps the Senate could be weakened? Perhaps the power to confirm judges could be given to the House? or perhaps we could restrict the Senate's power over appropriations?

It is possible that this issue will rise in prominence in the next few decades. Every census since 1790 has revealed that more and more of our population is concentrated in a few states. In 1790, half of the Senate was elected by ~33% of the population. In 2000 half of the Senate was elected by 17% of the population.

It won't be long before we can speak of Wyoming, Vermont, and North Dakota as rotten boroughs. Dinopup 14:48, 20 May 2005 (UTC)

[edit] Image:American Civil War Battles by Theater, Year.png

Thanks for the comments. The next thing I would like to do with the map is turn it into an imagemap, so that when you click on a county or state, it lists all the battles in that county or state, but that'll have to wait till Wikipedia supports such features. Another future possibility would be to use a map that accurately depicts counties, states, and territories back then, although those boundaries changed all the time. --brian0918 ™ 03:06, 28 Jan 2005 (UTC)

[edit] Trinidad towns

I suppose my question wasn't quite clear - I was just wondering if I re-name the category 'Towns of (in) Trinidad and Tobago' to 'Cities and towns' - as it stands there are only two "cities" in Trinidad, and I included them in the "towns" list (the difference in that case is somewhat trivial, because the "cities" are the second and third largest 'towns', while the largest 'town' wasn't even a municipality until 1990). The thought was really - if I choose to rename the category, should I wait until after the change has been run (and then empty and CFD the old category) - I assume that would be less confusing than to empty and CFD the category so that it appears in two places? Of course, no one has complained about the naming as it stands - maybe "towns" is fine for something as small as Trinidad. Sorry about the rambling... Guettarda 04:49, 17 Dec 2004 (UTC)!

Thanks very much. Legally municipalities covers two entities in Trinidad - "cities" (the City of Port of Spain and the City of San Fernando) and "boroughs" (the Borough of Chaguanas, the Borough of Point Fortin and the Royal Borough of Arima). The term "town" is generally used for these and any other settlement of some size and history, while "village" might be used for smaller entities. The usage of "town" for things like Siparia, Couva, St. Joseph and Tunapuna (to pick a few examples) is historical, covers a valid entity and (in several cases) an administrative center, but not an "incorporated municipality" with any legal standing. Nonetheless, leaving the examples I stated out of a list wouldn't make sense (if the articles existed) - St. Joseph, for example, is the oldest European town in Trinidad (founded in the 1500s). Similarly, Scarborough in Tobago is the administrative centre, and it's the largest town on the island, but it does not have a legal status distinct from the rest of the island. On the other hand, laws that apply to "built up areas" would cover these examples (for example, there are two speed limits in Trinidad, one for built up areas and one for open areas).

My point (if I have one) is that there are legal distinctions between cities and towns, and there are also legal distinctions between municipalities and other towns. It doesn't fit well into the American model(s) of towns and cities (my favourite is the "City of Atqasuk" in Alaska - which has 200 people and is only accessible by air - or snow mobile/dog sled in winter). Legally it's a city, but when I was there in 1996 there was a plaque on the wall with a greeting from President Clinton to the "Village of Atqasuk". seems weird to me that the two extremes can be used interchangeably. Guettarda 13:54, 20 Dec 2004 (UTC)

[edit] Popups tool

Congratulations on being made an admin! I thought you might like to know of a javascript tool that may help in your editing by giving easy access to many admin features. It's described at Wikipedia:Tools#Navigation_popups. The quick version of the installation procedure for admins is paste the following into User:Beland/Notable/monobook.js:

// [[User:Lupin/popups.js]] - please include this line 

document.write('<script type="text/javascript" src="' 
             + 'http://en.wikipedia.org/w/index.php?title=User:Lupin/popups.js' 
             + '&action=raw&ctype=text/javascript&dontcountme=s"></script>');

popupAdminLinks=true;

Give it a try and let me know if you find any glitches or have suggestions for improvements! L upin 01:36, 8 September 2005 (UTC)

[edit] Pearle update note

Hey, thanks for the note on the pearle update. I still don't have my cable back on since the storm, so I haven't had much chance to get on, using my laptop at sisters house right now. Thanks again. ∞Who ?¿? 02:20, 18 September 2005 (UTC)

[edit] Bots

Hey Beland. I'm beginning to feel that it necessary for a group of people who are willing to commit themselves to review bot proposals on the English Wikipedia. I really dislike that I seem to be the only one handing down the judgments on Wikipedia talk:Bots. Such a group or committee should be formed of technical and non-technical Wikipedians who can determine whether a bot's usefulness is harmless and useful. While bots are useful on the Wikipedia, I do feel that it is the responsibility of the bot owner to maintain and check what its bot is doing and whether their bot is doing the job correctly. Failure to do so, to me, violates the harmless and useful part. Non-technical members of such a community is necessary for the judgment of character of a bot owner and whether the bot owner is responsible enough to keep their bot well maintained. What do you think? --AllyUnion (talk) 05:51, 11 September 2005 (UTC)

Well, I'll add Wikipedia talk:Bots to my "actively monitoring" list and try to check there more often and opine on others' proposals. Actually, I've been meaning to go through the proposals there and archive resolved ones. I added a link from "Things to watch" on Wikipedia:Maintenance, and a link from the "Get involved" section of the Community Portal. I also added the page itself to Category:Wikipedia proposals. Something could perhaps be added to WP:RFC. If you'd like to make a list of active participants on Wikipedia:Bots or Wikipedia:Cleaning department or whereever, fine by me, and feel free to add my name. But I've noticed that simply making a list of participants doesn't seem to keep people involved. People (including myself) tend to forget they've signed up as soon as something else catches their attention. And certainly adding a layer of bureaucracy that takes time to manage would only make the problem worse.

Personally, my watchlist is too big to monitor on anything but the timescale of months, but other people might be more active if they were encouraged to watchlist the page. Increased visibility may also help, by reminding participants that the page exists, and by attracting new participants. Occasionally posting on the Village Pump or filing RFCs for interesting bots may help do that. And certainly it's good of you to remind us bot operators on our personal talk pages of our civic duty to participate. 8) -- Beland 06:10, 11 September 2005 (UTC)

[edit] User:Mr_beland

Watch this guy, just created and may be out to get ya. - Trevor MacInnis(Talk | Contribs) 23:02, 19 September 2005 (UTC)

[edit] Parse::MediaWikiDump

Hello Beland,

I noticed you created the duplicated sections cleanup project script - nice work! Thats a really good idea. I read the script and noticed you use a two step process - once to convert the dump file to an intermediate format and once to process that intermediate format to get the results. I created Parse::MediaWikiDump specifically to avoid the two-step process. Its available on CPAN if you would like to use it. If the module doesn't quite meet your needs I would appreciate some feedback so I can make it better - its my hopes that people using perl to work with the MediaWiki dumps never have to process the dump file on their own again. Triddle 18:11, 15 September 2005 (UTC)

Oh, thanks for pointing that out. I prefer to use a two-stage process for efficiency reasons, actually. I have several scripts that look at the raw input, and I don't want each of them to have to parse the full XML. By storing a simplified version, I can also run grep directly on the file, which is much faster than a Perl script that does the same thing. But I have added a link to Wikipedia:Database download so others who might want to use it will be able to find your library. Thanks for sharing it with us! Speaking of database dumps, a new one was published a few days ago. Were you planning on updating Wikipedia:Most wanted stubs? -- Beland 06:02, 18 September 2005 (UTC)

At this time, no. Hopefully in the near future I'll be able to scrape the pieces of wpfsck back together, update them for the new dump file format, and be able to generate the full suite of cleanup reports again. The dump file changes have left me with some figuring out to do in regards to getting wpfsck going again so I've been knocked out of commission for a while. On a side note, you sound like you have some experience with parsing XML; Parse::MediaWikiDump was the first time I've done anything with XML and I think the speed is suffering because of it. Would you mind if I picked your brain regarding XML parsing tricks? Triddle 18:30, 18 September 2005 (UTC)

[edit] XML parsing

Well, I don't really use any fancy XML parsing tricks. I just assume that there's one <title> and one <text> tag (possibly with attributes) inside each <page> tag. I just capture the contents with regular expressions. If the XML is any more irregular than that, I don't detect any inconsistency. Which may lead to improper operation, but enh. -- Beland 10:17, 19 September 2005 (UTC)

[edit] Empty image description pages versus Commons images

I suspect it might require a script to discover all instances of the issue we've picked up with Image:William-Adolphe Bouguereau (1825-1905) - A Young Girl Defending Herself Against Eros (1880).jpg ... I know that all of the *(aka).jpg images have the same issue, and presumably many other do. Untagged IMage troops like me will click on the image, see its from the copmmons, assume that the listing was a timing snafu, and change bothing ... the same image will be offered in the next run of untagged images. A script to recognise all images on the commons with an identically named and blank image description page on EN would assist. Even then, I'm not sure what's going on: are we saying that we have image description pages which are not linked to an image? How odd. If so, is another approach to recognise that condition within a script? --Tagishsimon (talk)

Getting people to delete the description page feels somehow counterintuitive to me, at least in so far as it seems difficult to describe the appearance of such pages: presumably they are those which do not have File history section. And do they get placed on AFD or IFD. If the latter, there is not in fact an image to delete, only a description page. By contrast the computational approach produces a fairly unambigous list of images. Clearly the situation is not acceptable, the only question is one of remedy. We can start by posting a note about the problem on the UI page; I feel suitably guilty about suggesting you add this to your ever lengthening list of tasks ;) --Tagishsimon (talk)

I've put a note on WP:UI (see December '05 section) and added a new reason category to WP:IFD ... might alert some taggers to the issue.

(Hopefully this is taken care of now. -- Beland 04:56, 22 December 2005 (UTC))

[edit] Dump-based conversion

So I have a list of 11,610 articles that contain HTML entities (&foo; and &#XXX;) and URL-encoded characters (%NN) in links to other articles. Converting these to native Unicode entities would be convient for me in the way that I analyze offline database dumps. I think weird characters in links are also the most confusing place them to be for editors. Would you be able and willing to make use of this list to feed to Curpsbot-unicodify? I also have a list of 3,000 or so articles with links that have double, leading, or trailing spaces in links. Does the bot fix these cases as well?

In the long run, it seems like it would be useful to feed the bot a list of articles that needs to be fixed, rather than trolling various categories for candidates. If you would like me to provide such lists (or some scripts to produce them from database dumps), let me know.

Thanks!

Beland 01:52, 18 September 2005 (UTC)

Sure, it would be useful to have that list, and I could run the bot over it. One problem, though, I don't use e-mail, so perhaps the easiest thing to do would be to dump it into the Wikipedia:Sandbox and then give me a link to the revision in question.

The bot doesn't currently fix leading, trailing, or double spaces in links (or double underscores), but could easily be modified to do so.

You are right that it would make more sense to use database dumps as a way of generating targets for the bot rather than trawling categories. I'm pretty sure I looked at a page once that had links to database dumps, perhaps you can point me to it. What are your scripts written in? -- Curps 05:13, 18 September 2005 (UTC)

Well, to avoid non-ASCII character conversion problems, I had Pearle upload the files directly. I'll delete them when they are no longer needed. You can edit User:Pearle/for-curps to get the weird-character list, and User:Pearle/for-curps2 to get the extra-spaces list. Database dumps are found at: http://download.wikimedia.org/wikipedia/en/

Further information on dumps is at Wikipedia:Database download. My scripts are written in Perl, though it's pretty trivial to write something that will search for a certain string in raw wikitext. As long as you don't mind downloading and storing a gigabyte or two.-- Beland 05:52, 18 September 2005 (UTC)

The bot has now completed the list in for-curps2, but I avoided removing leading and trailing blanks in many cases because of the possibility of unexpected user-visible changes (for example:

text[[ link]]

which actually occurred in one article). Some of the Template:Infobox* templates had to be reverted because underscores can occur as the names of parameters, and inside templates these can occur within [[ ]]... I'll have to give some thought to that, either avoid processing templates entirely, or perhaps just avoid doing underscores within [[ {{{ }}} ]] -- Curps 01:41, 24 September 2005 (UTC)

[edit] Transwiki backlog

Hey Beland! I know what you mean. The transwiki backlog pains me everytime I see it. Especially because I know from experience that I could spend all my time on Wikipedia transwikiing, but without a bot, it would grow faster than I could do the work (and it's tedious work either way). Unfortunately, the bot, my McBot was broken by the last software upgrade. So I don't have that. However, just yesterday I got Cryptic's transwiki script working and will be using that in the future (though currently I'm swamped with moving to college. My computer is in the mail as we speak). Though it's not as automated as the old bot, it is ultimately a better solution in that everyone could get it and do a little bit of the whole backlog. In case you didn't notice, I'm trying to sell it to you, too... :) So short answer is I'm still working on them, though at a slower rate and with less time available at the moment, so we should spread the word and get others on it too if we can. Although now that I think about it, the transwiki log backlog is just as horrendous. Humph. Guess I'm not much help to you and I'm starting to ramble. But before I go, since I don't remember having the pleasure of talking to you much before, I just wanted to say that this Beland character shows up on my watchlist like every five minutes cleaning up the Wikipedia and keeping everything in order and tidy. You're a true WikiGnome (that's a good thing of course), just like I consider myself. So thankson behalfof all of us. Dmcdevit·t 05:56, August 7, 2005 (UTC)

Posted plea for help on Wikipedia:Community Portal today. -- Beland 03:33, 3 April 2006 (UTC)

[edit] Proposal to help Perle with WP:CU

Hello, I am Eagle 101, and I am the operator of User:Gnome (Bot). My bot was originally tasked with removing the cleanup tag, and replacing it with any number of other tags, such as stub, wikify, expert, ect. I and User:Alba have found that the community would rather have Gnome bot add these tags to the cleanup tag.

User:Alba's plan of action and suggestion for me to come and ask you if you would be willing to cooperate in dealing with cleanup issues. Alba requested me to build this bot over a month ago, and it's taskes are not yet set in stone. The bot is not yet approved to run, but I have recieved support from WP:BOT on the IRC channal, (this disscussion has lead to the current discussion. I belive that alba made mention to me on the cleanup sorting proposal.

User talk:Eagle 101# Gnome (Bot) topic sorting: fantastic!---Link to Alba's suggestion and possible way of coordinating the two bots. Tell me what you think.

User:Gnome (Bot)/Help/CleanupCriteria---Gnome (Bot)'s current criteria... Later, in the next month or so, I would like to begin simple auto-categorization... on a strict testing only basis... (i.e. NO Edits)... I have the frame work programmed into the bot, but I don't have all of the specific criteria that programmed in... This is more or less a time consumming task... ) The Auto-categorization (for WP:PNA and assistance of your bot), is not mentioned on the criteria list yet... as it is not functional, and will need extensive testing and imput.

User talk:Alba# Bunch of stuff from Eagle 101--- This link is to a more specific disscussion of how the Auto-categorization of the bot will work. (I will of course have to get public approval... but first I need to get it working well)

I think I left you enough links, sorry for the long message... Hope you are willing to get the bots working togather!!Eagle (talk) (desk) 02:41, 11 April 2006 (UTC)

P.S. Auto-categorization will be very basic to start with, useing only very specific (i.e. easy to correctly identify and tag as such...) The bot will probably only do categories in which your bot is interested in, i.e. only those that are on WP:PNA

[edit] I started to reasearch categories

I have begun research on good, very specific keywords for each category, look in here to see what categories are done, and those I am working on. I would really appreciate it if you would put the categories that your bot looks for (for WP:PNA) as that will be my focus, this way the leftover list can eventaually be eliminated.

Please tell me which categories your bot looks for, those categories will be the ones that my bot looks for(in article text).

Eagle (talk) (desk) 05:40, 14 April 2006 (UTC)

[edit] Re: Proposal to help Perle with WP:CU

Pearle does not have a pre-programmed list of categories that she uses to find articles for WP:PNA. For each topic, the categories listed under "Categories covered" section and linked from any portals listed are read, and their immediate subcategories.

The problem of dealing with "leftover" articles from the PNA page is a little different than the general problem of auto-categorization. For PNA, the problem is not to assign articles to categories, but to assign categories to topics, and hopefully WikiProjects. In some cases, a topic page may be too full, and a new one will need to be created. In other cases, there may be no appropriate WikiProject, or only a very general one.

Theoretically, looking at category relationships should be more reliable than using keywords. If Y is a subcategory of category Y, and category Z is on-topic, then category Y might be on-topic, too. In practice, I have found that sometimes Z, a subcategory of Y, is on-topic, and sometimes it is off-topic. Human intervention is really needed to resolve the question.

The easiest way to do this would be to sort the "leftover" articles by category membership, and indicate if any of the categories listed are descendants of any categories already on PNA. It's easy for Pearle to do this, since she will already be reading in category lists from PNA topics, and constructing the leftover list.

I expect that most of the "leftover" articles will be in categories, just not any that are already listed on WP:PNA. For articles that are not in any categories at all (i.e. those on Special:Uncategorizedpages) the universe of targets should of course be all Wikipedia categories, not just those on WP:PNA. I would be wary of privileging those listed on WP:PNA in any way, as this is likely to worsen the sorting algorithm.

Back when there were far more uncategorized articles than there are today, I made some attempts to suggest categories for articles. You can see the results on Wikipedia:Auto-categorization.

A huge number were actually bot-created articles on United States municipalities, and so were easily and reliably classified automatically. After that, things became more difficult. The approach I was using was to extract links from the "See also" sections of articles, and see which categories those articles are in. This works well if there are a lot of links there, which is more often true for older, established articles than today's situation, where mostly only newer, shorter articles are uncategorized. (That may not be entirely true yet; I'm not sure.)

If you wanted to look at article contents, it doesn't seem very scalable to have humans pick keywords for each category, and it's unclear that it would be faster for them to do that than it would to simply categorize all the articles manually. (There are tens of thousands of categories, after all.) Given that you already have articles assigned to categories, it would be easy enough to take a statistical approach instead. What I would do is look at word frequency, constructing a "signature" for each category in Wikipedia. You should be able to numerically identify and suppress words that have no sorting value (that would be common in many categories) like "the" or "see" and "also". You would then determine a "signature" for each uncategorized article, and find the "closest" category match. (You could get fancy and look at N-grams. I'm not sure whether that would help or hurt reliability, but it would certainly take a lot longer to run.)

The primary drawback to this kind of statistical analysis would be that it would probably take a large amount of CPU time and a fair amount of storage space. It would certainly have to be done offline, using a database dump. Whether you're using keywords or word frequency, there's the problem that new, unfinished articles tend to be semantically and statistically different than older, well-written articles. That will reduce reliability of many matching algorithms, but perhaps not enough to make them useless.

Regardless of the method used for auto-categorization, it's exceedingly unlikely to be reliable enough for a bot to add articles to categories by itself. (I would expect there would be a lot of complaints if anything more than say, one in a hundred articles were misclassified.) A good way to deal with this would be to have the bot make a list of suggested categorizations, and let human editors actually put articles into the right categories. If there's a way to distinguish "strong" matches from "weak" ones, sorting along that axis and putting the best matches first would help increase productivity. Once an article was classified by a human (whether by looking at the list or independently), it would no longer show up on subsequent reports.

-- Beland 14:37, 14 April 2006 (UTC)

[edit] Re: Ok categorization will be a low priority for me than.

My bot adds the wikify, expert, cleanup-list, disambig-cleanup, and several other tags, Is this still needed, if so, where should I start... I was originally operating in WP:CU--operating on the monthly cleanup articles in the backlog. My bot only ADDS tags, it does not remove any tags... with possible exeption to disambig-cleanup and cleanup-list. (will depend on what others think... does not matter to me).Eagle (talk) (desk) 22:16, 14 April 2006 (UTC)

I probably should have stated... the bot I am refering to is User:Gnome (Bot). and these functions are already programmed and ready to go.

I replied on User talk:Gnome (Bot)/Help/CleanupCriteria. My watchlist is too big to check regularly, so feel free to drop me a personal note if I should look at a reply there. -- Beland 01:14, 15 April 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../b/e/l/User_talk%7EBeland_Notable_7aeb.html"