User talk:The Anomebot2

From Wikipedia, the free encyclopedia

I am a computer program, so there's not much point in talking to me. However, you might want to talk to my owner, User:The Anome.

Contents

[edit] Latest improvements

  • I have now improved the graph traversal code to deal better with the special case of territories of other countries which have their own ISO 3166 2-character code. For example Aruba, which is a constituent country of the federacy of the Netherlands, but is autonomous enough to have its own code, AW, will now be listed as within AW, not NL. This is currently only of theoretical interest, since these entries are currently eliminated by other heuristics, but when I get round to sorting them out, everything should work correctly. -- The Anome 18:11, 20 August 2006 (UTC)
  • I have added extra code to stop the addition of lat/long tags to articles which already contain OSGB tags: UK articles added by the bot from Aberaeron to Biggleswade did not use this check, and may therefore contain duplicate geotags. -- The Anome 18:36, 24 August 2006 (UTC)

[edit] Australian towns

  • Noticing your excellent work on adding coords to Australian towns. You are doing a great job :-) Thanks--Arktos talk 20:20, 19 August 2006 (UTC)

[edit] Progress so far...

Progress so far:

GNS/category fusion dataset:

  • inspected 17424 articles that were uniquely taggable and not recorded as being geotagged in a recent dump
  • added geodata tags to 13290 of them: the others were already geotagged in some other way, or otherwise untaggable, and were not updated

de:/en: collation dataset:

  • Pending for inspection, 8240 articles recorded as having geotags in de: but not in en: -- a subset of these articles now dumped
  • Double-check: early versions did not spot {{geodis}}. Only a few pages got marked by mistake: fixed by hand
  • dumping second run of ~1600 articles: towns, cities, lakes and islands only, degrees-and-minutes resolution only

-- The Anome 10:50, 8 September 2006 (UTC)

  • Now doing a new pass over recently-added and recently-categorized articles, based on the 20061104 dump set. -- The Anome 01:50, 20 November 2006 (UTC)
  • Adding various new feature types: bays, glaciers, fjords, volcanoes, etc. -- The Anome 02:01, 21 November 2006 (UTC)

[edit] Interwiki link sorting

I beg you to change the sorting order or disable this ability. It is frustrating not to find Suomi before Svenska where it logically belongs because it is moved before Français (see Jyväskylä for example). Not too many readers are familiar with the language ISO codes, they most likely read the list alphabetically. Yeah, I know, there's no consesus how the links should be listed, but still. Anyway, thanks for the coordinates thingy.--JyriL talk 12:40, 22 August 2006 (UTC)

The problem is that no-one can agree on the correct order. Some like them in English-language name order (in which case suomi should be sorted under 'F'), others in native-language name order (in which case, the Albanian language should be sorted under 'S', and where do you sort 中文, relative to Latin alphabets, in that case: under H or Z, or before or after all Latin characters, and if so why -- perhaps before, because their script's older, for example? -- or use raw Unicode collating order? And what about Indian or African languages, some of which have sounds not expressible in English: where would you put the !Kung language in the sorting order?) Rather than trying to reorder the tags in every article, which would result in a mess of different conventions, partial combinations of both, and no order at all, and massive database churn as competing attempts are made to impose the "correct" order on millions of interwikis within articles, it would be better to sort this out in the page-rendering code.
So, given that I have to put them in some sort of order when I move them to the bottom of the article, I've chosen to sort them in ISO code order, which is neatest in the source text, and this has the advantage of annoying both the big-endians and the little-endians equally not taking sides in this controversy. Thus the cosmic balance is preserved.
Seriously, since this should be a page-rendering, rather than an article-formatting issue, why not file a bug in the MediaWiki Bugzilla about this? The sorting code would be trivial, and could be made configurable for those people who really, really care about the ordering of the interwiki tags one way or another. -- The Anome 23:37, 22 August 2006 (UTC)

[edit] Geodata

What is the purpose of the geodata you have added to some Irish towns like Enniskerry and Blessington? I don't see any result on the page. Cheers ww2censor 14:37, 23 August 2006 (UTC)

It's displayed at the top of the article, to the right of the title. -- The Anome 18:32, 24 August 2006 (UTC)

[edit] Wonderful Bot

This bot has done great so far! How often do crazy people go and press the emergency shutoff? Can you add locations for Anjouan please? Keep it up! Felixboy 13:49, 25 August 2006 (UTC)

Thanks! I've fixed Anjouan by hand, for now, but there are many, many, more candidtate geodata tags of various classes yet to be added, so watch this space. -- The Anome 22:37, 24 August 2006 (UTC)

So do you do like whole regions at once and not move on until that area is a lot better or do you wake up every day aand say what the heck lets edit here? Felixboy 14:59, 28 August 2006 (UTC)

Not so much regions as classes of feature, but, pretty much, yes. I'm progressively improving my filtering code, and each time I add another set of filter criteria, I manually QA a sample before proceeding with the entire dump. So far I've mostly steered clear of U.S. features because they are not in the GNS -- they are in GNIS, instead -- so I may do the U.S. after I've done the rest of the world. -- The Anome 10:54, 8 September 2006 (UTC)

[edit] Stevenston / Ardrossan

Your bot has given Stevenston a latitude of 55 38 N and Ardrossan 55 37 N although Ardrossan actually lies to the NORTH of Stevenston, with Saltcoats in between.

The difference is small, however. Just seems v odd to me. --NSH001 00:04, 25 August 2006 (UTC)

The problem is with the source data: these annotations can only be as accurate as the original NIMA GNS data. Ideally, the coarse lat/long data will be replaced in the long term by higher-resultion data referenced to that particular countries' national geodetic systems. -- The Anome 20:40, 25 August 2006 (UTC)

[edit] Some things

Hi Anome, nice work. You can also use my CSV-Data. Many articel in german wikipedia has a geotag, but at the same article in the englisch is no geotag. The biggest problem in the english wikipedia is, that many geocoordinates has no "region" and "type. Is it possible that your bot fix this geotags? It will be a great help. -- Stefan Kühn 19:58, 26 August 2006 (UTC)

I'm doing this now. -- The Anome 10:56, 8 September 2006 (UTC)

[edit] Interwiki Bug

Hi Mr Bot.

I am considering pressing the emergency shutdown, since you mess up the interwiki links. The interwiki links are not alphabetical after language code; they are alphabetical after the language you see in the side bar. Here is an example how you messed up Vättern [1].

Fred-Chess 22:23, 26 August 2006 (UTC)

I did some checking: as far as I can see, as of the last discussion, which I believe was Wikipedia:Language_order_poll, there was no consensus, but where the leading choice was, by a tiny margin, alphabetical by language code order, which is the ordering used by this bot.
I think the lack of a clear consensus points clearly to this being a rendering issue, not an article-formatting issue: otherwise, every time the consensus changed, hundreds of thousands of articles would need to be re-formatted. -- The Anome 23:10, 26 August 2006 (UTC)
You are arbitrary changing the formatting of hundreds of thousands of pages. No matter the poll, suomi is put as "s" by several bots, and is so in all pages I have checked, including Venezuela, Bill Gates, Sweden, etc. I suggest you stop changing an accepted practice, since it will now take some time to clean up after you.
Fred-Chess 08:30, 27 August 2006 (UTC)
I've now changed the bot code to preserve the ordering of existing interwikis -- I'm tempted to have a long argument with you on this, but life's too short to waste on arguing the point. (However, please see my comments above, to see why this issue is almost impossible to resolve: for example, what letter would you sort 中文 under -- C, Z, H, before A, or after Z?) -- The Anome 11:55, 27 August 2006 (UTC)
Ok I see your point -- sorry if I came about too strongly. But I do think that it is imperative to do have interwikis in a uniform manner, and that preference should be given to the most common practice.
I am not familiar with how ZH is usually sorted.
Fred-Chess 12:08, 27 August 2006 (UTC)

[edit] Coordinates of provinces

Hi, I have just realized that "you" have inserted coordinates for the Province of Maputo in Mozambique. Although coordinates are for specific points, I can understand using them for cities and towns, now provinces....can you give a coordinate to an area? Is it tehe coordinate for the capital, or for the geographic center or for some corner or for a randomly selected point? Could "you" please elaborate? Thank you Teixant 18:52, 10 September 2006 (UTC)

It's purely a representaive point for centering maps on. -- The Anome 23:28, 15 September 2006 (UTC)

[edit] Bonete

Hi, I saw your bot added the coordinates to Bonete. Where did you get them from? They seam to be wrong. Check the talk page. Good wiking, Mariano(t/c) 08:10, 10 October 2006 (UTC)

[edit] Coordinates for Brahmagiri very wrong

The coordinates you added to Brahmagiri are very wrong. Please look into this problem with your system. --BostonMA talk 04:33, 14 October 2006 (UTC)

The bot-generated coordinates are only given to a precision of a few minutes of arc. This map [2] suggests that there is at least something called Brahmagiri around this location. -- The Anomebot2 22:21, 3 November 2006 (UTC)
Um, there might be a village or something there named Brahmagiri, however, the article is about a mountain on the border between Kerala and Karnataka, and is probably over 1500 km from whatever it is your map points to. --BostonMA talk 22:45, 3 November 2006 (UTC)

[edit] Multiple place names

I noticed that the bot inserted coordinates for Furnace, Llanelli which were actually those for Furnace, Scotland another of the 4 places called Furnace in the UK. Is there a problem with duplicate place names?--JBellis 19:16, 14 October 2006 (UTC)

The bot's code goes to some effort to prevent this sort of error from happening by detecting and filtering duplicates, and it has been tested extensively, but it's not infallible. If you've found any similar errors, I'd appreciate hearing about them. -- The Anomebot2 22:15, 3 November 2006 (UTC)