Rambot FAQ
Below is a FAQ for general questions related to rambot. For information on the bot including IP address and known problems, see User:rambot. Please direct all talk about the bot to User talk:Ram-Man. See Wikipedia:FAQ for more FAQs.
What is rambot?
- The rambot is really a custom front-end program written in the Java programming language. It has performed a number of tasks, from user interactive spellchecking to automatically modifying many thousands of U.S. city/county articles as well as creating them from pre-generated articles based on SQL queries. What most people call rambot articles are often just the result copying a local article from one's own computer to Wikipedia itself without any knowledge of the article itself. The rambot is not so much an author as a copier or copyeditor. Generally, a rambot could consist of the background tasks used to harvest and process data into articles, but this is technically done with human-assisted computing (and has nothing to do with the Java bot code).
What is the name of the bot?
- Ortolan88 coined the name rambot (named after User:Ram-Man of course). The name is correctly spelled in all lowercase.
There is a problem with the bot. How do I block it?
- The most recent IP address information will be posted on the bot's user page here.
Is your bot slowing down Wikipedia?
- Probably not, but it is possible. Most bot owners, including this one, try to use bots during off-peak hours and implement features in the bots to back off when problems occur. The effect of bots on Wikipedia has been discussed at Wikipedia talk:Bots at great length and will continue to change as hardware and software changes.
Can I have the source code to your bot?
- While I normally like to be open (otherwise I wouldn't be here), I don't want script kiddies ruining our experience here. Besides, an intelligent well-meaning programmer can easily duplicate the work with little effort.
I hate bots, what do I do?
- You're not the only one! See Wikipedia:Bots for a discussion on the benefits/disadvantages of bots.
How do we know the geographic data is accurate or correct?
- Due to the similar nature of all the articles, The data can be verified periodically by having the rambot check the source data with the article data. This can also be used to automatically update data as it becomes available. The source of the information can be found at Geographic references. The articles can only be as good as the sources, but they are as good as we can get on this scale. In the articles themselves the sources are referenced by numerical superscripts such as [[Geographic references|<sup>1</sup>]].
What good are these articles? They are just gazetteer entries and not encyclopedia articles. There is also too high of a percentage of these articles. We want variety!
- It is true that these articles contain information that is found in a gazetteer, however, it is not only that. It contains a collection of information from a variety of sources as well as individual edits from persons who know about the cities or counties the articles are about. Some people are uncomfortable with the idea of a bot generating articles, however, these articles are often times more complete than other stubs on other topics that are done by humans. A lot has been said here about the worth of the articles, but generally the best thing to do is not to complain about the articles but to add to them and make them better.
- Over thanksgiving I was with my wife's family and we had a discussion about the demographics of my hometown and my wife's family's town. They wanted to know the very information that I had added to the respective city articles a month earlier. So in fact I went to the computer and had their answer in less than a minute. Needless to say they were impressed and once again I got to see how this information can be quite useful.
- This view has been stated many times here but few actually feel that the articles should not be added. (Un?)officially, the "Random Page" feature was designed to help people find stubs to add to. If not for that feature, no one would know or care about the percentage of city entries, so in essence it is not the articles themselves that are a problem, but a single feature which is biased towards them. But it is true that there is a lack of balance, but that only implies that we need more people to add on a variety of topics. But we will always need that. The best thing to do is work harder. See Deaf Smith County, Texas and its talk page for an example of this in action.
My favorite city XXX is missing, where is it?
- Believe it or not even though 30,000 cities were entered, many are still missing. Over 1,000 entries could not be immediately automated and are still on the TODO list. These will be done sometime as I find time to work on the list. If it still does not show up, it is possible that the census bureau does not consider it an independent census location. Don't wait around for me to add an article, add it yourself!
If we're interested in adding further information about a city or town to its article, will the updates by the bot delete what we've added?
- No. The bot is just like anyone else and will only modify existing pages. It will even be affected by edit conflicts as well.
Some of the cities are just neighborhoods and not really cities
- This is a known "problem". The solution is simply to update the article to replace the wrong name with the correct one (e.g. replacing "city" with "neighborhood" and rewording accordingly). One real example is that of Wheeler AFB, Hawaii which is really a U.S. Air Force base and not a town. These will get fixed as people notice them and correct the inaccuracies.
Where did all the bot entries go from Recent Changes?
- Access the recent changes page with bot entries here.
The rambot screwed up an accented character or some other character. What's up with that?
- The rambot originally could only handle 7-bit ASCII characters, so all of the extended characters were messed up. This was then fixed to contain full 8-bit ASCII support. Recently, however, partial support for unicode has been added in that characters larger than 8-bit will be converted into their HTML equivalents. If any errors still exist they should be reported to User talk:Ram-Man so it can be fixed.
There is a duplicate article on X
Why are there two articles for "Fooville (city), Some County, State" and "Fooville (town), Some County, State?"
- The US census bureau sometimes lists multiple entries for the same general place, however, this place is not always exactly the same. A town may be smaller than a city (similar to a town and township relationship). Sometimes the one is a subentity of the other and the "city" is the governing agent. Sometimes these two agents contain the same data. Most of these things are either intentional or accidental things that must be corrected. Which one depends on your own knowledge of the place in question. Feel free to discuss it on the talk page or try to fix it yourself. (Large) PDF maps displaying the street-by-street boundaries of the census areas are available for download, and can be useful for understanding how things are divided up.
Are the rambot entries available under alternate licenses?
- All English Wikipedia main and main talk namespace articles or edits produced by the rambot are multi-licensed as described on the rambot user page. I encourage you multi-license your contributions as well. Most changes are not available under the public domain, however, the original data for most of the city/country articles is. (See: Geographic references)
How many county and city entries are in the rambot's database?
- Approximately 3,141 counties and 33,832 cities (10,024 cities, 8,039 towns, 5,655 CDPs, 4,853 townships, 3,768 villages, 1,232 boroughs, 99 unorganized territories, 49 locations, 34 U.S. Air Force bases, 33 plantations, 16 Indian reservations, 14 balances, 9 counties, 3 gores, and 1 grant, municipality, purchase, and district). This results in a maximum of 36,973 articles created and mantained by the rambot.
[edit] Undoing damage (false zeros)
I posted this some time ago on Wikipedia:Bot requests and someone has pointed out what should have been obvious—that I should bring it up with you.
In all the demographics for U.S. census places, the information is cluttered up by useless false precision of the numbers. All of the marked zeros in this example below are useless, misleading information.
- There are 1,974,181 households out of which 30.90% have children under the age of 18 living with them, 44.00% are married couples living together, 15.60% have a female householder with no husband present, and 35.70% are non-families. 29.40% of all households are made up of individuals and 9.30% have someone living alone who is 65 years of age or older. The average household size is 2.68 and the average family size is 3.38.
- In the county the population is spread out with 26.00% under the age of 18, 9.90% from 18 to 24, 31.70% from 25 to 44, 20.70% from 45 to 64, and 11.70% who are 65 years of age or older. The median age is 34 years. For every 100 females there are 93.90 males. For every 100 females age 18 and over, there are 90.50 males.
- The median income for a household in the county is $45,922, and the median income for a family is $53,784. Males have a median income of $40,690 versus $31,298 for females. The per capita income for the county is $23,227. 13.50% of the population and 10.60% of families are below the poverty line. Out of the total population, 18.90% of those under the age of 18 and 10.30% of those 65 and older are living below the poverty line.
Gene Nygaard 05:32, 24 Feb 2005 (UTC)
- Maybe this has been done for some of them--or it wasn't as widespread as I originally thought--but I didn't see anything about it either on this page nor on the Bot requests page. I see now that at least some of them do not have the extra zeros, though some such as Menominee County, Michigan and Tate County, Mississippi still do. Maybe it was just counties which had the problem in the first place. Gene Nygaard 05:47, 24 Feb 2005 (UTC)
Ranman should be able to correct it. Hopefully. -- AllyUnion (talk) 19:13, 25 Feb 2005 (UTC)
-
- I can check on this. I thought I had used correct precision, but maybe not. I'll re-verify everything at some point. -- RM 05:04, Feb 26, 2005 (UTC)
-
-
- If it's correct precision, isn't it very unlikely to be all zeros? --Chinasaur 09:11, 16 Apr 2005 (UTC)
[edit] geolinks
Hey, thanks for getting these external links templates onto US city pages. I have a small suggestion: use different scale templates for different size cities. We have a series of templates in addition to geolinks-US-cityscale including geolinks-US-hoodscale which might be appropriate for small towns, and geolinks-US-countyscale which might be appropriate for metropolises. It would probably be easy for you to deal with this using the city area info you are already deriving from the census? --Chinasaur 09:16, 16 Apr 2005 (UTC)
[edit] City taxoboxes
I was wondering if your bot can add the taxoboxes on all US City articles... like the example at the bottom of the page of Wikipedia:WikiProject Cities... maybe fill in some of the information while it's at it? -- AllyUnion (talk) 05:58, 21 Apr 2005 (UTC)
One of the statistics paragraphs in this article is truncated:
- There are 234 households out of which 20.1% have children under the age of 18 living with them, 49.6% are married couples living together, 5.6% have a female householder with no husband present, and 40.6% are non-families. 32.9% of all households are made up of individuals and 6.8% have someone living alone who is 65 years of age or older. The average household size is 2.07 and the average family size is 2.60.(474.9/mi²). There are 880 housing units at an average density of 333.1/km² (863.5/mi²). The racial makeup of the town is 92.56%
- RickK 05:06, Jun 13, 2005 (UTC)
[edit] Rambot-cruft
all of the Rambot articles should have this at the top:
Template:Rambot-cruft
Supersaiyanplough|(talk) 10:05, 11 July 2005 (UTC)
[edit] Problems with Vermont
There exist separate pages at Rutland, Vermont, Rutland (town), Vermont and Rutland City, Vermont. Rutland (city), Vermont also existed until it was redirected. If anything the city should be at Rutland, Vermont. The numbers are also slightly different - any ideas why? --SPUI (talk) 01:36, 13 July 2005 (UTC)
[edit] Stupid 0.01% statistics
Rambot not only has "Problems with Vermont", but with ALL villages less than some 1000 inhabitants. The statistic decimal figures have no sense at populations less than 100. The term "City" should not be used for small places - or at least be checked by political data.
If this is not possible, the program of RAMBOT should be changed and omit 1-2 decimal figures behind the decimal point at small cities: e.g. to 0.1% when population is less than 5000 (resp. the area is smaller than 20.0 km²), and just to 1% when less than 500 persons (or smaller than 1 km² ; personal data are changing much more quicker than areas of communities).
As an example, I have changed the false data of Beardsley, Minnesota (population 262):
"There are 125 households out of which 26% (instead of 26.4%) have children under the age of 18 living with them, 46% are married couples living together, 7% (not 7.2% - that's 9 persons!) have a female householder with no husband present, and 42% are non-families."
Additionally such small „cities“ should not included in the Category:Cities in Minnesota, resp. in the Category of the relevant state.
Other incorrect data like "For every 100 females there are 88.5 males" should be changed likewise. If just one boy is born (I think this will be the case within 2-3 years) or 1 girl will marriage to the neighbouring village, the relation will change to 100 : 89.2 , and in both cases to 90%. --Geof 06:01, 18 July 2005 (UTC) that is nice
[edit] Interwiki Rambottage
Are versions of Rambot at work in non-English Wikipedias? Almafeta 17:28, 22 July 2005 (UTC)
- No. Any bots using the same name are imposters. There are plans to add the rambot articles to the non-English Wikipedias, but it will be some time before this happens. — Ram-Man (comment) (talk) 01:56, July 23, 2005 (UTC)
I would like to have Rambot"tage" in Nepal Bhasa wikipedia. What do I do?--Eukesh 17:28, 22 December 2006 (UTC)
[edit] Possible New Article for West New York'o philes
A prominent newspaper editor who worked for papers like the New York Daily Mirror named Philip A. Payne lived in West New York, according to an article in the Jan. 29, 2006 Union City Reporter (which I assume was also printed in the Hudson Reporter's other area papers.) There is no article on him as of now, and I have other article I wish to work on, so as a suggestion, if anyone wants to make a one on him, and you want the article as reference, you can see it in full at: http://img368.imageshack.us/img368/9012/philippaynearticle7hw.jpg. I'm sending this message to more than one Wikipedian that I saw on the History page for the West New York article, so you don't have to respond to me about this. You can also begin a section on Noteworthy Residents for that city's page (as I did for Union City), and place him in there. Nightscream 00:10, 1 February 2006 (UTC)
[edit] Township naming conventions
A couple quick points regarding township articles:
- I see there's a Category:Townships in Michigan, but no Category:Townships in Indiana. Is the creation of articles for Indiana townships pending, and if not, could it be added to the to-do list? Thanks!
- The entries in the Category:Townships in Michigan appear to follow two different naming conventions: Township, State and Township, County, State. I presume the county is added only to provide disambiguation among multiple townships of the same name; however, since so many of the entries have to be disambiguated in this way, why not include the county for all? There are also some problems I see with the mixed naming convention:
- Including counties for some entries but not others makes the list look very inconsistent/messy.
- It makes it impossible to use the list (reliably) to easily search for townships by their parent county.
- This is a small point, but the civil association between township and county government is very close (I can speak only for Indiana where I live, but I presume we're not unique); omitting it doesn't seem called for.
Huw 13:21, 26 February 2006 (UTC)
[edit] Problems with U.S. county articles
Most of the county articles created by Rambot include the line "Census-recognised communities" instead of "Census-recognized communities" (which is the typical U.S. spelling). As the articles should all be in American English... could you add this to the bot's long, long, long to-do list, please? I'm taking care of the Washington counties, but I think the rest may be a bit too much. Matt Yeager ♫ (Talk?) 06:06, 15 March 2006 (UTC)
[edit] Magic:The Gathering card
Howdy! In honour of the prodigious quantity of articles submitted and modified by this bot over its lifetime, I award Rambot with its own Magic: the Gathering-style trading card. Feel free to do whatever you want with it, and keep up the good work! :) GeeJo (t)⁄(c) • 20:00, 30 May 2006 (UTC)
[edit] Canada?
Can Rambot do for Canada what it has done for the United States? I'm sure articles such as Orono, Ontario would benefit greatly from whatever facts and statistics Rambot could fetch and add. NeonMerlin 03:01, 16 August 2006 (UTC)
- I was wondering about this too. It seems that Statistics Canada's website has been improved somewhat recently, because either pages like this didn't exist, or they were a lot harder to find. Basically, every city/town/township/village/rural municipality/etc. can be enumerated from those pages. Then CSVs can actually be downloaded from the particular page. As a sample, I created Piney, Manitoba from the information here and formatted based on Excel Township, Minnesota. Most of the information in the Excel Twp article, entirely Rambot-created, had direct analogs on the Statcan page for Piney. Some differences and difficulties I noticed:
- Statcan uses "visible minority population" and "all others", where "all others" is similar to what the US Census calls "white"
- There are more age divisions in the Statcan page -- I added some of these together in the Piney article
- There doesn't seem to be any comparable metric to the poverty line number from the US census
- There are a lot of types of communities -- Statcan calls this "Piney (Rural Municipality)". Locally it's known as the RM of Piney. Other community types include Cities, Villages, Towns, Local Government Districts, Indian Reserves, Indian Settlements, and many more I'm sure. In the Piney article, I replaced subsequent references ("township" in the Excel article) with just "municipality" instead of "rural municipality". There are, of course, name collisions too. Gimli, MB (Rural Municipality) and Gimli, MB (Town) is one example example. Afiler 23:25, 25 August 2006 (UTC)
There may be copyright problems with Canadian statistics. Statistics Canada's Copyright / permission to reproduce page says, "Reproduction of free information on this site, in whole or in part, for the purposes of commercial redistribution is prohibited except with written permission from the Statistics Canada Copyright Administrator". In other words, the data can be used freely for non-commercial reproduction only, which is unacceptable for Wikipedia. 86.143.50.172 14:35, 12 September 2006 (UTC)
- Raw data cannot be copyrighted in the United States, as that is just data. I'm not sure what the means for other Wikipedias that are not in the U.S. -- RM 15:51, 12 September 2006 (UTC)
-
- All the Wikipedias are in the US (now that the Asian cluster has been folded into the Florida cluster), but other Wikipedias prefer to honor the laws of other countries as well, just in case. – Minh Nguyễn (talk, contribs) 07:13, 4 December 2006 (UTC)
- It appears to me that Statistics Canada is claiming more rights than they actually have. Tele-Direct (Publications) v. American Business Information applies a "creativity" test, like in the US (see Feist v. Rural), and since Statcan didn't apply any selectivity, they'd fail that test. Additionally, the data would not be a direct reproduction of the tables from statcan.ca, just a statement of facts (population counts) as shown on statcan.ca. http://www.robic.com/publications/Pdf/284-PEM.pdf has more info on Canadian database rights. As far as I'm concerned, there's no reason not to go ahead with publication of these facts, since they're not copyrighted in Canada or the US. Afiler 21:52, 15 March 2007 (UTC)
|