User talk:Rambot/translation

From Wikipedia, the free encyclopedia

Contents

[edit] Maps

Could your bot upload all the location dotmaps to Wikimedia Commons? Otherwise it'd be hard for them to be used in other language Wikipedias. Ausir 18:07, 20 Jun 2005 (UTC)

[edit] Square miles

As for square mile, it doesn't have to exist in every article, mostly because in countries where miles aren't in use, this information isn't very useful, and probably won't be included. Ausir 18:11, 20 Jun 2005 (UTC)

I'll omit miles from the articles. At the very least, that will clean up the article's presentation a little bit. — Ram-Man (comment) (talk) 18:25, Jun 20, 2005 (UTC)

[edit] Disambiguation

Here's the Polish disambiguation system:

  • If there is another article with the same name which is not about a location, we disambiguate it as [[Springfield (city)]], [[Springfield (village)]], etc.
  • If there is another article with the same name which is not about a location, and the locations we have in the database are of different kind, we disambiguate it also as [[Springfield (city)]], [[Springfield (village)]], etc.
  • If there is no other place with the same name in a given state, and there are in other states, the article should be disambiguated through the name of the state, e.g. [[Springfield (Kentucky)]].
  • If there are two or more places with the same name in one state, the article should be disambiguated through the name of the county, e.g. [[Springfield (Washington County)]]. If there are two or more counties with the same name with places with the same name, the article should be disambiguated through both the county and the state name, e.g. [[Springfield (Washington County, Kentucky)]].
  • If there is more than one place with the same name in one county, it should be disambiguated also through the type of location, e.g. [[Springfield (town in Washington County)]] and [[Springfield (city in Washington County)]]
The disambiguation scheme described above should be fairly easy to work with. I should note that any articles that already exist will have to be manually added, but that number should only be a fraction of the total number, I would hope. — Ram-Man (comment) (talk) 18:47, Jun 20, 2005 (UTC)

Don't worry, we don't even have all articles about state capitals at pl:. Ausir 18:49, 20 Jun 2005 (UTC)

There's a problem with our disambiguation scheme, however. What if there is a town with the same name as the American one e.g. in England or Australia? With the above scheme, it won't be disambiguated... unless we somehow compare it to some list of towns in other English-speaking countries. Ausir 19:36, 20 Jun 2005 (UTC)

I had not thought of this, but I am sure we would have caught it before we started. Have you considered using the English Wikipedia's disambiguation scheme? As the polish Wikipedia grows, you're going to have many problems like this. — Ram-Man (comment) (talk) 19:41, Jun 20, 2005 (UTC)
Could you please generate a list of all names unique in the US (which we would not normally disambiguate in the above scheme) or just all place names (just names) and e-mail it to me? We'll then compare it to a list of all towns and cities in the world, and generate a list of those which have an equivalent with the same name in another country. Then we'll know which places will need additional disambiguation like (US city). Ausir 15:52, 21 Jun 2005 (UTC)

I've spent some time pouring over the databases that I have that I havn't looked at in quite some time, and I'm working on getting the data ready for regeneration of articles. It's taking me quite some time, so you should know that this whole process could take a lot of coordination time on my part before we can go on. That said, I have some numbers for you. I could get you the lists, but numbers should do for now. It should give you a basic idea how many disambiguated names there will be.

  • There are 21,112 distinct names of the U.S. cities in the census bureau's database. This does not include those cities found in other databases (such as FIPS) and the names of the cities mapped to zip codes.
  • City names in the U.S.: 16,534 unique, 4,578 non-unique
  • City names in a particular state: 30,098 unique, 1,598 non-unique
  • City names in a particular county: 32,093 unique, 868 non-unique

Hopefully this is good enough for now. — Ram-Man (comment) (talk) 01:55, Jun 23, 2005 (UTC)

I've uploaded the file to distinct.txt. It contains a list of all distinct names of U.S. cities. No matter of whether they are duplicates or unique, they are only listed once (hence distinct). — Ram-Man (comment) (talk) 02:06, Jun 23, 2005 (UTC)

Thanks, we're going to compare it to the list of cities in other English speaking countries. By the way, in the above statistics, by "cities" you mean everything smaller than counties, right (also townships, towns where "town" is an equivalent of township, etc.)? Ausir 04:35, 23 Jun 2005 (UTC)
Yes, it has become somewhat of a convention to call place names that are not counties and not states as cities, no matter their actual legal designation. It simplifies the discussion, but of course is not entirely correct. (See Wikipedia:WikiProject Cities) — Ram-Man (comment) (talk) 12:14, Jun 23, 2005 (UTC)
Here's a list of 5711 place names which have equivalents in other countries US_rest_of_the_world_duplicates.txt and here's the list of country codes used in the file. Ausir 22:58, 24 Jun 2005 (UTC)

[edit] Town and city

Another problem is that in some languages (e.g. Polish) there is no distinction between city and town - they are both translated as "miasto". Ausir 18:30, 20 Jun 2005 (UTC)

In order to work with our data, you'll have to come up with a different name, for instance if "city" and "town" are identical, you'll need to come up with something like "city" and "small city" or "minor city", something to show that they are not the same. — Ram-Man (comment) (talk) 18:45, Jun 20, 2005 (UTC)
There is a word for "small city", but it's rather informal and sounds a bit silly when used in encyclopedic context. Furthermore, some places described as towns are bigger than some described as cities... Could you find out how many towns and cities with the same name in the same county are there? It doesn't sound like a very frequent occurence to me... Ausir 22:46, 20 Jun 2005 (UTC)
Ausir, "miasteczko" is a good translation of "Town" meaning "small city". It's not very informal or silly. We could use it. Fjl 20:42, 22 Jun 2005 (UTC)
Miasteczko is informal - I don't think you'll ever find it used in formal context. But let's wait for Ram-Man to check if there are any duplicate town and city names outsite NY and WI, where town is gmina, not miasto. Ausir 20:58, 22 Jun 2005 (UTC)

Ouch, my head hurts... Looks like I can't even give one word as a translation for town, since, according to the town article, it can be the equivalent of Polish city in some states, and of Polish gmina in others... Therefore I'll have to give different translations for different states. Ausir 00:08, 21 Jun 2005 (UTC)

It would be nice if the words meant the same thing here, but at least the word is the same. Here a town means something different from state to state, but I can use the same word in the articles. The biggest problem is distinguishing with places within a state that have the same name. I don't think we've even worked out all of the issues on the English Wikipedia yet, but we've done a pretty good job so far. — Ram-Man (comment) (talk) 12:56, Jun 21, 2005 (UTC)
But it won't be a problem to use different words for town, borough etc. for different states, right? Ausir 15:38, 21 Jun 2005 (UTC)
Actually, is there a need to disambiguate town from city at all? Is there any town with the same name as a city in the same county outside the states of New York and Wisconsin, where it is the equivalent of a township in other states? If not, there's no need to worry, since we'll be translating town in those states with the township word, not the city word. Ausir 18:52, 21 Jun 2005 (UTC)
Let's see. A quick search over the database for only those states that have duplicate names ("city" and "town" only) in the same county yields Wisconsin, Vermont, New York, and Connecticut. There are other combinations (such as "village" and "town", etc.) that I didn't look up. See: Barre (town), Washington County, Vermont and Barre (city), Washington County, Vermont and Groton (city), New London County, Connecticut and Groton (town), New London County, Connecticut. I don't see any collision with the word "township" in those cases. — Ram-Man (comment) (talk) 22:47, Jun 22, 2005 (UTC)
It looks like the Vermont one is a "township town" rather than "city town", and the Connecticut one, as said in the article, is the only town in Connecticut not coterminous with the city of the same name, so we can treat it individually. This means that there shouldn't be any disambiguation problems, since we'll use gmina for the township-like towns, and not miasto, like for the city-like ones. Ausir 23:14, 22 Jun 2005 (UTC)

[edit] Borough

Borough is a hard one to translate - I think we'll need to use a different word in the Alaska context, New York context, and in the Pennsylvania and New Jersey context. Is it possible? And are there any Pennsylvania or New Jersey boroughs with the same name as a city in the same state? Otherwise we'll just use the same name as for the city there, the same name as for county in Alaska and the same name as for city district in New York City. Ausir 20:37, 20 Jun 2005 (UTC)

The only state in which borough conflicts with something else is in Connecticut: Newtown (borough), Fairfield County, Connecticut, Litchfield (borough), Litchfield County, Connecticut, and Stonington (borough), New London County, Connecticut. Granted I am only doing a quick overview, but I'm pretty sure that these are all of them. — Ram-Man (comment) (talk) 23:52, Jun 22, 2005 (UTC)
Seems like borough is a city and town is a region (township) in Connecticut, so there's no real conflict :). Ausir 00:16, 23 Jun 2005 (UTC)

[edit] Balance, grant and purchase

What is a "balance" in the town context? Ausir 19:21, 20 Jun 2005 (UTC)

I have no idea ;-) I'd have to look it up. The census bureau uses terms that I've just copied, but I don't necessarily have any clue what they actually mean. Most of the terms at the bottom of the list reflect on a small handful or so of place names out of thousands of places. — Ram-Man (comment) (talk) 19:38, Jun 20, 2005 (UTC)
The same applies to grant and purchase. What the hell are those? Ausir 22:48, 20 Jun 2005 (UTC)
Those are easy enough for me to guess. A purchase and grant are merely places that were acquired via some legal transaction... buying the land from someone in the case of "purchase", so it literally was land that was purchased. From whom, no idea. I think that there is only 1 place of each type in the database, so we may be able to just ignore the articles altogether, or just handle them manually after we are done with everything else. — Ram-Man (comment) (talk) 13:02, Jun 21, 2005 (UTC)

[edit] Declination

We'll also have to enter genitive case, accusative case and locative case for every type of town (town, city, borough etc.) in Polish, since they have to be used for the translation to be correct. Ausir 23:34, 20 Jun 2005 (UTC)

Once I finish the template, you'll have to translate it as best as possible and then we'll figure out what to do with all of those individual cases. — Ram-Man (comment) (talk) 12:16, Jun 23, 2005 (UTC)

[edit] Moving to Meta

Wouldn't Meta-wiki be a better place for a translation project like this? Especially that en: does not support Unicode... Ausir 00:50, 21 Jun 2005 (UTC)

It's just easier for me to manage it as a subpage. I have no qualms with a translation project being started in Meta, but so long as I am going to be going through the data that I have, it's just that much simpler to coordinate the efforts here, for me anyway. By calling it a project, I by no means want to suggest that we need to get large quantities of people to "help out". It is more of a means to and end, and it avoids having to use email. Most of the actual work of generating the articles and such will be done offline anyway. Speaking of Unicode, the rambot does not have very good support for Unicode, but that on the list of things to work on. I don't know how much that will matter though. I consider translating the rambot articles a rambot project, so I'll coordinate my efforts right now here. — Ram-Man (comment) (talk) 13:10, Jun 21, 2005 (UTC)
If rambot has problems with Unicode, perhaps you could just send me the rambot data and we'll generate the articles ourselves, because Unicode is essential for the Polish version. Ausir 15:20, 21 Jun 2005 (UTC)
Oh, I guess you misunderstand. The generation of the articles has NOTHING to do with rambot. It's a common misperception that I don't always spend the time to correct. I can generate the articles in whatever format I need. Actually I was thinking about asking a developer to upload the articles directly to the databases and not go through the web interface to save on system overhead caused by adding such a large number of articles. So the rambot may not even be required. But my data is stored in rather large, multiple, SQL tables on my hard drive at home, and I can perform all sorts of processing on the data there. So don't let the unicode issue with the rambot cause you any concern, as the best method may be direct database access. But we'll have to get to the point where we have the articles before we can add them! — Ram-Man (comment) (talk) 15:28, Jun 21, 2005 (UTC)

[edit] New template

What do you think of my new rambot article template here: Springfield, Kentucky? It's definitely easier to translate and I think it generally makes more sense to show such data as tables rather than text. Ausir 08:43, 21 Jun 2005 (UTC)

Let's put it this way: You can format the Polish wikipedia articles however you see fit as long as users of the Polish wikipedia feel that it is acceptable. Unfortunately for the work you did on the springfield article, it will have to be reverted because of the long standing decision to NOT use tables, unless of course you'd like to start the long discussion (again) for possibly changing it and taking votes on it and stuff like that. A historical note: I originally started putting rambot style information in tabular form, and my methods were eventually and informally voted down. Later the feeling became more solidified as more people got involved. But the Polish Wikipedia may not have such a policy, so if you want to use the Springfield article as a template for Polish articles, I *personally* have no objection. Am I being clear? — Ram-Man (comment) (talk) 13:13, Jun 21, 2005 (UTC)

[edit] Russian

Just to give you a heads up on what Russian translation would involve. I've translated the template. Basically, there are almost the same issues as with Polish. For example, we should probably get rid of miles, as no other country except US uses those. There is no distinction between town and city, and the disambiguation scheme used in the Russian wikipedia looks similar to the Polish one.

Generally, I would be happy even if we only did places that require no disambiguation (unique names).

Also, there would be the same issues stemming from multiple cases (probably exactly the same as in Polish).

Here are some unique issues:

Naming. The names should be in Cyrillic. It is mostly not an issue, since I can easily provide you with a Java function to do transliteration. There could be some issues with additional name collisions, since for example v and w would end up as the same letter. Also, some names (of major cities) might have a different (traditional) spelling. Oh, and the English name for the town and county would need to also be present.
Maps & Cyrillic. This is somewhat more complex. Ideally we'd like maps to show names in cyrillic, although we would probably settle for Latin script.

Ornil 23:57, 6 August 2005 (UTC)

It sadly seems that Ram-Man has lost interest in the translation. I e-mailed him asking for the SQL database, so we could do it ourselves. Ausir 09:22, 9 August 2005 (UTC)

If you get it from him, could you leave me a message on my talk page? Thanks. Ornil 14:32, 9 August 2005 (UTC)

[edit] Catalan translation

I've started working on the Catalan template, I'll let you know about the problems it can involve as soon as possible. --Micru 19:42, 18 September 2005 (UTC)

After a few changes, now we're in the second stage, so it's almost finished. Please, let me know what else it's needed for you to start the article creation. Thanks. --Micru 21:05, 20 September 2005 (UTC)

Unfortunately, it seems that this project is no longer active. Ram-Man, who has the database of names and statistics for these articles, hasn't been responding to our requests for some time now. – Minh Nguyễn (talk, contribs, blog) 00:53, 13 October 2005 (UTC)