Wikipedia talk:Overcategorization

From Wikipedia, the free encyclopedia

[edit] Categorising by national descent, origin, nationality, etc.

I have been struggling for a while now to come up with some clearer guidelines for how the following categories ought to be applied to people so as not to create inconsistencies and/or overcategorization:

Then there are subcats like Category:People by occupation and nationality and "Foo-ian Goo-ians" (see, e.g., Category:Canadian Americans) that further confuse the issue. I don't have a specific proposal, but am hoping to generate a discussion that might lead to a new paradigm. I have some ideas to throw into the ring:

  1. Let's spell out the difference between "national descent," "national origin," and "nationality."
  2. "Nationality" often seems to be used interchangeably with "citizenship"; maybe we should call it that.
  3. Apply the "Foo-ian Goo-ian" category consistently --- does it mean "people by ethnic or national descent" (like Ted Kennedy is categorised as an Irish-American politician)? or people by nationality (like Alanis Morissette is categorized as "Canadian American" because she has become a naturalised U.S. citizen while retaining her Canadian citizenship?) With respect to the latter example, I might prefer to categorise such people as "Foo-ian immigrants to Goo," as Michael J. Fox is categorised in Category:Canadian immigrants to the United States. There is no logic to the discrepancy between Morissette and Fox. I'm not sure what the answer is -- I would just like to argue for consistency and logic.And there is not much of a clear distiction between the use of "Irish-American" as a descriptive term in a subcat and "Canadian American" as a subcat.
  • To clarify this, Category:Irish-American politicians is a subcat of Category:Irish-Americans. I picked Kennedy because he is an obvious example of someone thought of as Irish American; other Kennedys are categorised in the main cat. In the case of the Kennedys, the use of the term fits within "national descent"; the family has been in the U.S. for several generations. Despite the common usage of the term, "Irish American," would it be clearer , from a categorisation stand=point, to create Category:Americans of Irish descent. See, for example, Pamela Anderson, who was born in Canada and is of (partial) Finnish descent; she is categorised as "Canadians of Finnish descent" and "Finnish-Americans." I know the former category was recently created by an industrious editor; is it more useful than the latter?

Also, as we have touched on before, is the intersection between occupation and nationality really useful, especially if nationality is used to represent citizenship? To go back to an earlier discussion about Michael J. Fox, is it meaningful to double up on categorising him in all of the following: "Canadian film actors | Canadian television actors | Canadian voice actors | Canadian child actors | American film actors | American television actors | American voice actors | American child actors," etc?

Thoughts?--Vbd (talk) 02:46, 23 March 2007 (UTC)

In professional astronomy, I see no reason to categorize astronomers by nationality. Modern professional astronomers may work in multiple countries during their lifetimes, frequently outside of the countries where they were born. The locations where astronomers worked or where they were educated are more important than their nationalities. Dr. Submillimeter 14:58, 23 March 2007 (UTC)
While I agree with Dr. Sub above that knowing someone's nationality or place of birth isn't as important as other aspects of the person's biography, I think there are a substantial number of readers who are specifically interested in reading about people from their own country or state or region. There is a sort of cultural pride in knowing who the "important hometown people " are in your area. So even though in the astronomy field it's true that these people move around and work in multiple countries, at the same time I think a fair number of readers probably find it interesting to be able to find an index of famous astronomers and scientists that hail from their own area.
Of course, that doesn't directly answer the main questions asked in this thread about whether and when these categories should focus on place of birth versus nationality, but I think it does illustrate at least one reason why these profession-by-geographic-area categories are worth keeping in some form or another. Dugwiki 16:18, 23 March 2007 (UTC)
Thank you, Vbd for alerting me to this. My hobby horse in this issue is that people get categorised according to an accident of birth/technical citzenship, rather than culture and identification. For example, Nicole Kidman was born to Australian parents, grew up in Australia and identifies strongly as Australian, but because she was born in Hawaii, is consequently a US citizen and lives part of the time in the US, she has been given the categories "American adoptive parents | American Australians | American film actors | People from Honolulu "! Another example is an obscure biography subject, Patrick Stanley Vaughan Heenan, a spy for Japan/traitor to the UK in WW2. He was born in New Zealand, but as an infant moved to the British crown colony of Burma, where his mother remarried to a Briton when Heenan was two. He was consequently brought up in Burma and England. It appears that he never returned to NZ, and yet some helpful soul has added him to "New Zealanders of World War II". Now, even if we could establish that his mother was a born and bred New Zealander, I think it's stretching that category to refer to him as such. Grant | Talk 16:15, 27 March 2007 (UTC)
I know Nicole Kidman has been the subject of heated debate relating to these issues. I am curious, Grant . . . would it matter to you if she had became a U.S. citizen by choice, rather than by an accident of birth? Your focus is on her sense of self-identification or cultural identity as Australian, which in many instances may be difficult to ascertain. As Kevlar points out below, citizenship is much more easily determined. Kidman's case is unusual because she is not an immigrant to the United States, like MJF is.--Vbd (talk) 06:30, 28 March 2007 (UTC) (BTW, I am shaking my head in response to your tale of Heenan being categorized as a New Zealander.)
I think it's realistic and reasonable for people to have multiple national/cultural/citizenship categories in Wikipedia, as is the case with Michael J. Fox. That is, it's more than reasonable to categorise someone as American if they have voluntarily taken out U.S. citizenship as an adult, although to me that doesn't preclude them retaining a pre-existing national identity, or possibly even several national identities. I just question the basis for the assignment of many categories. I guess I don't have a problem with Nicole Kidman among "Australian Americans", since she lives in the US most of the time. But to classify Kidman with "American Australians" I think is bizarre, as is "People from Honolulu". I wonder how much time she's spent in Hawaii since she was a toddler! Likewise, Pamela Anderson's Finnish ancestors emigrated to Canada not the USA and I'm not aware of any Finnish cultural dimension that she has taken to the USA (!), so while "Finnish Canadian" may be warranted, I don't see how "Finnish American" is. Grant | Talk 08:29, 28 March 2007 (UTC)
Once again, thanks for raising this and alerting me, Vbd, it seems we have several editors working at cross purposes here. This is what I would like to see: a clear distinction between ethnicity and citizenship with the word nationality avoided when ever possible as ambiguous. This is especially pertinent when it comes to multi-national or multi-cultural entities like India, the Soviet Union, or Austria-Hungary. One could be a Tamil from India, or Tamil from Sri Lanka, or a Tamil from Canada; or a Ukrainian from the Soviet Union, a Ukrainian from Poland, or a Ukrainian from Ukraine, and so on. People’s citizenship is a legal fact that can be easily documented. People should be categorized by citizenship first. Secondarily, it may be necessary to clarify a person’s ethnic origin. For your example of actors, the relevant thing would be "where do they work?" So it becomes Category:Actors in the United States which is quite different from Category:American actors . Actors are people who may happen to work in any number of places, but they place doesn’t “own” them. I will have more for a filled out proposal on this later.Kevlar67 02:09, 28 March 2007 (UTC)
Further to the above. People who migrate from one country to another need to categorized by the place they left and the place they went to. But this doesn't mean that others won't show up!! E.g. an ethnically Ukrainian person born in Poland who moves to Canada, would be in both Category: Polish immigrants to Canada and a Category: Ukrainian Canadians.
Also, categorization by birthplace or place of residence needs a whole other discussion (e.g. Category: People from Tokyo). There are many, many, many examples of this so it needs to be treated separately.
Lastly for artists who work in a langue-related art (vocal music, literature, film), Categorization by language is at least as important as nationality if not more so. Tamil-language cinema is recognizable body of work but occurs in India, Sri Lanka, Canada, etc. French-language cinema occurs in France, Canada, Belgium, Switzerland, Morocco, Congo, etc. (See Category:Occupations by language and Category:Languages by occupation). Kevlar67 04:31, 30 March 2007 (UTC)
Hi Kevlar, you say an ethnically Ukrainian person born in Poland who moves to Canada, would be in "Polish immigrants to Canada". I think this is fine, as long as such a person spent a significant amount of time in Poland. If he/she was simply born in Poland to parents who were passing through, I don't think that's really meaningful, unless he/she Poland automatically grants citizenship to those born there, which is not something that all countries do. Grant | Talk 08:27, 30 March 2007 (UTC)
Well that's why I recommend categorization by citizenship rather than nationality. If a person is a citizen of Poland, that is a verifiable fact. If we have proof they self identify as something else that's great too, but citizenship should be the first concern. Kevlar67 03:22, 4 April 2007 (UTC)
One of the problems with this discussion is that people are trying to describe how they themselves would interpret the categories and not how the categories would be interpreted by everybody. For example, someone could say that "Polish immigrants to Canada" should be used for people who spent "a significant amount of time in Poland", but the average editor is not going to know that. Additionally, someone could add that criterion to the category, but the criterion could be removed later by someone else. If these categories are going to be used, then they will used for all immigrants. "Restricting" the categories' use is unrealistic. The categories should either be deleted or used in the broad contexts that their titles suggest. Dr. Submillimeter 09:19, 30 March 2007 (UTC)
But "the average editor" shouldn't be adding categories on the basis of what they don't know. Categories should be added on the basis of significant facts, such as citizenship and the national culture with which the subject of a biography identifies. If someone is born in X to parents from Y, leaves as an infant, grows up in Z and has no significant contact with X thereafter, then I don't think they should be put in categories relating to X, unless we have a category such as "People born in X". Grant | Talk 16:34, 30 March 2007 (UTC)
the immigrant case is straightforward - if that particular country grants citizenship for anyone born there then... and if they don t then they are not immigrants of that country - this info should be perfectly ascertainable. as for someone born in X who never returns there, as long as they have citizenship they are from X and if they don t, they re not. We have a problem though don t we as things stand for categorizing someone who grew up in a national culture but never gained that place's citizenship Mayumashu 02:08, 3 April 2007 (UTC)

I m with Vbd (kudos for getting this started) and Kevlar that people should be categorized based on parentage and lineage where self-identification an accessory as it is only occasionally verifiable. I agree with Vbd too that Category:Americans of Irish descent. I think that Category:Irish-Americans could be kept to hold verified self-identification cases and possibly those whose entire parentage/lineage in of that particularly descent and as such would be a sub-category page, although I wouldn t mind seeing this go as it s likely to be "abused". Kevlar's point that "nationality" is ambiguous and "citizenship" should perhaps be a replacement is interesting and a change I would support. Mayumashu 02:08, 3 April 2007 (UTC)

I m in favour of a "double stream" categorizing system that would categorize all people according to occupation regardless citizenship, to the level for instance of the kind of actor one is Category:Canadian television actors describing Canadian TV and not Canadians on TV, Category:Child actors, Category:Bollywood actors, etc. and then by nationality but only to the level of occupation in general (Category:Indian actors, Category:Indian scientists, etc.), permitting massive listed cat pages and reducing cat clutter Mayumashu 02:19, 3 April 2007 (UTC)

Similarly, Category:Expatriates and its sub-cat pages should probably be done away with. Having these combined with occupation seems an unnecessary duplication for one. I started them up as there were a few pages in place for non-Asians in the Far East (probably the result of pseudo-racism, actually based as much on cultural difference as racial - I ve lived here 10 years) but not for other expat groups in other places. The fundamental problem with these is that setting a period a time for which a mere stay becomes residence is wholly arbitrary. Again, citizenship and parentage/lineage are the only solid basis for categorizing people and origin, and occupation should be largely independent of these. Mayumashu 02:57, 3 April 2007 (UTC)

If we keeps the expats pages, then it should be for instance Category:People in Zimbabwe and not Category:Zimbabwean people. Then the problem is though Zimbabwean citizens not resident to Zimbabwe do not fit. I think again the expat cat pages should be removed.Mayumashu 03:37, 3 April 2007 (UTC)

Grant65's contribution hits on a fundamental flaw with things as they stand - someone who spends sigificant time in a country but does not gain citizenship - it s admittedly a rare case (Robert Goulet is one), how should they be catted? Should they slip through? (As it stands they are expats.) Mayumashu 03:41, 3 April 2007 (UTC)

Here's the rub. Someone can be in a place without being of a place. You can be an Actor in the United States, without being an American actor. You can be a sportsperson in the UK without being a British sportsperson. That's something we have yet to address. Kevlar67 03:29, 4 April 2007 (UTC)

This has been a long festering problem with categorization, that seems to be getting worse because there are so many different ways of relating people and place. I have been long bothered by the tendency to divide people by nationality, citizenship, etc... when it is not relevant to the subject, which is usually occupation. If someone is a politician it makes sense to have politicians by the country of citizenship. If we are talking about film actors, the most important distinction is probably actors by language. When discussing "British theatre actors", we mean "British theatre" and not "British actors". Many occupations are centered around a location (Law, politics, live theatre, etc...) but many are international. Why do we divide Scientists by nationality? If I am looking for Biologists, I'd only be interested in British biologists if they specialize in the flora and fauna of Britain, otherwise nationality seems irrelevant, and dividing categories by nationality separates people arbitrarily. Deciding whether Chemists should be divided by the country in which they were born, the country they identify with, the country in which they now reside, or the country stamped on their passport seems like the wrong question. Why are we dividing them at all if they go together?. So I'd like to propose that we get rid of categorization by nationality, citizenship, country of origin, and country of residence unless it is a defining characteristic.

This would mean that there would be some very large categories. There has been resistance to having very large people categories, but I don't understand why. Navigating through a category of people alphabetically is easier than navigating through a category divided arbitrarily by nationality. If professions are international, it makes sense to combine all people together.

There seems to be overwhelming desire to categorize by nationality. We can still do this. I'd make these huge as well, Like "Citizens of the United States".

So for someone like Roman Polanski, I'm advocating the following categories: Citizens of France, Polish-language film directors and English-language film directors instead of French people, French actors, French film directors and English-language film directors. He is known for being a director of Polish and English language films. He's not know for being a French actor or French film director, and both of those terms are very ambiguous. His article doesn't mention him being an actor, so he shouldn't be categorized as one. He may be a citizen of France, and live in France, but it seems wrong to call him "French".

We need to be careful about how we hyphenate, especially when the hyphens are just implied. There is a big difference between "French-film actor" and "French film-actor". I don't think we should have a category for "French film-actors". Considering the international nature of the film industry, I think "French-film actor" is not needed, but I'll concede that the need for this one is debatable.

If category intersection ever come to pass, we'll be likely be making these changes anyhow. I think there is value in starting the transition now.

--Samuel Wantman 09:39, 5 April 2007 (UTC)

[edit] Arbitrary inclusion criterion or Significant Thresholds?

The examples given at the arbitrary inclusion definition are non-round numbers, and unexceptional thresh-holds: is it also the intention of this guideline to prohibit categories that collate those who share the distinction of having crossed a threshold figure (typically a power of ten or 5 times such a number), which marks a significant acheivement? This would include Category:300 hits club and Category:Footballers with 100 or more caps, both of which are currently listed for deletion. Personally, I am undecided: if all numbers are deemed to be arbitrary, then categorisation of million/billionaires, and even centuries and decades, would have to go; on the other hand, is the 100th/1000th/100000th iteration different in kind from the one that preceded it? Kevin McE 11:21, 24 March 2007 (UTC)

  • In general such thresholds are as arbitrary as "295 hits club" and so forth. It may be different if you get some kind of special prize for reaching 300. The decades/centuries categories aren't really arbitrary, they're meta-categorization to make it easier to find specific years, and themselves don't contain articles. As for millionaires, we indeed don't have a Category:Millionaires. >Radiant< 09:43, 26 March 2007 (UTC)
  • I don't know about just deletion, but definitely listify. The criteria seems to be spread across several different monetary denominations. Though if that were fixed, I might vote "Keep". - jc37 14:29, 3 April 2007 (UTC)

[edit] Links and categories

There seems to be a dichotomy between those who are looking to hone categories into encyclopedic taxonomies and those who are looking for a tagging system in which they can do keyword searches. The more we push at removing overcategorization, the more there is a need for a simpler tagging system. If we can answer that need, it might make everyone happier. Towards that end, I've written up a proposal to do keyword searching based on wikilinks. Please take a look. I'm calling it Wikipedia:Link intersection. Comments would be appreciated. Thanks, -- Samuel Wantman 06:47, 6 April 2007 (UTC)