Wikipedia talk:Category math feature

From Wikipedia, the free encyclopedia

Contents

[edit] Update

There is a new proposal, called "Category intersection" that is very much like this "Category math" proposal. It addresses some of the concerns noted on this talk page, and has mockups and descriptions for how it might be implemented. Please take a look. Thanks. -- Samuel Wantman 23:45, 30 August 2006 (UTC)



Discuss, don't vote. WP:VIE. >Radiant< 15:34, 5 February 2006 (UTC)

Okay, but straw polls are just to gauge opinions (Wikipedia:Straw polls). -- Zondor 15:41, 5 February 2006 (UTC)
  • Yes, but you need to get some people to express opinions first, otherwise the situation will escalate into a "for-against" segregation on one particular point, instead of working on a compromise (for instance, the "requests for rollback" poll has backfired severely). Note that this feature already exists (it works in WikiNews) but it is disabled on Wikipedia. You may want to ask the devs why; there may be server load issues etc. >Radiant< 15:44, 5 February 2006 (UTC)
    • Moved to talk... -- Zondor 15:46, 5 February 2006 (UTC)

I'd love this feature. However, I also see a need to include the "category and all subcategories" operator: Category:Norwegians/with_sub & Category:Computer scientists. Perhaps even the default should be "with_sub" - the category of Norwegians who haven't been subclassified further is a pretty boring/useless category. A number of subcategory trees would be unneccessary with this feature, but I'm pretty sure some would remain. --Alvestrand 13:32, 10 February 2006 (UTC)

I see the appeal (of "category and all its subcategories), and would like to see that too, but those kinds of joins in SQL (unknown quantity of joins as you don't know how many subcategories there are) are not fun. Creating a new structure is a possibility, where a page is linked to its category, and then records referenceing higher level categories are also created, but this creates a whole other set of problems... I think a fixed depth list is the most pragmatic solution (pages in this category and all its subcategories). I'd very much love to hear any other ideas...--Aerik 05:10, 30 March 2006 (UTC)

Why not announce this proposal on the community portal so that more people will know about it?--CarabinieriTTaallkk 21:51, 17 June 2006 (UTC)

[edit] Straw poll

This is a straw poll only to gauge opinions, not consensus decision making. Sign your vote below.
  • Support. Categories then need not be so specific, awkwardly long and esoteric, making the organisation so much better. -- Zondor 15:04, 5 February 2006 (UTC)
  • Support. Many of the discussion on categorisation project could be eliminated using this feature. Not so keen on the name, confusing with mathematical topics. --Salix alba (talk) 19:13, 5 February 2006 (UTC)
  • Conditional Support. I would like to see an assessment of the resources required by this proposal by someone involved in either the coding of Mediawiki or the Wikimedia server management. BigBlueFish 13:21, 6 February 2006 (UTC)
    • Perhaps the use of lightweight tags rather than categories can avoid stress on the servers. -- Zondor 03:55, 7 February 2006 (UTC) Categories are somewhat rather useless anyway because of its wiki description. -- Zondor 03:59, 7 February 2006 (UTC)
  • Support, also with reservations about the feature's name. Perhaps something like "Category intersection" or "Category overlap" might be more intuitive, if overly specific or technically inaccurate. Feature seems a good idea though!  David Kernow 18:33, 11 February 2006 (UTC)
  • Weak support If this idea actually worked as proposed, it would be a wonderful idea. However, as with other voters, I would need to know how it would affect performance etc. before I could put full support behind it. Chairman S. 01:54, 12 February 2006 (UTC)
  • Questions. How would we transitition into it, what with the resulting broken links?--Urthogie 16:25, 12 February 2006 (UTC)
  • Conditional Support. I believe two things are necessary. First, expert attention must be given to a user-friendly interface. Second, BigBlueFish is absolutely right about a cost/benefit assessment. PhatJew 09:01, 14 February 2006 (UTC)
  • Conditional Support. Same perf issues as people have asked about above. My hunch is that for relatively simple operations like "in cat A and cat B", it wouldn't be that big of a performance crunch. Assuming that categories are stored in a sane way, it's certainly no more complex than search. Has anyone asked Brion about this? --Dantheox 21:22, 18 February 2006 (UTC)
  • Neutral the idea is a very solid one, and would fix a lot of the issues with the current categorization system. However the interface and implementation is really the crucial factor; any change would have to be to a categorization system that is at least as intuitive as the current one, and making this feature the same would be very tricky. As it stands there are only a very few intersections that are truly useful (by nationality or location, for example) and if it takes more than one or two clicks to get to said category intersection from an article then this feature would be unnecessary clutter. Ziggurat 00:49, 20 February 2006 (UTC)
  • Yes Please even pretty please. Look at how much this would simplify John Lennon, for only one example of where it would be enormously useful. Septentrionalis 01:16, 25 February 2006 (UTC)
  • Neutral per Ziggurat. If I intersect "Films" and "History" will I get films about history, or the history of films? If we don't want to lose quality of classification we will have to keep more subcategories than we imagine: and we'll have to keep defending them on CfD from people who think that category math has obsoleted them. Plus, if all those cats that John Lennon is in are worth having as cats, then I want them all to be one-click accessible from John Lennon. So what was the advantage, again? —Blotwell 09:05, 28 February 2006 (UTC)
    • History is ambiguous so it can mean about history or historical. It needs to be defined or use more specific ones like 1970s or even more specific. -- Zondor 04:26, 1 March 2006 (UTC)
  • Support, per David Kernow. jareha 05:56, 10 March 2006 (UTC)
  • Support Q0 12:06, 27 March 2006 (UTC)
  • Support And I'd be willing to write (re-write, actually) the code, but I'm biased

--Aerik 03:49, 29 March 2006 (UTC)

  • Conditional support - I like the idea a lot and am glad to see it has been proposed... however, I was worried about saying that Category:Actors by nationality would not exist. I see it as a useful too for browsing since when I see that some categories exist I become interseted and decide to look. I didn't read the proposal reports on Media Zilla but this is my basic thought. Would it cause a performance hit? gren グレン 00:48, 16 April 2006 (UTC)
  • Support This could be used to clean up the many lists and categories in the music area. Right now, we have Category:Canadian musical groups and Category: Hardcore punk groups. But you can't query for "Canadian hardcore punk groups". There are attempts to do this manually; there's a Category:Australian punk rock groups. But the category tags for things like that aren't well-maintained. At the user level, this can be presented to casual users in the form of, say "List of Canadian hardcore punk groups". But instead of a manually maintained list, the database engine does the work. Whatever solution is chosen should allow editors to construct pages whose content is generated from a query result, as with the current category pages. Ideally, filling in the "genre" and "country" slots in a band infobox would also create the category information, and far less manual work would be required to keep the info current. --John Nagle 19:55, 27 April 2006 (UTC)
  • Support It would be immensely helpful for footballers, for example. --Runcorn 16:55, 17 June 2006 (UTC)
  • Very strong oppose This proposal seems to completely misunderstand what categories are for. It is not the case that people only use Wikipedia to search for things they know they want. The present category has immense value as a means of facilitating unplanned browsing, and it would be an unmitigated disaster to wipe out all the detailed categories as proposed. Chicheley 21:45, 13 July 2006 (UTC)
  • Conditional support, should it be implemented 'correctly', that is, as a transparent function. Categories should be simple and flowing and easy to jump from wide-spanning (eg. Category:Australians) to extremely narrow (eg. Category:Austalian Punk Band Lead Vocalists). AKismet 02:27, 7 August 2006 (UTC)
  • Support per Zondor, Salix alba, David Kernow. However, per Chicheley, it would be necessary to back-up current categorizations before implementing a new system, in case something goes horribly wrong.--Atlantima 17:26, 25 August 2006 (UTC)

[edit] Handling sub-categories

While in principle this is a good idea, we'd have to think carefully about exactly how it would work....

One issue that occurs to me is who we handle subcategories. Suppose we take two categories X and Y, then we define category Z = X intersect Y. What you'd expect this to do is that Z would hold all pages in both X and Y. Since subcategories are special types of pages, one would then expect that it would hold all categories that are subcategories of both X and Y.

However, that is often not what you'd expect. Suppose we have the following (concocted) categories:

  • Scientists, with subcategories: Physicists, Chemists, etc.
  • European people, with subcategories: French, German, etc.

Now suppose I want to make a European Scientists = European people * Scientists. You'd expect it to have subcategories "European Physicists", "European Chemists", "French Scientists", "German Scientists", etc., etc. However, in actual fact it has none of those; indeed, if Scientists and European people directly have no pages, chances are the intersection would be empty. It would only have pages directly in both categories or subcategories with both categories as immediate parents.

Someone has already suggested some sort of "subtree" operator. Firstly, as any programmer will tell you, dealing with hierarchial relationships in SQL is a big PITA. Secondly, it has the chance to turn a small reasonable sized category into something absolutely massive. Thirdly, it loses the useful internal structure of the categories.

Let me propose a solution:

  • We use for intersection the syntax Category:A/Category:B. Thus, you can intersect any set of categories you choose just by typing in the address bar "/wiki/Category:A/Category:B". You can also link to any category intersection with [[:Category:A/Category:B]]. My reason for choosing the / is so it works fine in URLs.
  • When intersecting categories, we pick up all pages in both categories, plus all subcategories of either subcategory. However, when we pick up the subcategories of A, we intersect those categories with B, and vice versa. i.e. if Category:C is a subcategory of Category:A, then Category:C/Category:B is a subcategory of Category:A/Category:B
  • Say you want to give the name (X) to Category:A/Category:B. You can edit Category:X and add some markup (similar to #REDIRECT</nowiki) to redirect the X category to the A/B category * The system should ignore as invalid any links from a page to an intersection category or a category that redirects to an intersection category * You can mark Category:A/Category:B page with a title (say using syntax like <nowiki>[[Category:A/Category:B|X]] -- thus is not a conflict with anything else since we have already forbidden intersection categories from getting members) which will be used in any lists. So, if Category:C/Category:B gives itself the name P this way, Category:A/Category:B page will output a link named P to Category:C/Category:B page
  • I think intersection is the main requirement, and supporting anything else might make it too complex than needed. But, if you really want other operators, you can extend the syntax like this: Category:A/union/Category:B, Category:A/minus/Category:B, etc.
  • If we abandon the idea of other operators, we could simply Category:A/Category:B to Category:A/B, assuming no "/" are already in category names.... (OS/2?)
  • Another alternative to the / could be :, e.g. Category:A:B. This is less likely to conflict with OS/2 and friends.
  • Also, this could work for as many terms in the intersection as you like, e.g. Category:A/Category:B/Category:C, etc., although for performance reasons we'd want some limit (2? 3?)
  • Implementation detail: we need to define a cannonical form, e.g. split on the / and sort in ASCIIbetical order, before we store in the database. This is so that when we want to find that Category:C/Category:B has name P, we just need to search on one orderring (B/C) and not on two (B/C,C/B). Also, any category intersection not in cannonical form, it should treat as if it is in cannonical form. Thus if you put text in Category:A/Category:B, and then go to Category:B/Category:A, you'll get the exact same page.

anyway, that's my proposal.

Although in a perfect world something like this is a good idea, I fear that:

  1. it would be easy to do in a way which wasn't really that useful
  2. it would be a fair bit of work to do it properly (in a way that was really useful), and its not clear to me that the extra programmer effort and complexity in the code (which then has to be maintained) would be worth the benefits

--SJK 10:58, 25 March 2006 (UTC) (I do PHP programming for a living, although I haven't really touched MediaWiki....)

[edit] this vs DPL

Isn't this similiar to DPLs? Bawolff 01:18, 27 March 2006 (UTC)

Yes, in fact, it is. Thanks. -- Zondor 04:36, 27 March 2006 (UTC)
It's worth noting that the end result of DPLs is similar, the implementation and use are different. This is an alternative, imho more powerful, way of viewing all the categorized or tagged information in a wiki - DPLs do somethign very similar to dynamically create a list of fixed parameters. I think this approach implies more general (like "not as specific" not like quantitiy) categories, and finding information where they intersect. The subcategories are the most significant hurdle of this endeaver, again imho. BTW, I wrote the implementation mentioned on meta, and would be happy to re-write it for mediawiki 1.5/1.6. I'm sure it could stand some scrubbing by a more knowledgeable PHP guy than I, but the SQL I'm using is pretty solid and more efficient that what is (or was when I checked) being used for DPLs. --Aerik 03:44, 29 March 2006 (UTC)
DPL could do much of what I'd like to do. It would help if it was extended to the point that one could express (A or B or C) AND (D). Some way to deal with subcategories would help, too. But DPL is a good first step. --John Nagle 21:44, 27 April 2006 (UTC)

[edit] The main value of categories is being completely ignored here

This proposal seems to completely misunderstand what categories are for. It is not the case that people only use Wikipedia to search for things they know they want. The present category has immense value as a means of facilitating unplanned browsing, and it would be an unmitigated disaster to wipe out all the detailed categories as proposed. Chicheley 21:46, 13 July 2006 (UTC)