Talk:Statistical classification
From Wikipedia, the free encyclopedia
[edit] Link to peer reviewed paper
Hi, I recently added some new information regarding the coparison of various classification techniques with a reference to a peer reviewed article. There seems to be some controversy on this subject, the link has been removed several times. I am currently doing my PhD on this topic an I know the information is very relevant.
Is the reference and the external link http://www.pattenrecognition.co.za suitable for this site? If not, what can I do so that this information is not repeatedly removed?
cvdwalt —The preceding unsigned comment was added by 155.232.128.10 (talk) 07:03, 9 March 2007 (UTC).
[edit] Merging/reorganizing Pattern Recognition and Statistical classification
Regarding Statistical classification/temp, I'm puzzled how to integrate it into existing articles. Check out Classification: there are two types of classification, Taxonomic classification and Statistical classification. I think you may be haveing sex about taxonomic classification, but I'm not sure. We've made a distinction: taxonomic classification is based on human decision-making, while statistical classification is based on algorithmic decision-making.
-- hike395 July 1, 2005 17:55 (UTC)
I checked out the existing pages again. I'm talking about algorithmic decision making in the temp article--the classification of items into groups based on numerical/statistical analysis using some algorithm. My issue is that underthe death to arytjur67y23 n45687yb0 the category of algorithmic decision making, the topic is discussed only in terms of pattern recognition/machine learning and there is no general explanation of what statistical classification is and does. Algorithmic, (or computational, or numerical or statistical--I use them synonymously) classification can be, and is, applied to all kinds of things. So I think an overall summary of what stat. classif. is and does--how it works, it's underlying ideas, types of approaches, specific algorithms etc., is needed. Particular applications can then be discussed after that. To jump to a pattern recognition application immediately is just getting too specific too fast on just one of many many applications.
Also, the applications listed under taxonomic classification are not necessarily based on human decision making. Some of them can be algorithmic as well, such as phenetics-based classifications of organisms.
I don't disagree with your idea to break the topic into human vs algorithmic-based procedures, but I think some work needs to be done to make everything clearer. What I wrote needs to be expanded on for sure, but is a basic intro which can be built on I hope.
Jeeb 2 July 2005 00:18 (UTC)
- What a conundrum. I've thought about it, and I agree with you: Statistical classification should be the main article about classification in statistics. The problem is: what to do about Pattern recognition? Here are three issues: 1) lots of pages link to pattern recognition. If it turns into a redirect, it would surprise a lot of people; 2) if it is too similar to statistical classification, they will slowly evolve to have different/conflicting information (given that Wikipedians are not thorough about checking for redundant articles, and 3) ...
- Issue 3 is a doozy, and it goes back to the sociology of AI research. AI research goes through boom/bust cycles that seem to last 10-20 years. Each cycle generates a new name. In the 1950s and 1960s, the statistical AI approach was called pattern recognition (especially applied to computer vision tasks). In the 1980s, it was called neural networks (and it was vaguely neuromorphic). In the 1990s, it was called machine learning. In each cycle (except for machine learning?), the researchers overpromised and their area fell into disrepute. The name fell out of favor, except for those die-hard people who stayed with the same techniques. Thus, we still have pattern recognition conferences (ICPR), neural network conferences (IJCNN), and machine learning conferences (ICML) that all co-exist.
- So, I think that we should rewrite pattern recognition to be a more historical/sociological article about statistical AI, rather than a listing of techniques.
- The problem is, it's an enormous undertaking, and people may not fully agree. I can take a stab at making a stubby start of the article. The problem is that, without a lot of meat in the article, it may drift into replicating statistical classification. Also, we would need to find sources for the histoy of pattern recognition, which is somewhat tricky.
- -- hike395 July 7, 2005 06:05 (UTC)
- More data! Check out the FOLDOC definition of Pattern Recognition. They distinguish PR from statistical classification by 1) claiming that PR is a subfield, 2) PR systems solve the whole problem (including pre-processing), and 3) there are non-statistical classification approaches to PR (including syntactic classification, which I had forgotten about). -- hike395 July 7, 2005 15:38 (UTC)
-
- ...and I realize, on re-reading pattern recognition (PR) that I had been thinking of it as synonymous with image analysis when I made my initial comments and wrote the temp article, but the article makes it clear that PR is broader than just image analysis, which I agree with. Nevertheless, I think PR and statistical classification (SC) are different because of (1) your comment that PR can involve non-statistical (e.g. syntactical) approaches, and (2) SC (and PR) can be unsupervised (the PR article as written focuses on training sets and mapping a set of items onto an appropriate classification label using such sets--which means it is talking only of supervised classification procedures. But classification can also be unsupervised, with the labeling of classification groups coming later via some independent, non mathematical procedure). So in some respects PR seems to me broader than SC, and in other ways narrower, so I'm not so sure that PR is a subfield of SC; I'm prone now to think it's actually broader, but at any rate, I think they're certainly different enough to warrant separate articles.
-
- Including the historical evolution of PR sounds like a good idea, but I think some info on methods and techniques should be included as well, because PR seems to me to have important and distinguishing elements (like the incorporation of syntactic or contextual information that you mention). (It is in that respect especially that I think PR is broader than stat. classif., which never, to my knowledge, deals with syntactical information or the whole concept of topological relationships among items or groups).
-
- How about two separate articles without any redirects, justified by clear distinctions between the two in the articles--simply remove the redirect from SC to PR that now exists, put the existing SC-temp article where SC now is, and then continue to edit the two articles using this (and future) discussion as a basis for it? No links to PR would be affected that way, and any existing links to SC would not redirect a reader to PR. As for the enormous undertaking, I think this minimizes it because we can just slowly continue to revise the two existing articles as we discuss the relationship between the two topics...
- Jeeb
-
- Re-reading the temp page material, I realize that it uses terminology that is not standard in either machine learning or pattern recognition. The temp page is fundamentally about clustering, not statistical classification. It assumes that there are no fixed, pre-defined classes. Instead, the data guides the creation of the classes. The temp page describes both partitional clustering and agglomerative clustering.
-
- Remember, the essence of classification algorithms is that they take a training set that has both input data and labels. A clustering algorithm only takes input data (no labels). What we have here is a good introduction to clustering. I'll look over there and see if it fits in. -- hike395 06:46, July 26, 2005 (UTC)
-
-
-
- Ah, good work! I agree that what I wrote in "temp" falls into what many would call "cluster analysis". However I don't think it's a clear cut distinction, because for example, I believe it is common for satellite imagery analysts to use the terms "supervised" and "unsupervised" classification in their work, the former correpsonding to your definition of classification, the latter to your definition of clustering. The statsoft online textbook, which I use as a main reference, follows your line of argument in that they have cluster analysis as a chapter separate from classification, and the description therein supports your idea. On the other hand their definition of classification in their glossary (http://www.statsoft.com/textbook/glosfra.html) mentions nothing about putting things into a priori labeled classes, and their def of cluster analysis says it's a "classification algorithm". I would like to see this definitional fuzziness cleared up, but I'm willing to go with your ideas in the meantime.
-
-
-
-
-
- I think the statistical classification article still needs a bit of elaboration on the intro before the math and details that follow, although I see you worked on it some. Also, I think "numerical classification" is used as a synonym, particularly in taxonomic applications, and I think that term is actually more accurate in some ways.
-
-
-
-
-
- As for the temp article, I don't know if some of that can be useful in the clustering article or not, but I think so.Jeeb 04:39, 3 August 2005 (UTC)
-
-