Talk:Factor analysis
From Wikipedia, the free encyclopedia
Shouldn't this entry describe how to do factor analysis? It is a mathematical (statistical) procedure, after all, not just a loose idea, and there is only a hint that there is math involved at all! Why not split up the contents of the current page into a brief overview, and a (much shorter) example, with the bulk of the contents (yet to be written) in the middle? 63.145.36.242 (talk) 22:34, 3 January 2008 (UTC)
There is nothing resembling an account of what factor analysis is! It's as if the article is trying to protect the reader from reading about actual math. If I weren't very rusty on this, I'd jump right in. 131.183.81.100 21:36 26 Jun 2003 (UTC)
- I didn't realize that this encyclopedia was limited to a mathematical perspective. I admit that I am not a mathematician, but as a marketer I have had occasion to use factor analysis tecniques and that is why my article is practical and applications oriented. If you wish to add or correct some of the mathematical theory involved, then go ahead. But don't trash the article because it is not purely mathematical enough for you. user:mydogategodshat
It's not because it's not purely mathematical enough, but rather because of two things: It is written as if factor analysis is used ONLY in marketing (as if "car" were defined as "a technology for travelling from Grafton to Clarksburg", those two locations being the ONLY ones cars could travel to or from) and, more seriously, it does not even hint at what factor analysis is, or even attempt to say anything about that. Factor analysis originated in psychology and is used in biology and other fields; to say that the data are usually collected by market researchers or the like is absurd. Michael Hardy 22:20 26 Jun 2003 (UTC)
- Here is a solution. I will move the article to factor analysis (in marketing) to reflect the applied nature of the article. That way you can write a seperate purely theortical article that wont be tainted with practical considerations.
I never contemplated any purely theoretical article not tainted with practical considerations, nor do I think that would be a good idea. I do think that some attempt should be made to say what factor analysis is. I will probably attend to that within a few days. Michael Hardy 22:29 26 Jun 2003 (UTC)
"Have a computer run the factor analysis procedure" is the black box in this article: what that procedure is should be a central point, because if that procedure is replaced by something quite different, then what is being done is NOT factor analysis, even if everything now in the article remains true. Michael Hardy 22:32 26 Jun 2003 (UTC)
- If you wish to add or correct some of the mathematical theory involved, then go ahead. Any help at compensating for my mathematical deficiencies would be appreciated. - user:mydogategodshat
-
- Over the next week or two I will also be writing articles on:
- Your mathematical expertise would be appreciated there as well.
- user:mydogategodshat
From Wikipedia:Redirects for deletion
- Factor analysis -> Factor analysis (in marketing)
- Was moved because it was not about factor analysis in general, but just about factor analysis in marketing. But such a move has no use if one lets the redirect stand. Andre Engels 18:25, 4 Jan 2004 (UTC)
- Don't delete, write at least a stub or disambiguation page. Onebyone 18:34, 4 Jan 2004 (UTC)
- Move it back and have the entire article under the heading ==Marketing==. The term doesnt need to be split. --Jiang 03:46, 11 Jan 2004 (UTC)
From User talk:Angela
- I see you moved factor analysis (in marketing) to factor analysis. Let us recall some history. That was this article's original name. I complained that the article makes not the least attempt to say what factor analysis is (and please see the discussion page before disputing this). The page's first author rather belligerently and falsely accused me of wanting to write some sort of ivory-towerish account devoid of any mention of practical applications. What incited that attack I do not know. What happened was a sort of compromise. But my objection stands: the article still simply does not care what factor analysis is; it makes no attempt to even hint at that topic. Michael Hardy 02:05, 13 Jan 2004 (UTC)
- I don't dispute that the article is lacking information on what FA is, but that isn't a reason to remove the information on applications of FA. I added the subheading to make it clear that most of the article is on its use in market research. This doesn't stop someone coming and writing something more about FA that is not related to market research. I'm not sure I understand what the problem is. Angela. 02:15, Jan 13, 2004 (UTC)
Contents |
[edit] Suggestion : fuse advantages/disavantages for both
There is no reason for listing different advantages/disavantages in psychology and marketing. Indeed, the advantages/disavantages of Factor Analysis are the same, regardless of the domain of its application.
At the very least, the "Applications in psychology" advantages/disavantages should be changed. As it is, it focus only on intelligence tests, while Factor Analysis can be used in all areas of psychology, especially in Trait theory.
--Guillaume777 03:49, 6 November 2006 (UTC)
[edit] What factor analysis is
I'm looking at this again a long time after the discussion above. I'm amazed that someone would suggest that I'm trying to make this "purely mathematical" or not hint at applications. Obviously a technique about which all of the things said in the article are true could be something altogether different from factor analysis, until my edit today. The article didn't even hint at what factor analysis is, and the person who made those irrational accusations against me didn't appear to know or care what factor analysis is. But now I guilty of what I was told was a heinous crime: inserting into this article something about what factor analysis is. That that would be considered criminal is one of the more remarkable instances of irrationality I've seen. Michael Hardy 23:02, 19 Nov 2004 (UTC)
- Thanks for filling in the mathematical theory that I was not familiar with. In my opinion that is a more satisfactory approach than inserting "The factual accuracy of this article is disputed" at the top of the article because you felt it lacked "actual math". mydogategodshat 18:46, 20 Nov 2004 (UTC)
[edit] Matrix notation
The matrix notation doesn't seem to be correct, since μ cannot simply be a 10x1 vector. From my experience with principal components analysis, I'd guess that it was supposed to be a 10x1000 vector, with each column containing the average vector, but I'm not certain. --Dfalcantara 07:24, 6 August 2006 (UTC)
Actually I do not thing it should be a 10x1000 matrix, for two reasons: 1) in this case it is indistinguishable from the error matrix and serves no purpose, 2) in this case there is also no sense in which it is an average. Rather (reylying on mathematical intuition but not actual knowledge of FA - hmm, a page on FA written entirely by people who don't know exactly what it is??), I think it should be the outer product of a 10x1 vector mu, and a 1x1000 vector of 1's. This is both conformable and has the property that each row is the mean for a particular test. -J.Lewis
[edit] Common Factor Analyses versus Principal Component Analyses
Perhaps include a subsection in talking about the differences between exploratory factor analysis and principal component analysis. Based on my experience in working in Stat Lab is that students/clients frequently get them confused. Perhaps this can be added to common factor analyses. Below is my undertanding on the differences. What do you think?
Exploratory factor analysis (EFA) and principal component analysis (PCA) may differ in their utility. The goal in using EFA is factor structure interpretation and also in data reduction (reducing a large set of variables to a smaller set of new variables); whereas, the goal for PCA is usually only data reduction.
EFA is used to determine the number and the nature of latent factors which may account for a large part of the correlations among a large number of measured variables. On the other hand, PCA is used to reduce scores on a large set of observed (or measured) variables to a smaller set of linear composites of the original (or observed) variables that retain as much information as possible from the original (or observed) variables. That is, the components (linear combinations of the observed items) serve as reduced set of the observed variables.
Moreover, the core theoretical assumptions are different for both methods. EFA is based on the common factor model (FA), whereas, PCA is not.
1. Common and unique variances
- Common Factor Model (FA): Factors are latent variables that explain the covariances (or correlations) among the observed variables (items). That is, each observed item is a linear equation of the common factors (i.e., single or multiple latent factors) and one unique factor (latent construct affiliated with the observed variable). The latent factors are viewed as the causes of the observed variables.
-
- Note: Total variance of variable = common variance + unique variance (in which, unique variance = specific + error variance).
- Principal Components (PCA): In contrast, PCA does not distinguish between common or unique variances. The components are estimated to represent the variances of the observed variables in an economical fashion as possible (i.e., in a small a number of dimensions as possible), and no latent (or common) variables underlying the observed variables need to be invoked. Instead, the principal components are optimally weighted sums of the observed variables (i.e., components are linear combinations of the observed items). So, in a sense, the observed variables are the causes of the composite variables.
2. Reproduction of observed variables
- FA: Underlying factor structure tries to reproduce the correlations among the items
- PCA: Composites reproduce the variances of observed variables
3. Assumption concerning communalities & the matrix type.
- FA: Assumes that a variable's variance is composed of common variance and unique variance. For this reason, we analyze the matrix of correlations among measured variables with communality estimates (i.e., proportion of variance accounted for in each variable by the rest of the variables) on the main diagonal. This matrix is called the Rreduced.
-
- Note: Principal Axis factoring (PAF) = principal component analysis on Rreduced.
- PCA: There is no place for unique variance and all variance is common. Hence, we analyze the matrix of correlations (Rxx) among measured variables with 1.0s (representing all of the variance of the observed variables) on the main diagonal. The variance of each measured variable are entirely accounted for by the linear combination of principal components.
Also see Principal Component Analysis
(please bare with me, I am new to contributing to wikipedia).
RicoStatGuy 16:15, Sept 30, 2006(UTC)
- Hei, I have not read your comments, but I agree with you. I just read an article in a journal where the people seem not to understand the difference between both methods. This book explains the difference:
- Jackson J E 1991 A User’s Guide to Principal Components. New York, John Wiley and Sons
- (i.e. the difference in application and in the criteria to be maximized) 136.159.65.84 (talk) 00:45, 15 March 2008 (UTC)sstein
[edit] Advantage?
"Allows for a satisfactory comparison between the results of intelligence test" Not sure this is true without some more assumptions (such as all intellegence tests are conducted on random samples from the same population) Or at the very least I'm unclear what is supposed to be meant here. Cydmab 06:21, 6 October 2006 (UTC)
Can someone add to the example section? Dheerajakula 04:40, 12 January 2007 (UTC)
[edit] Introduction Not Suitable for a Layman
Factor analysis is a very simple concept. You have a number of different variables which may be products of a single variable we can't measure - for instance, number of smiles, laughs and cheery whistles per day might all be products of the variable "happiness" - and factor analysis is a way of identifying whether this is the case or not. Anybody can understand that without prior knowledge about statistics, yet the introduction to this article has the potential to confuse a layman because it never outlines the basics in easy to understand language. This isn't unique to this article, it seems to plague all of the statistics articles on wikipedia and in my opinion, they should be simplified greatly so that those with no knowledge of the subject can gain a foothold. This doesn't preclude us from adding in detailed sections later on in the article, but please keep in mind that this is supposed to be intelligible to your average housewife/manual labourer, not college professors. Blankfrackis (talk) 17:03, 5 May 2008 (UTC)