Talk:Biclustering
From Wikipedia, the free encyclopedia
Either this page should be rewritten to exclusively address biclustering algorithms or it should be merged with bicluster. IMHO, It doesn't fit WP's style to keep "bicluster" and "biclustering" separate. --Beefyt 01:14, 8 October 2006 (UTC)
[edit] A note on edit
There was a statement "The term was first introduced by Cheng and Church in gene expression analysis, although the technique was originally introduced much earlier by J.A. Hartigan."
Cheng and Church stated in their article that The term biclustering has been used by Mirkin (1996) to describe "simultaneous clustering of both row and column ets in a data matrix"., therefore I modified the sentence as "the term was first introduced by Mirkin (...)". I do not know if Mirkin actually was the first to introduce the term biclustering, but as he didn't have the definition as a citation (p. 144), I assume he was. Different question is whether every term should have a separate page, rather than just mentioning under a page describing the general concept/technique. To the best of my knowledge, there does not exist such general concept/technique page in Wiki where biclustering would fit and creating such will not be an easy work.
Also I added a reference to the statement "was originally introduced much earlier" - isomorphism of problems is seldom straightforward, therefore i thought it is better to have a reference for futher reading. What concerns the modification of the Hartigan part - Mirkin stated that Hartigan "was the first who considered the problem explicitly" under "Single Cluster Biclustering" section. As this statement concerned only "Single Cluster Biclustering" and he distinguished several biclustering counterparts: single cluster biclustering, partitional biclustering and hierarchical biclustering, the generalization of "Hartigan introduced the technique of biclustering" is not correct without a proper reference (All material that is challenged or likely to be challenged needs a source, Wikipedia:Citing_sources). Innar 21:23, 10 March 2007 (UTC)
- The word "biclustering" appears in a paper from 1974, "Graphical interpretation of water quality data" by Mahloch. Unfortunately I don't seem to have full text access to it so I can't tell whether it is used in the same way in that paper, but it's much older than your Mirkin reference. So I'm not at all convinced that Mirkin invented the word, as your edit seems to imply. —David Eppstein 21:42, 10 March 2007 (UTC)
-
- By extending my edit with the discussion I actually tried to imply that I do not have older references for the term (regardless of the intuition). Mahloch's paper does show up when searching "biclustering" with Scholar, but I do not seem to have full access either (or the PDF is corrupt, as it seems to be only 4KB in SpringerLink). Abstract looks good and promising (graphical representation + interrelationships within the data), so it would be interesting to read indeed - if anyone has the access, please do let us know (I will keep trying to access it as well).
- However, my personal standpoint is that setting the facts correct with the background of a specific term (synonym) is valuable and needed, but it would be even more important to describe the interrelations on more generalized and abstract level. Your edit in October 2006 was definetly a step towards that.Innar 00:57, 11 March 2007 (UTC)
- From Jerome L. Mahloch, Graphical interpretation of water quality data, J. of Water, Air, & Soil Pollution, Springer, 1974:
- "Biplotting is a technique which yields a two dimensional approximation to the data matrix through canonical decomposition."
- "The application and development of the biplot and its associated decomposition utilized in this paper follows from the work of Gabriel (1971, 1972a, b)."
- "The biplot shows that river systems within a basin form distinct clusters which are related to their water quality. The sampling stations for the Tallahala Creek form one cluster and the Leaf and Chickasawhay Rivers each form two separate clusters. Biclustering of these sampling stations indicates changing water quality conditions aong the stream."
- Ref:
- Gabriel, K.R.: 1971, Biometrika 58, 453.
- Gabriel, K.R.: 1972a, Candec Computer Program, Dep.of Statistics, The Hebrew University, Jerusalem.
- Gabriel, K.R.: 1972b, J. App. Meteor. 11, 1071.
- Innar 10:24, 29 March 2007 (UTC)
- From Jerome L. Mahloch, Graphical interpretation of water quality data, J. of Water, Air, & Soil Pollution, Springer, 1974:
[edit] The figure
Shouldn't figure (e) has multiplicative, instead of "model", in the parentheses of the annotation text? Took 14:24, 6 August 2007 (UTC)