Optimal classification
From Wikipedia, the free encyclopedia
Optimal classification is an arrangement of attributes for a set of elements in an Attribute-value system which minimizes the number of attribute queries necessary to identify a particular element within an element set.
The algorithm used for this purpose first sorts the elements in ascending order according to the value of each attribute as determined by its current position in the sequence of attributes. An empirical separatory value is then calculated for each characteristic in the target set and compared to the highest separatory value calculated for all preceding sets. This process repeats until an optimal sequence is found, all permutations have been evaluated or a time limit expires.
The number of elements, which are classified as well as the rapidity with which they can be identified, is dependent upon both the radix and the exponent of the data. A larger radix will allow faster identification by means of excluding a greater percentage of elements per characteristic. A binary radix for instance excludes only fifty percent of the elements per characteristic whereas a five-valued radix excludes eighty percent of the elements per characteristic.
What follows is a rigorous explanation of the algorithm.
Contents |
[edit] Truth Table Related Equations
- G = VC, where:
-
-
- G is the group size or total number of elements in the group,
- V is the highest value of logic in the group,
- C is the highest number of characteristics in the group.
-
- R = VK, where:
[edit] Element Related Equations
[edit] Maximum Number of Pairs of Elements to Separate
Maximum number of pairs of elements to separate refers to a matrix in which each element is compared with every other element to determine the number of pairs that are separable or disjoint. Pairs are separable or disjoint whenever the logic values of the elements that make up a pair are different. In theory, therefore the maximum possible numbers of pairs that can be separated is determined by the following equation:
- , where:
-
-
- pmax is the maximum number of pairs to separate, and
- G is the total number of elements in the group.
-
[edit] Order of Elements
The elements are arranged in descending order according to their truth table value, i.e., the value calculated as the sum of each characteristic's logic state value times the highest value of logic raised to the power of the order of the characteristic.
- , where:
-
-
- ei is the element truth table value in the group,
- C is the highest number of characteristics in the group,
- V is the highest value of logic in the group,
- v is the value of logic of each characteristic in the group,
- j is the jth characteristic index, where:
-
-
-
- j = 0..K and where:
-
- K is the number of characteristics in the set,
-
- j = 0..K and where:
-
-
-
- i is the ith element index, where:
-
-
-
- i = 0..G and where:
-
- G is the number of elements in the group.
-
- i = 0..G and where:
-
[edit] Characteristic Related Equations
[edit] Theoretical Separation
[edit] The General Identification Equation
- , where:
-
-
- Sj is the theoretical separatory value per jth characteristic,
- C is the highest number of characteristics in the group,
- V is the highest value of logic in the group and
- j is the jth characteristic index in the set, where:
-
-
-
- j = 0..K and where:
-
- K is the number of characteristics in the target set.
-
- j = 0..K and where:
-
[edit] Minimal Number of Characteristics to Result in Theoretical Separation
- , where:
-
-
- tmin is the minimal number of characteristics to result in theoretical separation,
- G is the number of elements in the group and
- V is the highest value of logic in the group.
-
[edit] Empirical Separation
(Please note that application of these equations can not be expressed entirely using conventional mathematical notation or normal MathCAD expressions without including a MathCAD function or MathCAD user program. The equations have been fully implemented using the Zbasic and the Visual Basic programming languages.)
- , where:
-
-
- Sj is the empirical separatory value,
- j is the jth characteristic index,
- G is the number of elements in the group,
- R is the number of elements in the set, where:
-
-
-
- R = V^K, where:
-
-
-
-
-
- V is the highest value of logic in the group and
- K is the number of characteristics in the set.
-
-
-
-
-
- i is the truth table index value of the set, where:
-
-
-
-
- i = 0..R and
-
-
-
-
- , where:
-
-
-
-
-
- ni is the truth table value of the ith element in the set
- v is the logic value of the jth characteristic.
-
-
-
[edit] Flag Recognition - Application Example
(A flag identification example from Neural Network Identification example.)
Although this example lacks an intuitive sense of optimization it still serves as a good example of how optimization can reduce the number of queries which must be made.
[edit] Flag Overlay Grid
Designated areas (characteristics) for sampling background colors (states) of all flags (elements).
[edit] Original Order
[edit] Systematic Query
Starting with area "A" the query begins by asking for the color in this area of the flag. Suppose we have in our possession the flag of the Netherlands. The answer to the first query in regard to area "A" is RED which would remove 2/3 of the flags from further consideration. The next query for the color in area "B" would be RED which would serve to eliminate none of the remaining flags. In fact, since the colors in columns "D", "E" and "F" are the same for each remaining flag, we would not be able to eliminate any remaining flags until column "G" where the color BLUE would provide a unique answer to the final necessary query. Here all remaining flags except the flag of the Netherlands would be eliminated from further consideration. It would therefore take a minimum of seven queries using the systematic query method to establish the identity of the flag in our possession as belonging to the Netherlands.
[edit] Optimized Order
[edit] Minimized Query
The results of optimization are shown above and include a listing of the theoretical and empirical percentages. The original characteristic sequence is indexed in the bottom row. Starting with area "G" the query begins by asking for the color in this area of the flag. Suppose we have in our possession the flag of Ireland. The answer to this first query would be GREEN. The next query is for the color in area "F" to which we would answer ORANGE. Since no other flags have this combination of GREEN and ORANGE in these areas our query can end here. The minimized query algorithm has optimized the order of characteristics and minimized the number of queries that are required to identify the flag. (Please note that there may be more than one optimal solution.)
[edit] References
[edit] Primary Reference
Biological Identification with Computers edited by R.J. Pankhurst, British museum (natural history) London, England proceedings of a meeting held at Kings College, Cambridge 27 and 28 September 1973 of the Systematics Association Special Volume Number 7 and published by the Academic Press 1975 noting the work of Eugene W. Rypka, Dept. of Microbiology, Lovelace Center for Health Sciences, Albuquerque, New Mexico, "Pattern Recognition and Microbial Identification."
-
- Eugene Weston Rypka passed away on April 27, 2006. Gene was born on May 6, 1925 in Owatonna, MN to Charles Frederick and Ethel Marie Rypka. He served in World War II as a paramedic in Iwo Jima and received several medals and commendations. In 1958, Gene received a Ph.D. in Medical Microbiology from Stanford University. He had a long and distinguished career, including work with Russian scientists at Lovelace Medical Center and the University of New Mexico. Bicycle racing was a lifetime love and occupation, and in later years, he also studied martial arts.
[edit] See also
[edit] Potential applications
Medical, legal and other discipline troubleshooting charts.
[edit] External links
- CLASSIFICATION OF LIVING THINGS
- LIBRARY OF CONGRESS CLASSIFICATION OUTLINE
- North American Industry Classification System (NAICS)
- 2000 Mathematics Subject Classification
- Standard Industrial Classification (SIC) System Search
- U.S. Patent Classification (USPC) System
- The Classification Society of North America (CSNA)