Structure Mapping Engine

From Wikipedia, the free encyclopedia

The Structure Mapping Engine is an implementation in software of an algorithm for analogical matching based on the psychological theories of Dedre Gentner [1983]. As of 1990, over 40 projects had used it [Falkenhainer, 2005]. RM French said that Structure Mapping Theory is "unquestionably the most influential work to date of the modeling of analogy-making" [2002].

SME is useful because it ignores surface features and finds matches between potentially very different things if they have the same representational structure. For example, SME could determine that a pen is like a sponge because both are involved in dispensing liquid, even though they do this very differently.

Contents

[edit] Structure Mapping Theory

Structure Mapping Theory is based on the systematicity principle, which states that more connected knowledge is preferred over independent facts. Therefore, SME should ignore isolated source-target mappings unless they are part of a bigger structure. SME should map objects that are related to knowledge already mapped.

Structure Mapping Theory also requires that mappings be one-to-one, which means that no part of the source description can map to more than one item in the target and no part of the target description can map to more than one part of the source. In addition, structure mapping theory requires that if a match maps S to T then the arguments of S and T must also be mapped. If both these conditions are met, the mapping is said to be structurally consistent.

[edit] Concepts in SME

SME takes two descriptions called the source and target, and maps knowledge from the source into the target. SME calls each description a dgroup. Dgroups contain a list of entities and predicates. Entities represent the objects or concepts in a description such as an inputgear or a switch. Predicates are one of three types and are a general way to express knowledge for SME. Relation predicates contain multiple arguments which can be other predicates or entities. An example relation is: (transmit (what from to)). This relation has a functor “transmit” and takes three arguments: “what,” “from,” and “to.” Attribute predicates are the properties of an entity. An example of an attribute is (red gear) which means that gear has the attribute red. Finally, function predicates map an entity into another entity or constant. An example of a function is (joules powersource) which maps the entity powersource onto the numerical quantity joules. Functions and attributes have different meanings and consequently SME processes them differently. For example in SME’s true analogy rule set, attributes differ from functions because they cannot match unless there is a higher order match between them. The difference between attributes and functions will be explained further in this section’s examples.

All predicates have four parameters. They have a functor, which identifies it and a type, which is either relation, attribute, or function. The other two parameters are for determining how to process the arguments in the SME algorithm. If the arguments have to be matched in order, commutative is false. If the predicate can take any number of arguments, N-ary is false. An example of a predicate definition is: (sme:defPredicate behavior-set (predicate) relation :n-ary? t :commutative? t) The predicate’s functor is “behavior-set,” its type is “relation,” and its n-ary and commutative parameters are both set to true. The “(predicate)” part of the definition specifies that there will be one or more predicates inside an instantiation of behavior-set.

[edit] Algorithm Details

The algorithm has several steps, the last sections describe the algorithm as originally written and described in this algorithm paper. The sections use examples to demonstrate how SME runs. The reader should read algorithm paper to learn all the details of the algorithm.

The first step of the algorithm is to create a set of match hypotheses between source and target dgroups. A match hypothesis represents a possible mapping between any part of the source and the target. This is controlled by a set of match rules. By changing the match rules, one can change the type of reasoning SME does. For example, one set of match rules may perform a kind of analogy called “literal similarity” and another performs a kind analogy called “true-analogy.” These rules are not the place where domain dependent information is added, but rather where the analogy process is tweaked depending on the type of cognitive function the user is trying to emulate.

There are two types of match rules: filter rules and intern rules. Intern rules only use the arguments of the expressions in the match hypotheses that the filter rules identify. This makes the processing more efficient by constraining the number of match hypotheses that are generated. At the same time, it also helps to build up the structural consistencies that are needed later on in the algorithm. An example of a filter rule from the true-analogy rule set creates match hypotheses between predicates that have the same functor. The true-analogy rule set has an intern rule that iterates over the arguments of any match hypotheses, creating more match hypotheses if the arguments are entities or functions, or if the arguments are attributes and have the same functor. In order to illustrate how the match rules produce match hypotheses consider these two predicates:

transmit torque inputgear secondgear (p1)

transmit signal switch div10 (p2)

The filter match rule generates a match between p1 and p2 because they share the same functor, “transmit.” The intern rules then produce three more match hypotheses: torque to signal, inputgear to switch, and secondgear to div10. The intern rules created these match hypotheses because all the arguments were entities.

If the arguments were functions or attributes instead of entities, the predicates would be expressed as:

transmit torque (inputgear gear) (secondgear gear) (p3)

transmit signal (switch circuit) (div10 circuit) (p4)

These additional predicates make inputgear, secondgear, switch, and div10 functions or attributes depending on the value defined in the language input file. The representation also contains additional entities for gear and circuit.

Depending on what type inputgear, secondgear, switch, and div10 are, their meanings change. As attributes, each one is a property of the gear or circuit. For example, the gear has two attributes, inputgear and secondgear. The circuit has two attributes, switch and circuit. As functions inputgear, secondgear, switch, and div10 become quantities of the gear and circuit. In this example, the functions inputgear and secondgear now map to the numerical quantities “torque from inputgear” and “torque from secondgear,” For the circuit the quantities map to logical quantity “switch engaged” and the numerical quantity “current count on the divide by 10 counter.”

SME processes these differently. It does not allow attributes to match unless they part of a higher order relation, but it does allow functions to match, even if they are not part of a higher order relation. It allows functions to match because they indirectly refer to entities and thus should be treated like relations that involve to entities. However, as next section shows, the intern rules assign lower weights to matches between functions than matches between relations. The reason why SME does not match attributes is because it is trying to create connected knowledge based on relationships and thus satisfy the systematicity principle. For example, if both a clock and a car have inputgear attributes SME will not mark them as similar. If it did, it would be making a match between the clock and car based on their appearance not on the relationships between them. When the additional predicates in p3 and p4 are functions, the results from matching p3 and p4 are similar to the results from p1 and p2 except there is an additional match between gear and circuit and the values for the match hypotheses between (inputgear gear) and (switch circuit), and (secondgear gear) and (div10 circuit), are lower. The next section describes the reason for this in more detail.

If the inputgear, secondgear, switch, and div10 are attributes instead of entities, SME does not find matches between any of the attributes. It only finds matches between the transmit predicates and between torque and signal. Additionally, the structural evaluation scores for the remaining two matches decreases. In order to get the two predicates to match, p3 would need to be replaced by p5. P5 is shown below.

transmit torque (inputgear gear) (div10 gear) (p5)

Since the true-analogy rule set identifies that the div10 attributes are the same between p5 and p4 and because the div10 attributes are both part of the higher relation match between torque and signal SME makes a match between (div10 gear) and (div10 circuit) which leads to a match between gear and circuit.

Being part of a higher order match is a requirement only for attributes. For example, if (div10 gear) and (div10 circuit) are not part of a higher order match, SME does not create a match hypothesis between match them. However, if div10 is a function or relation SME does create a match.

[edit] Structural Evaluation Score

Once the match hypotheses are generated, SME needs to compute an evaluation score for each match hypothesis. SME does this by using a set of intern match rules to calculate positive and negative evidence for each match. Multiple amounts of evidence are correlated using Dempster’s rule [Shafer, 1978] resulting in positive and negative belief values between 0 and 1. The match rules assign different values for matches involving functions and relations. These values are programmable, however some default values that can be used to enforce systematicity principle are described in [Falkenhainer et. al., 1989]. These rules are:

  1. If the source and target are not functions and have the same order the match gets +0.3 evidence. If the orders are within 1 of each other, the match gets +0.2 evidence and -0.05 evidence.
  2. If the source and target have the same functor, the match gets 0.2 evidence if the source is a function, and 0.5 if the source is a relation.
  3. If the arguments might match, the match gets +0.4 evidence. The arguments might match if all the pairs of arguments between the source and target are entities, if the arguments have the same functors, or it is never the case that the target is an entity but the source is not.
  4. If the predicate type matches, but the elements in the predicate do not match, then the match gets -0.8 evidence.
  5. If the source and target expressions are part of the a matching higher order match, add 0.8 of the evidence for the higher order match.

In the example match between p1 and p2, SME gives the match between the transmit relations a positive evidence value of 0.7900 and the others get values of 0.6320. The transmit relation receives the evidence value of 0.7900 because it gains evidence from rules 1, 3, and 2. The other matches get a value of 0.6320 because 0.8 of the evidence from the transmit is propagated to these matches because of rule 5.

For predicates p3 and p4, SME assigns less evidence because the arguments of the transmit relations are functions. The transmit relation gets positive evidence of 0.65 because rule 3 no longer adds evidence. The match between (input gear) and (switch circuit) becomes 0.7120. This match gets 0.4 evidence because of rule 3, and 0.52 evidence propagated from the transmit relation because of rule 5.

When the predicates in p3 and p4 are attributes, rule 4 adds -0.8 evidence to the transmit match because though the functors of the transmit relation match, the arguments do not have the potential to match and the arguments are not functions.

To summarize, the intern match rules compute a structural evaluation score for each match hypothesis. These rules enforce the systematicity principle. Rule 5 provides trickle-down evidence in order to strengthen matches that are involved in higher order relations. Rules 1, 3 and 4 add or subtract support for relations that could have matching arguments. Rule 2 adds support for when the functors match thereby adding support for matches that emphasize relationships.

The rules also enforce the difference between attributes, functions, and relations. For example, they have checks which give less evidence for functions than relations. Attributes are not specifically dealt with by the intern match rules, but SME’s filter rules ensure that they will only be considered for these rules if they are part of a higher order relation and rule 2 ensures that attributes will only match if they have identical functors.

[edit] Gmap Creation

The rest of the SME algorithm is involved in creating maximally consistent sets of match hypotheses. These sets of match hypotheses are called gmaps. SME must ensure that any gmaps that it creates are structurally consistent. This means that they are one-to-one, such that no source maps to multiple targets and no target maps to multiple sources. It also means that they must have support, which means that if a match hypothesis is in the gmap, then so are the match hypothesis that involve the source and target items.

The gmap creation process follows two steps. First, SME computes some information about each match hypothesis. This includes entity mappings, what other match hypotheses it conflicts with, and what other match hypotheses it is structurally inconsistent with.

SME then uses this information to merge match hypotheses using a greedy algorithm and the structural evaluation score. It merges the match hypotheses into maximally structurally consistent connected graphs of match hypotheses. Then it combines gmaps that have overlapping structure if they are structurally consistent. Finally, it combines independent gmaps together while maintaining structural consistency.

Comparing a source to a target dgroup may produce one or more gmaps. The weight for each gmap is the sum of all the positive evidence values for all the match hypotheses involved in the gmap. For example, if a source containing p1 and p6 below, is compared to a target containing p2, SME will generate two gmaps. Both gmaps have a weight of 2.9186.

Source:

transmit torque inputgear secondgear (p1)

transmit torque secondgear thirdgear (p6)

Target: transmit signal switch div10 (p2)

These are the gmaps which result from comparing a source containing a p1 and p6 and a target containing p2.

Gmap #1:

(TORQUE SIGNAL)
(INPUTGEAR SWITCH)
(SECONDGEAR DIV10)
(*TRANSMIT-TORQUE-INPUTGEAR-SECONDGEAR *TRANSMIT-SIGNAL-SWITCH-DIV10)

Gmap #2:

(TORQUE SIGNAL)
(SECONDGEAR SWITCH)
(THIRDGEAR DIV10)
(*TRANSMIT-TORQUE-SECONDGEAR-THIRDGEAR *TRANSMIT-SIGNAL-SWITCH-DIV10)

The gmaps show pairs of predicates or entities that match. For example in gmap #1, the entities torque and signal match and the behaviors transmit torque inputgear secondgear and transmit signal switch div10 match. Gmap #1 represents combining p1 and p2. Gmap #2 represents combining p1 and p6. Although p2 is compatible with both p1 and p6, the one-to-one mapping constraint enforces that both mappings cannot be in the same gmap. Therefore, SME produces two independent gmaps. In addition, combining the two gmaps together would make the entity mappings between thirdgear and div10 conflict with the entity mapping between secondgear and div10.

[edit] References