Scaling Problems Associated with Rule-Based Decision Algorithms in Multiple-Objective Situations -- Value-Driven Methods as an Alternative

Dr. Robert M. Kerchner (kerchner@rand.org)

RAND

I. Preface

The commonly-used methodologies for simulating decision processes in Distributed Interactive Simulation (DIS) computer-generated forces (CGFs) involve the use of rule-based systems as a means for determining the choice made by a decision maker. Rule-based methods typically do well for prototype problems, but realistic decisions are often made in environments where the decision maker is simultaneously concerned with multiple objectives. The number of rules required by a rule-based system typically grows very rapidly with the number of objectives, which may in turn grow as the number of entities that the decision-maker must consider increases. This paper suggests that the reasons for rapid growth in the number of rules can be understood by comparing common practices in formulating rules to a utility function formulation of the decision problem. A related computational experiment indicates that rapid rule growth may not be intrinsic to rule-based approaches, but may rather be associated with the way in which rule sets are developed in practice.

The main objective of this paper is to present the value-driven approach to simulating decision processes, an optimization-based technique that shares several features with chess-playing algorithms. Value-driven methods are an attractive alternative to rule-based systems in situations where the number of objectives considered by the decision-maker leads to an excessively large rule base. The complexity of a value-driven algorithm typically scales only linearly with the number of factors considered, and execution time may scale similarly.

Value driven decision simulations are not new (footnote 1), and while value-driven methods are occasionally used in the DIS community, a clear understanding of the methodology is not widespread. Value-driven methods have their difficulties too, and rule-based methods will be more appropriate in many cases. However, experience in some regimes, most notably engagement level air combat, confirms that in other cases value-driven systems produce more realistic and robust behavior. I hope that this paper will stimulate thought about value-driven methods, and about their advantages when scaling to large problems where practical rule based systems are difficult to develop.

A second objective of this paper is to encourage research into the correspondences between rule-based methods and value-driven methods that address the same decision problem. It appears that the correspondences are a potentially fruitful way of uncovering implicit assumptions in rules, assumptions that lead to problems should the scope of the decision problem be extended. The suggested research would hopefully improve practices and procedures for rule set development so as to help mitigate scaling problems.

I would like to thank RAND's National Defense Research Institute for supporting the writing of this paper, and Dr. Robert Anderson and Dr. Paul Davis for their helpful reviews. However, any errors and inaccuracies in this paper are entirely the responsibility of the author.

II. Summary

The first part of this paper examines the scaling of rule-based systems as problem complexity grows. The focus is on applications where rule-based systems are used to simulate human decisions, as in the case of computer-generated forces (CGFs). The following points are argued:

  1. In most cases the decisions handled by these applications can be naturally formulated in the following way: Select an action that maximizes a utility function. The utility function generally reflects a number of distinct, often incompatible, objectives. Its domain is the space of futures implied by the actions being considered.
  2. Rule-based approaches to such problems typically utilize knowledge about which objectives are dominant and which alternative actions tend to satisfy each objective. This allows shortcuts that permit determination of the best action without requiring all the computation needed to evaluate each candidate action's utility function score.
  3. Realistic problems include the following features: The number of underlying objectives is substantial; the importance of each objective is not static, but depends on the situation when the decision is made; the correlation of alternative actions with satisfied objectives also depends on the situation. In this context, implementing the "typical" approach to rule formulation now requires additional rules that explicitly treat exceptions to the "normal" priorities among objectives and to which alternatives "normally" have a favorable impact on certain objectives. The number of rules grows very rapidly, leading to difficulties in rule formulation, understandability, maintenance, and robustness.

There is no requirement that rule-based systems need to be constructed along the lines of the "typical" approach above. In fact, computational experiments described later in this paper indicate that for some problems a lower bound on the number of rules required for equivalent performance does not grow much at all with the number of objectives. This SUGGESTS that the scaling problems that occur in practice may have more to do with common approaches to rule set construction than with underlying deficiencies. One objective of this paper is to illuminate why certain approaches don't work well, and to stimulate additional research based on the idea of mapping rule-based decision algorithms into equivalent utility optimization forms.

The second part of the paper returns to the alternative decision- making approach of searching an action set for a best alternative. This methodology, coupled with a utility function defined on futures, constitutes what we call the VALUE-DRIVEN decision methodology. Its scaling properties are examined, and the argument is made that for complex (multiple objective) problems the scaling of value-driven implementations is linear in the number of objectives. Additional advantages for value-driven methods, especially maintainability when new objectives are incorporated, are presented.

Finally, the results of a small computational experiment on the scalability of rule-based systems are presented. When a decision problem can be represented as the optimization of a utility function one can equate a lower bound on the number of rules required by an equivalent rule-based system to the size of a partitioning of a "situation space." The term SITUATION SPACE refers to the domain of the variables that describe the context in which the decision is made. For a class of decision problem studied, the size of the partitioning did not grow quickly as the number of objectives increased. This result was unexpected, and may in fact be due to artificialities in formulating the "decision problems" experimented with. However, it does indicate that scaling problems for rule-based systems are not fundamentally linked to the number of underlying objectives considered, and suggests that observed problems may instead be associated with approaches to building rule sets.

III. The Scaling of Rule-Based Systems

The commonly-used methodologies for simulating decision processes in Distributed Interactive Simulation (DIS) computer-generated forces (CGFs) involve the use of rule-based systems as a means for determining the choice made by a decision maker. Realistic decisions are often made in environments where the decision maker is concerned with multiple objectives. Furthermore, these objectives may not be mutually achievable. For example, a military commander is concerned with the survival of his forces, the destruction of enemy forces, and achieving goals such as movement to a specified location and protection of a friendly force's flank. In general, the "best" choice is a compromise between these goals.

The purpose of this chapter is to show how correspondences between features in rules and features in a utility-based formulation of decision-making illuminate assumptions in rules, assumptions that may not be robust. The presence of such assumptions is a major reason why rule sets can be difficult to maintain when the representations of problems are made more realistic, or when the rules need to be applied to situations differing from those they were originally designed to handle. We begin by defining what is meant by a utility function formulation.

III.A. A Utility Function Formulation of Decision Problems

From a decision-theoretic perspective (footnote 2) there exists a utility function that puts the decision-maker's goals onto a common scale. One can define the "best" choice as that which maximizes this utility function. The decision is made in the context of a "situation space" that describes both the decision maker's current objectives and the physical situation confronting the decision maker. In most cases one can construct the functional form of the utility function as a sum over terms, each of which reflects the utility of a different goal (footnote 3) Goals are realized over time, so in general looking at the future a choice will lead to provides the cleanest way of evaluating the degree to which a goal has been achieved. The domain of the utility function is thus the "outcome space" of predicted futures that alternative choices can lead to (footnote 4).

Fig. 1 Relationship of Situation Space Components to the Utility Function

The figure above shows the information flow from the situation space, into the utility function, in the context of a particular choice being considered. The choice can be thought of as mapping the current physical situation to the outcome predicted should the choice be implemented. The term U(i) represents the term in the utility function sensitive to goal or objective i. It is organized as the product of a term W(i) that represents the current importance of objective i, and a term N(i) representing the degree to which the goal is realized if the predicted outcome occurs. W(i) is referred to as a "weight" or "importance multiplier." N(i) is referred to as the "normalized score" and is usually confined to a range of 0-1.

Note that W(i) is sensitive to the objectives portion of the current situation, and perhaps the physical situation itself (this depends somewhat on definitions), but doesn't depend on the choice being considered. N(i), which is a function of the predicted outcome, is strongly dependent on the choice being considered. In many cases the normalized score N(i) can also be thought of as the probability that objective i will be achieved, given the choice being considered; this conceptualization is useful when determining the functional form of N(i).

The process of making the best decision, starting from a point in situation space, is that of searching the set of alternatives for the one that maximizes the utility function. An ability to assess, for each alternative, the future state associated with that alternative, is implied.

III.B. Common Rule-Based Formulations of Decision Problems

The developers of rule-based systems for simulating decision- makers take a different approach to solving these problems. It seems obvious that the expert providing the knowledge basis for the system will have an underlying set of objectives in mind. However, these objectives may not always be explicitly articulated. The expression of the expert's knowledge instead takes the form of rules about what to do in various situations. In many cases these rules implicitly incorporate assertions that certain objectives are dominant and that certain actions tend to satisfy certain objectives.

A small example taken from air-to-air combat illustrates this process. Imagine that a lone WW I fighter pilot on an air superiority mission encounters a lone enemy aircraft. He has two objectives, survival and destruction of the enemy aircraft. He has three alternative actions, attack, delay, and flee. A plausible rule set might look like this (footnote 5):

IF survival_prospects == poor THEN
   flee
IF survival_prospects == good AND offensive_prospects != poor THEN
   attack
IF survival_prospects == medium AND offensive_prospects == good THEN
   attack
OTHERWISE
   delay

An associated utility function would look like this:

     V(a) = Wo*No(a) + Wd*Nd(a)

where the subscript o refers to "offensive" and the subscript d refers to "defensive". No(a) can be viewed as the probability that alternative A will result in killing the hostile, and Nd(a) one's own survival probability if alternative A is chosen. Wo is the payoff for killing the enemy aircraft, and Wd the pilot's perception of his own value. Strictly speaking, No and Nd are not explicit functions of alternative A, but rather of the predicted future stemming from A. The example rule set is superficially plausible, aside perhaps from omissions made in the interest of keeping the example small. However, closer examination indicates that it makes assumptions about the associated utility function and the results stemming from each course of action. The assumptions include:

  1. Certain alternatives are associated with maximizing certain objectives. The attack action raises No more that the others, and the flee action is the most effective in increasing Nd. The delay action is intermediate.
  2. Wd is greater than Wo, but not excessively so. This follows from fleeing when survival prospects are poor, regardless of offensive potential, but being willing to accept some risk (survival_prospects == medium) when the offensive prospects are good.

Other inferences are also possible, but these two items should be sufficient to make the point that judgments about quantitative features of the underlying utility function and the futures associated with alternatives are in fact deeply embedded in the above rules. The reader should also note that thinking about a rule set in terms of an associated utility function and the projected outcomes of alternatives provides an extremely useful framework for examining assumptions that underlie the rules.

III.C. Extending to Realistic Problems

Consider what happens to our simple example rule set when an attempt is made to extend it to a more realistic context. First, suppose that additional objectives are added. For example, the enemy aircraft might be observed engaging a friendly aircraft, or attacking a friendly force on the ground. This implies an objective of disrupting the enemy's action, not necessarily by shooting him down. Depending on what it actually entails, the "delay" action might be quite effective in this role. Certainly one would want to add assessments of one's ability to disrupt the hostile. In a worst case, where the balance between the objectives is close, one might need a rule for each possible combination of values taken on by survival_prospects, offensive_prospects, and (a newly introduced) disruptive_prospects. This would imply a combinatorial explosion in the number of rules as the number of such situation assessment variables grows! To the extent that the number of these variables matches the number of underlying objectives, a similar rate of growth is associated with the number of objectives.

Even more complexity is implied when the weights associated with objectives is dynamic. As shown in Figure 1, dynamic means that objectives depend on, and thus change with, the situation. However, they will still be fixed in the context of a single instance of making a decision. In our example problem dynamic weights could arise because in different cases a different kind of enemy aircraft is encountered (some are more lucrative targets), because the enemy is attacking different targets (the value of disrupting his mission changes), and because the decision-maker's risk aversion changes (perhaps one wishes to use this algorithm to simulate the decision processes of several pilots). Weights also change when the decision is embedded in a decision hierarchy--orders and priorities coming down from above act to alter the importance of different objectives.

Changing the weights can alter individual rules. For example, consider the two rules:

IF survival_prospects == good AND offensive_prospects != poor THEN
   attack
IF survival_prospects == medium AND offensive_prospects == good THEN
   attack

If the value of killing the enemy increases these might change to:

IF survival_prospects == good THEN
   attack
IF survival_prospects == medium AND offensive_prospects != poor THEN
   attack

It seems clear that attempting to augment the rules to handle dynamically changing weights would enormously add to the complexity of the rule set.

It is worth reiterating that the purpose of this section is to explore why the number of rules required to simulate a decision can grow rapidly when the problem domain is extended in certain ways (more objectives, objective importance varies with situation). There is no intention, here or elsewhere in this paper, of undertaking a general discussion of the merits of rule-based systems, or to undertake a BROAD comparison of their advantages and disadvantages with respect to value-driven methods. In fact, techniques that assist in the management of large rule sets certainly exist. These include state-based logic approaches that simplify the effort involved in organizing rules and in understanding the conditions under which a given rule may fire. These approaches make it far easier to construct, verify and validate larger numbers of rules, and enormously simplify the expression of firing conditions for rules. This makes much larger rule sets more practical. They also do a great deal to reduce runtime by easing the job of inference engines. Without claiming expertise in these techniques, I would venture, however, that they do little to actually reduce the number of rules involved.

It is also important to be very cautious when drawing conclusions from the above air combat example. In particular, one should avoid concluding that the "style" of writing rules shown above is intrinsic to all rule-based systems. However, the implicit assumptions of the type discussed in the example are found in many rule-based implementations, and it is valuable to understand WHY they don't accommodate well to larger problems. Looking at the relationship between rule sets, objective functions and sets of alternatives provides valuable insight. Hopefully, further research into these relationships will lead to improved practices that avoid injecting such assumptions into rule sets.

I do speculate that rule sets that remain relatively simple as the complicating factors mentioned above are introduced would look very much like an explicit optimization of an underlying utility function over a set of alternatives. In this case, a more natural approach can be to utilize a value-driven algorithm instead.

IV. Value-Driven Decision Systems

A value-driven decision explicitly considers decision alternatives and assigns a score to each. The alternative scoring best is selected for implementation. The EXPLICIT CONSIDERATION OF ALTERNATIVES and the MAXIMIZATION OF A VALUE FUNCTION over these alternatives are the essence of the value-driven approach.

In many cases the scoring is performed on a predicted state. That is, a projection algorithm is used to visualize the futures that will occur if each alternative is considered. It should be clear that value-driven systems with this projection feature implement the optimization of utility functions of the type shown in figure 1. Figure 2 shows a simple value-driven algorithm appropriate to handling a small finite set of alternatives, a very common case.

Figure 2. Basic Value-Driven Algorithm

IV.A. Form of the Utility Function

The formulation of the utility function as a sum was mentioned earlier as something that appears to work well in practice. One can often identify cases where several objectives are not independent, and a simple sum formulation would seem inappropriate. Most commonly, several objectives need to be satisfied simultaneously in order to provide real value. In this case, however these objectives are not the ones that are desired; it is instead a single objective associated with the thing that provides "real" value. The objectives that must be simultaneously satisfied would be reflected in the functional form of the single normalized score (N(i)) associated with the real objective.

Admittedly, there can be a residual problem because assessing whether the real underlying objectives have been achieved can imply a time horizon much longer than one can practically project. For example, our underlying objectives might be related to the outcome of a campaign, but a tactical decision needs to work in terms of a much shorter time horizon. If the projection time is the duration of the current engagement (or shorter) then one must make use of heuristic objectives that experience indicates are linked to the "real" objectives. When playing chess, for example, one often doesn't think in terms of checkmates, but rather in terms of capturing high value pieces, controlling the center, gaining mobility, etc. The reason is that one can't afford to project ahead more than a few moves. These heuristic objectives may in fact not be independent, but they are the best thing available, consistent with practical limits on the projection time.

Fortunately, our experience indicates that excellent decision quality can be achieved in the presence of considerable inaccuracy in the functional form of the utility (objective) function. One needs to keep in mind that the purpose of the decision is to find an alternative that achieves maximum or nearly maximum utility, not to measure the utility. Assume there exists a "true" objective function, whereas an "approximate" objective function is used in the actual implementation. When one alternative is clearly best, there can be considerable error in the approximate objective function before a different alternative scores best. On the other hand, if several alternatives have similar scores in terms of the true utility function, one will do well by picking any one of them. Again, error in the objective function is tolerable so long as one of several close "best" alternatives is selected. Admittedly, errors preclude finding a strict optimum, but that's not the objective; military decision-makers are generally delighted to be close to optimum.

As a result, the practice of ignoring mutual dependence does not often cause a problem. Additionally, experience indicates that getting the weights right to within a factor of 2 is all that is required. The most important thing is to insure that all relevant objectives are reflected in the utility function. When all relevant objectives are present in the function one can be reasonably confident that good choices will be made.

IV.B. Hierarchical Value-Driven Systems

An important feature of value-driven systems is their ability to provide a natural coupling between levels in a decision hierarchy. The general idea is to represent objectives set by commanders as terms in the objective function. The effect of higher level decisions, orders, is to adjust the parameters of these terms, and in particular the weights associated with them.

A very attractive feature of this scheme is that it doesn't preclude having the subordinate decision-maker consider factors in addition to those set by his commander. The commander's priorities are represented in some, but not all, the terms in the subordinate's utility function. It is suggested that this corresponds to "common sense" behavior on the part of the subordinate. The relative size of weights for commander's versus "other" terms in the objective function determines the rigidity of the doctrine by varying the ability of secondary terms to influence the decision. Note that rule-based approaches must explicitly provide for such common sense behavior ("Follow orders except if A or B or C ... occurs").

The ability of value-driven systems to automatically trade off competing objectives, without the need for explicit exception- handling rules, is quite general, and is not restricted to command hierarchy issues. It follows from the existence of a unified utility function that can place all these objectives on a common scale. The von Neumann-Morgenstern axioms (Reference 1) are very plausible general requirements for rational behavior. If they are satisfied such a utility function will in fact exist.

IV.C. Search Completeness

Not all decisions are characterized by a choice among a finite set of alternatives. However, practical human decision-making rarely examines alternatives exhaustively, but more typically considers a rather small set of alternatives. A few additional alternatives that are variations on the more promising alternatives in the original set may be examined as well.

In my experience value-driven systems that consider only a limited set of alternative actions seem to work very well. A common strategy is to include in the set of alternatives ones that are designed to maximize single objectives in the utility function. In many cases one of these will also do a good job of maximizing the entire objective function. When the alternatives are characterized by continuous variables it can be useful to consider additional alternatives that are interpolations between the top scoring original possibilities.

One way of compensating for incomplete searches is to have the decision-maker reconsider its course of action at time intervals short compared to the projection time used when predicting the futures associated with alternatives. The Brawler air combat simulation, for instance, will typically project maneuver alternatives 5 to 10 seconds into the future but the simulated pilot reconsiders them once each second. Thus, if the situation isn't developing as anticipated the maneuver can be changed in a timely manner.

IV.D. Scaling Properties of Value-Driven Systems

Value-driven systems that use utility functions additive in the objectives considered scale linearly in complexity as the number of objectives increases. When a new objective is added the previous utility function is modified only by the addition of a new term for the objective. For example, when Brawler first incorporated semi- active air-to-air missiles (these require continuous radar illumination of the target by the attacking fighter) attacking pilots would turn away from their targets after firing, breaking radar lock and causing the missile to "go ballistic." A term that rewarded keeping the target within the radar field of regard was added to the objective function for maneuvers. The weight associated with this term was clearly on the order of the target's value (used in other terms that reflect offensive considerations) and could thus be set with minimal adjustment required. Adding this term, plus a maneuver alternative that kept the target within radar coverage was quite sufficient to eliminate the problem. It is worth emphasizing that no special provision to explicitly mediate the trade off between this consideration and others was required.

Not only are value-driven systems easy to augment as we incorporate new features or learn more about the decision problem we are modeling (such as discover the need to consider additional objectives) but it is comparatively easy to verify that the objectives are in fact reflected in the decision process. In general, the loose coupling between different objectives makes it far easier for a new analyst to learn how the system works, and far less likely that he or she will inadvertently introduce an error through ignorance of a subtle interaction.

The run time scaling properties of value-driven systems are at worst quadratic in the number of objectives. The computation load scales as the product of the number of alternatives considered and the cost of evaluating each alternative. The evaluation cost will scale with the number of terms in the objective function, and thus is linear in the number of objectives. The number of alternatives tends to grow as objectives are added, assuming that the technique of including alternatives designed to maximize individual terms in the objective function is used. However, having an alternative associated with each objective is not required for all decision systems, and after a certain point there is a decided diminishing returns effect in the payoff achieved by looking at more alternatives (footnote 6). Thus, after an initial increase, the number of alternatives stays constant and run time becomes linear in the numberof objectives.

Additionally, practical efficiency enhancing techniques exist. These commonly make use of knowledge about the current size of weights in the objective function. In general, the importance of objectives will change with time, requiring that importance multipliers be reevaluated. The weights for some terms will often be set to zero in this process (a particular objective is not currently important). In this case, evaluating the normalized portion of the term (the N(i)) is clearly a waste of time. Similarly, any alternatives tailored to optimize that objective can be skipped.

For terms where the importance multiplier is not zero, but is small in comparison with those of other terms, evaluation can be deferred. Alternatives are evaluated using a reduced objective function that includes only terms with large weights, and the small terms are only evaluated for those alternatives that score well with respect to the reduced function (footnote 7). It is true that this kind of culling can produce sub optimal solutions, but it is very practical when simulating human decision processes, where finding a good choice is all that is usually required.

V. A Computational Experiment to Examine the Lower Bounds on the Number of Rules Required to Implement Multiple-Objective Decision Problems

V.A. Basis of the Experiment

When a decision problem has a value-driven formulation with a finite set of alternatives one can in principle search the set of actions and determine, for each point in the "current situation space", which action leads to the highest utility future. A finite number of possible actions and some reasonable continuity assumptions imply that the situation space can be divided into a finite number of contiguous partitions. Within each partition a "best" action is specified.

Since any decision algorithm must provide a mapping from the situation space to alternatives, no rule-based system can have fewer rules than the number of partitioned regions (minus 1) defined by the above mapping procedure. This is easy to see in the case of a one- dimensional situation space. For example, suppose that the situation space runs from 0 to 1, and that there are 5 partitions with boundaries at 0.12, 0.37, 0.56, and 0.94. A set of rules to identify which partition corresponds to a value x for the situation might be:

IF x <= 0.12 THEN action_1
ELSE IF x <= 0.37 THEN action_2
ELSE IF x <= 0.56 THEN action_3
ELSE IF x <= 0.94 THEN action_4
ELSE action_5

Obviously one can express such a rule set in a more compact way using a table lookup, also getting an algorithm that is independent of the number of partitions, but this point is really semantic. The real work in constructing the rules is involved in finding where the boundary points are, so as measured by effort required to construct the rule set there is no important difference.

Keep in mind that we are discussing the complexity of the rule- based system as measured by the number of rules. There is no implication that runtime also scales as the number of rules. A hierarchical structure (or a table lookup) can clearly reduce the number of rules that must be actually examined to the order of log(n), where n is the number of partitions. For example:

IF x <= 0.37 THEN
   IF x <= 0.12 THEN action_1
   ELSE action_2
ELSE IF x <= 0.94 THEN
   IF x <= 0.56 THEN action_3
   ELSE action_4
ELSE
   action_5

In the above there are still 4 IF-statements, but no more than 2 will ever be executed for a given decision.

The number of partitions provides a lower bound on the number of rules required, but actual rule sets may require many more rules for multi-dimensional situation spaces where the partition boundaries cannot be concisely described. The lower bound is really given by the number of partitions multiplied by B(N), where B is a lower bound on the number of rules per partition needed to determine which partition a point in situation space falls in, in a space of dimension N. For this paper, I assert that B(N) >= 1 for all N. However, I suspect that it is possible to provide larger lower bounds on B(N) for N > 1.

V.B. Description of the Experiment

In order to study the scaling properties of rule-based systems, as the number of objectives in a corresponding value-driven formulation grows, I designed a computational experiment. The essence of the experiment was this:

  1. Construct decision problems with quasi-random objective functions and mappings of alternatives into future space. One "run" corresponds to one decision problem.
  2. Evaluate the partitioning implied by each decision problem.
  3. Examine how the number of partitions varies, in the statistical sense, with the number of terms in the objective function.

A more precise description of the experiment requires the following information: (1) what do the situation space and future spaces look like, (2) how are alternative courses of action defined and (3) how are value functions defined? These definitions follow.

SITUATION SPACE
The situation space was defined to be a one-dimensional line segment, ranging from 0 to 1. This choice was made to simplify the process of evaluating the resulting partitioning. It is certainly uncharacteristic of realistic problems that require many variables to describe the situation, and may in fact be responsible for the unexpectedly slow growth in the observed number of partitions (see the end of Section V.A).

FUTURE SPACE
The future space was permitted to have multiple dimensions. In effect, alternatives mapped a point x in the situation space S into a vector y in the future space F. Dimensionalities between 1 and 4 were examined.

ALTERNATIVES
Alternatives were conceived of as attempting to drive towards a particular point in the future space. For each random decision problem (run) when each alternative was constructed as follows:
  1. A target point Y0 was selected at random in the future space F. Each component of Y0 was limited to the range [0,1].
  2. A best starting point X0 was selected at random in the situation space S, also in the range [0,1].
  3. A random gain k(i) was chosen for each dimension i of F. K's were chosen in a uniform-logarithm basis, typically from the range 0.05 to 0.5. That is, if ran(a,b) generates uniform random numbers between a and b then

     ln(k(i)) = ran(ln(0.05), ln(0.5))

The mapping from S to F was given by:

     Y(i) - Y0(i) = k(i)*(X - X0)

It should be clear that the smaller gains represent better efficiencies of the alternatives; they drive outcomes to narrower regions around the target point Y0. Different gains are chosen for each dimension of F because an alternative oriented towards localizing one feature of the result is not necessarily efficient at localizing another.

OBJECTIVE FUNCTIONS
Each objective function term rewards futures that are clustered about a point in F that represents the ideal for that objective. This ideal future is denoted by Y0. A distance measure z(i) in each dimension i of the future space F is given by:

     z(i) = (Y(i) - Y0(i))/L(i)

and the normalized score for the term by

     N = 1/(1+z^2)

where

     z^2 = sum( z(i)^2 )

The L(i)'s are also log-linear, typically chosen from a range 0.01 to 0.5.

Weights for the objective functions should be dynamic with the situation, so they were made to depend on the point x in S used as the basis of each decision. For each objective term, its weight was large in the vicinity of a (random) point X0 in S, with a randomly chosen maximum M usually in the range [0.1,1.0]. The important vicinity of X0 was defined by a width R from a range usually given by [0.2, 0.4]. For a point x in S, the importance multiplier W(j) of a term j in the value function is given by:

     W(j) = M(j)/(1 + (x-X0(j))^2/R(j)^2)

V.C. Experimental Procedure

One run of the experiment consisted of generating a random set of alternatives and objective function terms (including the parameters that determine the weights). A value-driven decision process was then executed on a fine mesh of points in the situation space S by evaluating each alternative at each point. The winning alternative was marked for that point, and partition boundaries declared when the winning alternative changed on moving from one point on the mesh to an adjacent point. The mesh was fine enough so that missing crossings was unlikely. This procedure provided a count on the number of partitions for each decision problem (run).

Each run was repeated 10 times, using the same number of alternatives, number of terms in the objective function, dimensionality of the future space F, and underlying distribution of random parameters. This provided an average number of partitions per set of runs.

Finally, a set of runs was produced for 10, 20, 40, 80, and 160 objective function terms. A number of variations on underlying parameters, number of dimensions of F, and number of alternatives considered was tried. However, the results appear to be qualitatively similar in all the cases looked at. No attempt was made to vary the dimensionality of the situation space S, to simplify the problem of counting the partitions of a multi-dimensional space, and to avoid anticipated very long run times for each experiment, associated with the need to evaluate a large number of mesh points in a multidimensional situation space.

V.D. Results

The results of these experiments surprised me. It was expected that the number of partitions would grow rapidly as terms were added to the objective function. However, the number of partitions appeared to grow only very slowly with the number of objectives.

Significant spread was present in the number of partitions of 10 runs, so no attempt was made to quantify an empirically observed growth law. Figure 3 shows the growth rate for two characteristic cases. It is certainly clear that very little growth in the number of partitions is present as the number of objective terms increases. Unfortunately, budgetary and time limitations have precluded additional investigation along the lines indicated at the end of the preceding section.

Figure 3. Typical Simulation Results

The bars in the figure indicate the standard deviations of the observed distribution of the number of partitions, not the standard deviations of the estimates of the mean number of partitions. Since each point is based on 10 runs, the latter will be smaller by a factor of sqrt(10), about 3.2.

VI. Conclusions

One possible interpretation of the experimental result is that the experimental problem doesn't correspond well to real multiple objective decision problems. For instance, additional dimensionality in the situation space might be needed to see a fast growth rate, especially if the conciseness with which partition boundaries can be expressed as rules is considered. Experiments to examine these questions are certainly feasible, and should be performed.

However, the above hypothesis is probably "grasping at straws." It seems more likely that the scaling problems observed for some rule sets are instead associated with the fact that rule-based systems do not generally use a representation that is related to value-driven formulations. In fact, the lower bound argument also provides a prescriptive approach for developing rule sets on 1-dimensional situation spaces. The experimental results indicate that there is reason to expect that the number of resulting rules will scale very well indeed.

For higher-dimensional situation spaces this may not be the case even if the number of partitions scales well. The reason is that it may become difficult to describe the partition boundaries. This issue was discussed at the end of Section V.A. A reader may know how algorithms that describe a partitioning of an N-dimensional space scale with N. This would be very important to understand, since it may lead to a much larger lower bound for higher dimensional situational spaces than the number of partitions alone provides.

I recommend further research intended to better understand how rule conditions can imply hidden assumptions about a corresponding value- function formulation, assumptions that may not be valid over the entire situation space. A minimal practical fallout of this research would be a better awareness of how to avoid "robustness" pitfalls when developing rule-based systems.

I also hope that value-driven systems will get more attention as a result of this paper. If the paper appears somewhat biased towards value-driven systems, that is deliberate. Section IV was intended to highlight the advantages of value-driven methods, without necessarily giving the problems an equal treatment. People are familiar and comfortable with the methods they currently use, and a bit of proselytizing is perhaps needed to induce them to look at alternative approaches.

My own experience with simulating human decision processes indicates that a good balanced approach is to start with rule-based decision algorithms. When the algorithms grow to a point where they start to become difficult to manage strong consideration should be given to a rewrite in a value-driven form. Why not start with a value-driven approach? It takes a bit longer to develop initially, but this isn't the principal reason for starting with a rule-based version of a decision problem. I have found that the domain insight gained by trying to make the rule-based version work is probably necessary to formulate the problem in a value-driven form. This shortcoming is probably not intrinsic, however, and might change if knowledge engineers become better at eliciting knowledge in a form that explicitly emphasizes underlying objectives. Also, if the domain experts come to understand the value-driven way of looking at a problem they may articulate their knowledge differently.

VII. Footnotes

  1. See, for example, reference 3
  2. See, for example, references 1 or 2
  3. The ability to construct the formulation as a sum is not rigorous, but appears to work very well in most cases. The fact that the evaluation is on future states is important here. This issue will be discussed in Section IV.A, Form of the Utility Function.
  4. In many cases a choice leads to probabilities of futures, not a single outcome. If U(i) (see Figure 1) is a true utility then an ensemble of futures is handled by computing a (probability-weighted) average of U(i) over the possible futures. We can ignore this complication in the discussion of rule-based systems. However, the uncertainty associated with future outcomes does add to the complexity of the implementations of value-driven systems.
  5. For simplicity, the formulation looks more like a decision tree than a set of rules. This allows us to avoid complexities associated with evaluation order indeterminacy that, while important in that they add complexity to rule-based systems, are not germane in the current context. In reading the example the reader can assume that rules are evaluated in order, and that the first condition satisfied terminates the decision.
  6. No such effect is observed when adding additional objectives. If an objective is relevant experience indicates that situations will arise where its inclusion is required to produce good decision quality.
  7. This strategy requires saving information, including projected futures, for the better-scoring alternatives, but with the memory resources available to modern computers this is not a major concern.

References

  1. Von Neumann and Morgenstern, "Theory of Games and Economic Behavior" (Second Edition), Princeton University Press, Princeton, New Jersey, 1947.
  2. Fishburn, "Decision and Value Theory" John Wiley & Sons, 1964. This provides a much more complete exposition of the value theory that underpins value-driven decision systems.
  3. Pugh and Kerchner, "Representation of C3I Effects in Combat Simulations," Proceedings of the 49th MORS, 1982. This includes a more complete exposition of value-driven systems, including discussions of the approximations used in practical systems.