Contingent Planning in the MAXPLAN Framework

Please download to get full document.

View again

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Leadership & Management

Published:

Views: 2 | Pages: 6

Extension: PDF | Download: 0

Share
Related documents
Description
From: AAAI Technical Report SS-99-07. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved. C-MAXPLAN: Contingent Planning in the MAXPLAN…
Transcript
From: AAAI Technical Report SS-99-07. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved. C-MAXPLAN: Contingent Planning in the MAXPLAN Framework Stephen M. Majercik Department of Computer Science Duke University Durham, NC 27708-0129 majercik@cs, dukm.edu Abstract lent to the E-MAJSATproblem (Littman, Goldsmith, & Mundhenk 1998). C-MAXPLAN extends MAXPLAN to include contingent In the remainder of this section, we describe our do- planning in probabilistic propositional domains. Like main representation and some related work in the area MAXPLAN, C-MAXPLAN converts the planning problem into an E-MAJSAT instance, a type of probabilistic sat- of contingent planning. In the next section, we de- isfiability problemin whicha set of Booleanvariables scribe the operation of C-MAXPLAN. Wedescribe two encodes the plan and a second set of Boolean vari- ways of mapping a contingent planning problem to an ables encodesthe probabilistic outcomeof the plan-- E-MAJSAT instance, and present some results. Finally, the satisfiability problemis to find the setting of the we conclude with some comments on further work, in- plan variables that maximizesthe probability of sat- cluding a brief description of a promising, completely isfaction with respect to the outcomevariables. We different approach to contingent planning in the prob- describe two different encodings for contingent plans abilistic satisfiability framework. in this framework:the first one explicitly encodescon- tingent actions and is similar to the approachtaken by Probabilistic Planning Representation C-BURIDAN; the second encodes a policy rather than a plan. Althoughthe efficiency with which the resulting C-MAXPLANworks on partially observable probabilistic satisfiability problemis solveddependscritically on the propositional planning domains. Such a domain con- contingent-plan representation, C-MAXPLAN is compet- sists of a finite set P of n distinct propositions, any of itive with state-of-the-art contingent planners on some which may be True or False at any (discrete) time problems. A state is an assignment of truth values to P. A proba- bilistic initial state is specified by a set of decision trees, one for each proposition. A proposition p whose initial Introduction assignment is independent of all other propositions has A contingent plan is a plan in which action choices are a tree consisting of a single node labeled by the proba- made contingent on information about the state of the bility with which p will be True at time 0. A proposition world. In our work, we are concerned with probabilistic q whose initial assignment is not independent has a de- propositional planning problems: states are represented cision tree whose nodes are labeled by the propositions as an assignment to a set of Boolean state variables (flu- which q depends on and whose leaves specify the prob- ents) and actions mapstates to states probabilistically. ability with which q will be True at time 0. Goal states Problems are expressed using a dynamic-belief-network are specified by a partial assignment G to the set of representation. A subset of the state variables are de- propositions; any state that extends G is considered to clared observable, meaning that a plan can be made be a goal state. contingent on any of these variables. This schemeis suf- Each of a set A of actions probabilistically transforms ficiently expressive to allow a domain designer to make a state at time t into a state at time t + 1 and so induces a domainfully observable, unobservable, or to have ob- a probability distribution over the set of all states. In servations depend on actions and states in probabilistic this work, the effect of each action on each proposition ways. is represented as a separate decision tree (Boutilier In this paper, we present C-MAXPLAN, a MAXPLAN- Poole 1996). For a given action a, each of the decision based approach to contingent planning. In C-MAXPLAN, trees for the different propositions are ordered, so the a set of Boolean variables (the choice variables) encodes decision tree for one proposition can refer to both the the contingent plan and a second set (the chance vari- new and old values of previous propositions. The leaves ables) the probabilistic outcome of the plan--the sat- of a decision tree describe how the associated proposi- isfiability problem is to find the setting of the choice tion changes as a function of the state and action. variables that maximizes the probability of satisfaction A subset of the set of propositions is the set of ob- with respect to the chance variables. This is equiva- servable propositions. Each observable proposition has, listen-for-tiger open-left-door open-right-door representation of domains; that is, every possible com- l, tlller-behind-left-¢kmr I. Illler-behlnd-leflt*door I. tiger-behind-left*door e4pr4wkke14d!41~ Uletr4K~ml-kft-~r tl~r-belded4t ft41eo¢ bination of state variables is considered independently. In this section, we focus on more recent algorithms that exploit propositional state representations. Of course, 2. hear-tllier-behlnd.kfl.d(xw 2. hear-tilier-behind.len-door 2. h~r.tiger.behind-len*door any algorithm that can solve a planning problem in a flat representation can also be used to solve a problemin the propositional representation by enumerating states; in fact, this approach is often the fastest for domains with up to five or six fluents. One of the most well-known contingent planners for probabilistic domains is C-BURIDAN (Draper, Hanks, Weld 1994), which uses tree-based, probabilistic STRIPS operators to extend partial-order planning to stochastic domains. C-BURIDAN searches for a type of contingent plan whose probability of success meets or exceeds some prespecified threshold. Onder & Pollack (1997) point Figure 1: The effects of the actions in the TIGER prob- out that there are some problems with C-BURIDAN, and lem are represented by a set of decision trees. these could prevent it from solving arbitrary partially observable planning problems. Specifically, C-BURIDAN: ¯ automatically constructs branches to resolve other- as its basis, a proposition which represents the actual wise unresolvable conflicts without reasoning about status of the the thing being observed. (Note that al- whether the branch is worth constructing; though values are assigned to observable propositions in the initial state, no action at time 1 makes use of ¯ creates a branch to resolve the conflict by inserting these propositions in its decision trees, since there are an observation step be]ore the two conflicting actions, no valid observations at time 0.) even if inserting the observation step between the two The planning task is to find a plan that selects an actions would be better; action for each step t as a function of the value of ob- ¯ produces observation labels not related to actual servable propositions for steps before t. Wewant to propositions, so it needs to try all possible combi- find a plan that maximizes(or exceeds a user-specified nations of observation labels and action contexts in threshold for) the probability of reaching a goal state. order to determine which actions could be triggered. For example, consider a simple domain based on MAHINUR (Onder & Pollack 1997) is a probabilistic the TIGER problem of Kaelbling, Littman, & Cas- partial-order planner that combines C-BURIDAN’S prob- sandra (1998). The domain consists of four proposi- abilistic action representation and system for manag- tions: tiger-behind-left-door, dead, rewardedand hear- ing these actions with a CNLP-style approach to han- tiger-behind-left-door, the last of which is observable. dling contingencies. The novel feature of MAHINUR is In the initial state, tiger-behind-left-door is True with that it identifies those contingencies whosefailure would probability 0.5, dead is False, rewarded is False, have the greatest negative impact on the plan’s suc- and hear-tiger-behind-left-door is False (although irrel- cess and focuses its planning efforts on generating plan evant). The goal states are specified by the partial as- branches to deal with those contingencies. Onder & signment (rewarded, (not dead)). The three actions Pollack (1997) identify several domain assumptions (in- listen-for-tiger, open-left-door, and open-right- cluding a type of subgoal decomposability) that under- door (Figure 1). Actions open-left-door and open- lie the design of MAHINUR. There are no guarantees on right-door makereward True, as long as the tiger is the correctness of MAHINUR for domains in which these not behind that door (we assume the tiger is behind assumptions are violated. the right door if tiger-behind-left-door is False). Since Several researchers have extended GRAPHPLAN to tiger-behind-left-door is not observable, the listen action handle uncertainty. CONFORMANT GRAPHPLAN (Smith becomesimportant; it causes the observable hear-tiger- & Weld 1998) deals with uncertainty in the initial con- behind-left-door proposition to becomeequal to tiger- ditions and in the outcome of actions by attempting to behind-left-door with probability 0.85 (and its negation construct a nonsensing, noncontingent plan that will otherwise). By listening multiple times, it becomes succeed in all cases. PGRAPHPLAN (Blum & Lang- possible to reliably determine the location of the tiger. ford 1998) employs forward search through the plan- ning graph to find the contingent plan with the highest Related Work expected utility in a completely observable stochastic The type of partially observable planning problem we environment. SENSORYGRAPHPLAN (SGP) (Weld, address, featuring actions with probabilistic effects and derson, & Smith 1998), unlike CONFORMANT GRAPH- noisy observations, is a form of partially observable PLAN,constructs plans with sensing actions that gather Markov decision process (POMDP). Algorithms for information to be used later in distinguishing between POMDPs have been around for manyyears, and use a fiat different plan branches. Thus, SGP is an approach to 84 constructing contingent plans. However, SGP has not need to alter the decision trees of the actions. In the fol- been extended to handle actions with uncertain effects lowing two sections, we describe the new variables and (except in the conformant case) and imperfect observa- clauses required for a single action step of the plan, tions, so it is applicable to a subset of partially observ- with the understanding that the variables and clauses able planning problems. required for the always-noncontingent initial action are Boutilier & Poole (1996) describe an algorithm for somewhat different. Following that, we will describe solving partially observable planning problems based the necessary alterations to the action decision trees. on an earlier algorithm for completely observable prob- lems. While promising, little computational experience Variables with this algorithm is available. A subset of the propositions in our planning domain is designated as the set of observable propositions; these C-MAXPLAN: E-MAJSAT are the propositions on which we wish to condition ac- tions. Each one of these propositions requires: MAXPLAN(Majercik & Littman1998a;1998b)was ini- ¯ two variables indicating the status of our knowledge tiallydeveloped to solveprobabilistic planning prob- of the observable proposition; a knowledge variable lems in completely unobservable domains.The MAX- indicating whether we have observed the status of PLAN approach solvesa planning problem by firstcon- the observable proposition and a perception variable vertingthe planningproblemto an E-MAJSATprob- indicating the agent’s perception of the observable lem (Littman,Goldsmith, & Mundhenk1998). proposition, MAXPLAN solves the E-MAJSAT problem using a modifiedversionof the Davis-Putnam-Logemann- ¯ an action (possibly an existing one) that sets the Loveland (DPLL)procedure for determining satisfia- knowledge and perception variables, bility. Essentially, itusesDPLLto determine allpossi- ¯ three variables indicating the status of our condition- blesatisfying assignments, sumstheprobabilities of the ing on the perception variable; either we want the satisfying assignments foreachpossible choice-variable perception to be True or we want it to be False or assignment, andthenreturnsthe choice-variable as- we Don’ t-Care, and signment (plan)withthe highestprobability of pro- ¯ a single variable indicating whether the condition at- ducing a satisfying assignment (goalsatisfaction). The tached to that perception variable has been satisfied. algorithm usesan efficient splitting heuristic (time- Finally, for each action step there is a single variable ordered splitting), memoization, and caching (Majercik that indicates whether all the conditions specified for & Littman 1998a; 1998b) in order to accelerate this that action step were fulfilled. procedure. Although MAXPLAN was initialy developed to solve Clauses probabilistic planning problems in completely unob- There are two necessary groups of clauses: servable domains, it can solve contingent planning prob- lems given an appropriate problem encoding. This * We need a group of clauses that enforces mutual highlights the benefit of separating the planning prob- exclusivity of the possible desired conditions for a lem into two modules--problem conversion and prob- perception variable; for each perception variable, we lem solution. Supporting contingent planning requires want exactly one of the following to be true: the con- only that the E-MAJSAT conversion unit allow the en- dition requires that the agent perceive the proposi- coding of contingent actions. tion to be True, the condition requires that the agent More specifically, plans in C-MAXPLAN are encoded in perceive the proposition to be False, or the condition specifies that the agent’s perception of the proposi- a manner similar to that of C-BURIDAN (Draper, Hanks, & Weld 1994). A contingent action is expressed as a se- tion is irrelevant (Don’ t-Care). quence of actions, each one of which executes only if its ¯ For each possible status (True/False/Don’t-Care) set of conditions matches the current set of observation of each perception upon which we are conditioning, labels generated by previous actions. Since the condi- we need a clause that describes what constitutes sat- tion sets of the actions are mutually exclusive, the net isfaction of that condition; e.g. if condition-1 re- result is that at most one contingent action will execute, quires that proposition-2 be False and the knowledge depending on current conditions. variable indicating whether the agent has observed Encoding contingent plans in C-MAXPLAN thus re- proposition-2 is True and the perception variable for quires a change of perspective. C-MAXPLAN still con- proposition-2 is False, then condition-1 is satisfied. siders actions sequentially but, since a series of actions The required alterations of the decision trees are can nowbe interpreted as a single contingent action for straightforward. Of course, we need new decision trees a single time step, there is no longer a one-to-one cor- in each action describing that action’s impact on the respondence between actions and time steps. It is now new variables. More importantly, we need to alter the more appropriate to speak of an n-action plan rather existing decision trees such that the action has no im- than an n-time-step plan. In order to encode contingent pact at all if the all-conditions-satisfied variable is not plans, we need additional variables and clauses, and we True for that action step. 85 Although C-MAXPLAN’S encoding technique is similar based on the probability calculated so far and the ini- to that of C-BUR1DAN, the C-MAXPLAN solution tech- tial threshold. For choice variables, only upper bound nique has some advantages. thresholding is possible. If the probability of success ¯ C-MAXPLAN does not automatically insert a con- given an assignment of True to this variable is higher ditional branch under certain circumstances as C- than our threshold, we can prune all plans in which this BURIDAN does; solving the E-MAJSATformula re- variable would be assigned False. For chance variables, quires C-MAXPLAN to reason about where an obser- we can perform both upper bound pruning and lower vation and conditional branch would be worthwhile. bound pruning. The upper bound pruning is similar to that used for choice variables, although the thresh- ¯ C-MAXPLAN can insert a conditional branch any- olds passed downthe calculation tree must be appropri- where in the plan; it is not restricted to inserting ately adjusted. Lower bound pruning makes use of the observations and conditional branches immediately fact that a sufficiently low probability of success on the before conflicting actions, as C-BURIDAN is. first truth assignment checked ensures failure. If the ¯ In C-MAXPLAN the result of an observation is a propo- probability of success given the first assignment to the sition, so the algorithm only has to check the status chance variable is low enough, we can determine that of the propositions generated by observation actions the probability weighted average of both truth assign- to see if an action’s conditions are satisfied, unlike C- ments could not possibly meet our adjusted threshold BURIDAN, which has to try matching an action’s con- and we can return failure without calculating the prob- ditions against all observation labels that have been ability of success of the other truth assignment. generated so far. Thresholding failed to reduce C-MAXPLAN’S running Solutions using this encoding are a series of grouped time on the MEDICAL-4ILL problem to reasonable lev- action steps with attached conditions. A group can els. A different approach to plan encoding, however, describe either a noncontingent action or a contingent did yield a significant improvementin performance. In- action. A noncontingent action group contains a single stead of searching for the optimal plan of a given length, action step, all of whose conditions are Don’t-Care. A we search for an optimal small policy to be applied for contingent action group contains a set of action steps a given number of steps. In this approach, the deci- and attached conditions. These condition sets are mu- sion trees from all actions for each proposition p are tually exclusive; hence, only one action in the group will merged into a single decision tree that describes the be effective when the plan is executed. For example, the impact of all the actions on p via a cascade of condition- optimal plan in the 2-step TIGERproblem is to listen- fulfillment variables. Essentially, the decision tree says: for-tiger, then open-right-door if hear-tiger-behind- If the conditions specified by the policy for action a left-door is True; otherwise open-left-door. The non- are satisfied, then decide the status of p according to contingent first action, listen-for-tiger, would be in a’s decision tree; otherwise, if the conditions for action a group by itself with Don’t-Care-about-hear-tiger- b are satisfied, then decide the status of p according to behind-left-door as its condition. The actions open- b’s decision tree; ...; otherwise, the status of p remains right-door and open-left-door would be in a sec- the same. ond action group, with conditions hear-tiger-behind-left- In this encoding, we have a single action that is al- door-is-True and hear-tiger-behind-left-door-is-False re- ways taken--follow-the-policy--and the choice variables spectively. in the encoding describe the conditions under which Initial tests of this contingent planning technique an action should be executed. The setting of these were promising. Even the most basic version of C- condition-specification variables describes a policy to MAXPLAN (without memoization) could solve the 2-step follow, and the algorithm searches for the setting of T
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks