Planning AI: search in situation graph
Why planning?
Suppose that you play with a computer agent.
It is in a given situation, and has some goal.
This agent executes a behavior in order to achieve the goal.
A behavior is a set of actions.
It may choose the behavior at least 2 different manners.
1. Reflex agent
Reflex agent selects the behavior using its decision tree (DT) or finite state machine (FSM) processor.
The agent has a function that maps the state of the world to actions:
actions = hypothesis( wordState )
This is a composit function of atomic functions and other composite functions (implementing the composite pattern).
The hypothesis is function must be the best possible mapping of world to actions.
If this composition is reconfigurable in runtime, and there is function that does so, it is a learning agent.
Neural networks are good example of reconfigurable function compositions.
An option is to have a set of rules that calculates priorities of the actions from the input using weighting.
The priorities may be very close to each other; the difference can be smaller than the error limit.
These weights can be even wrong. The learning method may adjust the weights if the selected action did not help.
2. Planner agent
A situation is a state of the world.
An action is a transition between 2 situations.
The situations and transitions represent a situation graph.
The planning agent finds the path in this graph from the start situation to the goal situation.
This path should be the shortest (has the lowest cost) possible one.
This path is a list of actions, this is the selected behavior.
The behavior processor then executes the actions one after another.
If the plan is completed, the goal is achieved.
If it is clear that it can't be completed, or isn't worth to complete, the planner must create a new plan.
We will examine this second agent.
It is almost impossible to build for example a chess partner reflex agent.
How planner works
Situations (current and goal also) is represented as a set of variables that describes the state of the world
Rules consist of results, predicate and actions. Predicates and results are variables with values.
The rule is applicable in a situation if all the predicates are fulfilled,
which means that each referenced variable has the required value.
Then the actions referred by this rule will be executed.
Then the state of the world will contain the results
which means that the each referenced variable will have specified value.
Situations also store the rule that led here, and the current total cost from the start.
Breath first search
There are a lot of algorithms to search in a graph like depth first, Dijkstra, A*, breath first, etc.
Depth first search is not guaranteed to find the path to the goal situation at all, and may fall in an endless loop.
Breath first search needs a lot of memory to store all the situations, but can find the least cost path.
However we choose breath first for its perfect solution and try to keep the number of situations low.
We will achieve this by keeping the number of situation variables, and number of rules low.
This is only a question of modeling. A good model hides a lot of unnecessary details.
The example
In this example a role-playing character wants to kill a huge enemy.
It is placed in the world of our game: Wing: Released Spirits.
There are some diamonds in its world, and an inn, a shop, and of course the huge enemy.
In the start situation the agent is tired, and has no weapons.
The goal situation is that it kills the huge enemy.
A rule states that with sword and shield a fresh agent can kill the enemy.
Other rules state that it must have money to buy sword and shield from the merchant.
Yet another rules: if the bot finds a diamond, the bot can sell it to to the merchant to have money.
Optimizations
The planner keeps track of costs of actions.
Applying a rule at a situation, the agent can get to an already investigated situation.
However the cost this way to this situation can be less then an older one.
If it is so, the planner stores the current preceding situation and the rule to the visited situation.
It is also worth to have separate planners for different tasks.
This way the exponential explosion of number of situations does not happen.
This example finds the goal within 47 iterations.
The rule selection may use heuristics instead of linear search. It may find the first rules to try just as a reflex agent.
The planner can learn also by taking the fruitless rule selections later into account.
It may recalculate the weights in rule selection, just as a learning reflex agent would do (see above).
Overall structure of this small planner
plan
From the current situation it applies all the rules whose predicates are fulfilled.
This step is called extending the situation graph point. This way it gets to new or already existing situations.
Then it continues with the extension of new situations, as far as it gets to the goal situation.
setup
Registers the state variables and the actions and the rules
Each variable has one letter name, such as location is 'H'
Each value has also a one-letter name, such as 'm' means at merchant.
test
Sets up a the example situation with the tired weaponless agent,
and the goal of killing the huge enemy.
The state variables
'K' means 'has sword'
'G' means 'has diamond'
'P' means 'has money'
'R' means 'has shield'
'H' is the location. value m=at merchant, c=at enemy, a=at diamond, h=at inn
'F' is fitness.
'B' is '1' if the enemy is killed
The actions
'k' is 'buy sword'.
'p' is 'buy shield'.
'g' is 'sell diamond'.
'c' is 'go to enemy'.
'm' is 'go to merchant'.
's' is 'find diamond'.
'h' is 'go to inn'.
'f' is 'fight enemy'.
'a' is 'sleep'.
't' is 'take diamond'.
The rules
The results and predicates are variable-value pairs,
for example addRule("K1HmP1k0",1,2):
1 and 2 means it has 1 result and then 2 predicate, the remainings are actions.
- K1 result in the string means 'has sword' is true,
- Hm predicate means location is at merchant,
- P1 predicate means 'has money' is true,
- k0 action means 'buy sword', the value 0 is ignored in this action.
The result will look like:
Beware of the reversed step order, because it logs actions from the goal backward to start situation.
goal found
act: fight enemy costs: 1.0
act: go to enemy costs: 1.0
act: sleep costs: 1.0
act: go to inn costs: 1.0
act: buy shield costs: 1.0
act: buy sword costs: 1.0
act: sell diamond costs: 1.0
act: go to merchant costs: 1.0
act: take diamond costs: 1.0
act: find diamond costs: 1.0
Bibliography
[1] Tricks of the Windows Game Programming Gurus, Sams, 1999, ISBN 0-672-31361-8
[2] Artificial Intelligence. A Modern Approach, Prentice Hall, 1995, ISBN 963 454 241 1
Source code
/*------------------------------------------------------------------------------
Planner.cpp
AI planning based on breath first search in situation graph
(C) Attila Szalkai, 2005. February 22. White Stone Studio
--------------------------------------------------------------------------------*/
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#define RULEOP 20
#define MAXRULE 200
#define MAXOPDEF 255
#define OPDEFLEN 40
#define MAXOP 200
#define MAXSTATE 30
#define MAXSITU 1000
#define DONT_CARE 2
int logm = 1;
//--------------------- RULE ------------------
struct Rule {
int resn,predn,actn;
int ops[RULEOP][2];
int *pred(int n) { return ops[resn+n]; }
int *act (int n) { return ops[resn+predn+n]; }
};
//--------------------- SITU ------------------
struct Situ {
int from;
char rule;
float cost;
char done;
char s[MAXSTATE];
Situ();
void copy( Situ *from );
bool equals( Situ *to );
void print();
void fill(char v);
void set(char n, char v);
};
//--------------------- PLAN ------------------
struct Planner
{
// STATES
char opDefs[MAXOPDEF][OPDEFLEN];
void addOp( int code, char *name);
void addState( int code, char *name );
// RULES
Rule rules[MAXRULE];
int rulen;
char rulest[MAXRULE][30]; // debug
void addRule( char *st, int resn, int predn);
void addRule( int *it, int resn, int predn);
bool isRuleUtil(Rule *r, Situ *w );
bool isRuleApplicable(Rule *r, Situ *w);
void applyRule(Rule *r, Situ *w);
// PLAN
int steps[MAXOP][2];
int stepn;
void collectSteps( int j);
// PLANNING
Situ goal;
Situ situs[ MAXSITU ];
int situn;
bool plan();
void setup();
void init();
};
//====================================================
//-----------------------------------------------
Situ::Situ() {
from=-1;
cost=-1;
for(int i=0; i<MAXSTATE; i++) s[i]=0;
}
//-----------------------------------------------
void Situ::copy( Situ *from ) {
for(int i=0; i<MAXSTATE; i++) s[i]= from->s[i];
}
//-----------------------------------------------
void Situ::fill(char v){
for(int i=0; i<MAXSTATE; i++) s[i]= v;
}
//-----------------------------------------------
void Situ::set(char n,char v){
s[n]=v;
}
//-----------------------------------------------
bool Situ::equals( Situ *to ) {
for(int i=0; i<MAXSTATE; i++)
if (s[i]!=to->s[i] && s[i]!=DONT_CARE)
return false;
return true;
}
//-----------------------------------------------
void Situ::print() {
int i;
for(i=0; i<MAXSTATE; i++)
if (s[i])
printf("%c ",i+'A');
printf("\n");
}
//--------------------------------------------------------
// At least one result needed
//--------------------------------------------------------
bool Planner::isRuleUtil(Rule *r, Situ *w )
{
for(int j=0; j<r->resn; j++)
{
int var = r->ops[j][0]-'A';
int val = r->ops[j][1];
if (w->s[ var ] != val ) {
if (logm) printf("has feature: s[%c] to %c\n",var+'A', val);
return true;
}
}
return false;
}
//--------------------------------------------------------
// Predicates are fulfilled
//--------------------------------------------------------
bool Planner::isRuleApplicable(Rule *r, Situ *w)
{
for(int j=0; j<r->predn; j++)
{
int var = r->pred(j)[0] - 'A';
int val = r->pred(j)[1];
if (w->s[var] != val ) {
if (logm) printf("condition fails: s[%c] isnt %c\n",var+'A', val);
return false;
}
}
return true;
}
//--------------------------------------------------------
// Apply rule. Does not execute the operation,
// changes the predicates with the results instead.
//--------------------------------------------------------
void Planner::applyRule(Rule *r, Situ *w)
{
for(int j=0; j<r->resn; j++) {
int var = r->ops[j][0]-'A';
w->set( var, r->ops[j][1] );
//if (logm) printf(" res %c %s\n", k, opDefs[k]);
}
}
//--------------------------------------------------------
// Collect the acts from goal point back to start
//--------------------------------------------------------
void Planner::collectSteps( int j)
{
do {
// LIST STEP
Rule *r = &rules[ situs[j].rule ];
// ALL ACTS OF RULE
for(int i=0; i<r->actn; i++)
{
int *act = r->act(i);
steps[stepn][0] = act[0];
steps[stepn][1] = act[1];
stepn++;
if (logm) printf("act: %c%d %s costs: %f\n", act[0], act[1], opDefs[act[0]], situs[j].cost );
}
j = situs[j].from;
} while (j);
}
//-----------------------------------------------
// Breath firs minimal cost situ graph search
// Try all operation from the situation point
//-----------------------------------------------
bool Planner::plan()
{
int situ=0;
int rule=0;
Situ situw;
stepn=0;
if (logm) situs[0].print();
while(1)
{
// STEP SITU
if (rule==rulen)
{
situs[situ].done=1;
for(; situ<situn && situs[situ].done ; situ++);
rule=0;
// NO MORE SITU, NO SOLUTION
if (situ==situn)
return false;
}
// TRY RULE
Rule *r = &rules[rule];
if (logm) printf("\nLoop situ: %d rule: %d %s\n", situ, rule, rulest[rule] );
if(isRuleUtil (r, &situs[situ])
&& isRuleApplicable(r, &situs[situ]))
{
// APPLY RULE RESULTS TO CURRENT SITU
if (logm) printf("rule fits\n");
situw.copy( &situs[situ] );
applyRule( r, &situw );
if (logm) printf("situ after rule %d: ", rule); situw.print();
// CALCULATE COST OF ACTION, DEPENDANT ON CURRENT SITUATION AND LOCATION
float cost = 1;
float costSum = situs[situ].cost + cost;
// FIND SIMILAR SITU
for(int j=0; j<situn && !situw.equals( &situs[j] ); j++);
if (j==situn)
{
// OUT OF MEMORY
if (j==MAXSITU)
return false;
// ADD NEW SITU
situw.from = situ;
situw.cost = cost;
situw.done = 0;
situw.rule = rule;
situs[j] = situw;
situn++;
if (logm) printf("new situ\n", j);
} else {
// COME FROM LESS COST
if (logm) printf("existing situ %d -> %d costs: %f\n",situ, j, cost );
if (cost < situs[j].cost) {
if (logm) printf("better cost than %f\n", situs[j].cost );
situs[j].from = situ;
situs[j].cost = cost;
situs[j].rule = rule;
situs[j].done = 0; // To update costs
}
}
// GOAL FOUND
if (goal.equals(&situw)) {
if (logm) printf("goal found\n", j);
collectSteps(j);
return true;
}
}
rule++;
}
}
//---------------------------------------------
// Add an op definition
//---------------------------------------------
void Planner::addOp( int code, char *name)
{
strcpy( opDefs[code],name );
}
//---------------------------------------------
// Add an op definition
//---------------------------------------------
void Planner::addState( int code, char *name )
{
strcpy( opDefs[code],name );
}
//---------------------------------------------
// Builds a rule from string
// And adds to the rule list
//
// st: rule coding string
// resn : number of results in rule
// predn: number of predicates in rule
//---------------------------------------------
void Planner::addRule( char *st, int resn, int predn)
{
strcpy( rulest[rulen], st);
Rule *r = &rules[rulen++];
int len = strlen(st)/2;
for(int i=0; *st; i++) {
r->ops[i][0]=*st++;
r->ops[i][1]=*st++;
}
r->resn =resn;
r->predn=predn;
r->actn =len-resn-predn;
}
//---------------------------------------------
// Builds a rule int array
// And adds to the rule list
//
// it : rule coding string
// resn : number of results in rule
// predn: number of predicates in rule
//---------------------------------------------
void Planner::addRule( int *it, int resn, int predn)
{
strcpy( rulest[rulen], "*");
Rule *r = &rules[rulen++];
for(int i=0; *it>=0; i++) {
r->ops[i][0]=*it++;
r->ops[i][1]=*it++;
}
r->resn = resn;
r->predn= predn;
r->actn = i-resn-predn;
}
void Planner::init()
{
for(int i=0; i<MAXOPDEF; i++) {
opDefs[i][0]=i; opDefs[i][1]=0;
}
rulen=0;
situn=1;
situs[0].fill(0);
goal.fill(DONT_CARE);
}
void Planner::setup()
{
//--------------------------State variables
// Has someting
addState( 'K',"has sword");
addState( 'G',"has diamon");
addState( 'P',"has money");
addState( 'R',"has shield");
// Location
addState( 'H',"location"); //m merchant, c enemy, a diamond, h inn
// Fittness
addState( 'F',"fitt"); // 0 1
// Goal
addState( 'B',"enemy is killed");
//--------------------------Actions
// Trade
addOp( 'k',"buy sword");
addOp( 'p',"buy shield");
addOp( 'g',"sell diamond");
// Travel
addOp( 'c',"go to enemy");
addOp( 'm',"go to merchant");
addOp( 's',"find diamond");
addOp( 'h',"go to inn");
// Act
addOp( 'f',"fight enemy");
addOp( 'a',"sleep");
addOp( 't',"take diamond");
//--------------------------Rules
//Kills enemy
addRule("B1HcR1K1F1f0",1,4);
//Trade
addRule("K1HmP1k0",1,2);
addRule("R1HmP1p0",1,2);
addRule("P1HmG1g0",1,2);
//Travel
addRule("Hmm0",1,0);
addRule("Hcc0",1,0);
addRule("Has0",1,0);
addRule("Hhh0",1,0);
//Sleep
addRule("F1HhP1a0",1,2);
//Get diamond
addRule("G1Hat0",1,1);
}
int test()
{
Planner pl;
pl.init();
pl.setup();
// START AND GOAL
pl.situs[0].set('F'-'A','0');
pl.goal.set('B'-'A','1');
pl.plan();
return 0;
}
//-----------Lazy couple interface ---------------------------
int plannerNew()
{
Planner *pl = new Planner;
pl->init();
pl->setup();
return (int) pl;
}
void plannerDel(int id)
{
Planner *pl = (Planner*)id;
delete pl;
}
void plannerRule(int id, char *rule, int resn, int predn )
{
Planner *pl = (Planner*)id;
pl->addRule( rule, resn, predn);
logm = 0; //!!
}
void plannerPlan(int id, char *situ, char *goal, char *plan)
{
Planner *pl = (Planner*)id;
char *r;
// REGISTER START
for(r = situ; *r; r+=2)
pl->situs[0].set(r[0]-'A',r[1]);
// REGISTER GOAL
for(r = goal; *r; r+=2)
pl->goal.set(r[0]-'A',r[1]);
// PLAN
r = plan;
if (pl->plan() && plan) {
// STORE STEPS
for(int i=0; i<pl->stepn; i++) {
*r++=pl->steps[i][0];
*r++=pl->steps[i][1];
}
}
*r=0;
}
/***********************************************************
An example of inserting planner to a game:
Wing: Released Spirits
***********************************************************/
#ifdef EMBEDDED
//--------- Example usage in game engine ----------------------
#include <stdlib.h>
#include "ani3d.h"
#include "types.h"
#include "game.h"
#include "actor.h"
//-------------------------------------------------------------
// Helper for merchants
//-------------------------------------------------------------
int findGood( Actor *m, int func )
{
int i;
for(i=0; i<ACTOR_TRADE; i++) {
Objdef *goodDef = (Objdef*)m->goods[i];
if (goodDef->func is func)
return i;
}
return -1;
}
//-------------------------------------------------------------
//
//-------------------------------------------------------------
int Actor::PlanStep()
{
planCnt++;
planTck=0;
Planner *pl = (Planner *)planner;
if (pl->steps[planCnt]<0) {
planner=0;
return 0;
}
return 1;
}
//-------------------------------------------------------------
// Execute the plan
//-------------------------------------------------------------
int Actor::PlanProc()
{
if (!planner) return 0;
Planner *pl = (Planner *)planner;
static Actor *merchant; if (!merchant) merchant = (Actor*)scene.EntFind("traderCity");
static Actor *boss; if (!boss) boss = scene.player;
static Object*diamond; if (!diamond) diamond = (Object*)scene.EntFind("diamond:2");
if (pl->steps[planCnt]<0) {
planner=0;
// Sign ready !
return 0;
}
int act = pl->steps[ planCnt ][0];
int arg = pl->steps[ planCnt ][1];
int i;
switch(act)
{
//--------------------------------
case 'k': //buy sword
if (planTck==1) {
i = findGood( merchant, ofSWORD );
if (i>=0)
Buy( merchant, i, 100);
}
if (planTck==50)
PlanStep();
break;
//--------------------------------
case 'p': //buy shield
if (planTck==1) {
i = findGood( merchant, ofSHIELD );
if (i>=0)
Buy( merchant, i, 100);
}
if (planTck==50)
PlanStep();
break;
//--------------------------------
case 'g': //sell diamond
if (planTck==1) {
int n = InvFind( (Ent *)diamond );
if (n>=0)
Sell( merchant, n, 500);
}
if (planTck==50)
PlanStep();
break;
//--------------------------------
case 'c': //go to enemy
if (planTck==1)
Act( evGOTO, boss );
if (Dist2(boss->wmtx.t)<900)
PlanStep();
break;
//--------------------------------
case 'm': //go to merchant
if (planTck==1)
Act( evGOTO, merchant );
if (Dist2(merchant->wmtx.t)<900)
PlanStep();
break;
//--------------------------------
case 's': //find diamond
if (planTck==1)
Act( evGOTO, diamond );
if (Dist2(diamond->wmtx.t)<25)
PlanStep();
break;
//--------------------------------
case 'f': //fight enemy
if (planTck==1)
Act( evCHASE, boss );
if (boss->health<=0)
PlanStep();
break;
//--------------------------------
case 'a': //sleep
if (planTck==1)
Act( evSLEEP, boss );
if (planTck==200)
PlanStep();
break;
//--------------------------------
case 't': //take diamond
if (planTck==1)
SetState( evTAKE, diamond);
if (planTck==200)
PlanStep();
break;
}
planCnt++;
return 1;
}
#endif