Planning AI: search in situation graph


Why planning?
Suppose that you play with a computer agent.
It is in a given situation, and has some goal.
This agent executes a behavior in order to achieve the goal.
A behavior is a set of actions.
It may choose the behavior at least 2 different manners.
1. Reflex agent
Reflex agent selects the behavior using its decision tree (DT) or finite state machine (FSM) processor.
The agent has a function that maps the state of the world to actions:
actions = hypothesis( wordState )
This is a composit function of atomic functions and other composite functions (implementing the composite pattern).
The hypothesis is function must be the best possible mapping of world to actions.

If this composition is reconfigurable in runtime, and there is function that does so, it is a learning agent.
Neural networks are good example of reconfigurable function compositions.
An option is to have a set of rules that calculates priorities of the actions from the input using weighting.
The priorities may be very close to each other; the difference can be smaller than the error limit.
These weights can be even wrong. The learning method may adjust the weights if the selected action did not help.
2. Planner agent
A situation is a state of the world.
An action is a transition between 2 situations.
The situations and transitions represent a situation graph.
The planning agent finds the path in this graph from the start situation to the goal situation.
This path should be the shortest (has the lowest cost) possible one.

This path is a list of actions, this is the selected behavior.
The behavior processor then executes the actions one after another.
If the plan is completed, the goal is achieved.
If it is clear that it can't be completed, or isn't worth to complete, the planner must create a new plan.

We will examine this second agent.
It is almost impossible to build for example a chess partner reflex agent.

How planner works
Situations (current and goal also) is represented as a set of variables that describes the state of the world
Rules consist of results, predicate and actions. Predicates and results are variables with values.
The rule is applicable in a situation if all the predicates are fulfilled,
which means that each referenced variable has the required value.
Then the actions referred by this rule will be executed. Then the state of the world will contain the results
which means that the each referenced variable will have specified value.
Situations also store the rule that led here, and the current total cost from the start.
Breath first search
There are a lot of algorithms to search in a graph like depth first, Dijkstra, A*, breath first, etc.
Depth first search is not guaranteed to find the path to the goal situation at all, and may fall in an endless loop.
Breath first search needs a lot of memory to store all the situations, but can find the least cost path.
However we choose breath first for its perfect solution and try to keep the number of situations low.
We will achieve this by keeping the number of situation variables, and number of rules low.
This is only a question of modeling. A good model hides a lot of unnecessary details.
The example
In this example a role-playing character wants to kill a huge enemy.
It is placed in the world of our game: Wing: Released Spirits.
There are some diamonds in its world, and an inn, a shop, and of course the huge enemy.
In the start situation the agent is tired, and has no weapons.
The goal situation is that it kills the huge enemy.
A rule states that with sword and shield a fresh agent can kill the enemy.
Other rules state that it must have money to buy sword and shield from the merchant.
Yet another rules: if the bot finds a diamond, the bot can sell it to to the merchant to have money.
Optimizations
The planner keeps track of costs of actions.
Applying a rule at a situation, the agent can get to an already investigated situation.
However the cost this way to this situation can be less then an older one.
If it is so, the planner stores the current preceding situation and the rule to the visited situation.
It is also worth to have separate planners for different tasks.
This way the exponential explosion of number of situations does not happen.
This example finds the goal within 47 iterations.

The rule selection may use heuristics instead of linear search. It may find the first rules to try just as a reflex agent.
The planner can learn also by taking the fruitless rule selections later into account.
It may recalculate the weights in rule selection, just as a learning reflex agent would do (see above).
Overall structure of this small planner
plan
From the current situation it applies all the rules whose predicates are fulfilled.
This step is called extending the situation graph point. This way it gets to new or already existing situations.
Then it continues with the extension of new situations, as far as it gets to the goal situation.

setup
Registers the state variables and the actions and the rules
Each variable has one letter name, such as location is 'H'
Each value has also a one-letter name, such as 'm' means at merchant.

test
Sets up a the example situation with the tired weaponless agent,
and the goal of killing the huge enemy.
The state variables
'K' means 'has sword'
'G' means 'has diamond'
'P' means 'has money'
'R' means 'has shield'
'H' is the location. value m=at merchant, c=at enemy, a=at diamond, h=at inn
'F' is fitness.
'B' is '1' if the enemy is killed
The actions
'k' is 'buy sword'.
'p' is 'buy shield'.
'g' is 'sell diamond'.
'c' is 'go to enemy'.
'm' is 'go to merchant'.
's' is 'find diamond'.
'h' is 'go to inn'.
'f' is 'fight enemy'.
'a' is 'sleep'.
't' is 'take diamond'.
The rules
The results and predicates are variable-value pairs,
for example addRule("K1HmP1k0",1,2):
1 and 2 means it has 1 result and then 2 predicate, the remainings are actions.
- K1 result in the string means 'has sword' is true,
- Hm predicate means location is at merchant,
- P1 predicate means 'has money' is true,
- k0 action means 'buy sword', the value 0 is ignored in this action.
The result will look like:
Beware of the reversed step order, because it logs actions from the goal backward to start situation.
goal found
act: fight enemy costs: 1.0
act: go to enemy costs: 1.0
act: sleep costs: 1.0
act: go to inn costs: 1.0
act: buy shield costs: 1.0
act: buy sword costs: 1.0
act: sell diamond costs: 1.0
act: go to merchant costs: 1.0
act: take diamond costs: 1.0
act: find diamond costs: 1.0
Bibliography
[1] Tricks of the Windows Game Programming Gurus, Sams, 1999, ISBN 0-672-31361-8
[2] Artificial Intelligence. A Modern Approach, Prentice Hall, 1995, ISBN 963 454 241 1
Source code
/*------------------------------------------------------------------------------
Planner.cpp 
    AI planning based on breath first search in situation graph
    (C) Attila Szalkai, 2005. February 22.    White Stone Studio

--------------------------------------------------------------------------------*/
 
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
 
#define RULEOP  20
#define MAXRULE 200
 
#define MAXOPDEF 255
#define OPDEFLEN 40 
#define MAXOP 200

#define MAXSTATE 30
#define MAXSITU  1000
#define DONT_CARE 2

int logm = 1;
 
//--------------------- RULE ------------------
struct Rule {
    int resn,predn,actn;
    int ops[RULEOP][2];    
    int *pred(int n) { return ops[resn+n]; }
    int *act (int n) { return ops[resn+predn+n]; }
};


//--------------------- SITU ------------------

struct Situ {
    int   from;
    char  rule;
    float cost;
    char  done;
    char  s[MAXSTATE];

    Situ();
    void copy( Situ *from );
    bool equals( Situ *to );
    void print();
    void fill(char v);
    void set(char n, char v);
}; 

 
//--------------------- PLAN ------------------
struct Planner
{
    // STATES
    char opDefs[MAXOPDEF][OPDEFLEN];    
    void addOp( int code, char *name);            
    void addState( int code, char *name );

    // RULES     
    Rule rules[MAXRULE];
    int  rulen;
    char rulest[MAXRULE][30]; // debug    
    void addRule( char *st, int resn, int predn);    
    void addRule( int  *it, int resn, int predn);
    bool isRuleUtil(Rule *r, Situ *w );
    bool isRuleApplicable(Rule *r, Situ *w);
    void applyRule(Rule *r, Situ *w);

    // PLAN
    int  steps[MAXOP][2];
    int  stepn;         
    void collectSteps( int j);

    // PLANNING
    Situ goal;
    Situ situs[ MAXSITU ];    
    int  situn;
    bool plan();
    void setup();
    void init();
};


//====================================================


//-----------------------------------------------
Situ::Situ() {    
    from=-1;
    cost=-1;
    for(int i=0; i<MAXSTATE; i++) s[i]=0;
}

//-----------------------------------------------
void Situ::copy( Situ *from ) {
    for(int i=0; i<MAXSTATE; i++) s[i]= from->s[i];
}

//-----------------------------------------------
void Situ::fill(char v){
    for(int i=0; i<MAXSTATE; i++) s[i]= v;    
}

//-----------------------------------------------
void Situ::set(char n,char v){
    s[n]=v;
}

//-----------------------------------------------
bool Situ::equals( Situ *to ) {
    for(int i=0; i<MAXSTATE; i++)
        if (s[i]!=to->s[i] && s[i]!=DONT_CARE)
            return false;
    return true;
}

//-----------------------------------------------
void Situ::print() {
    int i;
    for(i=0; i<MAXSTATE; i++) 
        if (s[i])
            printf("%c ",i+'A');
    printf("\n");
}


//--------------------------------------------------------
// At least one result needed        
//--------------------------------------------------------
bool Planner::isRuleUtil(Rule *r, Situ *w )
{    
    for(int j=0; j<r->resn; j++)
    {
        int var = r->ops[j][0]-'A';
        int val = r->ops[j][1];
        if (w->s[ var ] != val ) {
            if (logm) printf("has feature: s[%c] to %c\n",var+'A', val);
            return true;                
        }
    }
    return false;
}

//--------------------------------------------------------
// Predicates are fulfilled            
//--------------------------------------------------------
bool Planner::isRuleApplicable(Rule *r, Situ *w)
{    
    for(int j=0; j<r->predn; j++)
    {                            
        int var = r->pred(j)[0] - 'A';
        int val = r->pred(j)[1];
        if (w->s[var] != val ) {
            if (logm) printf("condition fails: s[%c] isnt %c\n",var+'A', val);
            return false;
        }                             
    }
    return true;
}


//--------------------------------------------------------
// Apply rule. Does not execute the operation, 
// changes the predicates with the results instead.
//--------------------------------------------------------
void Planner::applyRule(Rule *r, Situ *w)
{
    for(int j=0; j<r->resn; j++) {
        int var = r->ops[j][0]-'A';        
        w->set( var, r->ops[j][1] );
        //if (logm) printf("  res %c %s\n", k, opDefs[k]);
    }    
}


//--------------------------------------------------------
// Collect the acts from goal point back to start
//--------------------------------------------------------
void Planner::collectSteps( int j)
{
    do {
        // LIST STEP
        Rule *r = &rules[ situs[j].rule ];
        
        // ALL ACTS OF RULE
        for(int i=0; i<r->actn; i++)
        {   
            int *act = r->act(i);
            steps[stepn][0] = act[0];
            steps[stepn][1] = act[1];
            stepn++;
            if (logm) printf("act: %c%d %s costs: %f\n", act[0], act[1], opDefs[act[0]], situs[j].cost );            
        }
        j = situs[j].from;
    } while (j);
}

//-----------------------------------------------
// Breath firs minimal cost situ graph search 
// Try all operation from the situation point    
//-----------------------------------------------
bool Planner::plan()
{    
    int  situ=0;
    int  rule=0;        
    Situ situw;
    
    stepn=0;
    if (logm) situs[0].print();                    

    while(1)
    {                
        // STEP SITU
        if (rule==rulen)
        {
            situs[situ].done=1;            
            for(; situ<situn && situs[situ].done ; situ++);            
            rule=0;

            // NO MORE SITU, NO SOLUTION
            if (situ==situn)                
                return false;            
        }

        // TRY RULE
        Rule *r = &rules[rule];
        if (logm) printf("\nLoop situ: %d rule: %d %s\n", situ, rule, rulest[rule] );
        if(isRuleUtil      (r, &situs[situ])
        && isRuleApplicable(r, &situs[situ]))
        {                        
            // APPLY RULE RESULTS TO CURRENT SITU
            if (logm) printf("rule fits\n");
            situw.copy( &situs[situ] );
            applyRule( r, &situw );
            if (logm) printf("situ after rule %d: ", rule); situw.print();                    

            // CALCULATE COST OF ACTION, DEPENDANT ON CURRENT SITUATION AND LOCATION
            float cost = 1;
            float costSum = situs[situ].cost + cost;
            
            // FIND SIMILAR SITU
            for(int j=0; j<situn && !situw.equals( &situs[j] ); j++);                                    
            if (j==situn)
            {    
                // OUT OF MEMORY                            
                if (j==MAXSITU)                    
                    return false;

                // ADD NEW SITU
                situw.from = situ;
                situw.cost = cost;                
                situw.done = 0;
                situw.rule = rule;                
                situs[j]   = situw;                
                situn++;
                if (logm) printf("new situ\n", j);            
            } else {
                // COME FROM LESS COST
                if (logm) printf("existing situ %d -> %d costs: %f\n",situ, j, cost );
                if (cost < situs[j].cost) {
                    if (logm) printf("better cost than %f\n", situs[j].cost );
                    situs[j].from = situ;
                    situs[j].cost = cost; 
                    situs[j].rule = rule;
                    situs[j].done = 0; // To update costs
                }                    
            }

            // GOAL FOUND
            if (goal.equals(&situw)) {
                if (logm) printf("goal found\n", j);
                collectSteps(j);
                return true;                   
            }
        }
        rule++;
    }
}
 

//---------------------------------------------
// Add an op definition
//---------------------------------------------
void Planner::addOp( int code, char *name)
{
  strcpy( opDefs[code],name );
}

//---------------------------------------------
// Add an op definition
//---------------------------------------------
void Planner::addState( int code, char *name )
{
  strcpy( opDefs[code],name );  
}
 

//---------------------------------------------
// Builds a rule from string
// And adds to the rule list 
//
// st: rule coding string
// resn : number of results in rule
// predn: number of predicates in rule
//---------------------------------------------
void Planner::addRule( char *st, int resn, int predn)
{    
    strcpy( rulest[rulen], st);

    Rule *r = &rules[rulen++];    
    int len = strlen(st)/2;    
    for(int i=0; *st; i++) {
        r->ops[i][0]=*st++; 
        r->ops[i][1]=*st++; 
    }
    r->resn =resn;
    r->predn=predn;
    r->actn =len-resn-predn; 
}

//---------------------------------------------
// Builds a rule int array
// And adds to the rule list 
//
// it   : rule coding string
// resn : number of results in rule
// predn: number of predicates in rule
//---------------------------------------------
 void Planner::addRule( int *it, int resn, int predn)
{    
    strcpy( rulest[rulen], "*");

    Rule *r = &rules[rulen++];             
    for(int i=0; *it>=0; i++) {
        r->ops[i][0]=*it++;
        r->ops[i][1]=*it++;
    } 
    r->resn = resn;
    r->predn= predn;
    r->actn = i-resn-predn; 
}


void Planner::init()
{    
    for(int i=0; i<MAXOPDEF; i++) {
        opDefs[i][0]=i; opDefs[i][1]=0;
    }
    rulen=0;
    situn=1;    
    situs[0].fill(0);
    goal.fill(DONT_CARE);
}

void Planner::setup()
{
    //--------------------------State variables
 
    // Has someting
    addState( 'K',"has sword");
    addState( 'G',"has diamon");
    addState( 'P',"has money");
    addState( 'R',"has shield");
 
    // Location
    addState( 'H',"location"); //m merchant, c enemy, a diamond, h inn
    
    // Fittness    
    addState( 'F',"fitt"); // 0 1
    
    // Goal    
    addState( 'B',"enemy is killed");
    
 
    //--------------------------Actions
    
    // Trade
    addOp( 'k',"buy sword");
    addOp( 'p',"buy shield");
    addOp( 'g',"sell diamond");
 
    // Travel
    addOp( 'c',"go to enemy");    
    addOp( 'm',"go to merchant");
    addOp( 's',"find diamond");
    addOp( 'h',"go to inn");
    
    // Act
    addOp( 'f',"fight enemy");    
    addOp( 'a',"sleep");
    addOp( 't',"take diamond");
    
 
    //--------------------------Rules
 
    //Kills enemy
    addRule("B1HcR1K1F1f0",1,4);
 
    //Trade
    addRule("K1HmP1k0",1,2);
    addRule("R1HmP1p0",1,2);
    addRule("P1HmG1g0",1,2);
 
    //Travel
    addRule("Hmm0",1,0);
    addRule("Hcc0",1,0);
    addRule("Has0",1,0);     
    addRule("Hhh0",1,0);
  
    //Sleep
    addRule("F1HhP1a0",1,2);                  

    //Get diamond
    addRule("G1Hat0",1,1);
}


int test()
{
    Planner pl;
    pl.init();
    pl.setup();

    // START AND GOAL
    pl.situs[0].set('F'-'A','0');        
    pl.goal.set('B'-'A','1');
               
    pl.plan();
    return 0;
}

//-----------Lazy couple interface ---------------------------
int plannerNew()
{
    Planner *pl = new Planner;
    pl->init();
    pl->setup();
    return (int) pl;
}

void plannerDel(int id)
{
    Planner *pl = (Planner*)id;
    delete pl;
}

void plannerRule(int id, char *rule, int resn, int predn )
{
    Planner *pl = (Planner*)id;
    pl->addRule( rule, resn, predn);
    logm = 0; //!!
}   

void plannerPlan(int id, char *situ, char *goal, char *plan)
{        
    Planner *pl = (Planner*)id;
    char *r;

    // REGISTER START
    for(r = situ; *r; r+=2)
        pl->situs[0].set(r[0]-'A',r[1]);                    

    // REGISTER GOAL
    for(r = goal; *r; r+=2)
        pl->goal.set(r[0]-'A',r[1]);        

    // PLAN
    r = plan;        
    if (pl->plan() && plan) {
        // STORE STEPS
        for(int i=0; i<pl->stepn; i++) {
            *r++=pl->steps[i][0];
            *r++=pl->steps[i][1];
        }
    }        
    *r=0;        
}









/***********************************************************

         An example of inserting planner to a game:
                  Wing: Released Spirits

***********************************************************/

#ifdef EMBEDDED 

//--------- Example usage in game engine ----------------------
#include <stdlib.h>
#include "ani3d.h"
#include "types.h"
#include "game.h"
#include "actor.h"

//-------------------------------------------------------------
// Helper for merchants
//-------------------------------------------------------------
int findGood( Actor *m, int func )
{
    int i;
    for(i=0; i<ACTOR_TRADE; i++) {
        Objdef *goodDef = (Objdef*)m->goods[i];
        if (goodDef->func is func)
            return i;
    }
    return -1;
}

//-------------------------------------------------------------
//
//-------------------------------------------------------------
int Actor::PlanStep()
{
    planCnt++;
    planTck=0;
    Planner *pl = (Planner *)planner;
    if (pl->steps[planCnt]<0) {
        planner=0;    
        return 0;
    }
    return 1;
}

//-------------------------------------------------------------
// Execute the plan
//-------------------------------------------------------------
int Actor::PlanProc()
{
    if (!planner) return 0;
    Planner *pl = (Planner *)planner;

    static Actor *merchant; if (!merchant) merchant = (Actor*)scene.EntFind("traderCity");
    static Actor *boss;     if (!boss)     boss = scene.player;
    static Object*diamond;  if (!diamond)  diamond = (Object*)scene.EntFind("diamond:2");

    if (pl->steps[planCnt]<0) {
        planner=0;
        // Sign ready !
        return 0;
    }
    
    int act = pl->steps[ planCnt ][0];
    int arg = pl->steps[ planCnt ][1];
    int i;
    switch(act)         
    {
    //--------------------------------
    case 'k': //buy sword
        if (planTck==1) {
            i = findGood( merchant, ofSWORD );
            if (i>=0) 
                Buy( merchant, i, 100);
        }
        if (planTck==50)
            PlanStep();
        break;

    //--------------------------------
    case 'p': //buy shield
        if (planTck==1) {
            i = findGood( merchant, ofSHIELD );
            if (i>=0) 
                Buy( merchant, i, 100);
        }
        if (planTck==50)
            PlanStep();
        break;

    //--------------------------------
    case 'g': //sell diamond
        if (planTck==1) {
            int n = InvFind( (Ent *)diamond );
            if (n>=0)
                Sell( merchant, n, 500);
        }
        if (planTck==50)
            PlanStep();        
        break;

    //--------------------------------
    case 'c': //go to enemy
        if (planTck==1)
            Act( evGOTO, boss );
        if (Dist2(boss->wmtx.t)<900)
            PlanStep();
        break;

    //--------------------------------
    case 'm': //go to merchant
        if (planTck==1)
            Act( evGOTO, merchant );
        if (Dist2(merchant->wmtx.t)<900)
            PlanStep();
        break;

    //--------------------------------
    case 's': //find diamond
        if (planTck==1)
            Act( evGOTO, diamond );
        if (Dist2(diamond->wmtx.t)<25)
            PlanStep();
        break;

    //--------------------------------
    case 'f': //fight enemy
        if (planTck==1)
            Act( evCHASE, boss );
        if (boss->health<=0)
            PlanStep();
        break;

    //--------------------------------
    case 'a': //sleep
        if (planTck==1)
            Act( evSLEEP, boss );
        if (planTck==200)
            PlanStep();
        break;

    //--------------------------------
    case 't': //take diamond
        if (planTck==1)
            SetState( evTAKE, diamond);
        if (planTck==200)
            PlanStep();
        break;
    }

    planCnt++;
    return 1;
}
#endif