Tuesday, February 24, 2009

Implementation Diary - Feb 25, 2009


I have been struggling with making Crackpot work with Logistics Domain. One recent observation here: This is the plot of number of costs in one run of the planner in Logistics Domain. As we can see there is a jump in number of costs suddenly from 10 to 200 within iterations from 175 to 300. The number of costs however always stays within 300, in 2000 iterations.

What is the cause of this sudden jump in number of costs? After investigation I found several costs fixed at that time were goal costs. For goals costs, we have only one heuristic that adds action into plan: addAction. There was one goal "Package 6 should be in depot 1b" and there was one action that pushed Package 6 to depot 1a near 278 time point. It seems actions were added one after another, but all not after 278, but before 278, which did not do anything to improve the cost of goal.

Monday, February 23, 2009

Thoughts on an Adaptive Local Search for Planning


Non-stationary Decision Making


The basic question discussed in this article is
Out of a set of possible choices (of heuristics, for example), which choice should be made?
Markov Decision Processes (MDPs) are an often used mechanism for this problem. The model for MDP requires a set of states {s}, and a set of actions {a} which take from one state to another with certain probability P(s_(t+1)|s_(t), a). An immediate reward R_a(s_(t+1), s_t) is received on this transition. The solution of MDP is a policy to take some action a in each state s.

Our problem of making choice is complicated by the following factors:

1. Uncertainty in the real world introduces non-determinism in MDP. This is commonly studied under Partially Observable MDP (POMDP) - the idea is that the underlying states are not observable to the agent, but the agent must infer the underlying state from the observations. Only approximate solutions of POMDP may be determined.

2. There are efficient solutions for solving POMDPs. [2] presents a point-based anytime algorithm for POMDP, for example. However, for really quick responses such as is needed for selecting small actions within local search, they are still too slow. Solving POMDPs would be good for long term learning. For short-term, non-stationary learning low-cost approaches such as in [1] are suitable.

3. MDP does not incorporate change in state structure with time. For example, in local search, usually many more local minima are found as the search progresses, compared to near the start of search. So the policy must evolve as time goes on. One example of such is squeaky wheel optimization in local search.

However, the simple approach for non-stationary learning described in [1] can be augmented with a model to make it more efficient. The idea is that probably more time can be allocated to making a larger decision.

For this purpose, each heuristic contains the following:
- goodness of the heuristic (goodness = the difficulty level that can be improved by application of the heuristic)
- confidence of goodness of the heuristic
- time needed to execute the heuristic
- resources needed to execute the heuristic

Incorporating Short-Term Goal Aspects

We will have two systems - one that handles long term aspects (like difficulty of a goal, priority of a goal), and another that handles short-term aspects (like remaining time, available resources).

Every choice considers both short-term and long-term aspects of the choice.

Making a Choice

Decisions taken when there is sufficient time available are usually different from when little time is available before execution must start. For example, exploration of other/untried options will take a back seat when time available is little.

One question is whether, if confidence in goodness is low (which means there is a huge standard deviation of goodness), should be choice maker assume an optimistic stance or a pessimistic stance? This may be a research question.

General Observation

In general, we note that more sophisticated and resource intensive approaches are more applicable when a large decision is being made, or when the granularity of decision is high, that is, all choices will take a long time and resources to implement. Therefore, effective profiling of time and resource usage is important.

[1] Alexander Nareyek. Choosing heuristics by non-stationary reinforcement learning.
[2] Pineau, Gordon, Thrun. Point-based value iteration: An anytime algorithm for POMDPs.

Saturday, February 14, 2009

Implementation Diary - 14 Feb 2009 - 2

1. There was a bug introduced recently in PlannerImpl code due to which the heuristics were not selected based on their currentPreference. This led to non-convergence of plan in Apple Domain. Fixed.

2. Also, I wrote a small python script to plot the evolution of costs with iterations. For Apple domain there wasn't much interesting pattern of evolution of costs. The red one is total costs, and blue one is number of costs:




In Logistics domain the graphs were interesting, but in the wrong sense. I hope soon the graphs will be upside down of what they are now.

Friday, February 13, 2009

Consistency of Input: Domain Definition and Initial and Goal States

Consistency of Domain Definition

Rules of thumb:
1. Every action type should use an actuator

Consistency of Initial State
1. Every attribute and actuator should have a present initial state

Implementation Diary - 14 Feb 2009

One problem that often occurs in planning in Crackpot is two same actions (needing same preconditions) occurring concurrently. For example, two LoadPackageIntoTruckAtDepot(package_3, into: Truck1) actions occurring at time 6 and 12.

At 6: of type: LoadPackageIntoTruckAtDepot
: Duration: 10
ConditionInstances: [
@[6,7): Package_3 of type: package.packageLocation == Truck_1 of type: truck.truckInDepot
]
ContributionInstances: [
@[16,2147483647): Package_3 of type: package.packageLocation <- Truck_1 of type: truck (type OBJECT_REFERENCE)
]
ActionTaskInstances: [
]

At 12: of type: LoadPackageIntoTruckAtDepot
: Duration: 10
ConditionInstances: [
@[12,13): Package_3 of type: package.packageLocation == Truck_1 of type: truck.truckInDepot
]
ContributionInstances: [
@[22,2147483647): Package_3 of type: package.packageLocation <- Truck_1 of type: truck (type OBJECT_REFERENCE)
]
ActionTaskInstances: [
]

The way to eliminate this error is to introduce an Actuator "TruckBusy" within each truck that gets used when it is being loaded. The same Actuator is used when the truck is running and unloaded. This is one thing I did today.

Note that here we introduced the Actuator as an afterthought of observing a plan. This happened because Actuators are somewhat unobvious in actions.

It usually does not make sense that the same action (instantiated with the same parameters) are carried out again concurrently. For example, "opening the same door twice simultaneously" is non-sense. All such actions should contain Actuators.

Tuesday, February 10, 2009

Implementation Diary - 10 Feb 2009 - Inheritance in Crackpot

One of the requirements in Logistics Domain is inheritance of object types. For example, the airports are also depots (trucks travel from depot to depot). Therefore airport is a derived class of depot.

Updated on 15 April 2009:

Inheritance (Is-a) and Containment (Has-a) relations between ObjectType s are implemented in Crackpot.

Inheritance

An example of inheritance in Logistics Domain is: Trucks travel from depot to depot. The airports are also depots as trucks travel to airports too. Therefore airport is a derived class of depot.

Changes made for implementation of inheritance:

1. Added a class called ObjectTypeRelationsContainer for containing the "directly" derived types. This class contains a 2-column table with each row containing a base and derived class. This class has a wrapper interface to find all directly derived object types called getDirectlyDerivedObjectTypeRange and a class to find all derived types of a type called getAllDerivedObjectTypes.

2. Added into interface of DomainPackage: functions to find all "directly" derived types of a type called getDirectlyDerivedObjectTypeRange, and inherited attributes of a type called getInheritedAttributeTypesOfObjectType (another function getAttributeTypesOfObjectType returns attributes but not inherited attributes). After the domain definition is read, the DomainPackage::postInitialize function is called. Inside this function, all attributes of base classes are inserted as attributes of derived classes too. In DomainPackageUtils class: added function getGroundObjectInstancesOfAllDerivedObjectTypes which returns all ground object instances of derived object types of a base type.

During planning, when an object parameter is instantiated, the planner uses getGroundObjectInstancesOfAllDerivedObjectTypes as the starting point to find an appropriate object instance.

Inheritance from multiple ObjectTypes are allowed (but not tested till now). No error checks like same attribute names in base classes etc is implemented.

Containment

Containment is implemented through object references. The value of an attribute which refers to another object in the domain is called an ObjectReferenceAttributeValueInstance.


Idea: Unlike our other interfaces, getAllDerivedObjectTypes returns a set, and not a range of iterators. A range class should be defined so that it also contains a pointer to its container. Then the range can be returned, and no need to returning the set explicitly.

Observations: the DomainPackage class is growing bigger everyday. This is not scalable. It needs to be broken down. It should delegate tasks to smaller classes - for objects and attributes, another for actions, and so on.

Tuesday, February 3, 2009

Distance between Attribute Values

We use some kind of heuristic distance computation between different values of an attribute. One often used (and also used by us) heuristic is to define distance between two values A and B to be the number of actions it takes to go from A to B, ignoring any other preconditions and contributions of the actions.

Distance between Object References

In some cases the distance between object references are important. This is considered in detail in the post http://crackpotplanner.blogspot.com/2009/03/implementation-diary-march-5-2009.html

Modelling Uncertainty in Heuristic Distance Computation of AttributeInstances

We can extend this model of heuristic distance to include uncertainty in the distance. If there is a series of actions that really take from A to B, and has no other preconditions and contributions, then there is no uncertainty for this path from A to B. If there is a series of actions from A to B having many other preconditions, then the uncertainty would be large.

The original context that lead to this (small) idea was justifying a statement in Alex's paper "Local Search Heuristics for Generative Planning" - an action is inserted in proportion to 1/d^3, where d is the distance between the desired value and the resultant value after inserting the action. Why 1/d^3?

In case there is no uncertainty, the better action should be greedily selected (with probability 1). Suppose there are two actions A1 and A2 with distance d1 and d2 and uncertainty of distances u1 and u2, and u1 <> u2. This is a tricky case.

If there is no action that can convert from one value to another, then the distance would be infinite.