Planners

POMDP planners compute optimal actions given belief states. POMDPPlanners provides state-of-the-art algorithms from Monte Carlo Tree Search to sparse sampling approaches.

Planning Algorithm Categories

Monte Carlo Tree Search (MCTS)

Sparse Sampling

Open Loop Planning

Planner Interface

All planners inherit from the base Policy class:

class POMDPPlanners.core.policy.Policy(environment, discount_factor, name, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: ABC

Abstract base class for POMDP policies.

This class defines the interface for POMDP policies that select actions based on belief states. All concrete policy implementations must inherit from this class and implement the action selection and space information methods.

Note

This is an abstract base class and cannot be instantiated directly. Subclasses must implement the action() and get_space_info() methods.

Parameters:

environment (Environment)
discount_factor (float)
name (str)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The POMDP environment this policy operates in

discount_factor: Discount factor for future rewards

name: Unique identifier for the policy

log_path: Optional directory for logging output

debug: Flag to enable debug logging

abstractmethod action(belief)[source]

Select action(s) based on the current belief state.

This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.

Parameters:

belief (Belief) – Current belief state representing uncertainty over states

Returns:

List of selected actions (typically single action, but supports multiple)
PolicyRunData with execution metrics and performance information

Return type:

Tuple[List[Any], PolicyRunData]

Note

Subclasses must implement this method with their specific planning or decision-making algorithm.

property config_id: str: Generate a deterministic identifier based on policy configuration.

abstractmethod classmethod get_info_variable_names()[source]

Get names of policy info variables that this policy produces.

This class method returns the names of metrics and performance data that the policy tracks during execution via PolicyInfoVariable objects. It enables users to discover what metrics are available for hyperparameter optimization before running simulations.

Return type:: List[str]
Returns:: List of info variable names that this policy produces during action selection

Note

Subclasses must implement this method to declare what metrics they track. Use an Enum to ensure consistency between the names returned here and the names used when creating PolicyInfoVariable objects in the action() method.

abstractmethod classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

classmethod load(filepath)[source]

Load policy configuration from JSON file.

Reconstructs policy instance from saved constructor parameters. Creates both the environment and policy from the saved configuration.

Parameters:

filepath (Union[str, Path]) – Path to the saved policy configuration file

Return type:

Policy

Returns:

Reconstructed policy instance

Raises:

FileNotFoundError – If filepath does not exist
ValueError – If JSON format is invalid or unsupported
ImportError – If policy/environment classes cannot be imported

Example

>>> import tempfile
>>> from pathlib import Path
>>> from POMDPPlanners.planners import POMCP
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> # Create and save a policy
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(environment=env, discount_factor=0.95, depth=10, exploration_constant=1.0, name="test", n_simulations=100)
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     filepath = Path(tmpdir) / "test_policy.json"
...     _ = planner.save(filepath)
...     # Load the policy back
...     loaded_planner = POMCP.load(filepath)
...     print(loaded_planner.depth)
10

property logger: Logger

Get logger instance for this policy.

The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.

Returns:: Configured logger instance with hierarchical naming

save(filepath=None)[source]

Save policy configuration to JSON file.

Saves only constructor parameters needed to reconstruct the policy, not the full internal state. This enables human-readable policy configurations that can be versioned, inspected, and modified.

Parameters:

filepath (Union[str, Path, None]) – Path where to save the policy configuration. If None, uses default location: saved_policies/{env_name}/{policy_class}/{policy_name}_{timestamp}.json

Return type:

Path

Returns:

Path where policy was saved

Raises:

ValueError – If policy parameters cannot be serialized
IOError – If file cannot be written

Example

>>> from POMDPPlanners.environments import TigerPOMDP
>>> from POMDPPlanners.planners import POMCP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(environment=env, discount_factor=0.95,
...                 depth=10, exploration_constant=1.0,
...                 name="test", n_simulations=100)
>>> # Save with default path
>>> filepath = planner.save()
>>> # Or save to custom path
>>> filepath = planner.save("my_policy.json")

Algorithm Details

POMCP (Partially Observable Monte Carlo Planning)

Uses Upper Confidence Bounds (UCB) for action selection
Builds belief trees through simulation
Handles continuous observation spaces with particle filters
Excellent for problems with large observation spaces

PFT-DPW (Particle Filter Trees with Double Progressive Widening)

Extends POMCP with progressive widening
Gradually expands action and observation nodes
Better for continuous action spaces
Balances exploration and exploitation

Sparse PFT

Sparse sampling within particle filter trees
Efficient for large state/action spaces
Reduced computational requirements
Good performance with limited simulations

Sparse Sampling

Classical forward-search algorithm
Builds sparse lookahead trees
Provable performance guarantees
Simple and effective baseline

Choosing the Right Planner

For Discrete Problems (Tiger, Sanity):

POMCP: Excellent default choice
Sparse Sampling: Simple baseline
PFT-DPW: When you need progressive widening

For Continuous Problems (CartPole, Light-Dark):

PFT-DPW: Handles continuous actions well
POMCP: Good for continuous observations
Sparse PFT: When computational budget is limited

For Large-Scale Problems:

Sparse PFT: Efficient scaling
POMCP with limited simulations: Balance speed/quality

Basic Usage Example

from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
from POMDPPlanners.core.belief import get_initial_belief

# Create environment and planner
env = TigerPOMDP()
planner = POMCP(
    environment=env,
    num_simulations=1000,
    exploration_constant=50.0,
    depth=30
)

# Get initial belief and plan
belief = get_initial_belief(env, n_particles=1000)
action, run_data = planner.action(belief)

print(f"Recommended action: {action}")
print(f"Planning time: {run_data.info_variables['planning_time']:.3f}s")

Configuration Parameters

Common Parameters:

num_simulations: Number of MCTS simulations
depth: Maximum planning horizon
discount_factor: Future reward discount
exploration_constant: UCB exploration parameter

POMCP-Specific:

threshold: Particle reinvigoration threshold
particle_filter_threshold: Belief update threshold

PFT-DPW-Specific:

k_action, alpha_action: Action progressive widening
k_observation, alpha_observation: Observation progressive widening

Creating Custom Planners

To implement a custom planner, inherit from the Policy base class:

from POMDPPlanners.core.policy import Policy
from POMDPPlanners.core.simulation import SimulationRunData

class MyCustomPlanner(Policy):
    def __init__(self, environment, **kwargs):
        super().__init__(environment, **kwargs)
        # Initialize your planner

    def action(self, belief_state):
        # Implement your planning algorithm
        # Return (action, run_data)
        action = self.select_action(belief_state)
        run_data = SimulationRunData()
        return action, run_data

    def select_action(self, belief_state):
        # Your action selection logic
        pass

Planners

Planning Algorithm Categories

Planner Interface

Algorithm Details

Choosing the Right Planner

Basic Usage Example

Configuration Parameters

Creating Custom Planners

See Also