Planners

POMDP planners compute optimal actions given belief states. POMDPPlanners provides state-of-the-art algorithms from Monte Carlo Tree Search to sparse sampling approaches.

Planning Algorithm Categories

Monte Carlo Tree Search (MCTS)

Sparse Sampling

Open Loop Planning

Planner Interface

All planners inherit from the base Policy class:

class POMDPPlanners.core.policy.Policy(environment, discount_factor, name, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: ABC

Abstract base class for POMDP policies.

This class defines the interface for POMDP policies that select actions based on belief states. All concrete policy implementations must inherit from this class and implement the action selection and space information methods.

Note

This is an abstract base class and cannot be instantiated directly. Subclasses must implement the action() and get_space_info() methods.

Parameters:
environment

The POMDP environment this policy operates in

discount_factor

Discount factor for future rewards

name

Unique identifier for the policy

log_path

Optional directory for logging output

debug

Flag to enable debug logging

abstractmethod action(belief)[source]

Select action(s) based on the current belief state.

This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.

Parameters:

belief (Belief) – Current belief state representing uncertainty over states

Returns:

  • List of selected actions (typically single action, but supports multiple)

  • PolicyRunData with execution metrics and performance information

Return type:

Tuple[List[Any], PolicyRunData]

Note

Subclasses must implement this method with their specific planning or decision-making algorithm.

property config_id: str

Generate a deterministic identifier based on policy configuration.

abstractmethod classmethod get_info_variable_names()[source]

Get names of policy info variables that this policy produces.

This class method returns the names of metrics and performance data that the policy tracks during execution via PolicyInfoVariable objects. It enables users to discover what metrics are available for hyperparameter optimization before running simulations.

Return type:

List[str]

Returns:

List of info variable names that this policy produces during action selection

Note

Subclasses must implement this method to declare what metrics they track. Use an Enum to ensure consistency between the names returned here and the names used when creating PolicyInfoVariable objects in the action() method.

abstractmethod classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:

PolicySpaceInfo

Returns:

PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

classmethod load(filepath)[source]

Load policy configuration from JSON file.

Reconstructs policy instance from saved constructor parameters. Creates both the environment and policy from the saved configuration.

Parameters:

filepath (Union[str, Path]) – Path to the saved policy configuration file

Return type:

Policy

Returns:

Reconstructed policy instance

Raises:

Example

>>> import tempfile
>>> from pathlib import Path
>>> from POMDPPlanners.planners import POMCP
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> # Create and save a policy
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(environment=env, discount_factor=0.95, depth=10, exploration_constant=1.0, name="test", n_simulations=100)
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     filepath = Path(tmpdir) / "test_policy.json"
...     _ = planner.save(filepath)
...     # Load the policy back
...     loaded_planner = POMCP.load(filepath)
...     print(loaded_planner.depth)
10
property logger: Logger

Get logger instance for this policy.

The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.

Returns:

Configured logger instance with hierarchical naming

save(filepath=None)[source]

Save policy configuration to JSON file.

Saves only constructor parameters needed to reconstruct the policy, not the full internal state. This enables human-readable policy configurations that can be versioned, inspected, and modified.

Parameters:

filepath (Union[str, Path, None]) – Path where to save the policy configuration. If None, uses default location: saved_policies/{env_name}/{policy_class}/{policy_name}_{timestamp}.json

Return type:

Path

Returns:

Path where policy was saved

Raises:
  • ValueError – If policy parameters cannot be serialized

  • IOError – If file cannot be written

Example

>>> from POMDPPlanners.environments import TigerPOMDP
>>> from POMDPPlanners.planners import POMCP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(environment=env, discount_factor=0.95,
...                 depth=10, exploration_constant=1.0,
...                 name="test", n_simulations=100)
>>> # Save with default path
>>> filepath = planner.save()
>>> # Or save to custom path
>>> filepath = planner.save("my_policy.json")

Algorithm Details

POMCP (Partially Observable Monte Carlo Planning)
  • Uses Upper Confidence Bounds (UCB) for action selection

  • Builds belief trees through simulation

  • Handles continuous observation spaces with particle filters

  • Excellent for problems with large observation spaces

PFT-DPW (Particle Filter Trees with Double Progressive Widening)
  • Extends POMCP with progressive widening

  • Gradually expands action and observation nodes

  • Better for continuous action spaces

  • Balances exploration and exploitation

Sparse PFT
  • Sparse sampling within particle filter trees

  • Efficient for large state/action spaces

  • Reduced computational requirements

  • Good performance with limited simulations

Sparse Sampling
  • Classical forward-search algorithm

  • Builds sparse lookahead trees

  • Provable performance guarantees

  • Simple and effective baseline

Choosing the Right Planner

For Discrete Problems (Tiger, Sanity):
  • POMCP: Excellent default choice

  • Sparse Sampling: Simple baseline

  • PFT-DPW: When you need progressive widening

For Continuous Problems (CartPole, Light-Dark):
  • PFT-DPW: Handles continuous actions well

  • POMCP: Good for continuous observations

  • Sparse PFT: When computational budget is limited

For Large-Scale Problems:
  • Sparse PFT: Efficient scaling

  • POMCP with limited simulations: Balance speed/quality

Basic Usage Example

from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
from POMDPPlanners.core.belief import get_initial_belief

# Create environment and planner
env = TigerPOMDP()
planner = POMCP(
    environment=env,
    num_simulations=1000,
    exploration_constant=50.0,
    depth=30
)

# Get initial belief and plan
belief = get_initial_belief(env, n_particles=1000)
action, run_data = planner.action(belief)

print(f"Recommended action: {action}")
print(f"Planning time: {run_data.info_variables['planning_time']:.3f}s")

Configuration Parameters

Common Parameters:
  • num_simulations: Number of MCTS simulations

  • depth: Maximum planning horizon

  • discount_factor: Future reward discount

  • exploration_constant: UCB exploration parameter

POMCP-Specific:
  • threshold: Particle reinvigoration threshold

  • particle_filter_threshold: Belief update threshold

PFT-DPW-Specific:
  • k_action, alpha_action: Action progressive widening

  • k_observation, alpha_observation: Observation progressive widening

Creating Custom Planners

To implement a custom planner, inherit from the Policy base class:

from POMDPPlanners.core.policy import Policy
from POMDPPlanners.core.simulation import SimulationRunData

class MyCustomPlanner(Policy):
    def __init__(self, environment, **kwargs):
        super().__init__(environment, **kwargs)
        # Initialize your planner

    def action(self, belief_state):
        # Implement your planning algorithm
        # Return (action, run_data)
        action = self.select_action(belief_state)
        run_data = SimulationRunData()
        return action, run_data

    def select_action(self, belief_state):
        # Your action selection logic
        pass

See Also

  • ../examples/planners - Planner usage examples

  • Belief States - Belief state representations

  • ../api/planners - Complete API reference