Planners
POMDP planners compute optimal actions given belief states. POMDPPlanners provides state-of-the-art algorithms from Monte Carlo Tree Search to sparse sampling approaches.
Planning Algorithm Categories
Monte Carlo Tree Search (MCTS)
Sparse Sampling
Open Loop Planning
Planner Interface
All planners inherit from the base Policy class:
- class POMDPPlanners.core.policy.Policy(environment, discount_factor, name, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
ABCAbstract base class for POMDP policies.
This class defines the interface for POMDP policies that select actions based on belief states. All concrete policy implementations must inherit from this class and implement the action selection and space information methods.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the action() and get_space_info() methods.
- Parameters:
- environment
The POMDP environment this policy operates in
- discount_factor
Discount factor for future rewards
- name
Unique identifier for the policy
- log_path
Optional directory for logging output
- debug
Flag to enable debug logging
- abstractmethod action(belief)[source]
Select action(s) based on the current belief state.
This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.
- Parameters:
belief (
Belief) – Current belief state representing uncertainty over states- Returns:
List of selected actions (typically single action, but supports multiple)
PolicyRunData with execution metrics and performance information
- Return type:
Note
Subclasses must implement this method with their specific planning or decision-making algorithm.
- abstractmethod classmethod get_info_variable_names()[source]
Get names of policy info variables that this policy produces.
This class method returns the names of metrics and performance data that the policy tracks during execution via PolicyInfoVariable objects. It enables users to discover what metrics are available for hyperparameter optimization before running simulations.
- Return type:
- Returns:
List of info variable names that this policy produces during action selection
Note
Subclasses must implement this method to declare what metrics they track. Use an Enum to ensure consistency between the names returned here and the names used when creating PolicyInfoVariable objects in the action() method.
- abstractmethod classmethod get_space_info()[source]
Get space type requirements for this policy class.
This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.
- Return type:
- Returns:
PolicySpaceInfo specifying required action and observation space types
Note
Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.
- classmethod load(filepath)[source]
Load policy configuration from JSON file.
Reconstructs policy instance from saved constructor parameters. Creates both the environment and policy from the saved configuration.
- Parameters:
filepath (
Union[str,Path]) – Path to the saved policy configuration file- Return type:
- Returns:
Reconstructed policy instance
- Raises:
FileNotFoundError – If filepath does not exist
ValueError – If JSON format is invalid or unsupported
ImportError – If policy/environment classes cannot be imported
Example
>>> import tempfile >>> from pathlib import Path >>> from POMDPPlanners.planners import POMCP >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> # Create and save a policy >>> env = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP(environment=env, discount_factor=0.95, depth=10, exploration_constant=1.0, name="test", n_simulations=100) >>> with tempfile.TemporaryDirectory() as tmpdir: ... filepath = Path(tmpdir) / "test_policy.json" ... _ = planner.save(filepath) ... # Load the policy back ... loaded_planner = POMCP.load(filepath) ... print(loaded_planner.depth) 10
- property logger: Logger
Get logger instance for this policy.
The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.
- Returns:
Configured logger instance with hierarchical naming
- save(filepath=None)[source]
Save policy configuration to JSON file.
Saves only constructor parameters needed to reconstruct the policy, not the full internal state. This enables human-readable policy configurations that can be versioned, inspected, and modified.
- Parameters:
filepath (
Union[str,Path,None]) – Path where to save the policy configuration. If None, uses default location: saved_policies/{env_name}/{policy_class}/{policy_name}_{timestamp}.json- Return type:
- Returns:
Path where policy was saved
- Raises:
ValueError – If policy parameters cannot be serialized
IOError – If file cannot be written
Example
>>> from POMDPPlanners.environments import TigerPOMDP >>> from POMDPPlanners.planners import POMCP >>> env = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP(environment=env, discount_factor=0.95, ... depth=10, exploration_constant=1.0, ... name="test", n_simulations=100) >>> # Save with default path >>> filepath = planner.save() >>> # Or save to custom path >>> filepath = planner.save("my_policy.json")
Algorithm Details
- POMCP (Partially Observable Monte Carlo Planning)
Uses Upper Confidence Bounds (UCB) for action selection
Builds belief trees through simulation
Handles continuous observation spaces with particle filters
Excellent for problems with large observation spaces
- PFT-DPW (Particle Filter Trees with Double Progressive Widening)
Extends POMCP with progressive widening
Gradually expands action and observation nodes
Better for continuous action spaces
Balances exploration and exploitation
- Sparse PFT
Sparse sampling within particle filter trees
Efficient for large state/action spaces
Reduced computational requirements
Good performance with limited simulations
- Sparse Sampling
Classical forward-search algorithm
Builds sparse lookahead trees
Provable performance guarantees
Simple and effective baseline
Choosing the Right Planner
- For Discrete Problems (Tiger, Sanity):
POMCP: Excellent default choice
Sparse Sampling: Simple baseline
PFT-DPW: When you need progressive widening
- For Continuous Problems (CartPole, Light-Dark):
PFT-DPW: Handles continuous actions well
POMCP: Good for continuous observations
Sparse PFT: When computational budget is limited
- For Large-Scale Problems:
Sparse PFT: Efficient scaling
POMCP with limited simulations: Balance speed/quality
Basic Usage Example
from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
from POMDPPlanners.core.belief import get_initial_belief
# Create environment and planner
env = TigerPOMDP()
planner = POMCP(
environment=env,
num_simulations=1000,
exploration_constant=50.0,
depth=30
)
# Get initial belief and plan
belief = get_initial_belief(env, n_particles=1000)
action, run_data = planner.action(belief)
print(f"Recommended action: {action}")
print(f"Planning time: {run_data.info_variables['planning_time']:.3f}s")
Configuration Parameters
- Common Parameters:
num_simulations: Number of MCTS simulationsdepth: Maximum planning horizondiscount_factor: Future reward discountexploration_constant: UCB exploration parameter
- POMCP-Specific:
threshold: Particle reinvigoration thresholdparticle_filter_threshold: Belief update threshold
- PFT-DPW-Specific:
k_action,alpha_action: Action progressive wideningk_observation,alpha_observation: Observation progressive widening
Creating Custom Planners
To implement a custom planner, inherit from the Policy base class:
from POMDPPlanners.core.policy import Policy
from POMDPPlanners.core.simulation import SimulationRunData
class MyCustomPlanner(Policy):
def __init__(self, environment, **kwargs):
super().__init__(environment, **kwargs)
# Initialize your planner
def action(self, belief_state):
# Implement your planning algorithm
# Return (action, run_data)
action = self.select_action(belief_state)
run_data = SimulationRunData()
return action, run_data
def select_action(self, belief_state):
# Your action selection logic
pass
See Also
../examples/planners - Planner usage examples
Belief States - Belief state representations
../api/planners - Complete API reference