Planners ======== POMDP planners compute optimal actions given belief states. POMDPPlanners provides state-of-the-art algorithms from Monte Carlo Tree Search to sparse sampling approaches. Planning Algorithm Categories ---------------------------- **Monte Carlo Tree Search (MCTS)** .. autosummary:: :toctree: ../api/ POMDPPlanners.planners.mcts_planners.pomcp.POMCP POMDPPlanners.planners.mcts_planners.pft_dpw.PFT_DPW POMDPPlanners.planners.mcts_planners.sparse_pft.SparsePFT **Sparse Sampling** .. autosummary:: :toctree: ../api/ POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling.SparseSamplingDiscreteActionsPlanner **Open Loop Planning** .. autosummary:: :toctree: ../api/ POMDPPlanners.planners.open_loop_planners.discrete_action_sequences_planner.DiscreteActionSequencesPlanner Planner Interface ---------------- All planners inherit from the base Policy class: .. autoclass:: POMDPPlanners.core.policy.Policy :members: :undoc-members: :show-inheritance: Algorithm Details ---------------- **POMCP (Partially Observable Monte Carlo Planning)** - Uses Upper Confidence Bounds (UCB) for action selection - Builds belief trees through simulation - Handles continuous observation spaces with particle filters - Excellent for problems with large observation spaces **PFT-DPW (Particle Filter Trees with Double Progressive Widening)** - Extends POMCP with progressive widening - Gradually expands action and observation nodes - Better for continuous action spaces - Balances exploration and exploitation **Sparse PFT** - Sparse sampling within particle filter trees - Efficient for large state/action spaces - Reduced computational requirements - Good performance with limited simulations **Sparse Sampling** - Classical forward-search algorithm - Builds sparse lookahead trees - Provable performance guarantees - Simple and effective baseline Choosing the Right Planner -------------------------- **For Discrete Problems (Tiger, Sanity):** - POMCP: Excellent default choice - Sparse Sampling: Simple baseline - PFT-DPW: When you need progressive widening **For Continuous Problems (CartPole, Light-Dark):** - PFT-DPW: Handles continuous actions well - POMCP: Good for continuous observations - Sparse PFT: When computational budget is limited **For Large-Scale Problems:** - Sparse PFT: Efficient scaling - POMCP with limited simulations: Balance speed/quality Basic Usage Example ------------------ .. code-block:: python from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP from POMDPPlanners.planners.mcts_planners.pomcp import POMCP from POMDPPlanners.core.belief import get_initial_belief # Create environment and planner env = TigerPOMDP() planner = POMCP( environment=env, num_simulations=1000, exploration_constant=50.0, depth=30 ) # Get initial belief and plan belief = get_initial_belief(env, n_particles=1000) action, run_data = planner.action(belief) print(f"Recommended action: {action}") print(f"Planning time: {run_data.info_variables['planning_time']:.3f}s") Configuration Parameters ----------------------- **Common Parameters:** - ``num_simulations``: Number of MCTS simulations - ``depth``: Maximum planning horizon - ``discount_factor``: Future reward discount - ``exploration_constant``: UCB exploration parameter **POMCP-Specific:** - ``threshold``: Particle reinvigoration threshold - ``particle_filter_threshold``: Belief update threshold **PFT-DPW-Specific:** - ``k_action``, ``alpha_action``: Action progressive widening - ``k_observation``, ``alpha_observation``: Observation progressive widening Creating Custom Planners ------------------------ To implement a custom planner, inherit from the Policy base class: .. code-block:: python from POMDPPlanners.core.policy import Policy from POMDPPlanners.core.simulation import SimulationRunData class MyCustomPlanner(Policy): def __init__(self, environment, **kwargs): super().__init__(environment, **kwargs) # Initialize your planner def action(self, belief_state): # Implement your planning algorithm # Return (action, run_data) action = self.select_action(belief_state) run_data = SimulationRunData() return action, run_data def select_action(self, belief_state): # Your action selection logic pass See Also -------- - :doc:`../examples/planners` - Planner usage examples - :doc:`beliefs` - Belief state representations - :doc:`../api/planners` - Complete API reference