POMDPPlanners.planners.mcts_planners package
Subpackages
- POMDPPlanners.planners.mcts_planners.beta_zero package
AbstractBetaZeroNetworkBeliefRepresentationBetaZeroBetaZero.networkBetaZero.belief_representationBetaZero.z_qBetaZero.z_nBetaZero.temperatureBetaZero.action()BetaZero.begin_collecting()BetaZero.buffer_size()BetaZero.collect_episodes_batched()BetaZero.end_collecting()BetaZero.finalize_episode()BetaZero.get_metric_keys()BetaZero.get_network()BetaZero.get_space_info()BetaZero.load_normalization_stats()BetaZero.prepare_episode()BetaZero.save()BetaZero.train_step()
BetaZeroActionSamplerBetaZeroNetworkParticleMeanStdRepresentationTrainingBufferTrainingExample- Submodules
- POMDPPlanners.planners.mcts_planners.beta_zero.belief_representation module
- POMDPPlanners.planners.mcts_planners.beta_zero.beta_zero module
BetaZeroBetaZero.networkBetaZero.belief_representationBetaZero.z_qBetaZero.z_nBetaZero.temperatureBetaZero.action()BetaZero.begin_collecting()BetaZero.buffer_size()BetaZero.collect_episodes_batched()BetaZero.end_collecting()BetaZero.finalize_episode()BetaZero.get_metric_keys()BetaZero.get_network()BetaZero.get_space_info()BetaZero.load_normalization_stats()BetaZero.prepare_episode()BetaZero.save()BetaZero.train_step()
- POMDPPlanners.planners.mcts_planners.beta_zero.beta_zero_action_sampler module
- POMDPPlanners.planners.mcts_planners.beta_zero.beta_zero_network module
- POMDPPlanners.planners.mcts_planners.beta_zero.puct module
- POMDPPlanners.planners.mcts_planners.beta_zero.training module
- POMDPPlanners.planners.mcts_planners.beta_zero.training_buffer module
- POMDPPlanners.planners.mcts_planners.constrained_zero package
ConstrainedTrainingBufferConstrainedTrainingExampleConstrainedTrainingExample.belief_featuresConstrainedTrainingExample.policy_targetConstrainedTrainingExample.value_targetConstrainedTrainingExample.failure_targetConstrainedTrainingExample.belief_featuresConstrainedTrainingExample.failure_targetConstrainedTrainingExample.policy_targetConstrainedTrainingExample.value_target
ConstrainedZeroConstrainedZeroNetwork- Submodules
- POMDPPlanners.planners.mcts_planners.constrained_zero.constrained_puct module
- POMDPPlanners.planners.mcts_planners.constrained_zero.constrained_training module
- POMDPPlanners.planners.mcts_planners.constrained_zero.constrained_training_buffer module
ConstrainedTrainingBufferConstrainedTrainingExampleConstrainedTrainingExample.belief_featuresConstrainedTrainingExample.policy_targetConstrainedTrainingExample.value_targetConstrainedTrainingExample.failure_targetConstrainedTrainingExample.belief_featuresConstrainedTrainingExample.failure_targetConstrainedTrainingExample.policy_targetConstrainedTrainingExample.value_target
- POMDPPlanners.planners.mcts_planners.constrained_zero.constrained_zero module
- POMDPPlanners.planners.mcts_planners.constrained_zero.constrained_zero_network module
Submodules
POMDPPlanners.planners.mcts_planners.icvar_pft_dpw module
ICVaR PFT-DPW (Iterated CVaR Particle Filter Tree with Double Progressive Widening) Algorithm.
This module implements a risk-sensitive variant of PFT-DPW that uses the Iterated Conditional Value at Risk (ICVaR) for value backups instead of the expected value. This makes the planner focus on the worst-alpha fraction of outcomes, enabling risk-averse planning in POMDPs.
- Reference:
Pariente, Y., & Indelman, V. (2026). Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function. arXiv preprint arXiv:2601.20554. https://arxiv.org/abs/2601.20554
- Classes:
ICVaR_PFT_DPW: Risk-sensitive PFT-DPW planner with CVaR-based value updates
- class POMDPPlanners.planners.mcts_planners.icvar_pft_dpw.ICVaR_PFT_DPW(environment, name, depth, action_sampler, discount_factor=0.95, time_out_in_seconds=None, n_simulations=None, alpha=0.1, delta=0.1, belief_child_num=5, min_immediate_cost=0.0, max_immediate_cost=1.0, min_visit_count_per_action=1, exploration_constant=1.0, k_a=1.0, alpha_a=0.5, k_o=1.0, alpha_o=0.5, entropy_weight=0.0, visit_count_penalty=0.0)[source]
Bases:
PathSimulationPolicyCostSetting- Parameters:
environment (Environment)
name (str)
depth (int)
action_sampler (ActionSampler)
discount_factor (float)
time_out_in_seconds (int | None)
n_simulations (int | None)
alpha (float)
delta (float)
belief_child_num (int)
min_immediate_cost (float)
max_immediate_cost (float)
min_visit_count_per_action (int)
exploration_constant (float)
k_a (float)
alpha_a (float)
k_o (float)
alpha_o (float)
entropy_weight (float)
visit_count_penalty (float)
- update_nodes(belief_node, action_node)[source]
- Parameters:
belief_node (BeliefNode)
action_node (ActionNode)
POMDPPlanners.planners.mcts_planners.icvar_pomcpow module
ICVaR POMCPOW (Iterated CVaR POMCPOW) Algorithm.
This module implements a risk-sensitive variant of POMCPOW that uses the Iterated Conditional Value at Risk (ICVaR) for value backups instead of the expected value. This makes the planner focus on the worst-alpha fraction of outcomes, enabling risk-averse planning in POMDPs with continuous state, action, and observation spaces.
- Reference:
Pariente, Y., & Indelman, V. (2026). Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function. arXiv preprint arXiv:2601.20554. https://arxiv.org/abs/2601.20554
- Classes:
ICVaR_POMCPOW: Risk-sensitive POMCPOW planner with CVaR-based value updates
- class POMDPPlanners.planners.mcts_planners.icvar_pomcpow.ICVaR_POMCPOW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, min_immediate_cost, max_immediate_cost, min_visit_count_per_action, delta, name, action_sampler, time_out_in_seconds=None, n_simulations=None, alpha=0.05, min_samples_per_node=10, log_path=None, debug=False, visit_count_penalty=0.0)[source]
Bases:
PathSimulationPolicyCostSetting- Parameters:
environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
min_immediate_cost (float)
max_immediate_cost (float)
min_visit_count_per_action (int)
delta (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
alpha (float)
min_samples_per_node (int)
log_path (Path | None)
debug (bool)
visit_count_penalty (float)
- classmethod get_space_info()[source]
Get information about action and observation spaces.
POMCPOW supports mixed-type spaces through its action sampler interface, allowing it to handle both discrete and continuous action spaces.
- Return type:
- Returns:
PolicySpaceInfo with MIXED space types for both actions and observations
POMDPPlanners.planners.mcts_planners.pft_dpw module
PFT-DPW (Progressive Function Transfer with Double Progressive Widening) Algorithm.
This module implements PFT-DPW, a Monte Carlo Tree Search algorithm for continuous action spaces in POMDPs. The algorithm uses progressive widening to gradually expand the action and observation spaces during tree search, enabling effective planning in problems with continuous or large discrete action spaces.
Key features: - Progressive widening for both actions and observations - Handles continuous action spaces through adaptive sampling - Uses UCB1-style exploration with progressive expansion - Supports custom action samplers for domain-specific action generation
The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts 2. Using observation progressive widening to add new observation branches 3. Balancing exploration of new actions with exploitation of promising ones 4. Performing random rollouts from leaf nodes for value estimation
- Reference:
Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
- Classes:
ActionSampler: Abstract base class for action sampling strategies PFT_DPW: Main PFT-DPW planner with progressive widening for continuous actions
- class POMDPPlanners.planners.mcts_planners.pft_dpw.PFT_DPW(environment, discount_factor, depth, name, action_sampler, k_a=1.0, alpha_a=0.5, k_o=1.0, alpha_o=0.5, exploration_constant=1.0, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
DoubleProgressiveWideningMCTSPolicyPFT-DPW (Progressive Function Transfer with Double Progressive Widening) Algorithm.
PFT-DPW is a Monte Carlo Tree Search algorithm designed for continuous action spaces in POMDPs. It uses progressive widening to gradually expand both the action and observation spaces during tree search, enabling effective planning in problems with continuous or very large discrete action spaces.
Algorithm Overview: The algorithm operates through progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts 2. Observation Progressive Widening: Gradually adds new observation branches 3. UCB1 Exploration: Balances exploration of new actions with exploitation 4. Random Rollouts: Estimates values from leaf nodes using random simulations
Key Features: - Handles continuous action spaces through adaptive sampling - Uses UCB1-style exploration with progressive expansion - Supports custom action samplers for domain-specific action generation - Balances exploration of new actions with exploitation of promising ones - Performs random rollouts from leaf nodes for value estimation
Progressive Widening Parameters: - k_a, alpha_a: Control action space expansion (more actions added as visit_count^alpha_a) - k_o, alpha_o: Control observation space expansion - exploration_constant: UCB1 exploration parameter (higher = more exploration)
- Parameters:
environment (Environment)
discount_factor (float)
depth (int)
name (str)
action_sampler (ActionSampler)
k_a (float)
alpha_a (float)
k_o (float)
alpha_o (float)
exploration_constant (float)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)
- environment
The POMDP environment to plan for
- discount_factor
Discount factor for future rewards (0 < γ ≤ 1)
- depth
Maximum search depth for tree expansion
- action_sampler
Strategy for sampling new actions during progressive widening
- k_a, alpha_a
Action progressive widening parameters
- k_o, alpha_o
Observation progressive widening parameters
- exploration_constant
UCB1 exploration parameter
- n_simulations
Number of simulations to run (mutually exclusive with timeout)
- time_out_in_seconds
Time limit for planning (mutually exclusive with n_simulations)
Example
>>> import numpy as np >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler >>> np.random.seed(42) # For reproducible results >>> >>> # Create environment and planner >>> tiger = TigerPOMDP(discount_factor=0.95) >>> action_sampler = DiscreteActionSampler(tiger.get_actions()) >>> planner = PFT_DPW( ... environment=tiger, ... discount_factor=0.95, ... depth=5, ... name="ExamplePlanner", ... action_sampler=action_sampler, ... k_a=2.0, ... alpha_a=0.5, ... n_simulations=10 ... ) >>> >>> # Basic planner interface usage >>> planner.name 'ExamplePlanner' >>> >>> # Action selection from belief >>> initial_belief = get_initial_belief(tiger, n_particles=10) >>> actions, run_data = planner.action(initial_belief) >>> >>> # Planner space information >>> space_info = PFT_DPW.get_space_info() >>> space_info.action_space.name 'MIXED'
- sample_existing_belief_node(belief_node, action_node)[source]
- Return type:
- Parameters:
belief_node (BeliefNode)
action_node (ActionNode)
POMDPPlanners.planners.mcts_planners.pomcp module
POMCP (Partially Observable Monte Carlo Planning) Algorithm Implementation.
This module implements POMCP, a Monte Carlo Tree Search algorithm for POMDP planning. POMCP builds a search tree by iteratively sampling trajectories and using UCB1 for action selection, providing an efficient approximation to optimal POMDP planning.
The algorithm works by: 1. Building a tree of belief-action nodes through Monte Carlo simulations 2. Using UCB1 (Upper Confidence Bounds) for action selection during tree traversal 3. Performing random rollouts from leaf nodes to estimate values 4. Updating node statistics (visit counts, Q-values) based on simulation returns
Key features: - Handles large or continuous observation spaces through particle filtering - Uses UCB1 for principled exploration-exploitation balance - Can be configured with time limits or simulation count limits - Provides theoretical convergence guarantees to optimal policy
- Reference:
Silver, D., & Veness, J. (2010). Monte-Carlo Planning in Large POMDPs. Advances in Neural Information Processing Systems, 23. https://papers.nips.cc/paper_files/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html
- Classes:
POMCP: Monte Carlo Tree Search planner for POMDPs with UCB1 action selection
- class POMDPPlanners.planners.mcts_planners.pomcp.POMCP(environment, discount_factor, depth, exploration_constant, name, time_out_in_seconds=None, n_simulations=None, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
PathSimulationPolicyPOMCP (Partially Observable Monte Carlo Planning) algorithm.
POMCP is a Monte Carlo Tree Search algorithm for POMDP planning that combines UCB1 action selection with particle filtering to handle continuous observation spaces. It builds a search tree through repeated simulations and provides theoretical convergence guarantees.
The algorithm uses UCB1 (Upper Confidence Bounds) to balance exploration and exploitation when selecting actions during tree search. It maintains belief states using particle filters and performs random rollouts to estimate values at leaf nodes.
- Parameters:
- environment
The POMDP environment to plan for
- discount_factor
Discount factor for future rewards (0 < γ ≤ 1)
- depth
Maximum search depth for tree expansion
- exploration_constant
UCB1 exploration parameter (higher = more exploration)
- timeout_in_seconds
Time limit for planning (mutually exclusive with n_simulations)
- n_simulations
Number of simulations to run (mutually exclusive with timeout)
Note
In the original POMCP paper, the belief structure used was an unweighted particle belief that can be found in
POMDPPlanners.core.belief.UnweightedParticleBelief. However, in this implementation, we keep the belief structure abstract to allow users to choose their preferred belief representation. In the usage example below, a weighted particle belief is used via thePOMDPPlanners.core.belief.get_initial_belief()function.Example
>>> import numpy as np >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> np.random.seed(42) # For reproducible results >>> >>> # Create environment and planner >>> tiger = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP( ... environment=tiger, ... discount_factor=0.95, ... depth=5, ... exploration_constant=1.0, ... name="ExamplePlanner", ... n_simulations=10 ... ) >>> >>> # Basic planner interface usage >>> planner.name 'ExamplePlanner' >>> >>> # Action selection from belief >>> initial_belief = get_initial_belief(tiger, n_particles=10) >>> actions, run_data = planner.action(initial_belief) >>> >>> # Planner space information >>> space_info = POMCP.get_space_info() >>> space_info.action_space.name 'DISCRETE'
- get_explored_action_node(belief_node)[source]
- Return type:
- Parameters:
belief_node (BeliefNode)
- classmethod get_space_info()[source]
Get space type requirements for this policy class.
This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.
- Return type:
- Returns:
PolicySpaceInfo specifying required action and observation space types
Note
Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.
- update_nodes(belief_node, action_node, return_sample, state)[source]
- Parameters:
belief_node (BeliefNode)
action_node (ActionNode)
return_sample (float)
state (Any)
POMDPPlanners.planners.mcts_planners.pomcp_dpw module
POMCP_DPW (Partially Observable Monte Carlo Planning with Double Progressive Widening) Algorithm.
This module implements POMCP_DPW, an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening capabilities. POMCP_DPW combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.
Key features: - Double progressive widening for actions and observations - Unweighted particle-based belief representation (following POMCP tradition) - UCB1-based exploration-exploitation balance - Handles continuous and discrete action spaces - Adaptive observation node expansion
The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts and α_a parameter 2. Using observation progressive widening to add new observation branches based on k_o and α_o 3. Maintaining unweighted particle beliefs in observation nodes (as per POMCP) 4. Balancing exploration of new actions with exploitation of promising ones 5. Performing random rollouts from leaf nodes for value estimation
- Reference:
Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
- Classes:
POMCP_DPW: Monte Carlo Tree Search planner with double progressive widening extending POMCP
- class POMDPPlanners.planners.mcts_planners.pomcp_dpw.POMCP_DPW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, name, action_sampler, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
DoubleProgressiveWideningMCTSPolicyPOMCP_DPW (Partially Observable Monte Carlo Planning with Double Progressive Widening) Algorithm.
POMCP_DPW is an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening. It combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.
Algorithm Overview: The algorithm operates through double progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts and α_a 2. Observation Progressive Widening: Gradually adds new observation branches based on k_o and α_o 3. Unweighted Particle Beliefs: Maintains unweighted particle representations in observation nodes (POMCP tradition) 4. UCB1 Exploration: Balances exploration of new actions with exploitation using UCB1 5. Random Rollouts: Estimates values from leaf nodes using random simulations
Key Features: - Handles continuous and discrete action spaces through ActionSampler interface - Uses double progressive widening to manage tree growth - Maintains unweighted particle beliefs for efficient belief approximation (following POMCP tradition) - Balances exploration of new actions with exploitation of promising ones - Supports configurable progressive widening parameters
Progressive Widening Parameters: - k_a, α_a: Control action progressive widening (new actions added when ⌊n^α_a⌋ > ⌊(n-1)^α_a⌋) - k_o, α_o: Control observation progressive widening (max observations ≤ k_o * n^α_o)
- Parameters:
environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)
- environment
The POMDP environment to plan for
- discount_factor
Discount factor for future rewards (0 < γ ≤ 1)
- depth
Maximum search depth for tree expansion
- exploration_constant
UCB1 exploration parameter (higher = more exploration)
- k_o
Observation progressive widening coefficient
- k_a
Action progressive widening coefficient
- alpha_o
Observation progressive widening exponent
- alpha_a
Action progressive widening exponent
- action_sampler
Action sampling strategy for progressive widening
- time_out_in_seconds
Time limit for planning (mutually exclusive with n_simulations)
- n_simulations
Number of simulations to run (mutually exclusive with timeout)
- log_path
Optional path for logging policy execution
- debug
Enable debug logging if True
Example
>>> import numpy as np >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler >>> np.random.seed(42) # For reproducible results >>> >>> # Create environment and planner >>> tiger = TigerPOMDP(discount_factor=0.95) >>> action_sampler = DiscreteActionSampler(tiger.get_actions()) >>> planner = POMCP_DPW( ... environment=tiger, ... discount_factor=0.95, ... depth=5, ... exploration_constant=1.0, ... k_o=3.0, ... k_a=3.0, ... alpha_o=0.5, ... alpha_a=0.5, ... action_sampler=action_sampler, ... n_simulations=10, ... name="ExamplePlanner" ... ) >>> >>> # Basic planner interface usage >>> planner.name 'ExamplePlanner' >>> >>> # Action selection from belief >>> initial_belief = get_initial_belief(tiger, n_particles=10) >>> actions, run_data = planner.action(initial_belief) >>> >>> # Planner space information >>> space_info = POMCP_DPW.get_space_info() >>> space_info.action_space.name 'MIXED'
POMDPPlanners.planners.mcts_planners.pomcpow module
POMCPOW (Partially Observable Monte Carlo Planning with Optimistic Weights) Algorithm.
This module implements POMCPOW, an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening capabilities. POMCPOW combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.
Key features: - Double progressive widening for actions and observations - Weighted particle-based belief representation - UCB1-based exploration-exploitation balance - Handles continuous and discrete action spaces - Adaptive observation node expansion
The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts and α_a parameter 2. Using observation progressive widening to add new observation branches based on k_o and α_o 3. Maintaining weighted particle beliefs in observation nodes 4. Balancing exploration of new actions with exploitation of promising ones 5. Performing random rollouts from leaf nodes for value estimation
- Reference:
Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
- Classes:
POMCPOW: Monte Carlo Tree Search planner with double progressive widening
- class POMDPPlanners.planners.mcts_planners.pomcpow.POMCPOW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, name, action_sampler, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
DoubleProgressiveWideningMCTSPolicyPOMCPOW (Partially Observable Monte Carlo Planning with Optimistic Weights) Algorithm.
POMCPOW is an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening. It combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.
Algorithm Overview: The algorithm operates through double progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts and α_a 2. Observation Progressive Widening: Gradually adds new observation branches based on k_o and α_o 3. Weighted Particle Beliefs: Maintains weighted particle representations in observation nodes 4. UCB1 Exploration: Balances exploration of new actions with exploitation using UCB1 5. Random Rollouts: Estimates values from leaf nodes using random simulations
Key Features: - Handles continuous and discrete action spaces through ActionSampler interface - Uses double progressive widening to manage tree growth - Maintains weighted particle beliefs for efficient belief approximation - Balances exploration of new actions with exploitation of promising ones - Supports configurable progressive widening parameters
Progressive Widening Parameters: - k_a, α_a: Control action progressive widening (new actions added when ⌊n^α_a⌋ > ⌊(n-1)^α_a⌋) - k_o, α_o: Control observation progressive widening (max observations ≤ k_o * n^α_o)
- Parameters:
environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)
- environment
The POMDP environment to plan for
- discount_factor
Discount factor for future rewards (0 < γ ≤ 1)
- depth
Maximum search depth for tree expansion
- exploration_constant
UCB1 exploration parameter (higher = more exploration)
- k_o
Observation progressive widening coefficient
- k_a
Action progressive widening coefficient
- alpha_o
Observation progressive widening exponent
- alpha_a
Action progressive widening exponent
- action_sampler
Action sampling strategy for progressive widening
- time_out_in_seconds
Time limit for planning (mutually exclusive with n_simulations)
- n_simulations
Number of simulations to run (mutually exclusive with timeout)
- log_path
Optional path for logging policy execution
- debug
Enable debug logging if True
Example
>>> import numpy as np >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler >>> np.random.seed(42) # For reproducible results >>> >>> # Create environment and planner >>> tiger = TigerPOMDP(discount_factor=0.95) >>> action_sampler = DiscreteActionSampler(tiger.get_actions()) >>> planner = POMCPOW( ... environment=tiger, ... discount_factor=0.95, ... depth=5, ... exploration_constant=1.0, ... k_o=3.0, ... k_a=3.0, ... alpha_o=0.5, ... alpha_a=0.5, ... action_sampler=action_sampler, ... n_simulations=10, ... name="ExamplePlanner" ... ) >>> >>> # Basic planner interface usage >>> planner.name 'ExamplePlanner' >>> >>> # Action selection from belief >>> initial_belief = get_initial_belief(tiger, n_particles=10) >>> actions, run_data = planner.action(initial_belief) >>> >>> # Planner space information >>> space_info = POMCPOW.get_space_info() >>> space_info.action_space.name 'MIXED'
POMDPPlanners.planners.mcts_planners.sparse_pft module
- class POMDPPlanners.planners.mcts_planners.sparse_pft.SparsePFT(environment, discount_factor, gamma, depth, c_ucb, beta_ucb, belief_child_num, time_out_in_seconds=None, n_simulations=None, name='SparsePFT', log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
PathSimulationPolicySparse-PFT (Sparse Progressive Function Transfer) Algorithm for POMDP Planning.
Sparse-PFT combines the efficiency of sparse sampling with progressive function transfer and Monte Carlo Tree Search for POMDP planning. It addresses the curse of dimensionality by limiting the number of children per belief-action node while using sophisticated exploration strategies to guide tree construction.
Algorithm Overview: The algorithm operates by: 1. Sparse Branching: Limits each action node to a fixed number of belief children 2. Progressive Selection: Uses modified UCB to balance exploration and exploitation 3. Adaptive Sampling: Samples existing children or generates new ones based on capacity 4. Random Rollouts: Estimates values from leaf nodes using random simulations
Key Features: - Sparse Tree Structure: Controls memory usage by limiting belief children per action - Enhanced UCB: Uses modified UCB formula with beta parameter for better exploration - Efficient Sampling: Balances between exploring existing branches and generating new ones - Discrete Actions: Optimized for discrete action spaces with discrete or mixed observations - Terminal State Handling: Properly detects when all particles reach terminal states
Mathematical Foundation: The algorithm uses a modified UCB selection criterion:
UCB(s,a) = Q(s,a) + c_ucb * beta_ucb * N(s) * (1/√N(s,a))
Where: - Q(s,a): Action-value estimate - c_ucb: Base exploration constant - beta_ucb: Additional exploration parameter - N(s): Visit count of belief node - N(s,a): Visit count of action node
- Parameters:
- environment
The discrete-action POMDP environment for planning
- discount_factor
Discount factor for future rewards (0 < γ ≤ 1)
- gamma
Alternative discount parameter for value computation
- depth
Maximum search depth for tree expansion
- c_ucb
Base exploration constant for UCB formula
- beta_ucb
Additional exploration parameter for enhanced UCB
- belief_child_num
Maximum number of belief children per action node
- n_simulations
Number of MCTS simulations to perform
Example
>>> import numpy as np >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> np.random.seed(42) # For reproducible results >>> >>> # Create environment and planner >>> tiger = TigerPOMDP(discount_factor=0.95) >>> planner = SparsePFT( ... environment=tiger, ... discount_factor=0.95, ... gamma=0.95, ... depth=5, ... c_ucb=1.0, ... beta_ucb=2.0, ... belief_child_num=3, ... n_simulations=10, ... name="ExamplePlanner" ... ) >>> >>> # Basic planner interface usage >>> planner.name 'ExamplePlanner' >>> >>> # Action selection from belief >>> initial_belief = get_initial_belief(tiger, n_particles=10) >>> actions, run_data = planner.action(initial_belief) >>> >>> # Planner space information >>> space_info = SparsePFT.get_space_info() >>> space_info.action_space.name 'DISCRETE'
- get_explored_action_node(belief_node)[source]
- Return type:
- Parameters:
belief_node (BeliefNode)
- classmethod get_space_info()[source]
Get space type requirements for this policy class.
This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.
- Return type:
- Returns:
PolicySpaceInfo specifying required action and observation space types
Note
Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.
- update_nodes(belief_node, action_node, return_sample)[source]
- Parameters:
belief_node (BeliefNode)
action_node (ActionNode)
return_sample (float)