POMDPPlanners.planners.mcts_planners package

Subpackages

Submodules

POMDPPlanners.planners.mcts_planners.icvar_pft_dpw module

ICVaR PFT-DPW (Iterated CVaR Particle Filter Tree with Double Progressive Widening) Algorithm.

This module implements a risk-sensitive variant of PFT-DPW that uses the Iterated Conditional Value at Risk (ICVaR) for value backups instead of the expected value. This makes the planner focus on the worst-alpha fraction of outcomes, enabling risk-averse planning in POMDPs.

Reference:: Pariente, Y., & Indelman, V. (2026). Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function. arXiv preprint arXiv:2601.20554. https://arxiv.org/abs/2601.20554
Classes:: ICVaR_PFT_DPW: Risk-sensitive PFT-DPW planner with CVaR-based value updates

class POMDPPlanners.planners.mcts_planners.icvar_pft_dpw.ICVaR_PFT_DPW(environment, name, depth, action_sampler, discount_factor=0.95, time_out_in_seconds=None, n_simulations=None, alpha=0.1, delta=0.1, belief_child_num=5, min_immediate_cost=0.0, max_immediate_cost=1.0, min_visit_count_per_action=1, exploration_constant=1.0, k_a=1.0, alpha_a=0.5, k_o=1.0, alpha_o=0.5, entropy_weight=0.0, visit_count_penalty=0.0)[source]

Bases: PathSimulationPolicyCostSetting

Parameters:

environment (Environment)
name (str)
depth (int)
action_sampler (ActionSampler)
discount_factor (float)
time_out_in_seconds (int | None)
n_simulations (int | None)
alpha (float)
delta (float)
belief_child_num (int)
min_immediate_cost (float)
max_immediate_cost (float)
min_visit_count_per_action (int)
exploration_constant (float)
k_a (float)
alpha_a (float)
k_o (float)
alpha_o (float)
entropy_weight (float)
visit_count_penalty (float)

classmethod get_space_info()[source]

Get information about the policy’s space.

Return type:: PolicySpaceInfo

is_terminal_belief(belief)[source]

Checks if all paricles are terminal states.

Return type:: bool
Parameters:: belief (Belief)

update_nodes(belief_node, action_node)[source]

Parameters:

belief_node (BeliefNode)
action_node (ActionNode)

POMDPPlanners.planners.mcts_planners.icvar_pomcpow module

ICVaR POMCPOW (Iterated CVaR POMCPOW) Algorithm.

This module implements a risk-sensitive variant of POMCPOW that uses the Iterated Conditional Value at Risk (ICVaR) for value backups instead of the expected value. This makes the planner focus on the worst-alpha fraction of outcomes, enabling risk-averse planning in POMDPs with continuous state, action, and observation spaces.

Reference:: Pariente, Y., & Indelman, V. (2026). Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function. arXiv preprint arXiv:2601.20554. https://arxiv.org/abs/2601.20554
Classes:: ICVaR_POMCPOW: Risk-sensitive POMCPOW planner with CVaR-based value updates

class POMDPPlanners.planners.mcts_planners.icvar_pomcpow.ICVaR_POMCPOW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, min_immediate_cost, max_immediate_cost, min_visit_count_per_action, delta, name, action_sampler, time_out_in_seconds=None, n_simulations=None, alpha=0.05, min_samples_per_node=10, log_path=None, debug=False, visit_count_penalty=0.0)[source]

Bases: PathSimulationPolicyCostSetting

Parameters:

environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
min_immediate_cost (float)
max_immediate_cost (float)
min_visit_count_per_action (int)
delta (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
alpha (float)
min_samples_per_node (int)
log_path (Path | None)
debug (bool)
visit_count_penalty (float)

classmethod get_space_info()[source]

Get information about action and observation spaces.

POMCPOW supports mixed-type spaces through its action sampler interface, allowing it to handle both discrete and continuous action spaces.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo with MIXED space types for both actions and observations

POMDPPlanners.planners.mcts_planners.pft_dpw module

PFT-DPW (Progressive Function Transfer with Double Progressive Widening) Algorithm.

This module implements PFT-DPW, a Monte Carlo Tree Search algorithm for continuous action spaces in POMDPs. The algorithm uses progressive widening to gradually expand the action and observation spaces during tree search, enabling effective planning in problems with continuous or large discrete action spaces.

Key features: - Progressive widening for both actions and observations - Handles continuous action spaces through adaptive sampling - Uses UCB1-style exploration with progressive expansion - Supports custom action samplers for domain-specific action generation

The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts 2. Using observation progressive widening to add new observation branches 3. Balancing exploration of new actions with exploitation of promising ones 4. Performing random rollouts from leaf nodes for value estimation

Reference:: Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
Classes:: ActionSampler: Abstract base class for action sampling strategies PFT_DPW: Main PFT-DPW planner with progressive widening for continuous actions

class POMDPPlanners.planners.mcts_planners.pft_dpw.PFT_DPW(environment, discount_factor, depth, name, action_sampler, k_a=1.0, alpha_a=0.5, k_o=1.0, alpha_o=0.5, exploration_constant=1.0, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: DoubleProgressiveWideningMCTSPolicy

PFT-DPW (Progressive Function Transfer with Double Progressive Widening) Algorithm.

PFT-DPW is a Monte Carlo Tree Search algorithm designed for continuous action spaces in POMDPs. It uses progressive widening to gradually expand both the action and observation spaces during tree search, enabling effective planning in problems with continuous or very large discrete action spaces.

Algorithm Overview: The algorithm operates through progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts 2. Observation Progressive Widening: Gradually adds new observation branches 3. UCB1 Exploration: Balances exploration of new actions with exploitation 4. Random Rollouts: Estimates values from leaf nodes using random simulations

Key Features: - Handles continuous action spaces through adaptive sampling - Uses UCB1-style exploration with progressive expansion - Supports custom action samplers for domain-specific action generation - Balances exploration of new actions with exploitation of promising ones - Performs random rollouts from leaf nodes for value estimation

Progressive Widening Parameters: - k_a, alpha_a: Control action space expansion (more actions added as visit_count^alpha_a) - k_o, alpha_o: Control observation space expansion - exploration_constant: UCB1 exploration parameter (higher = more exploration)

Parameters:

environment (Environment)
discount_factor (float)
depth (int)
name (str)
action_sampler (ActionSampler)
k_a (float)
alpha_a (float)
k_o (float)
alpha_o (float)
exploration_constant (float)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The POMDP environment to plan for

discount_factor: Discount factor for future rewards (0 < γ ≤ 1)

depth: Maximum search depth for tree expansion

action_sampler: Strategy for sampling new actions during progressive widening

k_a, alpha_a: Action progressive widening parameters

k_o, alpha_o: Observation progressive widening parameters

exploration_constant: UCB1 exploration parameter

n_simulations: Number of simulations to run (mutually exclusive with timeout)

time_out_in_seconds: Time limit for planning (mutually exclusive with n_simulations)

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> action_sampler = DiscreteActionSampler(tiger.get_actions())
>>> planner = PFT_DPW(
...     environment=tiger,
...     discount_factor=0.95,
...     depth=5,
...     name="ExamplePlanner",
...     action_sampler=action_sampler,
...     k_a=2.0,
...     alpha_a=0.5,
...     n_simulations=10
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = PFT_DPW.get_space_info()
>>> space_info.action_space.name
'MIXED'

sample_existing_belief_node(belief_node, action_node)[source]

Return type:

Tuple[BeliefNode, float]

Parameters:

belief_node (BeliefNode)
action_node (ActionNode)

POMDPPlanners.planners.mcts_planners.pomcp module

POMCP (Partially Observable Monte Carlo Planning) Algorithm Implementation.

This module implements POMCP, a Monte Carlo Tree Search algorithm for POMDP planning. POMCP builds a search tree by iteratively sampling trajectories and using UCB1 for action selection, providing an efficient approximation to optimal POMDP planning.

The algorithm works by: 1. Building a tree of belief-action nodes through Monte Carlo simulations 2. Using UCB1 (Upper Confidence Bounds) for action selection during tree traversal 3. Performing random rollouts from leaf nodes to estimate values 4. Updating node statistics (visit counts, Q-values) based on simulation returns

Key features: - Handles large or continuous observation spaces through particle filtering - Uses UCB1 for principled exploration-exploitation balance - Can be configured with time limits or simulation count limits - Provides theoretical convergence guarantees to optimal policy

Reference:: Silver, D., & Veness, J. (2010). Monte-Carlo Planning in Large POMDPs. Advances in Neural Information Processing Systems, 23. https://papers.nips.cc/paper_files/paper/2010/hash/edfbe1afcf9246bb0d40eb4d8027d90f-Abstract.html
Classes:: POMCP: Monte Carlo Tree Search planner for POMDPs with UCB1 action selection

class POMDPPlanners.planners.mcts_planners.pomcp.POMCP(environment, discount_factor, depth, exploration_constant, name, time_out_in_seconds=None, n_simulations=None, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: PathSimulationPolicy

POMCP (Partially Observable Monte Carlo Planning) algorithm.

POMCP is a Monte Carlo Tree Search algorithm for POMDP planning that combines UCB1 action selection with particle filtering to handle continuous observation spaces. It builds a search tree through repeated simulations and provides theoretical convergence guarantees.

The algorithm uses UCB1 (Upper Confidence Bounds) to balance exploration and exploitation when selecting actions during tree search. It maintains belief states using particle filters and performs random rollouts to estimate values at leaf nodes.

Parameters:

environment (DiscreteActionsEnvironment)
discount_factor (float)
depth (int)
exploration_constant (float)
name (str)
time_out_in_seconds (int | None)
n_simulations (int | None)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The POMDP environment to plan for

discount_factor: Discount factor for future rewards (0 < γ ≤ 1)

depth: Maximum search depth for tree expansion

exploration_constant: UCB1 exploration parameter (higher = more exploration)

timeout_in_seconds: Time limit for planning (mutually exclusive with n_simulations)

n_simulations: Number of simulations to run (mutually exclusive with timeout)

Note

In the original POMCP paper, the belief structure used was an unweighted particle belief that can be found in POMDPPlanners.core.belief.UnweightedParticleBelief. However, in this implementation, we keep the belief structure abstract to allow users to choose their preferred belief representation. In the usage example below, a weighted particle belief is used via the POMDPPlanners.core.belief.get_initial_belief() function.

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(
...     environment=tiger,
...     discount_factor=0.95,
...     depth=5,
...     exploration_constant=1.0,
...     name="ExamplePlanner",
...     n_simulations=10
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = POMCP.get_space_info()
>>> space_info.action_space.name
'DISCRETE'

get_explored_action_node(belief_node)[source]

Return type:: ActionNode
Parameters:: belief_node (BeliefNode)

classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

random_rollout(state, depth)[source]

Return type:

float

Parameters:

state (Any)
depth (int)

update_nodes(belief_node, action_node, return_sample, state)[source]

Parameters:

belief_node (BeliefNode)
action_node (ActionNode)
return_sample (float)
state (Any)

POMDPPlanners.planners.mcts_planners.pomcp_dpw module

POMCP_DPW (Partially Observable Monte Carlo Planning with Double Progressive Widening) Algorithm.

This module implements POMCP_DPW, an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening capabilities. POMCP_DPW combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.

Key features: - Double progressive widening for actions and observations - Unweighted particle-based belief representation (following POMCP tradition) - UCB1-based exploration-exploitation balance - Handles continuous and discrete action spaces - Adaptive observation node expansion

The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts and α_a parameter 2. Using observation progressive widening to add new observation branches based on k_o and α_o 3. Maintaining unweighted particle beliefs in observation nodes (as per POMCP) 4. Balancing exploration of new actions with exploitation of promising ones 5. Performing random rollouts from leaf nodes for value estimation

Reference:: Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
Classes:: POMCP_DPW: Monte Carlo Tree Search planner with double progressive widening extending POMCP

class POMDPPlanners.planners.mcts_planners.pomcp_dpw.POMCP_DPW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, name, action_sampler, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: DoubleProgressiveWideningMCTSPolicy

POMCP_DPW (Partially Observable Monte Carlo Planning with Double Progressive Widening) Algorithm.

POMCP_DPW is an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening. It combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.

Algorithm Overview: The algorithm operates through double progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts and α_a 2. Observation Progressive Widening: Gradually adds new observation branches based on k_o and α_o 3. Unweighted Particle Beliefs: Maintains unweighted particle representations in observation nodes (POMCP tradition) 4. UCB1 Exploration: Balances exploration of new actions with exploitation using UCB1 5. Random Rollouts: Estimates values from leaf nodes using random simulations

Key Features: - Handles continuous and discrete action spaces through ActionSampler interface - Uses double progressive widening to manage tree growth - Maintains unweighted particle beliefs for efficient belief approximation (following POMCP tradition) - Balances exploration of new actions with exploitation of promising ones - Supports configurable progressive widening parameters

Progressive Widening Parameters: - k_a, α_a: Control action progressive widening (new actions added when ⌊n^α_a⌋ > ⌊(n-1)^α_a⌋) - k_o, α_o: Control observation progressive widening (max observations ≤ k_o * n^α_o)

Parameters:

environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The POMDP environment to plan for

discount_factor: Discount factor for future rewards (0 < γ ≤ 1)

depth: Maximum search depth for tree expansion

exploration_constant: UCB1 exploration parameter (higher = more exploration)

k_o: Observation progressive widening coefficient

k_a: Action progressive widening coefficient

alpha_o: Observation progressive widening exponent

alpha_a: Action progressive widening exponent

action_sampler: Action sampling strategy for progressive widening

time_out_in_seconds: Time limit for planning (mutually exclusive with n_simulations)

n_simulations: Number of simulations to run (mutually exclusive with timeout)

log_path: Optional path for logging policy execution

debug: Enable debug logging if True

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> action_sampler = DiscreteActionSampler(tiger.get_actions())
>>> planner = POMCP_DPW(
...     environment=tiger,
...     discount_factor=0.95,
...     depth=5,
...     exploration_constant=1.0,
...     k_o=3.0,
...     k_a=3.0,
...     alpha_o=0.5,
...     alpha_a=0.5,
...     action_sampler=action_sampler,
...     n_simulations=10,
...     name="ExamplePlanner"
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = POMCP_DPW.get_space_info()
>>> space_info.action_space.name
'MIXED'

POMDPPlanners.planners.mcts_planners.pomcpow module

POMCPOW (Partially Observable Monte Carlo Planning with Optimistic Weights) Algorithm.

This module implements POMCPOW, an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening capabilities. POMCPOW combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.

Key features: - Double progressive widening for actions and observations - Weighted particle-based belief representation - UCB1-based exploration-exploitation balance - Handles continuous and discrete action spaces - Adaptive observation node expansion

The algorithm progressively expands the tree by: 1. Using action progressive widening to add new actions based on visit counts and α_a parameter 2. Using observation progressive widening to add new observation branches based on k_o and α_o 3. Maintaining weighted particle beliefs in observation nodes 4. Balancing exploration of new actions with exploitation of promising ones 5. Performing random rollouts from leaf nodes for value estimation

Reference:: Sunberg, Z. N., & Kochenderfer, M. J. (2018). Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. Proceedings of the International Conference on Automated Planning and Scheduling, 28(1), 259-263. https://ojs.aaai.org/index.php/ICAPS/article/view/13882
Classes:: POMCPOW: Monte Carlo Tree Search planner with double progressive widening

class POMDPPlanners.planners.mcts_planners.pomcpow.POMCPOW(environment, discount_factor, depth, exploration_constant, k_o, k_a, alpha_o, alpha_a, name, action_sampler, time_out_in_seconds=None, n_simulations=None, min_visit_count_per_action=1, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: DoubleProgressiveWideningMCTSPolicy

POMCPOW (Partially Observable Monte Carlo Planning with Optimistic Weights) Algorithm.

POMCPOW is an advanced Monte Carlo Tree Search algorithm for POMDP planning that extends POMCP with double progressive widening. It combines UCB1 action selection with progressive widening for both actions and observations, making it particularly effective for problems with large or continuous action spaces.

Algorithm Overview: The algorithm operates through double progressive expansion: 1. Action Progressive Widening: Gradually adds new actions based on visit counts and α_a 2. Observation Progressive Widening: Gradually adds new observation branches based on k_o and α_o 3. Weighted Particle Beliefs: Maintains weighted particle representations in observation nodes 4. UCB1 Exploration: Balances exploration of new actions with exploitation using UCB1 5. Random Rollouts: Estimates values from leaf nodes using random simulations

Key Features: - Handles continuous and discrete action spaces through ActionSampler interface - Uses double progressive widening to manage tree growth - Maintains weighted particle beliefs for efficient belief approximation - Balances exploration of new actions with exploitation of promising ones - Supports configurable progressive widening parameters

Progressive Widening Parameters: - k_a, α_a: Control action progressive widening (new actions added when ⌊n^α_a⌋ > ⌊(n-1)^α_a⌋) - k_o, α_o: Control observation progressive widening (max observations ≤ k_o * n^α_o)

Parameters:

environment (Environment)
discount_factor (float)
depth (int)
exploration_constant (float)
k_o (float)
k_a (float)
alpha_o (float)
alpha_a (float)
name (str)
action_sampler (ActionSampler)
time_out_in_seconds (int | None)
n_simulations (int | None)
min_visit_count_per_action (int)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The POMDP environment to plan for

discount_factor: Discount factor for future rewards (0 < γ ≤ 1)

depth: Maximum search depth for tree expansion

exploration_constant: UCB1 exploration parameter (higher = more exploration)

k_o: Observation progressive widening coefficient

k_a: Action progressive widening coefficient

alpha_o: Observation progressive widening exponent

alpha_a: Action progressive widening exponent

action_sampler: Action sampling strategy for progressive widening

time_out_in_seconds: Time limit for planning (mutually exclusive with n_simulations)

n_simulations: Number of simulations to run (mutually exclusive with timeout)

log_path: Optional path for logging policy execution

debug: Enable debug logging if True

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> from POMDPPlanners.utils.action_samplers import DiscreteActionSampler
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> action_sampler = DiscreteActionSampler(tiger.get_actions())
>>> planner = POMCPOW(
...     environment=tiger,
...     discount_factor=0.95,
...     depth=5,
...     exploration_constant=1.0,
...     k_o=3.0,
...     k_a=3.0,
...     alpha_o=0.5,
...     alpha_a=0.5,
...     action_sampler=action_sampler,
...     n_simulations=10,
...     name="ExamplePlanner"
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = POMCPOW.get_space_info()
>>> space_info.action_space.name
'MIXED'

POMDPPlanners.planners.mcts_planners.sparse_pft module

class POMDPPlanners.planners.mcts_planners.sparse_pft.SparsePFT(environment, discount_factor, gamma, depth, c_ucb, beta_ucb, belief_child_num, time_out_in_seconds=None, n_simulations=None, name='SparsePFT', log_path=None, debug=False, use_queue_logger=False)[source]

Bases: PathSimulationPolicy

Sparse-PFT (Sparse Progressive Function Transfer) Algorithm for POMDP Planning.

Sparse-PFT combines the efficiency of sparse sampling with progressive function transfer and Monte Carlo Tree Search for POMDP planning. It addresses the curse of dimensionality by limiting the number of children per belief-action node while using sophisticated exploration strategies to guide tree construction.

Algorithm Overview: The algorithm operates by: 1. Sparse Branching: Limits each action node to a fixed number of belief children 2. Progressive Selection: Uses modified UCB to balance exploration and exploitation 3. Adaptive Sampling: Samples existing children or generates new ones based on capacity 4. Random Rollouts: Estimates values from leaf nodes using random simulations

Key Features: - Sparse Tree Structure: Controls memory usage by limiting belief children per action - Enhanced UCB: Uses modified UCB formula with beta parameter for better exploration - Efficient Sampling: Balances between exploring existing branches and generating new ones - Discrete Actions: Optimized for discrete action spaces with discrete or mixed observations - Terminal State Handling: Properly detects when all particles reach terminal states

Mathematical Foundation: The algorithm uses a modified UCB selection criterion:

UCB(s,a) = Q(s,a) + c_ucb * beta_ucb * N(s) * (1/√N(s,a))

Where: - Q(s,a): Action-value estimate - c_ucb: Base exploration constant - beta_ucb: Additional exploration parameter - N(s): Visit count of belief node - N(s,a): Visit count of action node

Parameters:

environment (DiscreteActionsEnvironment)
discount_factor (float)
gamma (float)
depth (int)
c_ucb (float)
beta_ucb (float)
belief_child_num (int)
time_out_in_seconds (int | None)
n_simulations (int | None)
name (str)
log_path (Path | None)
debug (bool)
use_queue_logger (bool)

environment: The discrete-action POMDP environment for planning

discount_factor: Discount factor for future rewards (0 < γ ≤ 1)

gamma: Alternative discount parameter for value computation

depth: Maximum search depth for tree expansion

c_ucb: Base exploration constant for UCB formula

beta_ucb: Additional exploration parameter for enhanced UCB

belief_child_num: Maximum number of belief children per action node

n_simulations: Number of MCTS simulations to perform

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> planner = SparsePFT(
...     environment=tiger,
...     discount_factor=0.95,
...     gamma=0.95,
...     depth=5,
...     c_ucb=1.0,
...     beta_ucb=2.0,
...     belief_child_num=3,
...     n_simulations=10,
...     name="ExamplePlanner"
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = SparsePFT.get_space_info()
>>> space_info.action_space.name
'DISCRETE'

get_explored_action_node(belief_node)[source]

Return type:: ActionNode
Parameters:: belief_node (BeliefNode)

classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

random_rollout(state, depth)[source]

Return type:

float

Parameters:

state (Any)
depth (int)

update_nodes(belief_node, action_node, return_sample)[source]

Parameters:

belief_node (BeliefNode)
action_node (ActionNode)
return_sample (float)