POMDPPlanners.planners.sparse_sampling_planners package

Submodules

POMDPPlanners.planners.sparse_sampling_planners.icvar_sparse_sampling module

ICVaR Sparse Sampling POMDP Planning Algorithm Implementation.

This module implements a risk-sensitive variant of the sparse sampling algorithm for POMDP planning. Instead of using the expected value (mean) for Bellman backups, it uses the Conditional Value at Risk (CVaR) to focus on the worst-alpha fraction of outcomes.

Reference:: Pariente, Y., & Indelman, V. (2026). Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function. arXiv preprint arXiv:2601.20554. https://arxiv.org/abs/2601.20554
Classes:: ICVaRSparseSampling: Risk-sensitive sparse sampling with CVaR-based value updates

class POMDPPlanners.planners.sparse_sampling_planners.icvar_sparse_sampling.ICVaRSparseSampling(environment, branching_factor, depth, alpha, name='ICVaRSparseSampling')[source]

Bases: SparseSamplingDiscreteActionsPlanner

Risk-sensitive sparse sampling planner using CVaR for value backups.

This planner extends the standard sparse sampling algorithm by replacing the expected value (mean) in Q-value computation with the Conditional Value at Risk (CVaR). CVaR focuses on the worst-alpha fraction of outcomes, making the planner risk-sensitive.

The standard Q-value update uses:: Q = immediate_cost + gamma * mean(child_v_values)
The ICVaR variant replaces this with:: Q = immediate_cost + gamma * CVaR_alpha(child_v_values)

Parameters:

environment (DiscreteActionsEnvironment)
branching_factor (int)
depth (int)
alpha (float)
name (str)

alpha: CVaR confidence level (0 < alpha <= 1). Lower alpha means more risk-sensitive (focuses on worse outcomes). alpha=1.0 recovers the standard expected value.

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and risk-sensitive planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> planner = ICVaRSparseSampling(
...     environment=tiger,
...     branching_factor=2,
...     depth=2,
...     alpha=0.3,
...     name="ICVaRPlanner"
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ICVaRPlanner'
>>> planner.alpha
0.3
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = ICVaRSparseSampling.get_space_info()
>>> space_info.action_space.name
'DISCRETE'

classmethod get_info_variable_names()[source]

Get names of policy info variables.

Return type:: List[str]
Returns:: Empty list as this planner produces no info variables.

classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling module

Sparse Sampling POMDP Planning Algorithm Implementation.

This module implements the sparse sampling algorithm for POMDP planning, which builds a finite-depth lookahead tree by sampling a limited number of outcomes at each node. The algorithm provides theoretical guarantees on the quality of the computed policy.

The sparse sampling approach works by: 1. Building a finite-depth tree from the current belief 2. Sampling a fixed number of next states and observations at each node 3. Computing value estimates using dynamic programming 4. Selecting the action with the best estimated value

Reference:: Kearns, M., Mansour, Y., & Ng, A. Y. (2002). A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning, 49, 193-208. https://link.springer.com/article/10.1023/A:1017932429737
Classes:: BaseSparseSamplingDiscreteActionsPlanner: Abstract base class for sparse sampling algorithms SparseSamplingDiscreteActionsPlanner: Concrete implementation with standard value updates

class POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling.BaseSparseSamplingDiscreteActionsPlanner(environment, branching_factor, depth, resampling=False, name='BaseSparseSamplingDiscreteActionsPlanner', log_path=None, debug=False)[source]

Bases: Policy, ABC

Abstract base class for sparse sampling POMDP planners.

This class implements the core sparse sampling algorithm for POMDP planning. It builds a finite-depth lookahead tree by sampling a limited number of outcomes at each node, providing theoretical guarantees on policy quality.

The algorithm works by building a tree where: - Each belief node represents a belief state - Each action node represents taking an action from a belief - The tree depth is limited to control computational complexity - Value estimates are computed using dynamic programming

Parameters:

environment (DiscreteActionsEnvironment)
branching_factor (int)
depth (int)
resampling (bool)
name (str)
log_path (Path | None)
debug (bool)

environment: The POMDP environment to plan for

branching_factor: Number of samples at each node (controls tree width)

depth: Maximum planning depth (controls tree height)

resampling: Whether to resample particles during belief updates

Note

This is an abstract base class and cannot be instantiated directly. Subclasses must implement the value update methods for leaf and non-leaf nodes.

action(belief)[source]

Select action(s) based on the current belief state.

This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.

Parameters:

belief (Belief) – Current belief state representing uncertainty over states

Returns:

List of selected actions (typically single action, but supports multiple)
PolicyRunData with execution metrics and performance information

Return type:

Tuple[List[Any], PolicyRunData]

Note

Subclasses must implement this method with their specific planning or decision-making algorithm.

classmethod get_info_variable_names()[source]

Get names of policy info variables.

Sparse sampling planner does not produce any info variables.

Return type:: List[str]
Returns:: Empty list as this planner produces no info variables

classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:: PolicySpaceInfo
Returns:: PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

class POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling.SparseSamplingDiscreteActionsPlanner(environment, branching_factor, depth, name='SparseSamplingDiscreteActionsPlanner')[source]

Bases: BaseSparseSamplingDiscreteActionsPlanner

Standard implementation of sparse sampling for POMDP planning.

This concrete implementation of sparse sampling uses standard value updates: - Q-values for actions are computed as immediate cost plus discounted future value - V-values for beliefs are computed as the minimum Q-value over actions (cost formulation) - Leaf nodes use only immediate cost estimates

The algorithm provides theoretical guarantees: with probability 1-δ, the computed policy is ε-optimal, where ε decreases with increasing depth and branching factor.

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> planner = SparseSamplingDiscreteActionsPlanner(
...     environment=tiger,
...     branching_factor=2,
...     depth=2,
...     name="ExamplePlanner"
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = SparseSamplingDiscreteActionsPlanner.get_space_info()
>>> space_info.action_space.name
'DISCRETE'

Parameters:

environment (DiscreteActionsEnvironment)
branching_factor (int)
depth (int)
name (str)