POMDPPlanners.planners.open_loop_planners package

Submodules

POMDPPlanners.planners.open_loop_planners.discrete_action_sequences_planner module

class POMDPPlanners.planners.open_loop_planners.discrete_action_sequences_planner.DiscreteActionSequencesPlanner(environment, discount_factor, name, depth, n_return_samples, log_path=None, debug=False, use_queue_logger=False)[source]

Bases: Policy

Open-loop planner for discrete action spaces using exhaustive sequence search.

This planner uses an open-loop strategy to find optimal action sequences by enumerating all possible action sequences up to a specified depth and selecting the sequence with the highest expected return. It’s particularly useful for problems with small action spaces and short planning horizons.

The algorithm works by: 1. Generating all possible action sequences of the specified depth 2. For each sequence, estimating the expected return through Monte Carlo sampling 3. Selecting the sequence with the maximum expected return 4. Returning the first action in the optimal sequence

Open-Loop vs Closed-Loop Planning: - Open-loop: Plans a complete action sequence without considering future observations - Closed-loop: Re-plans at each step based on new observations (like MCTS algorithms)

This approach is computationally intensive (O(|A|^depth)) but provides optimal solutions for the open-loop setting when the action space is manageable.

Parameters:
  • environment (DiscreteActionsEnvironment) – The discrete actions POMDP environment

  • discount_factor (float) – Discount factor for future rewards (0 < γ ≤ 1)

  • name (str) – Identifier for the planner instance

  • depth (int) – Planning horizon (number of actions in sequence)

  • n_return_samples (int) – Number of Monte Carlo samples for return estimation

  • log_path (Optional[Path]) – Optional path for logging planner execution details

  • debug (bool) – Enable debug mode for detailed execution traces

  • use_queue_logger (bool)

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Create environment and planner
>>> tiger = TigerPOMDP(discount_factor=0.95)
>>> planner = DiscreteActionSequencesPlanner(
...     environment=tiger,
...     discount_factor=0.95,
...     name="ExamplePlanner",
...     depth=2,
...     n_return_samples=10
... )
>>>
>>> # Basic planner interface usage
>>> planner.name
'ExamplePlanner'
>>>
>>> # Action selection from belief
>>> initial_belief = get_initial_belief(tiger, n_particles=10)
>>> actions, run_data = planner.action(initial_belief)
>>>
>>> # Planner space information
>>> space_info = DiscreteActionSequencesPlanner.get_space_info()
>>> space_info.action_space.name
'DISCRETE'
action(belief)[source]

Select action(s) based on the current belief state.

This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.

Parameters:

belief (Belief) – Current belief state representing uncertainty over states

Returns:

  • List of selected actions (typically single action, but supports multiple)

  • PolicyRunData with execution metrics and performance information

Return type:

Tuple[List[Any], PolicyRunData]

Note

Subclasses must implement this method with their specific planning or decision-making algorithm.

estimate_return(action_sequence, belief)[source]
Return type:

float

Parameters:
classmethod get_info_variable_names()[source]

Get names of policy info variables.

Discrete action sequences planner does not produce any info variables.

Return type:

List[str]

Returns:

Empty list as this planner produces no info variables

classmethod get_space_info()[source]

Get space type requirements for this policy class.

This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.

Return type:

PolicySpaceInfo

Returns:

PolicySpaceInfo specifying required action and observation space types

Note

Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.

search(belief)[source]
Return type:

Any

Parameters:

belief (Belief)