Environments

POMDP environments define the problem structure, including states, actions, observations, transitions, and rewards. POMDPPlanners provides both classic benchmark problems and modern challenging environments.

Core Environment Types

Classic Benchmark Problems

Control & Navigation

Manipulation

Information Gathering & Pursuit

Environment Interface

All environments inherit from the base Environment class:

class POMDPPlanners.core.environment.Environment(discount_factor, name, space_info, reward_range=None, output_dir=None, debug=False, use_queue_logger=False)[source]

Bases: ABC

Abstract base class for POMDP environments.

This is the core abstract class that all POMDP environments must inherit from. It defines the essential interface for POMDP environments including state transitions, observations, rewards, and terminal conditions.

Note

This is an abstract base class and cannot be instantiated directly. Subclasses must implement all abstract methods.

Parameters:
discount_factor

Discount factor for future rewards

name

Environment identifier string

space_info

Information about action and observation space types

reward_range

Optional tuple containing (min_reward, max_reward)

output_dir

Optional directory for logging output

debug

Flag to enable debug logging

cache_visualization(history, cache_path)[source]

Cache visualization data for an episode history.

This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.

Parameters:
  • history (List[StepData]) – List of step data from an episode

  • cache_path (Path) – Path where visualization data should be cached

Return type:

None

compute_metrics(histories)[source]

Compute environment-specific metrics from episode histories.

This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.

Parameters:

histories (List[History]) – List of episode histories to analyze

Return type:

List[MetricValue]

Returns:

List of computed metrics with confidence intervals

property config_id: str

Generate a deterministic identifier based on environment configuration.

Note

Uses custom serialization logic (not centralized serialize_value) to ensure: - Deterministic dict key ordering for consistent hashing - Compact format without __type__ markers - Recursive handling of nested objects Changing this serialization format would invalidate all cached results.

classmethod from_dict(data)[source]

Reconstruct environment from dictionary.

Dynamically imports the environment class and instantiates it with the saved parameters.

Parameters:

data (Dict[str, Any]) – Dictionary containing environment serialization data with keys: class, module, params, config_id

Return type:

Environment

Returns:

Reconstructed environment instance

Raises:
  • ImportError – If environment class cannot be imported

  • ValueError – If required data fields are missing

  • TypeError – If parameters are invalid for environment constructor

Example

>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> env_dict = env.to_dict()
>>> reconstructed_env = Environment.from_dict(env_dict)
>>> reconstructed_env.discount_factor
0.95
get_metric_names()[source]

Get names of environment-specific metrics.

This method returns the names of custom metrics that this environment computes in the compute_metrics() method. It enables users to discover what metrics are available for hyperparameter optimization.

Return type:

List[str]

Returns:

List of metric names that this environment produces. Default implementation returns empty list for environments without custom metrics.

Note

Subclasses that override compute_metrics() should also override this method to return the names of metrics they produce. Use an Enum to ensure consistency between the names returned here and the names used in compute_metrics().

abstractmethod initial_observation_dist()[source]

Get the initial observation distribution.

Return type:

Distribution

Returns:

Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

abstractmethod initial_state_dist()[source]

Get the initial state distribution.

Return type:

Distribution

Returns:

Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

abstractmethod is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:
  • observation1 (Any) – First observation to compare

  • observation2 (Any) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

abstractmethod is_terminal(state)[source]

Check if a state is terminal.

Parameters:

state (Any) – State to check for terminal condition

Return type:

bool

Returns:

True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

property logger: Logger

Get logger instance for this environment.

The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.

Returns:

Configured logger instance with hierarchical naming

abstractmethod observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:
  • next_state (Any) – The resulting state after taking an action

  • action (Any) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

abstractmethod reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:
  • state (Any) – Current state

  • action (Any) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

reward_batch(states, action)[source]

Calculate rewards for a batch of states given a single action.

Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.

Parameters:
  • states (Union[ndarray, Sequence[Any]]) – Sequence of states of length N.

  • action (Any) – Action executed from each state.

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

sample_next_step(state, action)[source]

Sample a complete state transition step.

This convenience method combines state transition, observation generation, and reward calculation in a single operation.

Parameters:
  • state (Any) – Current state

  • action (Any) – Action to execute

Returns:

  • next_state: Sampled next state

  • next_observation: Sampled observation

  • reward: Immediate reward

Return type:

Tuple[Any, Any, float]

abstractmethod state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:
  • state (Any) – Current state

  • action (Any) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

to_dict()[source]

Serialize environment to dictionary format.

Extracts environment class information and constructor parameters to enable JSON serialization and reconstruction.

Returns:

  • class: Full class path (module.ClassName)

  • module: Module name

  • params: Constructor parameters

  • config_id: Deterministic configuration identifier

Return type:

Dict[str, Any]

Example

>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> env_dict = env.to_dict()
>>> 'class' in env_dict and 'params' in env_dict
True

Note

Uses centralized serialization system with registered SpaceInfo handler.

Space Types

Environments can have different action and observation space types:

class POMDPPlanners.core.environment.SpaceType(*values)[source]

Enumeration for categorizing action and observation spaces.

This enum is used to classify the mathematical structure of action and observation spaces in POMDP environments.

DISCRETE

Finite, countable spaces (e.g., {0, 1, 2, …})

CONTINUOUS

Real-valued continuous spaces (e.g., R^n)

MIXED

Combination of discrete and continuous elements

CONTINUOUS = 'continuous'
DISCRETE = 'discrete'
MIXED = 'mixed'

Key Environment Features

State Spaces
  • Discrete states (Tiger, Sanity)

  • Continuous states (CartPole, Mountain Car, Light-Dark)

  • Mixed representations

Action Spaces
  • Discrete actions (Tiger: listen/open doors)

  • Continuous actions (Light-Dark: movement vectors)

  • Hybrid approaches

Observation Models
  • Perfect observability (Sanity POMDP)

  • Noisy observations (Tiger: 85% accuracy)

  • Position-dependent noise (Light-Dark)

  • Sensor noise models (CartPole, Mountain Car)

Reward Structures
  • Goal-reaching rewards

  • Action costs

  • Safety penalties

  • Shaped rewards for learning

Creating Custom Environments

To create a custom environment, inherit from the base Environment class and implement the required methods:

from POMDPPlanners.core.environment import Environment
import numpy as np

class MyCustomPOMDP(Environment):
    def __init__(self, discount_factor=0.95):
        super().__init__(discount_factor)
        # Initialize your environment

    def get_states(self):
        # Return list of possible states
        return ["state1", "state2", "state3"]

    def get_actions(self):
        # Return list of possible actions
        return ["action1", "action2"]

    def initial_state_dist(self):
        # Return initial state distribution
        pass

    def state_transition_model(self, state, action):
        # Return state transition model
        pass

    def observation_model(self, next_state, action):
        # Return observation model
        pass

    def reward(self, state, action):
        # Return reward for state-action pair
        pass

    def is_terminal(self, state):
        # Return whether state is terminal
        return False

See Also

  • ../examples/environments - Environment usage examples

  • Planners - Planning algorithms for these environments

  • ../api/environments - Complete API reference