POMDPPlanners.environments.light_dark_pomdp package
Subpackages
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs package
ContinuousLightDarkDistanceBasedVectorizedUpdaterContinuousLightDarkNoObsInDarkVectorizedUpdaterContinuousLightDarkVectorizedUpdaterContinuousLightDarkVectorizedUpdater.state_transition_distContinuousLightDarkVectorizedUpdater.obs_dist_near_beaconContinuousLightDarkVectorizedUpdater.obs_dist_far_from_beaconContinuousLightDarkVectorizedUpdater.beaconsContinuousLightDarkVectorizedUpdater.beacon_radiusContinuousLightDarkVectorizedUpdater.grid_sizeContinuousLightDarkVectorizedUpdater.batch_observation_log_likelihood()ContinuousLightDarkVectorizedUpdater.batch_transition()ContinuousLightDarkVectorizedUpdater.config_idContinuousLightDarkVectorizedUpdater.from_environment()
DiscreteLightDarkDistanceBasedVectorizedUpdaterDiscreteLightDarkNoObsInDarkVectorizedUpdaterDiscreteLightDarkVectorizedUpdaterDiscreteLightDarkVectorizedUpdater.transition_error_probDiscreteLightDarkVectorizedUpdater.observation_error_probDiscreteLightDarkVectorizedUpdater.beaconsDiscreteLightDarkVectorizedUpdater.beacon_radiusDiscreteLightDarkVectorizedUpdater.grid_sizeDiscreteLightDarkVectorizedUpdater.actionsDiscreteLightDarkVectorizedUpdater.action_to_vectorDiscreteLightDarkVectorizedUpdater.batch_observation_log_likelihood()DiscreteLightDarkVectorizedUpdater.batch_transition()DiscreteLightDarkVectorizedUpdater.config_idDiscreteLightDarkVectorizedUpdater.from_environment()
GaussianBeliefUpdaterTypecreate_continuous_light_dark_belief()create_continuous_light_dark_gaussian_belief()create_discrete_light_dark_belief()- Submodules
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs.continuous_light_dark_belief_factory module
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs.continuous_light_dark_gaussian_beliefs module
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs.continuous_light_dark_vectorized_updater module
ContinuousLightDarkDistanceBasedVectorizedUpdaterContinuousLightDarkNoObsInDarkVectorizedUpdaterContinuousLightDarkVectorizedUpdaterContinuousLightDarkVectorizedUpdater.state_transition_distContinuousLightDarkVectorizedUpdater.obs_dist_near_beaconContinuousLightDarkVectorizedUpdater.obs_dist_far_from_beaconContinuousLightDarkVectorizedUpdater.beaconsContinuousLightDarkVectorizedUpdater.beacon_radiusContinuousLightDarkVectorizedUpdater.grid_sizeContinuousLightDarkVectorizedUpdater.batch_observation_log_likelihood()ContinuousLightDarkVectorizedUpdater.batch_transition()ContinuousLightDarkVectorizedUpdater.config_idContinuousLightDarkVectorizedUpdater.from_environment()
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs.discrete_light_dark_belief_factory module
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_beliefs.discrete_light_dark_vectorized_updater module
DiscreteLightDarkDistanceBasedVectorizedUpdaterDiscreteLightDarkNoObsInDarkVectorizedUpdaterDiscreteLightDarkVectorizedUpdaterDiscreteLightDarkVectorizedUpdater.transition_error_probDiscreteLightDarkVectorizedUpdater.observation_error_probDiscreteLightDarkVectorizedUpdater.beaconsDiscreteLightDarkVectorizedUpdater.beacon_radiusDiscreteLightDarkVectorizedUpdater.grid_sizeDiscreteLightDarkVectorizedUpdater.actionsDiscreteLightDarkVectorizedUpdater.action_to_vectorDiscreteLightDarkVectorizedUpdater.batch_observation_log_likelihood()DiscreteLightDarkVectorizedUpdater.batch_transition()DiscreteLightDarkVectorizedUpdater.config_idDiscreteLightDarkVectorizedUpdater.from_environment()
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils package
- Submodules
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp module
BaseLightDarkPOMDPBaseLightDarkPOMDP.cache_visualization()BaseLightDarkPOMDP.compute_metrics()BaseLightDarkPOMDP.config_idBaseLightDarkPOMDP.initial_observation_dist()BaseLightDarkPOMDP.initial_state_dist()BaseLightDarkPOMDP.is_equal_observation()BaseLightDarkPOMDP.is_terminal()BaseLightDarkPOMDP.observation_model()BaseLightDarkPOMDP.reward()BaseLightDarkPOMDP.state_transition_model()BaseLightDarkPOMDP.visualize_path()
BaseLightDarkPOMDPDiscreteActions
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models module
BaseContinuousLightDarkObservationModelBaseDiscreteLightDarkObservationModelBaseDiscreteLightDarkObservationModel.beaconsBaseDiscreteLightDarkObservationModel.obstaclesBaseDiscreteLightDarkObservationModel.beacon_radiusBaseDiscreteLightDarkObservationModel.observation_error_probBaseDiscreteLightDarkObservationModel.actionsBaseDiscreteLightDarkObservationModel.action_to_vectorBaseDiscreteLightDarkObservationModel.near_beaconBaseDiscreteLightDarkObservationModel.probability()BaseDiscreteLightDarkObservationModel.sample()
ContinuousLightDarkDistanceBasedObservationModelContinuousLightDarkNormalNoiseNoObsInDarkObservationModelContinuousLightDarkNormalNoiseObservationModelDiscreteLDDistanceBasedObservationModelDiscreteLDObservationModelDiscreteLDObservationModelNoObsInDark
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models module
- POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer module
Submodules
POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp module
Continuous Light-Dark POMDP Environment Implementation.
This module implements the continuous Light-Dark domain, a classic POMDP benchmark where an agent must navigate to a goal position in a continuous 2D space while dealing with position-dependent observation noise.
The Continuous Light-Dark POMDP features: - Continuous 2D state space representing agent position - Discrete or continuous action space for movement - Light source at a specific location that affects observation quality - Observation noise that decreases closer to the light source - Goal region that agent must reach to maximize reward - Optional obstacles that cause negative rewards when hit
Key characteristics: - State: [x, y] position in continuous 2D space - Actions: Movement vectors or discrete directions - Observations: Noisy position estimates (noise depends on distance from light) - Rewards: Goal reaching bonus, movement costs, obstacle penalties - Multiple reward model variants available
- Classes:
RewardModelType: Enumeration of available reward model types StateTransitionModel: Continuous movement with Gaussian noise ContinuousLightDarkPOMDP: Main environment class ContinuousLightDarkPOMDPDiscreteActions: Discrete action variant
- class POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp.ContinuousLightDarkPOMDP(discount_factor, name='ContinuousLightDarkPOMDP', state_transition_cov_matrix=array([[0.05, 0.], [0., 0.05]]), observation_cov_matrix=array([[0.05, 0.], [0., 0.05]]), beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, fuel_cost=2.0, grid_size=11, goal_state_radius=1.5, beacon_radius=1.0, obstacle_radius=1.5, reward_model_type=RewardModelType.STANDARD, observation_model_type=ObservationModelType.NORMAL_NOISE, penalty_decay=1.0, is_obstacle_hit_terminal=True)[source]
Bases:
BaseLightDarkPOMDPContinuous Light-Dark POMDP environment with continuous actions.
This environment extends the base Light-Dark problem to continuous 2D space with continuous action vectors. The agent navigates toward a goal while dealing with position-dependent observation noise and optional obstacles.
Key features: - Continuous 2D state and action spaces - Light beacons reduce observation noise when nearby - Multiple observation models available (normal noise, normal noise with no observation in dark) - Multiple reward models available (standard, decaying hit probability, dangerous states) - Optional obstacles with configurable hit penalties - Terminal conditions for goal reaching, obstacle hits, and boundary violations
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = ContinuousLightDarkPOMDP( ... discount_factor=0.95, ... goal_state=np.array([10, 5]), ... start_state=np.array([0, 5]) ... ) >>> >>> # Get initial state >>> initial_state = env.initial_state_dist().sample()[0] >>> >>> # Sample complete step (action must be provided based on environment type) >>> action = np.array([1.0, 0.0]) # Move right >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
discount_factor (float)
name (str)
state_transition_cov_matrix (ndarray)
observation_cov_matrix (ndarray)
goal_state (ndarray)
start_state (ndarray)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
fuel_cost (float)
grid_size (int)
goal_state_radius (float)
beacon_radius (float)
obstacle_radius (float)
reward_model_type (RewardModelType)
observation_model_type (ObservationModelType)
penalty_decay (float)
is_obstacle_hit_terminal (bool)
- compute_metrics(histories)[source]
Compute environment-specific metrics from episode histories.
This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
ndarray) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp.ContinuousLightDarkPOMDPDiscreteActions(discount_factor, state_transition_cov_matrix=array([[1., 0.], [0., 1.]]), observation_cov_matrix=array([[1., 0.], [0., 1.]]), obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, fuel_cost=2.0, grid_size=11, goal_state_radius=1.5, beacon_radius=1.0, obstacle_radius=1.5, name='ContinuousLightDarkPOMDPDiscreteActions', beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], reward_model_type=RewardModelType.STANDARD, observation_model_type=ObservationModelType.NORMAL_NOISE, penalty_decay=1.0, is_obstacle_hit_terminal=True)[source]
Bases:
ContinuousLightDarkPOMDP,DiscreteActionsEnvironmentContinuous Light-Dark POMDP environment with discrete actions.
This variant of the Continuous Light-Dark POMDP uses discrete directional actions (up, down, left, right) instead of continuous action vectors. The continuous state space and observation model are preserved.
Actions are mapped to unit vectors: - “up”: [0, 1] - “down”: [0, -1] - “right”: [1, 0] - “left”: [-1, 0]
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = ContinuousLightDarkPOMDPDiscreteActions( ... discount_factor=0.95, ... goal_state=np.array([10, 5]), ... start_state=np.array([0, 5]) ... ) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
discount_factor (float)
state_transition_cov_matrix (ndarray)
observation_cov_matrix (ndarray)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
fuel_cost (float)
grid_size (int)
goal_state_radius (float)
beacon_radius (float)
obstacle_radius (float)
name (str)
goal_state (ndarray)
start_state (ndarray)
reward_model_type (RewardModelType)
observation_model_type (ObservationModelType)
penalty_decay (float)
is_obstacle_hit_terminal (bool)
- get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp.ContinuousLightDarkPOMDPMetrics(*values)[source]
Bases:
EnumMetric names for Continuous Light-Dark POMDP environment.
- AVG_DANGEROUS_STATES_COUNTER = 'avg_dangerous_states_counter'
- AVG_OBSTACLE_HIT_COUNTER = 'avg_obstacle_hit_counter'
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- OBSTACLE_HIT_RATE = 'obstacle_hit_rate'
- OUT_OF_GRID_RATE = 'out_of_grid_rate'
- class POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp.ContinuousLightDarkStateTransitionModel(state, action, state_dist)[source]
Bases:
StateTransitionModelState transition model for Continuous Light-Dark POMDP.
This model implements continuous movement in 2D space with Gaussian noise. The agent’s next position is determined by adding the action vector to the current position, with additional Gaussian noise to model uncertainty.
- Parameters:
state (ndarray)
action (ndarray)
state_dist (CovarianceParameterizedMultivariateNormal)
- state
Current 2D position [x, y]
- action
Movement vector [dx, dy]
- mean
Expected next position (state + action)
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> # Define current position and movement action >>> state = np.array([3.0, 4.0]) # Current position >>> action = np.array([1.0, 0.5]) # Move right and slightly up >>> >>> # Define movement noise >>> cov_matrix = np.eye(2) * 0.1 # Small movement noise >>> state_dist = CovarianceParameterizedMultivariateNormal(cov_matrix) >>> >>> # Create transition model >>> transition = ContinuousLightDarkStateTransitionModel( ... state=state, ... action=action, ... state_dist=state_dist ... ) >>> >>> # Sample next position with noise >>> next_position = transition.sample()[0] >>> # Returns position around [4.0, 4.5] ± noise >>> >>> # Calculate probability of specific next position >>> prob = transition.probability([next_position])
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[ndarray]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
POMDPPlanners.environments.light_dark_pomdp.discrete_light_dark_pomdp module
- class POMDPPlanners.environments.light_dark_pomdp.discrete_light_dark_pomdp.DiscreteLightDarkPOMDP(discount_factor, name='DiscreteLightDarkPOMDP', transition_error_prob=0.05, observation_error_prob=0.05, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11, is_stochastic_reward=True, observation_model_type=ObservationModelType.NORMAL)[source]
Bases:
BaseLightDarkPOMDPDiscreteActions,DiscreteActionsEnvironmentDiscrete Light-Dark POMDP Environment for Robot Navigation with Observation Uncertainty.
This environment implements a discretized version of the classic Light-Dark POMDP problem, where a robot must navigate from a start position to a goal position in a grid world with beacons and obstacles. The key challenge is that the robot’s observation quality depends on its distance from beacons - closer to beacons means more accurate observations.
Problem Description: The robot operates in a discrete grid world where it can move in four cardinal directions. The environment includes: - Beacons: Fixed positions that provide location reference with varying accuracy - Obstacles: Grid cells that incur penalties when hit - Goal: Target position that provides high reward when reached - Observation uncertainty: Decreases with proximity to beacons (light areas)
Key Features: - Discrete state space: Robot positions are restricted to grid cells - Discrete action space: North, South, East, West movements - Multiple observation models available (normal, no observation in dark) - Distance-dependent observation accuracy: Closer to beacons = better observations - Stochastic transitions: Actions may fail with configurable probability - Obstacle avoidance: Penalties for hitting obstacles during navigation - Configurable environment parameters: Grid size, beacon positions, obstacles
State Space: - 2D grid coordinates (x, y) representing robot position - Bounded by grid_size parameter (default: 11x11 grid)
Action Space: - Discrete actions: [‘North’, ‘South’, ‘East’, ‘West’] - Each action moves robot one grid cell in the corresponding direction - Boundary conditions: Actions that would move outside grid are blocked
Observation Space: - Discrete observations based on beacon proximity and noise - Observation accuracy improves with proximity to beacons - Stochastic observation errors controlled by observation_error_prob
Reward Structure: - Goal reward: Large positive reward for reaching the goal state - Obstacle penalty: Negative reward for hitting obstacles - Fuel cost: Small negative reward for each movement action - Distance-based penalties: Encourage efficient navigation
- Parameters:
discount_factor (float)
name (str)
transition_error_prob (float)
observation_error_prob (float)
goal_state (ndarray)
start_state (ndarray)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
beacon_radius (float)
fuel_cost (float)
grid_size (int)
is_stochastic_reward (bool)
observation_model_type (ObservationModelType)
- transition_error_prob
Probability that an action fails (results in different movement)
- observation_error_prob
Probability of observation noise/error
- is_stochastic_reward
Whether rewards include stochastic components
- beacons
List of (x, y) beacon positions that provide navigation references
- goal_state
Target position (x, y) that robot should reach
- start_state
Initial robot position (x, y)
- obstacles
List of (x, y) obstacle positions to avoid
- grid_size
Dimension of the square grid world
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = DiscreteLightDarkPOMDP( ... discount_factor=0.95, ... transition_error_prob=0.1, ... observation_error_prob=0.15, ... beacons=[(1, 1), (2, 2)], ... grid_size=11 ... ) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
References: - Platt, R., et al. “Belief space planning assuming maximum likelihood observations.” (2010) - Kurniawati, H., et al. “SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces.” (2008) - Light-Dark domain: Classic POMDP benchmark for testing observation uncertainty
- compute_metrics(histories)[source]
Compute environment-specific metrics from episode histories.
This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
ndarray) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- sample_next_step(state, action)[source]
Sample a complete state transition step.
This convenience method combines state transition, observation generation, and reward calculation in a single operation.
- class POMDPPlanners.environments.light_dark_pomdp.discrete_light_dark_pomdp.DiscreteLightDarkPOMDPMetrics(*values)[source]
Bases:
EnumMetric names for Discrete Light-Dark POMDP environment.
- AVG_DANGEROUS_STATES_COUNTER = 'avg_dangerous_states_counter'
- AVG_OBSTACLE_HIT_COUNTER = 'avg_obstacle_hit_counter'
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- OBSTACLE_HIT_RATE = 'obstacle_hit_rate'
- OUT_OF_GRID_RATE = 'out_of_grid_rate'