POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils package

Submodules

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDP(discount_factor, name, space_info, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, obstacle_radius=1.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]

Bases: Environment, ABC

Parameters:

discount_factor (float)
name (str)
space_info (SpaceInfo)
reward_range (Tuple[float, float] | None)
beacons (List[Tuple[float, float]])
goal_state (ndarray)
start_state (ndarray)
obstacles (List[Tuple[float, float]])
obstacle_hit_probability (float)
obstacle_reward (float)
obstacle_radius (float)
goal_reward (float)
beacon_radius (float)
fuel_cost (float)
grid_size (int)

cache_visualization(history, cache_path)[source]

Cache visualization of agent’s path and belief.

Parameters:

history (List[StepData]) – List of step data from an episode.
cache_path (Path) – Path where to save the visualization.

Raises:

TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.
ValueError – If history is empty or contains invalid data.

Return type:

None

abstractmethod compute_metrics(histories)[source]

Compute environment-specific metrics from episode histories.

This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.

Parameters:: histories (List[History]) – List of episode histories to analyze
Return type:: List[MetricValue]
Returns:: List of computed metrics with confidence intervals

property config_id: str: Generate a deterministic identifier based on environment configuration. This implementation ensures that the config_id is invariant to the order of beacons and obstacles.

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:: Distribution
Returns:: Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:: Distribution
Returns:: Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:

observation1 (Any) – First observation to compare
observation2 (Any) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

abstractmethod is_terminal(state)[source]

Check if a state is terminal.

Parameters:: state (ndarray) – State to check for terminal condition
Return type:: bool
Returns:: True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

abstractmethod observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:

next_state (ndarray) – The resulting state after taking an action
action (Any) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

abstractmethod reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:

state (ndarray) – Current state
action (Any) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

abstractmethod state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:

state (ndarray) – Current state
action (Any) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

visualize_path(path, agent_belief_path, actions, cache_path)[source]

Create and save an animated visualization of the agent’s path.

Parameters:

path (List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.
agent_belief_path (List[DiscreteDistribution]) – List of belief distributions at each step.
actions (List[str]) – List of actions taken at each step.
cache_path (Path) – Path where to save the visualization (must end with .gif).

Raises:

TypeError – If cache_path is not a Path object.
ValueError – If cache_path doesn’t end with .gif.

Return type:

None

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDPDiscreteActions(discount_factor, name, is_discrete_observations, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]

Bases: BaseLightDarkPOMDP

Parameters:

discount_factor (float)
name (str)
is_discrete_observations (bool)
reward_range (Tuple[float, float] | None)
beacons (List[Tuple[float, float]])
goal_state (ndarray)
start_state (ndarray)
obstacles (List[Tuple[float, float]])
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
beacon_radius (float)
fuel_cost (float)
grid_size (int)

get_actions()[source]

Return type:: List[Any]

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseContinuousLightDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: ObservationModel

Parameters:

next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)

abstractmethod sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[Any]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseDiscreteLightDarkObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: ObservationModel

Base class for discrete Light-Dark observation models.

This base class provides common functionality for discrete observation models, including beacon proximity detection, action-to-vector mapping, and distribution creation logic.

Parameters:

next_state (ndarray)
action (Any)
beacons (ndarray)
obstacles (ndarray)
beacon_radius (float)
observation_error_prob (float)

beacons: Array of beacon positions

obstacles: Array of obstacle positions

beacon_radius: Radius within which a beacon is considered “near”

observation_error_prob: Base probability of observation error

actions: List of possible actions

action_to_vector: Mapping from action names to direction vectors

near_beacon: Boolean indicating if next_state is near a beacon

abstractmethod probability(values)[source]

Calculate probability of given observation values.

Parameters:: values (List[Any]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities for each value

abstractmethod sample(n_samples=1)[source]

Sample observations from the distribution.

Parameters:: n_samples (int) – Number of samples to generate
Return type:: List[Any]
Returns:: List of sampled observations

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkDistanceBasedObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Continuous Light-Dark observation model with binary near/far beacon noise levels.

This observation model uses a binary near/far approach based on the distance to the nearest beacon. When within beacon_radius, observations are sampled from the near-beacon distribution. When the distance exceeds beacon_radius, observations are “None” (no observation available).

Parameters:

next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)

min_distance_to_beacon: Distance to the nearest beacon

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[Union[ndarray, str]]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[Union[ndarray, str]]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Parameters:

next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[Union[ndarray, str]]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[Union[ndarray, str]]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Parameters:

next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[ndarray]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDDistanceBasedObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model with continuous distance-based error probability.

This observation model scales the observation error probability continuously based on the distance to the nearest beacon, rather than using a binary threshold. The error probability scales linearly from a minimum value (when at beacon) to the base value (when at beacon_radius distance). When the distance exceeds beacon_radius, observations are “None” (no observation available).

The scaling formula is:

error_factor = min_factor + (1 - min_factor) * (distance / beacon_radius) error_prob(d) = base_error_prob * error_factor (only when distance <= beacon_radius)

Where:

min_factor = 0.2 (error probability is reduced to 20% when at beacon)
distance = distance to nearest beacon
At distance 0: error_prob = 0.2 * base_error_prob
At distance beacon_radius: error_prob = 1.0 * base_error_prob
Beyond beacon_radius: observation = “None”

Parameters:

next_state (ndarray)
action (Any)
beacons (ndarray)
obstacles (ndarray)
beacon_radius (float)
observation_error_prob (float)

distribution: DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon

min_distance_to_beacon: Distance to the nearest beacon

probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Union[Any, str]]) – List of observation values to calculate probabilities for. Can include “None” values.

Returns:

If value is “None” and near beacon: probability is 0
If value is “None” and far from beacon: probability is 1
If value is actual observation: probability from distribution (if near beacon) or 0 (if far)

Return type:

ndarray

sample(n_samples=1)[source]

Sample observations from the discrete distribution or return “None”.

Parameters:: n_samples (int) – Number of samples to generate
Return type:: List[Union[ndarray, str]]
Returns:: List of sampled observation states when near beacon, or list of “None” when far from beacon

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model with distance-dependent error probability.

This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, the observation error probability is reduced, making observations more accurate.

Parameters:

next_state (ndarray)
action (Any)
beacons (ndarray)
obstacles (ndarray)
beacon_radius (float)
observation_error_prob (float)

distribution: DiscreteDistribution for sampling observations

probability(values)[source]

Calculate probability of given observation values.

Parameters:: values (List[Any]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities for each value

sample(n_samples=1)[source]

Sample observations from the discrete distribution.

Parameters:: n_samples (int) – Number of samples to generate
Return type:: List[ndarray]
Returns:: List of sampled observation states

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModelNoObsInDark(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model that returns “None” when not near beacons.

This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, observations are sampled from a discrete distribution. When far from beacons, observations are “None” (no observation available).

Similar to ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel but for discrete observations using DiscreteDistribution instead of continuous multivariate normal.

Parameters:

next_state (ndarray)
action (Any)
beacons (ndarray)
obstacles (ndarray)
beacon_radius (float)
observation_error_prob (float)

distribution: DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon

probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Union[Any, str]]) – List of observation values to calculate probabilities for. Can include “None” values.

Returns:

If value is “None” and near beacon: probability is 0
If value is “None” and far from beacon: probability is 1
If value is actual observation: probability from distribution (if near beacon) or 0 (if far)

Return type:

ndarray

sample(n_samples=1)[source]

Sample observations from the discrete distribution or return “None”.

Parameters:: n_samples (int) – Number of samples to generate
Return type:: List[Union[ndarray, str]]
Returns:: List of sampled observation states when near beacon, or list of “None” when far from beacon

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.BaseLightDarkRewardModel[source]

Bases: ABC

compute_reward(state, action)[source]

Return type:

float

Parameters:

state (ndarray)
action (ndarray)

compute_reward_batch(states, action)[source]

Return type:

ndarray

Parameters:

states (ndarray)
action (ndarray)

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLDDangerousStatesRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]

Bases: ContinuousLightDarkRewardModel

Parameters:

goal_state (ndarray)
obstacles (ndarray)
goal_state_radius (float)
obstacle_radius (float)
grid_size (int)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
fuel_cost (float)

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkDecayingHitProbabilityRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost, penalty_decay)[source]

Bases: BaseLightDarkRewardModel

Parameters:

goal_state (ndarray)
obstacles (ndarray)
goal_state_radius (float)
obstacle_radius (float)
grid_size (int)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
fuel_cost (float)
penalty_decay (float)

compute_reward_batch(states, action)[source]

Return type:

ndarray

Parameters:

states (ndarray)
action (ndarray)

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]

Bases: BaseLightDarkRewardModel

Parameters:

goal_state (ndarray)
obstacles (ndarray)
goal_state_radius (float)
obstacle_radius (float)
grid_size (int)
obstacle_hit_probability (float)
obstacle_reward (float)
goal_reward (float)
fuel_cost (float)

compute_reward_batch(states, action)[source]

Return type:

ndarray

Parameters:

states (ndarray)
action (ndarray)

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer.LightDarkPOMDPVisualizer(environment)[source]

Bases: object

Visualizer for Light-Dark POMDP environments.

Handles all visualization and animation logic for Light-Dark POMDP environments, including path visualization, belief particle rendering, and animation generation.

Parameters:: environment (Any)

environment: The Light-Dark POMDP environment instance to visualize.

cache_visualization(history, cache_path)[source]

Cache visualization of agent’s path and belief.

Parameters:

history (List[StepData]) – List of step data from an episode.
cache_path (Path) – Path where to save the visualization.

Raises:

TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.
ValueError – If history is empty or contains invalid data.

Return type:

None

visualize_path(path, agent_belief_path, actions, cache_path)[source]

Create and save an animated visualization of the agent’s path.

Parameters:

path (List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.
agent_belief_path (List[DiscreteDistribution]) – List of belief distributions at each step.
actions (List[str]) – List of actions taken at each step.
cache_path (Path) – Path where to save the visualization (must end with .gif).

Raises:

TypeError – If cache_path is not a Path object.
ValueError – If cache_path doesn’t end with .gif.