POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils package

Submodules

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDP(discount_factor, name, space_info, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, obstacle_radius=1.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]

Bases: Environment, ABC

Parameters:
cache_visualization(history, cache_path)[source]

Cache visualization of agent’s path and belief.

Parameters:
  • history (List[StepData]) – List of step data from an episode.

  • cache_path (Path) – Path where to save the visualization.

Raises:
  • TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.

  • ValueError – If history is empty or contains invalid data.

Return type:

None

abstractmethod compute_metrics(histories)[source]

Compute environment-specific metrics from episode histories.

This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.

Parameters:

histories (List[History]) – List of episode histories to analyze

Return type:

List[MetricValue]

Returns:

List of computed metrics with confidence intervals

property config_id: str

Generate a deterministic identifier based on environment configuration. This implementation ensures that the config_id is invariant to the order of beacons and obstacles.

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:

Distribution

Returns:

Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:

Distribution

Returns:

Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:
  • observation1 (Any) – First observation to compare

  • observation2 (Any) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

abstractmethod is_terminal(state)[source]

Check if a state is terminal.

Parameters:

state (ndarray) – State to check for terminal condition

Return type:

bool

Returns:

True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

abstractmethod observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:
  • next_state (ndarray) – The resulting state after taking an action

  • action (Any) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

abstractmethod reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:
  • state (ndarray) – Current state

  • action (Any) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

abstractmethod state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:
  • state (ndarray) – Current state

  • action (Any) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

visualize_path(path, agent_belief_path, actions, cache_path)[source]

Create and save an animated visualization of the agent’s path.

Parameters:
  • path (List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.

  • agent_belief_path (List[DiscreteDistribution]) – List of belief distributions at each step.

  • actions (List[str]) – List of actions taken at each step.

  • cache_path (Path) – Path where to save the visualization (must end with .gif).

Raises:
  • TypeError – If cache_path is not a Path object.

  • ValueError – If cache_path doesn’t end with .gif.

Return type:

None

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDPDiscreteActions(discount_factor, name, is_discrete_observations, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]

Bases: BaseLightDarkPOMDP

Parameters:
get_actions()[source]
Return type:

List[Any]

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseContinuousLightDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: ObservationModel

Parameters:
abstractmethod sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:

n_samples (int) – Number of observation samples to generate. Defaults to 1.

Return type:

List[Any]

Returns:

List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseDiscreteLightDarkObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: ObservationModel

Base class for discrete Light-Dark observation models.

This base class provides common functionality for discrete observation models, including beacon proximity detection, action-to-vector mapping, and distribution creation logic.

Parameters:
beacons

Array of beacon positions

obstacles

Array of obstacle positions

beacon_radius

Radius within which a beacon is considered “near”

observation_error_prob

Base probability of observation error

actions

List of possible actions

action_to_vector

Mapping from action names to direction vectors

near_beacon

Boolean indicating if next_state is near a beacon

abstractmethod probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Any]) – List of observation values to calculate probabilities for

Return type:

ndarray

Returns:

Array of probabilities for each value

abstractmethod sample(n_samples=1)[source]

Sample observations from the distribution.

Parameters:

n_samples (int) – Number of samples to generate

Return type:

List[Any]

Returns:

List of sampled observations

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkDistanceBasedObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Continuous Light-Dark observation model with binary near/far beacon noise levels.

This observation model uses a binary near/far approach based on the distance to the nearest beacon. When within beacon_radius, observations are sampled from the near-beacon distribution. When the distance exceeds beacon_radius, observations are “None” (no observation available).

Parameters:
min_distance_to_beacon

Distance to the nearest beacon

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:

values (List[Union[ndarray, str]]) – List of observation values to calculate probabilities for

Return type:

ndarray

Returns:

Array of probabilities corresponding to the input values

Raises:

NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:

n_samples (int) – Number of observation samples to generate. Defaults to 1.

Return type:

List[Union[ndarray, str]]

Returns:

List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Parameters:
probability(values)[source]

Calculate observation probabilities for given values.

Parameters:

values (List[Union[ndarray, str]]) – List of observation values to calculate probabilities for

Return type:

ndarray

Returns:

Array of probabilities corresponding to the input values

Raises:

NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:

n_samples (int) – Number of observation samples to generate. Defaults to 1.

Return type:

List[Union[ndarray, str]]

Returns:

List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]

Bases: BaseContinuousLightDarkObservationModel

Parameters:
probability(values)[source]

Calculate observation probabilities for given values.

Parameters:

values (List[ndarray]) – List of observation values to calculate probabilities for

Return type:

ndarray

Returns:

Array of probabilities corresponding to the input values

Raises:

NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:

n_samples (int) – Number of observation samples to generate. Defaults to 1.

Return type:

List[ndarray]

Returns:

List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDDistanceBasedObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model with continuous distance-based error probability.

This observation model scales the observation error probability continuously based on the distance to the nearest beacon, rather than using a binary threshold. The error probability scales linearly from a minimum value (when at beacon) to the base value (when at beacon_radius distance). When the distance exceeds beacon_radius, observations are “None” (no observation available).

The scaling formula is:

error_factor = min_factor + (1 - min_factor) * (distance / beacon_radius) error_prob(d) = base_error_prob * error_factor (only when distance <= beacon_radius)

Where:
  • min_factor = 0.2 (error probability is reduced to 20% when at beacon)

  • distance = distance to nearest beacon

  • At distance 0: error_prob = 0.2 * base_error_prob

  • At distance beacon_radius: error_prob = 1.0 * base_error_prob

  • Beyond beacon_radius: observation = “None”

Parameters:
distribution

DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon

min_distance_to_beacon

Distance to the nearest beacon

probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Union[Any, str]]) – List of observation values to calculate probabilities for. Can include “None” values.

Returns:

  • If value is “None” and near beacon: probability is 0

  • If value is “None” and far from beacon: probability is 1

  • If value is actual observation: probability from distribution (if near beacon) or 0 (if far)

Return type:

ndarray

sample(n_samples=1)[source]

Sample observations from the discrete distribution or return “None”.

Parameters:

n_samples (int) – Number of samples to generate

Return type:

List[Union[ndarray, str]]

Returns:

List of sampled observation states when near beacon, or list of “None” when far from beacon

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model with distance-dependent error probability.

This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, the observation error probability is reduced, making observations more accurate.

Parameters:
distribution

DiscreteDistribution for sampling observations

probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Any]) – List of observation values to calculate probabilities for

Return type:

ndarray

Returns:

Array of probabilities for each value

sample(n_samples=1)[source]

Sample observations from the discrete distribution.

Parameters:

n_samples (int) – Number of samples to generate

Return type:

List[ndarray]

Returns:

List of sampled observation states

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModelNoObsInDark(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]

Bases: BaseDiscreteLightDarkObservationModel

Discrete Light-Dark observation model that returns “None” when not near beacons.

This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, observations are sampled from a discrete distribution. When far from beacons, observations are “None” (no observation available).

Similar to ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel but for discrete observations using DiscreteDistribution instead of continuous multivariate normal.

Parameters:
distribution

DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon

probability(values)[source]

Calculate probability of given observation values.

Parameters:

values (List[Union[Any, str]]) – List of observation values to calculate probabilities for. Can include “None” values.

Returns:

  • If value is “None” and near beacon: probability is 0

  • If value is “None” and far from beacon: probability is 1

  • If value is actual observation: probability from distribution (if near beacon) or 0 (if far)

Return type:

ndarray

sample(n_samples=1)[source]

Sample observations from the discrete distribution or return “None”.

Parameters:

n_samples (int) – Number of samples to generate

Return type:

List[Union[ndarray, str]]

Returns:

List of sampled observation states when near beacon, or list of “None” when far from beacon

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.BaseLightDarkRewardModel[source]

Bases: ABC

compute_reward(state, action)[source]
Return type:

float

Parameters:
compute_reward_batch(states, action)[source]
Return type:

ndarray

Parameters:
class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLDDangerousStatesRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]

Bases: ContinuousLightDarkRewardModel

Parameters:
class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkDecayingHitProbabilityRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost, penalty_decay)[source]

Bases: BaseLightDarkRewardModel

Parameters:
compute_reward_batch(states, action)[source]
Return type:

ndarray

Parameters:
class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]

Bases: BaseLightDarkRewardModel

Parameters:
compute_reward_batch(states, action)[source]
Return type:

ndarray

Parameters:

POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer module

class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer.LightDarkPOMDPVisualizer(environment)[source]

Bases: object

Visualizer for Light-Dark POMDP environments.

Handles all visualization and animation logic for Light-Dark POMDP environments, including path visualization, belief particle rendering, and animation generation.

Parameters:

environment (Any)

environment

The Light-Dark POMDP environment instance to visualize.

cache_visualization(history, cache_path)[source]

Cache visualization of agent’s path and belief.

Parameters:
  • history (List[StepData]) – List of step data from an episode.

  • cache_path (Path) – Path where to save the visualization.

Raises:
  • TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.

  • ValueError – If history is empty or contains invalid data.

Return type:

None

visualize_path(path, agent_belief_path, actions, cache_path)[source]

Create and save an animated visualization of the agent’s path.

Parameters:
  • path (List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.

  • agent_belief_path (List[DiscreteDistribution]) – List of belief distributions at each step.

  • actions (List[str]) – List of actions taken at each step.

  • cache_path (Path) – Path where to save the visualization (must end with .gif).

Raises:
  • TypeError – If cache_path is not a Path object.

  • ValueError – If cache_path doesn’t end with .gif.