POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils package
Submodules
POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp module
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDP(discount_factor, name, space_info, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, obstacle_radius=1.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]
Bases:
Environment,ABC- Parameters:
- cache_visualization(history, cache_path)[source]
Cache visualization of agent’s path and belief.
- Parameters:
- Raises:
TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.
ValueError – If history is empty or contains invalid data.
- Return type:
- abstractmethod compute_metrics(histories)[source]
Compute environment-specific metrics from episode histories.
This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.
- property config_id: str
Generate a deterministic identifier based on environment configuration. This implementation ensures that the config_id is invariant to the order of beacons and obstacles.
- initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- abstractmethod is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
ndarray) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- abstractmethod observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- abstractmethod reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- abstractmethod state_transition_model(state, action)[source]
Get the state transition model for a given state-action pair.
- Parameters:
- Return type:
- Returns:
State transition model that can sample next states
Note
Subclasses must implement this method to define state dynamics.
- visualize_path(path, agent_belief_path, actions, cache_path)[source]
Create and save an animated visualization of the agent’s path.
- Parameters:
path (
List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.agent_belief_path (
List[DiscreteDistribution]) – List of belief distributions at each step.cache_path (
Path) – Path where to save the visualization (must end with .gif).
- Raises:
TypeError – If cache_path is not a Path object.
ValueError – If cache_path doesn’t end with .gif.
- Return type:
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.base_light_dark_pomdp.BaseLightDarkPOMDPDiscreteActions(discount_factor, name, is_discrete_observations, reward_range=None, beacons=[(0, 0), (0, 5), (0, 10), (5, 0), (5, 5), (5, 10), (10, 0), (10, 5), (10, 10)], goal_state=array([10, 5]), start_state=array([0, 5]), obstacles=[(3, 7), (5, 5)], obstacle_hit_probability=0.2, obstacle_reward=-10.0, goal_reward=10.0, beacon_radius=1.0, fuel_cost=2.0, grid_size=11)[source]
Bases:
BaseLightDarkPOMDP
POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models module
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseContinuousLightDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]
Bases:
ObservationModel- Parameters:
next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)
- abstractmethod sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.BaseDiscreteLightDarkObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]
Bases:
ObservationModelBase class for discrete Light-Dark observation models.
This base class provides common functionality for discrete observation models, including beacon proximity detection, action-to-vector mapping, and distribution creation logic.
- Parameters:
- beacons
Array of beacon positions
- obstacles
Array of obstacle positions
- beacon_radius
Radius within which a beacon is considered “near”
- observation_error_prob
Base probability of observation error
- actions
List of possible actions
- action_to_vector
Mapping from action names to direction vectors
- near_beacon
Boolean indicating if next_state is near a beacon
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkDistanceBasedObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]
Bases:
BaseContinuousLightDarkObservationModelContinuous Light-Dark observation model with binary near/far beacon noise levels.
This observation model uses a binary near/far approach based on the distance to the nearest beacon. When within beacon_radius, observations are sampled from the near-beacon distribution. When the distance exceeds beacon_radius, observations are “None” (no observation available).
- Parameters:
next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)
- min_distance_to_beacon
Distance to the nearest beacon
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[Union[ndarray,str]]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]
Bases:
BaseContinuousLightDarkObservationModel- Parameters:
next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[Union[ndarray,str]]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.ContinuousLightDarkNormalNoiseObservationModel(next_state, action, obs_dist_near_beacon, obs_dist_far_from_beacon, grid_size, beacons, beacon_radius)[source]
Bases:
BaseContinuousLightDarkObservationModel- Parameters:
next_state (ndarray)
action (ndarray)
obs_dist_near_beacon (CovarianceParameterizedMultivariateNormal)
obs_dist_far_from_beacon (CovarianceParameterizedMultivariateNormal)
grid_size (int)
beacons (ndarray)
beacon_radius (float)
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[ndarray]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDDistanceBasedObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]
Bases:
BaseDiscreteLightDarkObservationModelDiscrete Light-Dark observation model with continuous distance-based error probability.
This observation model scales the observation error probability continuously based on the distance to the nearest beacon, rather than using a binary threshold. The error probability scales linearly from a minimum value (when at beacon) to the base value (when at beacon_radius distance). When the distance exceeds beacon_radius, observations are “None” (no observation available).
- The scaling formula is:
error_factor = min_factor + (1 - min_factor) * (distance / beacon_radius) error_prob(d) = base_error_prob * error_factor (only when distance <= beacon_radius)
- Where:
min_factor = 0.2 (error probability is reduced to 20% when at beacon)
distance = distance to nearest beacon
At distance 0: error_prob = 0.2 * base_error_prob
At distance beacon_radius: error_prob = 1.0 * base_error_prob
Beyond beacon_radius: observation = “None”
- Parameters:
- distribution
DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon
- min_distance_to_beacon
Distance to the nearest beacon
- probability(values)[source]
Calculate probability of given observation values.
- Parameters:
values (
List[Union[Any,str]]) – List of observation values to calculate probabilities for. Can include “None” values.- Returns:
If value is “None” and near beacon: probability is 0
If value is “None” and far from beacon: probability is 1
If value is actual observation: probability from distribution (if near beacon) or 0 (if far)
- Return type:
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModel(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]
Bases:
BaseDiscreteLightDarkObservationModelDiscrete Light-Dark observation model with distance-dependent error probability.
This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, the observation error probability is reduced, making observations more accurate.
- Parameters:
- distribution
DiscreteDistribution for sampling observations
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_observation_models.DiscreteLDObservationModelNoObsInDark(next_state, action, beacons, obstacles, beacon_radius, observation_error_prob)[source]
Bases:
BaseDiscreteLightDarkObservationModelDiscrete Light-Dark observation model that returns “None” when not near beacons.
This observation model provides discrete observations based on the robot’s position relative to beacons. When near beacons, observations are sampled from a discrete distribution. When far from beacons, observations are “None” (no observation available).
Similar to ContinuousLightDarkNormalNoiseNoObsInDarkObservationModel but for discrete observations using DiscreteDistribution instead of continuous multivariate normal.
- Parameters:
- distribution
DiscreteDistribution for sampling observations (only used when near beacon), None when far from beacon
- probability(values)[source]
Calculate probability of given observation values.
- Parameters:
values (
List[Union[Any,str]]) – List of observation values to calculate probabilities for. Can include “None” values.- Returns:
If value is “None” and near beacon: probability is 0
If value is “None” and far from beacon: probability is 1
If value is actual observation: probability from distribution (if near beacon) or 0 (if far)
- Return type:
POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models module
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.BaseLightDarkRewardModel[source]
Bases:
ABC
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLDDangerousStatesRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkDecayingHitProbabilityRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost, penalty_decay)[source]
Bases:
BaseLightDarkRewardModel- Parameters:
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_reward_models.ContinuousLightDarkRewardModel(goal_state, obstacles, goal_state_radius, obstacle_radius, grid_size, obstacle_hit_probability, obstacle_reward, goal_reward, fuel_cost)[source]
Bases:
BaseLightDarkRewardModel- Parameters:
POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer module
- class POMDPPlanners.environments.light_dark_pomdp.light_dark_pomdp_utils.light_dark_visualizer.LightDarkPOMDPVisualizer(environment)[source]
Bases:
objectVisualizer for Light-Dark POMDP environments.
Handles all visualization and animation logic for Light-Dark POMDP environments, including path visualization, belief particle rendering, and animation generation.
- Parameters:
environment (Any)
- environment
The Light-Dark POMDP environment instance to visualize.
- cache_visualization(history, cache_path)[source]
Cache visualization of agent’s path and belief.
- Parameters:
- Raises:
TypeError – If history is not a List or contains non-StepData objects, or if cache_path is not a Path object.
ValueError – If history is empty or contains invalid data.
- Return type:
- visualize_path(path, agent_belief_path, actions, cache_path)[source]
Create and save an animated visualization of the agent’s path.
- Parameters:
path (
List[ndarray]) – List of state positions (2D numpy arrays) along the agent’s trajectory.agent_belief_path (
List[DiscreteDistribution]) – List of belief distributions at each step.cache_path (
Path) – Path where to save the visualization (must end with .gif).
- Raises:
TypeError – If cache_path is not a Path object.
ValueError – If cache_path doesn’t end with .gif.