POMDPPlanners.environments.rock_sample_pomdp package
RockSample POMDP Environment Module.
This module provides the RockSample POMDP environment implementation and related components for robot navigation and sampling tasks.
- Classes:
RockSamplePOMDP: Main POMDP environment for rock sampling tasks RockSampleState: State representation with robot position and rock qualities RockSampleStateTransitionModel: State transition model for deterministic movements RockSampleObservationModel: Observation model with distance-dependent sensor noise RockSampleVisualizer: Visualization utilities for RockSample POMDP episodes
- class POMDPPlanners.environments.rock_sample_pomdp.RockSampleObservationModel(next_state, action, pomdp)[source]
Bases:
ObservationModelObservation model for RockSample POMDP.
- Parameters:
next_state (ndarray)
action (int)
pomdp (RockSamplePOMDP)
- class POMDPPlanners.environments.rock_sample_pomdp.RockSamplePOMDP(map_size=(5, 5), rock_positions=None, init_pos=(0, 0), sensor_efficiency=10.0, bad_rock_penalty=-10.0, good_rock_reward=10.0, step_penalty=0.0, sensor_use_penalty=0.0, exit_reward=10.0, dangerous_areas=None, dangerous_area_radius=1.0, dangerous_area_penalty=5.0, discount_factor=0.95, name='RockSample', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentRockSample POMDP environment
This environment implements the classic rock sampling problem where a robot must navigate a grid, use sensors to evaluate rocks, and decide which ones to sample while balancing exploration costs and sampling rewards.
- Parameters:
sensor_efficiency (float)
bad_rock_penalty (float)
good_rock_reward (float)
step_penalty (float)
sensor_use_penalty (float)
exit_reward (float)
dangerous_area_radius (float)
dangerous_area_penalty (float)
discount_factor (float)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)
- map_size
Grid dimensions as (rows, cols)
- rock_positions
List of rock positions as (row, col) tuples
- init_pos
Initial robot position
- sensor_efficiency
Sensor noise parameter (higher = less noise)
- bad_rock_penalty
Penalty for sampling a bad rock
- good_rock_reward
Reward for sampling a good rock
- step_penalty
Cost for each action
- sensor_use_penalty
Additional cost for using sensor
- exit_reward
Reward for reaching the exit
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = RockSamplePOMDP(map_size=(5, 5), rock_positions=[(0, 0), (2, 2), (3, 3)]) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- class POMDPPlanners.environments.rock_sample_pomdp.RockSampleStateTransitionModel(state, action, pomdp)[source]
Bases:
StateTransitionModelState transition model for RockSample POMDP.
- Parameters:
state (ndarray)
action (int)
pomdp (RockSamplePOMDP)
- class POMDPPlanners.environments.rock_sample_pomdp.RockSampleVectorizedUpdater(map_rows, map_cols, num_rocks, rock_positions, sensor_efficiency)[source]
Bases:
VectorizedParticleBeliefUpdaterVectorized particle belief updater for the RockSample POMDP.
Stores precomputed environment parameters and performs all-particle transitions and observation log-likelihood evaluations using NumPy operations. State layout per particle is
[robot_row, robot_col, rock_0_quality, ..., rock_{R-1}_quality].- Parameters:
- map_rows
Number of grid rows.
- map_cols
Number of grid columns.
- num_rocks
Number of rocks in the environment.
- rock_positions
Array of shape (R, 2) with rock (row, col) positions.
- sensor_efficiency
Sensor noise parameter (higher = less noise).
- batch_observation_log_likelihood(next_particles, action, observation)[source]
Compute observation log-likelihoods for all particles.
- classmethod from_environment(env)[source]
Construct an updater from a RockSamplePOMDP instance.
- Return type:
- Parameters:
env (RockSamplePOMDP)
- class POMDPPlanners.environments.rock_sample_pomdp.RockSampleVisualizer(env)[source]
Bases:
objectHandles visualization and animation for RockSample POMDP environments.
This class encapsulates all visualization logic for RockSample POMDP episodes, creating animated GIFs showing robot movement, rock sampling, sensor checks, dangerous areas, and exit behavior.
- Parameters:
env (RockSamplePOMDP)
- env
Reference to the RockSamplePOMDP environment instance
- map_size
Grid dimensions as (rows, cols)
- rock_positions
List of rock positions
- action_names
Names of available actions
- action_to_vector
Mapping from action indices to direction vectors
- dangerous_areas
List of dangerous area center positions
- dangerous_area_radius
Radius around dangerous area centers
- create_visualization(history, cache_path)[source]
Create animated visualization of a RockSample POMDP episode.
Creates an animated GIF showing the robot navigating, sampling rocks, using sensors, and exiting the grid.
- Parameters:
- Raises:
ValueError – If history is empty or cache_path doesn’t end with .gif
TypeError – If cache_path is not a Path object or history is invalid
- Return type:
- POMDPPlanners.environments.rock_sample_pomdp.create_random_rock_sample(map_size=7, num_rocks=8, seed=None)[source]
Create a random RockSample instance.
- Parameters:
- Return type:
- Returns:
Randomly configured RockSample POMDP
- POMDPPlanners.environments.rock_sample_pomdp.create_rock_sample_state(robot_pos, rocks)[source]
Create a RockSample state as a numpy array.
- Parameters:
- Returns:
[robot_row, robot_col, rock_0, rock_1, …, rock_n] where rock values are 1.0 for good (True) and 0.0 for bad (False)
- Return type:
- POMDPPlanners.environments.rock_sample_pomdp.create_rocksample_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]
Create a belief object for the RockSample POMDP.
- Parameters:
env (
RockSamplePOMDP) – RockSample environment instance.belief_type (
BeliefType) – Desired belief representation. SupportsPARTICLEandVECTORIZED_PARTICLE.n_particles (
int) – Number of particles. Defaults to 200.**kwargs (
Any) – Reserved for future use.
- Return type:
- Returns:
A configured belief object.
- Raises:
ValueError – If belief_type is not supported.
- POMDPPlanners.environments.rock_sample_pomdp.get_robot_pos(state)[source]
Extract robot position from state array.
- POMDPPlanners.environments.rock_sample_pomdp.get_rocks(state)[source]
Extract rock qualities from state array.
- POMDPPlanners.environments.rock_sample_pomdp.states_equal(state1, state2)[source]
Check if two states are equal.
Subpackages
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp_beliefs package
RockSampleVectorizedUpdaterRockSampleVectorizedUpdater.map_rowsRockSampleVectorizedUpdater.map_colsRockSampleVectorizedUpdater.num_rocksRockSampleVectorizedUpdater.rock_positionsRockSampleVectorizedUpdater.sensor_efficiencyRockSampleVectorizedUpdater.batch_observation_log_likelihood()RockSampleVectorizedUpdater.batch_transition()RockSampleVectorizedUpdater.config_idRockSampleVectorizedUpdater.from_environment()
create_rocksample_belief()- Submodules
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp_beliefs.rocksample_belief_factory module
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp_beliefs.rocksample_vectorized_updater module
RockSampleVectorizedUpdaterRockSampleVectorizedUpdater.map_rowsRockSampleVectorizedUpdater.map_colsRockSampleVectorizedUpdater.num_rocksRockSampleVectorizedUpdater.rock_positionsRockSampleVectorizedUpdater.sensor_efficiencyRockSampleVectorizedUpdater.batch_observation_log_likelihood()RockSampleVectorizedUpdater.batch_transition()RockSampleVectorizedUpdater.config_idRockSampleVectorizedUpdater.from_environment()
Submodules
POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp module
Module for RockSample POMDP environment.
This module provides the RockSample POMDP environment implementation based on the classic rock sampling problem.
The environment involves a robot navigating a grid world with rocks that are either good or bad. The robot must use a noisy sensor to determine rock quality and decide whether to sample them, balancing exploration and exploitation.
- Classes:
RockSampleState: Represents the state of the environment RockSamplePOMDP: The main POMDP environment implementation
- class POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.RockSampleObservationModel(next_state, action, pomdp)[source]
Bases:
ObservationModelObservation model for RockSample POMDP.
- Parameters:
next_state (ndarray)
action (int)
pomdp (RockSamplePOMDP)
- class POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.RockSamplePOMDP(map_size=(5, 5), rock_positions=None, init_pos=(0, 0), sensor_efficiency=10.0, bad_rock_penalty=-10.0, good_rock_reward=10.0, step_penalty=0.0, sensor_use_penalty=0.0, exit_reward=10.0, dangerous_areas=None, dangerous_area_radius=1.0, dangerous_area_penalty=5.0, discount_factor=0.95, name='RockSample', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentRockSample POMDP environment
This environment implements the classic rock sampling problem where a robot must navigate a grid, use sensors to evaluate rocks, and decide which ones to sample while balancing exploration costs and sampling rewards.
- Parameters:
sensor_efficiency (float)
bad_rock_penalty (float)
good_rock_reward (float)
step_penalty (float)
sensor_use_penalty (float)
exit_reward (float)
dangerous_area_radius (float)
dangerous_area_penalty (float)
discount_factor (float)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)
- map_size
Grid dimensions as (rows, cols)
- rock_positions
List of rock positions as (row, col) tuples
- init_pos
Initial robot position
- sensor_efficiency
Sensor noise parameter (higher = less noise)
- bad_rock_penalty
Penalty for sampling a bad rock
- good_rock_reward
Reward for sampling a good rock
- step_penalty
Cost for each action
- sensor_use_penalty
Additional cost for using sensor
- exit_reward
Reward for reaching the exit
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = RockSamplePOMDP(map_size=(5, 5), rock_positions=[(0, 0), (2, 2), (3, 3)]) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- class POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.RockSamplePOMDPMetrics(*values)[source]
Bases:
EnumMetric names for RockSample POMDP environment.
- AVERAGE_DANGEROUS_AREA_STEPS = 'average_dangerous_area_steps'
- AVG_ROCKS_SAMPLED = 'avg_rocks_sampled'
- EXIT_SUCCESS_RATE = 'exit_success_rate'
- class POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.RockSampleStateTransitionModel(state, action, pomdp)[source]
Bases:
StateTransitionModelState transition model for RockSample POMDP.
- Parameters:
state (ndarray)
action (int)
pomdp (RockSamplePOMDP)
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.create_random_rock_sample(map_size=7, num_rocks=8, seed=None)[source]
Create a random RockSample instance.
- Parameters:
- Return type:
- Returns:
Randomly configured RockSample POMDP
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.create_rock_sample_state(robot_pos, rocks)[source]
Create a RockSample state as a numpy array.
- Parameters:
- Returns:
[robot_row, robot_col, rock_0, rock_1, …, rock_n] where rock values are 1.0 for good (True) and 0.0 for bad (False)
- Return type:
- POMDPPlanners.environments.rock_sample_pomdp.rock_sample_pomdp.get_robot_pos(state)[source]
Extract robot position from state array.
POMDPPlanners.environments.rock_sample_pomdp.rock_sample_visualizer module
Visualization module for RockSample POMDP Environment.
This module provides visualization capabilities for RockSample POMDP episodes, creating animated GIFs showing robot movement, rock sampling, sensor usage, and exit behavior.
- Classes:
RockSampleVisualizer: Handles all visualization logic for RockSample POMDP
- class POMDPPlanners.environments.rock_sample_pomdp.rock_sample_visualizer.RockSampleVisualizer(env)[source]
Bases:
objectHandles visualization and animation for RockSample POMDP environments.
This class encapsulates all visualization logic for RockSample POMDP episodes, creating animated GIFs showing robot movement, rock sampling, sensor checks, dangerous areas, and exit behavior.
- Parameters:
env (RockSamplePOMDP)
- env
Reference to the RockSamplePOMDP environment instance
- map_size
Grid dimensions as (rows, cols)
- rock_positions
List of rock positions
- action_names
Names of available actions
- action_to_vector
Mapping from action indices to direction vectors
- dangerous_areas
List of dangerous area center positions
- dangerous_area_radius
Radius around dangerous area centers
- create_visualization(history, cache_path)[source]
Create animated visualization of a RockSample POMDP episode.
Creates an animated GIF showing the robot navigating, sampling rocks, using sensors, and exiting the grid.
- Parameters:
- Raises:
ValueError – If history is empty or cache_path doesn’t end with .gif
TypeError – If cache_path is not a Path object or history is invalid
- Return type: