POMDPPlanners.environments.pacman_pomdp package
PacMan POMDP package with sprite-based visualization.
- class POMDPPlanners.environments.pacman_pomdp.PacManObservationModel(next_state, action, pomdp)[source]
Bases:
ObservationModelObservation model for PacMan POMDP.
- Parameters:
next_state (PacManState)
action (int)
pomdp (PacManPOMDP)
- class POMDPPlanners.environments.pacman_pomdp.PacManPOMDP(maze_size=(7, 7), walls=None, initial_pellets=None, initial_pacman_pos=(0, 0), num_ghosts=1, initial_ghost_positions=None, initial_ghost_pos=None, pellet_reward=10.0, ghost_collision_penalty=-100.0, step_penalty=-1.0, win_reward=100.0, ghost_aggressiveness=2.0, ghost_coordination='independent', ghost_strategies=None, observation_noise_factor=0.3, max_observation_noise=1.5, discount_factor=0.95, name='PacManPOMDP', output_dir=None, debug=False)[source]
Bases:
DiscreteActionsEnvironmentPacMan POMDP environment inspired by the classic arcade game.
This environment implements a simplified PacMan game where PacMan must collect pellets while avoiding a single ghost. The ghost position is only partially observable through noisy sensor readings.
- Parameters:
num_ghosts (int)
pellet_reward (float)
ghost_collision_penalty (float)
step_penalty (float)
win_reward (float)
ghost_aggressiveness (float)
ghost_coordination (str)
observation_noise_factor (float)
max_observation_noise (float)
discount_factor (float)
name (str)
output_dir (Path | None)
debug (bool)
- maze_size
Grid dimensions as (rows, cols)
- walls
Set of wall positions as (row, col) tuples
- initial_pellets
List of initial pellet positions
- pellet_reward
Reward for collecting a pellet
- ghost_collision_penalty
Penalty for collision with ghost
- step_penalty
Cost per action
- win_reward
Reward for collecting all pellets
- ghost_aggressiveness
Temperature parameter for ghost movement policy
- observation_noise_factor
Multiplier for observation noise based on distance
- max_observation_noise
Maximum noise standard deviation
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = PacManPOMDP(maze_size=(7, 7)) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- array_to_state(arr)[source]
Convert a numpy array back to a PacManState.
- Parameters:
arr (
ndarray) – 1-D array of shape(self._state_dim,)produced bystate_to_array().- Return type:
- Returns:
Reconstructed PacManState.
- get_metric_names()[source]
Get names of PacMan POMDP specific metrics.
- Return type:
- Returns:
List containing metric names including standard metrics (win_rate, avg_pellets_collected, avg_episode_length, avg_pacman_closest_ghost_distance, avg_collision_encounters) and dynamically generated per-ghost distance metrics for multi-ghost scenarios (avg_pacman_ghost_0_distance, avg_pacman_ghost_1_distance, etc.)
- property initial_ghost_pos: Tuple[int, int]
returns first ghost position.
- Type:
Backward compatibility
- reward_batch(states, action)[source]
Calculate rewards for a batch of states.
Accepts either a 2-D numpy array of shape
(N, state_dim)(vectorized path) or a sequence of PacManState objects (falls back to the loop-based default).Computes deterministic reward components only: step penalty, pellet collection, and win bonus. Ghost collision penalty is excluded because it depends on stochastic ghost movement.
- state_to_array(state)[source]
Convert a PacManState to a fixed-size numpy array.
The array layout is:
[pac_row, pac_col, g0_row, g0_col, ..., pellet_mask[0..P-1], score, terminal]- Parameters:
state (
PacManState) – A PacManState instance.- Return type:
- Returns:
1-D float array of shape
(self._state_dim,).
- states_to_array(states)[source]
Batch-convert a list of PacManState to a 2-D numpy array.
- Parameters:
states (
List[PacManState]) – List of PacManState instances.- Return type:
- Returns:
Array of shape
(len(states), self._state_dim).
- visualize_path(path, actions, cache_path)[source]
Visualize PacMan path through the maze using sprite-based rendering.
- Parameters:
path (
List[PacManState]) – List of states representing the path through the mazecache_path (
Path) – Path where the GIF should be saved
- class POMDPPlanners.environments.pacman_pomdp.PacManState(pacman_pos, ghost_positions, pellets, score=0, terminal=False)[source]
Bases:
objectState representation for PacMan POMDP.
- Parameters:
- pacman_pos
PacMan position as (row, col) tuple
- ghost_positions
Tuple of ghost positions as (row, col) tuples
- pellets
Tuple of remaining pellet positions as (row, col) tuples
- score
Current game score
- terminal
Whether the game has ended
- class POMDPPlanners.environments.pacman_pomdp.PacManStateTransitionModel(state, action, pomdp)[source]
Bases:
StateTransitionModelState transition model for PacMan POMDP.
- Parameters:
state (PacManState)
action (int)
pomdp (PacManPOMDP)
- probability(values)[source]
Calculate transition probabilities to next states.
- Parameters:
values (
List[PacManState]) – List of potential next states- Return type:
- Returns:
Array of probabilities for each state in values
- class POMDPPlanners.environments.pacman_pomdp.PacManVectorizedUpdater(maze_size, num_ghosts, num_pellets, state_dim, neighbor_table, neighbor_validity, pellet_positions, ghost_aggressiveness, ghost_coordination, ghost_strategies, observation_noise_factor, max_observation_noise, idx_pac_row, idx_pac_col, idx_ghosts_start, idx_pellets_start, idx_pellets_end, idx_score, idx_terminal)[source]
Bases:
VectorizedParticleBeliefUpdaterVectorized particle belief updater for PacMan POMDP.
Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations. Ghost movement uses batched softmax sampling, and collision/pellet logic operates on the full particle array at once.
- Parameters:
num_ghosts (int)
num_pellets (int)
state_dim (int)
neighbor_table (np.ndarray)
neighbor_validity (np.ndarray)
pellet_positions (np.ndarray)
ghost_aggressiveness (float)
ghost_coordination (str)
ghost_strategies (List[str])
observation_noise_factor (float)
max_observation_noise (float)
idx_pac_row (int)
idx_pac_col (int)
idx_ghosts_start (int)
idx_pellets_start (int)
idx_pellets_end (int)
idx_score (int)
idx_terminal (int)
- maze_size
Grid dimensions (rows, cols).
- num_ghosts
Number of ghosts.
- num_pellets
Number of initial pellets.
- state_dim
Dimensionality of the array state.
- ghost_aggressiveness
Softmax temperature for ghost pursuit.
- ghost_coordination
Ghost coordination mode.
- ghost_strategies
Per-ghost strategy list.
- observation_noise_factor
Multiplier for observation noise.
- max_observation_noise
Maximum observation noise std.
- batch_observation_log_likelihood(next_particles, action, observation)[source]
Compute observation log-likelihoods for all particles at once.
- batch_transition(particles, action)[source]
Transition all particles in a single batched operation.
- classmethod from_environment(env)[source]
Construct an updater from a PacManPOMDP instance.
- Parameters:
env (
PacManPOMDP) – Environment to extract parameters from.- Return type:
- Returns:
A new
PacManVectorizedUpdaterinstance.
- POMDPPlanners.environments.pacman_pomdp.create_pacman_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]
Create a ready-to-use belief for the PacMan POMDP.
- Parameters:
env (
PacManPOMDP) – PacManPOMDP environment instance.belief_type (
BeliefType) – Desired belief representation. Defaults toBeliefType.VECTORIZED_PARTICLE.n_particles (
int) – Number of particles. Defaults to 200.**kwargs (
Any) – Extra arguments (reserved for future use).
- Return type:
- Returns:
A configured
Beliefobject.- Raises:
ValueError – If belief_type is not supported.
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.pacman_pomdp import PacManPOMDP >>> env = PacManPOMDP(discount_factor=0.95) >>> belief = create_pacman_belief(env, n_particles=50) >>> belief.sample().shape[0] > 0 True
- POMDPPlanners.environments.pacman_pomdp.create_simple_maze_pacman(maze_size=7, num_walls=5, num_ghosts=1, seed=None)[source]
Create a simple PacMan instance with random walls and multiple ghosts.
- Parameters:
- Return type:
- Returns:
Randomly configured PacMan POMDP with multi-ghost support
Subpackages
- POMDPPlanners.environments.pacman_pomdp.pacman_pomdp_beliefs package
PacManVectorizedUpdaterPacManVectorizedUpdater.maze_sizePacManVectorizedUpdater.num_ghostsPacManVectorizedUpdater.num_pelletsPacManVectorizedUpdater.state_dimPacManVectorizedUpdater.ghost_aggressivenessPacManVectorizedUpdater.ghost_coordinationPacManVectorizedUpdater.ghost_strategiesPacManVectorizedUpdater.observation_noise_factorPacManVectorizedUpdater.max_observation_noisePacManVectorizedUpdater.batch_observation_log_likelihood()PacManVectorizedUpdater.batch_transition()PacManVectorizedUpdater.config_idPacManVectorizedUpdater.from_environment()
create_pacman_belief()- Submodules
- POMDPPlanners.environments.pacman_pomdp.pacman_pomdp_beliefs.pacman_belief_factory module
- POMDPPlanners.environments.pacman_pomdp.pacman_pomdp_beliefs.pacman_grid_utils module
- POMDPPlanners.environments.pacman_pomdp.pacman_pomdp_beliefs.pacman_vectorized_updater module
PacManVectorizedUpdaterPacManVectorizedUpdater.maze_sizePacManVectorizedUpdater.num_ghostsPacManVectorizedUpdater.num_pelletsPacManVectorizedUpdater.state_dimPacManVectorizedUpdater.ghost_aggressivenessPacManVectorizedUpdater.ghost_coordinationPacManVectorizedUpdater.ghost_strategiesPacManVectorizedUpdater.observation_noise_factorPacManVectorizedUpdater.max_observation_noisePacManVectorizedUpdater.batch_observation_log_likelihood()PacManVectorizedUpdater.batch_transition()PacManVectorizedUpdater.config_idPacManVectorizedUpdater.from_environment()
Submodules
POMDPPlanners.environments.pacman_pomdp.pacman_pomdp module
Module for PacMan POMDP environment.
This module provides the PacMan POMDP environment implementation inspired by the classic arcade game. The environment features a grid world where PacMan must collect pellets while avoiding ghosts, with partial observability of ghost positions.
The environment involves PacMan navigating a maze with walls, collecting pellets, and avoiding ghosts that move according to stochastic policies. PacMan receives noisy observations about nearby ghost positions.
- Classes:
PacManState: Represents the state of the environment PacManPOMDP: The main POMDP environment implementation
- class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManObservationModel(next_state, action, pomdp)[source]
Bases:
ObservationModelObservation model for PacMan POMDP.
- Parameters:
next_state (PacManState)
action (int)
pomdp (PacManPOMDP)
- class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManPOMDP(maze_size=(7, 7), walls=None, initial_pellets=None, initial_pacman_pos=(0, 0), num_ghosts=1, initial_ghost_positions=None, initial_ghost_pos=None, pellet_reward=10.0, ghost_collision_penalty=-100.0, step_penalty=-1.0, win_reward=100.0, ghost_aggressiveness=2.0, ghost_coordination='independent', ghost_strategies=None, observation_noise_factor=0.3, max_observation_noise=1.5, discount_factor=0.95, name='PacManPOMDP', output_dir=None, debug=False)[source]
Bases:
DiscreteActionsEnvironmentPacMan POMDP environment inspired by the classic arcade game.
This environment implements a simplified PacMan game where PacMan must collect pellets while avoiding a single ghost. The ghost position is only partially observable through noisy sensor readings.
- Parameters:
num_ghosts (int)
pellet_reward (float)
ghost_collision_penalty (float)
step_penalty (float)
win_reward (float)
ghost_aggressiveness (float)
ghost_coordination (str)
observation_noise_factor (float)
max_observation_noise (float)
discount_factor (float)
name (str)
output_dir (Path | None)
debug (bool)
- maze_size
Grid dimensions as (rows, cols)
- walls
Set of wall positions as (row, col) tuples
- initial_pellets
List of initial pellet positions
- pellet_reward
Reward for collecting a pellet
- ghost_collision_penalty
Penalty for collision with ghost
- step_penalty
Cost per action
- win_reward
Reward for collecting all pellets
- ghost_aggressiveness
Temperature parameter for ghost movement policy
- observation_noise_factor
Multiplier for observation noise based on distance
- max_observation_noise
Maximum noise standard deviation
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = PacManPOMDP(maze_size=(7, 7)) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- array_to_state(arr)[source]
Convert a numpy array back to a PacManState.
- Parameters:
arr (
ndarray) – 1-D array of shape(self._state_dim,)produced bystate_to_array().- Return type:
- Returns:
Reconstructed PacManState.
- get_metric_names()[source]
Get names of PacMan POMDP specific metrics.
- Return type:
- Returns:
List containing metric names including standard metrics (win_rate, avg_pellets_collected, avg_episode_length, avg_pacman_closest_ghost_distance, avg_collision_encounters) and dynamically generated per-ghost distance metrics for multi-ghost scenarios (avg_pacman_ghost_0_distance, avg_pacman_ghost_1_distance, etc.)
- property initial_ghost_pos: Tuple[int, int]
returns first ghost position.
- Type:
Backward compatibility
- reward_batch(states, action)[source]
Calculate rewards for a batch of states.
Accepts either a 2-D numpy array of shape
(N, state_dim)(vectorized path) or a sequence of PacManState objects (falls back to the loop-based default).Computes deterministic reward components only: step penalty, pellet collection, and win bonus. Ghost collision penalty is excluded because it depends on stochastic ghost movement.
- state_to_array(state)[source]
Convert a PacManState to a fixed-size numpy array.
The array layout is:
[pac_row, pac_col, g0_row, g0_col, ..., pellet_mask[0..P-1], score, terminal]- Parameters:
state (
PacManState) – A PacManState instance.- Return type:
- Returns:
1-D float array of shape
(self._state_dim,).
- states_to_array(states)[source]
Batch-convert a list of PacManState to a 2-D numpy array.
- Parameters:
states (
List[PacManState]) – List of PacManState instances.- Return type:
- Returns:
Array of shape
(len(states), self._state_dim).
- visualize_path(path, actions, cache_path)[source]
Visualize PacMan path through the maze using sprite-based rendering.
- Parameters:
path (
List[PacManState]) – List of states representing the path through the mazecache_path (
Path) – Path where the GIF should be saved
- class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManPOMDPMetrics(*values)[source]
Bases:
EnumMetric names for PacMan POMDP environment.
- AVG_COLLISION_ENCOUNTERS = 'avg_collision_encounters'
- AVG_EPISODE_LENGTH = 'avg_episode_length'
- AVG_PACMAN_CLOSEST_GHOST_DISTANCE = 'avg_pacman_closest_ghost_distance'
- AVG_PELLETS_COLLECTED = 'avg_pellets_collected'
- WIN_RATE = 'win_rate'
- class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManState(pacman_pos, ghost_positions, pellets, score=0, terminal=False)[source]
Bases:
objectState representation for PacMan POMDP.
- Parameters:
- pacman_pos
PacMan position as (row, col) tuple
- ghost_positions
Tuple of ghost positions as (row, col) tuples
- pellets
Tuple of remaining pellet positions as (row, col) tuples
- score
Current game score
- terminal
Whether the game has ended
- class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManStateTransitionModel(state, action, pomdp)[source]
Bases:
StateTransitionModelState transition model for PacMan POMDP.
- Parameters:
state (PacManState)
action (int)
pomdp (PacManPOMDP)
- probability(values)[source]
Calculate transition probabilities to next states.
- Parameters:
values (
List[PacManState]) – List of potential next states- Return type:
- Returns:
Array of probabilities for each state in values
- POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.create_simple_maze_pacman(maze_size=7, num_walls=5, num_ghosts=1, seed=None)[source]
Create a simple PacMan instance with random walls and multiple ghosts.
- Parameters:
- Return type:
- Returns:
Randomly configured PacMan POMDP with multi-ghost support
POMDPPlanners.environments.pacman_pomdp.pacman_visualizer module
Visualization module for PacMan POMDP environment.
This module provides sprite-based visualization capabilities for PacMan POMDP episodes, rendering animated GIFs of agent behavior and game state.
- Classes:
PacManVisualizer: Handles sprite-based rendering and GIF generation
- class POMDPPlanners.environments.pacman_pomdp.pacman_visualizer.PacManVisualizer(environment, tile_size=32)[source]
Bases:
objectHandles visualization for PacMan POMDP environments.
This class manages sprite loading, frame rendering, and GIF generation for visualizing PacMan POMDP episodes. It renders the maze, PacMan, ghosts, pellets, and game state information.
- Parameters:
environment (PacManPOMDP)
tile_size (int)
- env
Reference to the PacMan POMDP environment
- tile_size
Size of each tile in pixels
- sprites
Dictionary of loaded sprite images