POMDPPlanners.environments.pacman_pomdp package

PacMan POMDP package with sprite-based visualization.

class POMDPPlanners.environments.pacman_pomdp.PacManObservationModel(next_state, action, pomdp)[source]

Bases: ObservationModel

Observation model for PacMan POMDP.

Parameters:
probability(values)[source]

Calculate observation probabilities for multi-ghost observations.

Return type:

ndarray

Parameters:

values (List[Tuple[Tuple[int, int], ...]])

sample(n_samples=1)[source]

Sample observations of all ghost positions with noise.

Return type:

List[Tuple[Tuple[int, int], ...]]

Parameters:

n_samples (int)

sample_closest_ghosts(max_ghosts=2, n_samples=1)[source]

Sample observations of only the closest ghosts.

Return type:

List[Tuple[Tuple[int, int], ...]]

Parameters:
  • max_ghosts (int)

  • n_samples (int)

class POMDPPlanners.environments.pacman_pomdp.PacManPOMDP(maze_size=(7, 7), walls=None, initial_pellets=None, initial_pacman_pos=(0, 0), num_ghosts=1, initial_ghost_positions=None, initial_ghost_pos=None, pellet_reward=10.0, ghost_collision_penalty=-100.0, step_penalty=-1.0, win_reward=100.0, ghost_aggressiveness=2.0, ghost_coordination='independent', ghost_strategies=None, observation_noise_factor=0.3, max_observation_noise=1.5, discount_factor=0.95, name='PacManPOMDP', output_dir=None, debug=False)[source]

Bases: DiscreteActionsEnvironment

PacMan POMDP environment inspired by the classic arcade game.

This environment implements a simplified PacMan game where PacMan must collect pellets while avoiding a single ghost. The ghost position is only partially observable through noisy sensor readings.

Parameters:
maze_size

Grid dimensions as (rows, cols)

walls

Set of wall positions as (row, col) tuples

initial_pellets

List of initial pellet positions

pellet_reward

Reward for collecting a pellet

ghost_collision_penalty

Penalty for collision with ghost

step_penalty

Cost per action

win_reward

Reward for collecting all pellets

ghost_aggressiveness

Temperature parameter for ghost movement policy

observation_noise_factor

Multiplier for observation noise based on distance

max_observation_noise

Maximum noise standard deviation

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> env = PacManPOMDP(maze_size=(7, 7))
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False
array_to_observation(arr)[source]

Convert a flat numpy array back to a PacMan observation tuple.

Parameters:

arr (ndarray) – 1-D array of shape (2 * num_ghosts,).

Return type:

Tuple[Tuple[int, int], ...]

Returns:

Observation as tuple of (row, col) tuples.

array_to_state(arr)[source]

Convert a numpy array back to a PacManState.

Parameters:

arr (ndarray) – 1-D array of shape (self._state_dim,) produced by state_to_array().

Return type:

PacManState

Returns:

Reconstructed PacManState.

cache_visualization(history, cache_path)[source]

Cache visualization of episode history.

Parameters:
  • history (List[StepData]) – List of StepData objects representing the episode

  • cache_path (Path) – Path where the GIF should be saved

Return type:

None

compute_metrics(histories)[source]

Compute environment-specific metrics.

Return type:

List[MetricValue]

Parameters:

histories (List[History])

get_actions()[source]

Get all available actions.

Return type:

List[int]

get_metric_names()[source]

Get names of PacMan POMDP specific metrics.

Return type:

List[str]

Returns:

List containing metric names including standard metrics (win_rate, avg_pellets_collected, avg_episode_length, avg_pacman_closest_ghost_distance, avg_collision_encounters) and dynamically generated per-ghost distance metrics for multi-ghost scenarios (avg_pacman_ghost_0_distance, avg_pacman_ghost_1_distance, etc.)

property initial_ghost_pos: Tuple[int, int]

returns first ghost position.

Type:

Backward compatibility

initial_observation_dist()[source]

Get initial observation distribution.

Return type:

DiscreteDistribution

initial_state_dist()[source]

Get initial state distribution.

Return type:

DiscreteDistribution

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Return type:

bool

Parameters:
  • observation1 (Any)

  • observation2 (Any)

is_terminal(state)[source]

Check if state is terminal.

Return type:

bool

Parameters:

state (Any)

observation_model(next_state, action)[source]

Get observation model.

Return type:

PacManObservationModel

Parameters:
  • next_state (Any)

  • action (int)

observation_to_array(obs)[source]

Convert a PacMan observation tuple to a flat numpy array.

Parameters:

obs (Tuple[Tuple[int, int], ...]) – Observation as tuple of ghost (row, col) positions.

Return type:

ndarray

Returns:

1-D array of shape (2 * num_ghosts,).

reward(state, action)[source]

Calculate immediate reward.

Return type:

float

Parameters:
reward_batch(states, action)[source]

Calculate rewards for a batch of states.

Accepts either a 2-D numpy array of shape (N, state_dim) (vectorized path) or a sequence of PacManState objects (falls back to the loop-based default).

Computes deterministic reward components only: step penalty, pellet collection, and win bonus. Ghost collision penalty is excluded because it depends on stochastic ghost movement.

Parameters:
  • states (Union[ndarray, Sequence[Any]]) – Array of shape (N, state_dim) or sequence of states.

  • action (int) – Discrete action index (0-3).

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_to_array(state)[source]

Convert a PacManState to a fixed-size numpy array.

The array layout is: [pac_row, pac_col, g0_row, g0_col, ..., pellet_mask[0..P-1], score, terminal]

Parameters:

state (PacManState) – A PacManState instance.

Return type:

ndarray

Returns:

1-D float array of shape (self._state_dim,).

state_transition_model(state, action)[source]

Get state transition model.

Return type:

PacManStateTransitionModel

Parameters:
states_to_array(states)[source]

Batch-convert a list of PacManState to a 2-D numpy array.

Parameters:

states (List[PacManState]) – List of PacManState instances.

Return type:

ndarray

Returns:

Array of shape (len(states), self._state_dim).

visualize_path(path, actions, cache_path)[source]

Visualize PacMan path through the maze using sprite-based rendering.

Parameters:
  • path (List[PacManState]) – List of states representing the path through the maze

  • actions (List[int]) – List of actions taken at each step

  • cache_path (Path) – Path where the GIF should be saved

class POMDPPlanners.environments.pacman_pomdp.PacManState(pacman_pos, ghost_positions, pellets, score=0, terminal=False)[source]

Bases: object

State representation for PacMan POMDP.

Parameters:
pacman_pos

PacMan position as (row, col) tuple

ghost_positions

Tuple of ghost positions as (row, col) tuples

pellets

Tuple of remaining pellet positions as (row, col) tuples

score

Current game score

terminal

Whether the game has ended

property ghost_pos: Tuple[int, int]

returns first ghost position.

Type:

Backward compatibility

ghost_positions: Tuple[Tuple[int, int], ...]
property num_ghosts: int

Number of ghosts in the game.

pacman_pos: Tuple[int, int]
pellets: Tuple[Tuple[int, int], ...]
score: int | float = 0
terminal: bool = False
class POMDPPlanners.environments.pacman_pomdp.PacManStateTransitionModel(state, action, pomdp)[source]

Bases: StateTransitionModel

State transition model for PacMan POMDP.

Parameters:
probability(values)[source]

Calculate transition probabilities to next states.

Parameters:

values (List[PacManState]) – List of potential next states

Return type:

ndarray

Returns:

Array of probabilities for each state in values

sample(n_samples=1)[source]

Sample next states.

Return type:

List[PacManState]

Parameters:

n_samples (int)

class POMDPPlanners.environments.pacman_pomdp.PacManVectorizedUpdater(maze_size, num_ghosts, num_pellets, state_dim, neighbor_table, neighbor_validity, pellet_positions, ghost_aggressiveness, ghost_coordination, ghost_strategies, observation_noise_factor, max_observation_noise, idx_pac_row, idx_pac_col, idx_ghosts_start, idx_pellets_start, idx_pellets_end, idx_score, idx_terminal)[source]

Bases: VectorizedParticleBeliefUpdater

Vectorized particle belief updater for PacMan POMDP.

Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations. Ghost movement uses batched softmax sampling, and collision/pellet logic operates on the full particle array at once.

Parameters:
  • maze_size (Tuple[int, int])

  • num_ghosts (int)

  • num_pellets (int)

  • state_dim (int)

  • neighbor_table (np.ndarray)

  • neighbor_validity (np.ndarray)

  • pellet_positions (np.ndarray)

  • ghost_aggressiveness (float)

  • ghost_coordination (str)

  • ghost_strategies (List[str])

  • observation_noise_factor (float)

  • max_observation_noise (float)

  • idx_pac_row (int)

  • idx_pac_col (int)

  • idx_ghosts_start (int)

  • idx_pellets_start (int)

  • idx_pellets_end (int)

  • idx_score (int)

  • idx_terminal (int)

maze_size

Grid dimensions (rows, cols).

num_ghosts

Number of ghosts.

num_pellets

Number of initial pellets.

state_dim

Dimensionality of the array state.

ghost_aggressiveness

Softmax temperature for ghost pursuit.

ghost_coordination

Ghost coordination mode.

ghost_strategies

Per-ghost strategy list.

observation_noise_factor

Multiplier for observation noise.

max_observation_noise

Maximum observation noise std.

batch_observation_log_likelihood(next_particles, action, observation)[source]

Compute observation log-likelihoods for all particles at once.

Parameters:
  • next_particles (ndarray) – Transitioned particle states of shape (N, d).

  • action (ndarray) – Action vector.

  • observation (ndarray) – Observed value.

Return type:

ndarray

Returns:

Log-likelihoods of shape (N,).

batch_transition(particles, action)[source]

Transition all particles in a single batched operation.

Parameters:
  • particles (ndarray) – Current particle states of shape (N, d).

  • action (ndarray) – Action vector.

Return type:

ndarray

Returns:

Next-state particles of shape (N, d).

property config_id: str

Return a deterministic identifier for this updater configuration.

classmethod from_environment(env)[source]

Construct an updater from a PacManPOMDP instance.

Parameters:

env (PacManPOMDP) – Environment to extract parameters from.

Return type:

PacManVectorizedUpdater

Returns:

A new PacManVectorizedUpdater instance.

POMDPPlanners.environments.pacman_pomdp.create_pacman_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]

Create a ready-to-use belief for the PacMan POMDP.

Parameters:
  • env (PacManPOMDP) – PacManPOMDP environment instance.

  • belief_type (BeliefType) – Desired belief representation. Defaults to BeliefType.VECTORIZED_PARTICLE.

  • n_particles (int) – Number of particles. Defaults to 200.

  • **kwargs (Any) – Extra arguments (reserved for future use).

Return type:

Belief

Returns:

A configured Belief object.

Raises:

ValueError – If belief_type is not supported.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.pacman_pomdp import PacManPOMDP
>>> env = PacManPOMDP(discount_factor=0.95)
>>> belief = create_pacman_belief(env, n_particles=50)
>>> belief.sample().shape[0] > 0
True
POMDPPlanners.environments.pacman_pomdp.create_simple_maze_pacman(maze_size=7, num_walls=5, num_ghosts=1, seed=None)[source]

Create a simple PacMan instance with random walls and multiple ghosts.

Parameters:
  • maze_size (int) – Size of square maze. Defaults to 7.

  • num_walls (int) – Number of walls to place randomly. Defaults to 5.

  • num_ghosts (int) – Number of ghosts in the game. Defaults to 1.

  • seed (Optional[int]) – Random seed. Defaults to None.

Return type:

PacManPOMDP

Returns:

Randomly configured PacMan POMDP with multi-ghost support

Subpackages

Submodules

POMDPPlanners.environments.pacman_pomdp.pacman_pomdp module

Module for PacMan POMDP environment.

This module provides the PacMan POMDP environment implementation inspired by the classic arcade game. The environment features a grid world where PacMan must collect pellets while avoiding ghosts, with partial observability of ghost positions.

The environment involves PacMan navigating a maze with walls, collecting pellets, and avoiding ghosts that move according to stochastic policies. PacMan receives noisy observations about nearby ghost positions.

Classes:

PacManState: Represents the state of the environment PacManPOMDP: The main POMDP environment implementation

class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManObservationModel(next_state, action, pomdp)[source]

Bases: ObservationModel

Observation model for PacMan POMDP.

Parameters:
probability(values)[source]

Calculate observation probabilities for multi-ghost observations.

Return type:

ndarray

Parameters:

values (List[Tuple[Tuple[int, int], ...]])

sample(n_samples=1)[source]

Sample observations of all ghost positions with noise.

Return type:

List[Tuple[Tuple[int, int], ...]]

Parameters:

n_samples (int)

sample_closest_ghosts(max_ghosts=2, n_samples=1)[source]

Sample observations of only the closest ghosts.

Return type:

List[Tuple[Tuple[int, int], ...]]

Parameters:
  • max_ghosts (int)

  • n_samples (int)

class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManPOMDP(maze_size=(7, 7), walls=None, initial_pellets=None, initial_pacman_pos=(0, 0), num_ghosts=1, initial_ghost_positions=None, initial_ghost_pos=None, pellet_reward=10.0, ghost_collision_penalty=-100.0, step_penalty=-1.0, win_reward=100.0, ghost_aggressiveness=2.0, ghost_coordination='independent', ghost_strategies=None, observation_noise_factor=0.3, max_observation_noise=1.5, discount_factor=0.95, name='PacManPOMDP', output_dir=None, debug=False)[source]

Bases: DiscreteActionsEnvironment

PacMan POMDP environment inspired by the classic arcade game.

This environment implements a simplified PacMan game where PacMan must collect pellets while avoiding a single ghost. The ghost position is only partially observable through noisy sensor readings.

Parameters:
maze_size

Grid dimensions as (rows, cols)

walls

Set of wall positions as (row, col) tuples

initial_pellets

List of initial pellet positions

pellet_reward

Reward for collecting a pellet

ghost_collision_penalty

Penalty for collision with ghost

step_penalty

Cost per action

win_reward

Reward for collecting all pellets

ghost_aggressiveness

Temperature parameter for ghost movement policy

observation_noise_factor

Multiplier for observation noise based on distance

max_observation_noise

Maximum noise standard deviation

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> env = PacManPOMDP(maze_size=(7, 7))
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False
array_to_observation(arr)[source]

Convert a flat numpy array back to a PacMan observation tuple.

Parameters:

arr (ndarray) – 1-D array of shape (2 * num_ghosts,).

Return type:

Tuple[Tuple[int, int], ...]

Returns:

Observation as tuple of (row, col) tuples.

array_to_state(arr)[source]

Convert a numpy array back to a PacManState.

Parameters:

arr (ndarray) – 1-D array of shape (self._state_dim,) produced by state_to_array().

Return type:

PacManState

Returns:

Reconstructed PacManState.

cache_visualization(history, cache_path)[source]

Cache visualization of episode history.

Parameters:
  • history (List[StepData]) – List of StepData objects representing the episode

  • cache_path (Path) – Path where the GIF should be saved

Return type:

None

compute_metrics(histories)[source]

Compute environment-specific metrics.

Return type:

List[MetricValue]

Parameters:

histories (List[History])

get_actions()[source]

Get all available actions.

Return type:

List[int]

get_metric_names()[source]

Get names of PacMan POMDP specific metrics.

Return type:

List[str]

Returns:

List containing metric names including standard metrics (win_rate, avg_pellets_collected, avg_episode_length, avg_pacman_closest_ghost_distance, avg_collision_encounters) and dynamically generated per-ghost distance metrics for multi-ghost scenarios (avg_pacman_ghost_0_distance, avg_pacman_ghost_1_distance, etc.)

property initial_ghost_pos: Tuple[int, int]

returns first ghost position.

Type:

Backward compatibility

initial_observation_dist()[source]

Get initial observation distribution.

Return type:

DiscreteDistribution

initial_pellets: List[Tuple[int, int]]
initial_state_dist()[source]

Get initial state distribution.

Return type:

DiscreteDistribution

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Return type:

bool

Parameters:
  • observation1 (Any)

  • observation2 (Any)

is_terminal(state)[source]

Check if state is terminal.

Return type:

bool

Parameters:

state (Any)

observation_model(next_state, action)[source]

Get observation model.

Return type:

PacManObservationModel

Parameters:
  • next_state (Any)

  • action (int)

observation_to_array(obs)[source]

Convert a PacMan observation tuple to a flat numpy array.

Parameters:

obs (Tuple[Tuple[int, int], ...]) – Observation as tuple of ghost (row, col) positions.

Return type:

ndarray

Returns:

1-D array of shape (2 * num_ghosts,).

reward(state, action)[source]

Calculate immediate reward.

Return type:

float

Parameters:
reward_batch(states, action)[source]

Calculate rewards for a batch of states.

Accepts either a 2-D numpy array of shape (N, state_dim) (vectorized path) or a sequence of PacManState objects (falls back to the loop-based default).

Computes deterministic reward components only: step penalty, pellet collection, and win bonus. Ghost collision penalty is excluded because it depends on stochastic ghost movement.

Parameters:
  • states (Union[ndarray, Sequence[Any]]) – Array of shape (N, state_dim) or sequence of states.

  • action (int) – Discrete action index (0-3).

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_to_array(state)[source]

Convert a PacManState to a fixed-size numpy array.

The array layout is: [pac_row, pac_col, g0_row, g0_col, ..., pellet_mask[0..P-1], score, terminal]

Parameters:

state (PacManState) – A PacManState instance.

Return type:

ndarray

Returns:

1-D float array of shape (self._state_dim,).

state_transition_model(state, action)[source]

Get state transition model.

Return type:

PacManStateTransitionModel

Parameters:
states_to_array(states)[source]

Batch-convert a list of PacManState to a 2-D numpy array.

Parameters:

states (List[PacManState]) – List of PacManState instances.

Return type:

ndarray

Returns:

Array of shape (len(states), self._state_dim).

visualize_path(path, actions, cache_path)[source]

Visualize PacMan path through the maze using sprite-based rendering.

Parameters:
  • path (List[PacManState]) – List of states representing the path through the maze

  • actions (List[int]) – List of actions taken at each step

  • cache_path (Path) – Path where the GIF should be saved

class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManPOMDPMetrics(*values)[source]

Bases: Enum

Metric names for PacMan POMDP environment.

AVG_COLLISION_ENCOUNTERS = 'avg_collision_encounters'
AVG_EPISODE_LENGTH = 'avg_episode_length'
AVG_PACMAN_CLOSEST_GHOST_DISTANCE = 'avg_pacman_closest_ghost_distance'
AVG_PELLETS_COLLECTED = 'avg_pellets_collected'
WIN_RATE = 'win_rate'
class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManState(pacman_pos, ghost_positions, pellets, score=0, terminal=False)[source]

Bases: object

State representation for PacMan POMDP.

Parameters:
pacman_pos

PacMan position as (row, col) tuple

ghost_positions

Tuple of ghost positions as (row, col) tuples

pellets

Tuple of remaining pellet positions as (row, col) tuples

score

Current game score

terminal

Whether the game has ended

property ghost_pos: Tuple[int, int]

returns first ghost position.

Type:

Backward compatibility

ghost_positions: Tuple[Tuple[int, int], ...]
property num_ghosts: int

Number of ghosts in the game.

pacman_pos: Tuple[int, int]
pellets: Tuple[Tuple[int, int], ...]
score: int | float = 0
terminal: bool = False
class POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.PacManStateTransitionModel(state, action, pomdp)[source]

Bases: StateTransitionModel

State transition model for PacMan POMDP.

Parameters:
probability(values)[source]

Calculate transition probabilities to next states.

Parameters:

values (List[PacManState]) – List of potential next states

Return type:

ndarray

Returns:

Array of probabilities for each state in values

sample(n_samples=1)[source]

Sample next states.

Return type:

List[PacManState]

Parameters:

n_samples (int)

POMDPPlanners.environments.pacman_pomdp.pacman_pomdp.create_simple_maze_pacman(maze_size=7, num_walls=5, num_ghosts=1, seed=None)[source]

Create a simple PacMan instance with random walls and multiple ghosts.

Parameters:
  • maze_size (int) – Size of square maze. Defaults to 7.

  • num_walls (int) – Number of walls to place randomly. Defaults to 5.

  • num_ghosts (int) – Number of ghosts in the game. Defaults to 1.

  • seed (Optional[int]) – Random seed. Defaults to None.

Return type:

PacManPOMDP

Returns:

Randomly configured PacMan POMDP with multi-ghost support

POMDPPlanners.environments.pacman_pomdp.pacman_visualizer module

Visualization module for PacMan POMDP environment.

This module provides sprite-based visualization capabilities for PacMan POMDP episodes, rendering animated GIFs of agent behavior and game state.

Classes:

PacManVisualizer: Handles sprite-based rendering and GIF generation

class POMDPPlanners.environments.pacman_pomdp.pacman_visualizer.PacManVisualizer(environment, tile_size=32)[source]

Bases: object

Handles visualization for PacMan POMDP environments.

This class manages sprite loading, frame rendering, and GIF generation for visualizing PacMan POMDP episodes. It renders the maze, PacMan, ghosts, pellets, and game state information.

Parameters:
env

Reference to the PacMan POMDP environment

tile_size

Size of each tile in pixels

sprites

Dictionary of loaded sprite images

cache_visualization(history, cache_path)[source]

Cache visualization of episode history.

Parameters:
  • history (List[StepData]) – List of StepData objects representing the episode

  • cache_path (Path) – Path where the GIF should be saved

Raises:
  • TypeError – If history or cache_path have wrong types

  • ValueError – If history is empty or cache_path doesn’t end with .gif

Return type:

None

visualize_path(path, actions, cache_path)[source]

Visualize PacMan path through the maze using sprite-based rendering.

Parameters:
  • path (List[PacManState]) – List of states representing the path through the maze

  • actions (List[int]) – List of actions taken at each step

  • cache_path (Path) – Path where the GIF should be saved

Raises:

TypeError – If cache_path is not a Path object

Return type:

None