POMDPPlanners.environments.cartpole_pomdp package

CartPole POMDP Environment Module.

This module provides the CartPole POMDP environment implementation and related components for pole-balancing tasks with noisy observations.

Classes:: CartPolePOMDP: Main CartPole environment with POMDP formulation CartPoleStateTransition: Physics-based state transition model CartPoleObservation: Gaussian noise observation model CartPoleInitialStateDistribution: Initial state sampling distribution CartPolePOMDPMetrics: Metric names for CartPole POMDP environment

class POMDPPlanners.environments.cartpole_pomdp.CartPoleInitialStateDistribution[source]

Bases: Distribution

Initial state distribution for CartPole POMDP.

This distribution generates random initial states for the cart-pole system by sampling uniformly from a small range around the equilibrium position. All state variables (position, velocity, angle, angular velocity) are initialized close to zero with small random perturbations.

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> # Create initial state distribution
>>> initial_dist = CartPoleInitialStateDistribution()

>>> # Sample initial state
>>> initial_state = initial_dist.sample()[0]
>>> len(initial_state) == 4
True
>>> all(-0.05 <= x <= 0.05 for x in initial_state)  # Values in valid range
True

>>> # Sample multiple initial states
>>> states = initial_dist.sample(n_samples=3)
>>> len(states) == 3
True
>>> all(len(state) == 4 for state in states)
True

>>> # Each state has 4 components: [cart_pos, cart_vel, pole_angle, pole_ang_vel]
>>> position, velocity, angle, angular_velocity = initial_state
>>> isinstance(position, (int, float, np.floating))
True

sample(n_samples=1)[source]

Sample values from the distribution.

Parameters:: n_samples (int) – Number of samples to return. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of n_samples independent samples from the distribution

Note

Subclasses must implement this method according to their specific distribution type and parameters.

class POMDPPlanners.environments.cartpole_pomdp.CartPoleObservation(next_state, action, obs_dist)[source]

Bases: ObservationModel

Noisy observation model for CartPole POMDP.

This model adds Gaussian noise to the true state to create partial observability. The agent receives a noisy version of the full state vector, making it challenging to determine the exact cart-pole configuration.

Parameters:

next_state (ndarray)
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)

next_state: True state after action execution

action: Action that was taken (not used in observation generation)

obs_dist: Pre-computed multivariate normal distribution for efficient sampling/PDF

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> # Define true state after action
>>> true_state = np.array([0.1, 0.05, 0.02, -0.1])
>>> action = 1

>>> # Define observation noise covariance and create distribution
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> obs_dist = CovarianceParameterizedMultivariateNormal(noise_cov)

>>> # Create observation model
>>> obs_model = CartPoleObservation(
...     next_state=true_state,
...     action=action,
...     obs_dist=obs_dist
... )

>>> # Sample noisy observation
>>> observation = obs_model.sample()[0]
>>> len(observation) == 4  # Same dimensionality as state
True
>>> isinstance(observation, np.ndarray)
True

>>> # Calculate probability of specific observation
>>> prob = obs_model.probability([observation])
>>> len(prob) == 1
True

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[ndarray]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.cartpole_pomdp.CartPolePOMDP(discount_factor, noise_cov, state_transition_cov=None, name='CartPolePOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]

Bases: DiscreteActionsEnvironment

CartPole balancing task formulated as a POMDP.

This environment simulates the classic cart-pole balancing problem where an agent must apply left or right forces to keep a pole balanced on a moving cart. The challenge comes from noisy observations of the cart-pole state.

Problem Structure: - State: [cart_position, cart_velocity, pole_angle, pole_velocity] (continuous) - Actions: [left_force, right_force] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: +1.0 per time step alive, 0.0 when terminated - Termination: Pole falls beyond angle threshold or cart moves too far

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov)
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False

Parameters:

discount_factor (float)
noise_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]])
state_transition_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)

DEFAULT_STATE_TRANSITION_COV = array([[1.0e-04, 0.0e+00, 0.0e+00, 0.0e+00], [0.0e+00, 1.0e-04, 0.0e+00, 0.0e+00], [0.0e+00, 0.0e+00, 2.5e-05, 0.0e+00], [0.0e+00, 0.0e+00, 0.0e+00, 1.0e-04]])

compute_metrics(histories)[source]

Compute CartPole POMDP specific metrics from simulation histories.

Parameters:: histories (List[History]) – List of simulation histories
Return type:: List[MetricValue]
Returns:: List of MetricValue objects containing the computed metrics

get_actions()[source]

Get all possible actions in the discrete action space.

Return type:: List[int]
Returns:: List containing all valid actions that can be executed

Note

Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.

get_metric_names()[source]

Get names of CartPole POMDP specific metrics.

Returns:: goal_reaching_rate
Return type:: List[str]

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:: Distribution
Returns:: Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:: Distribution
Returns:: Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:

observation1 (ndarray) – First observation to compare
observation2 (ndarray) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

is_terminal(state)[source]

Check if a state is terminal.

Parameters:: state (ndarray) – State to check for terminal condition
Return type:: bool
Returns:: True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:

next_state (ndarray) – The resulting state after taking an action
action (int) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:

state (ndarray) – Current state
action (int) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

reward_batch(states, action)[source]

Calculate rewards for a batch of states given a single action.

Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.

Parameters:

states (Union[ndarray, Sequence[Any]]) – Sequence of states of length N.
action (int) – Action executed from each state.

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:

state (ndarray) – Current state
action (int) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

class POMDPPlanners.environments.cartpole_pomdp.CartPolePOMDPMetrics(*values)[source]

Bases: Enum

Metric names for CartPole POMDP environment.

GOAL_REACHING_RATE = 'goal_reaching_rate'

class POMDPPlanners.environments.cartpole_pomdp.CartPoleStateTransition(state, action, force_mag, total_mass, polemass_length, gravity, length, kinematics_integrator, tau, masspole, state_transition_dist)[source]

Bases: StateTransitionModel

Physics-based state transition model for CartPole POMDP.

This model implements the classical cart-pole dynamics with Gaussian process noise. The cart experiences forces that affect both cart acceleration and pole angular acceleration through coupled equations of motion, with additive Normal noise on the resulting next state.

Parameters:

state (ndarray)
action (int)
force_mag (float)
total_mass (float)
polemass_length (float)
gravity (float)
length (float)
kinematics_integrator (str)
tau (float)
masspole (float)
state_transition_dist (CovarianceParameterizedMultivariateNormal)

state: Current state [cart_position, cart_velocity, pole_angle, pole_velocity]

action: Force direction (0 for left, 1 for right)

force_mag: Magnitude of applied force

total_mass: Combined mass of cart and pole

polemass_length: Pole mass times pole length (moment calculation)

gravity: Gravitational acceleration constant

length: Half the pole’s length

kinematics_integrator: Integration method (“euler” or “semi-implicit euler”)

tau: Time step for integration

masspole: Mass of the pole

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> # Define initial state [position, velocity, angle, angular_velocity]
>>> state = np.array([0.0, 0.0, 0.1, 0.0])
>>> action = 1  # Apply right force

>>> # Create transition model with physics parameters and noise
>>> state_transition_cov = np.diag([1e-4, 1e-4, 2.5e-5, 1e-4])
>>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov)
>>> transition = CartPoleStateTransition(
...     state=state,
...     action=action,
...     force_mag=10.0,
...     total_mass=1.1,
...     polemass_length=0.05,
...     gravity=9.8,
...     length=0.5,
...     kinematics_integrator="euler",
...     tau=0.02,
...     masspole=0.1,
...     state_transition_dist=state_transition_dist
... )

>>> # Simulate physics step
>>> next_state = transition.sample()[0]
>>> len(next_state) == 4  # [pos, vel, angle, ang_vel]
True
>>> isinstance(next_state, np.ndarray)
True

probability(values)[source]

Calculate transition probabilities for given next states.

Parameters:: values (List[ndarray]) – List of next state values to calculate probabilities for
Return type:: ndarray
Returns:: Array of transition probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample next states from the transition model.

Parameters:: n_samples (int) – Number of next state samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled next states of length n_samples.

Note

Subclasses must implement this method according to their specific state transition dynamics.

Submodules

POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp module

CartPole POMDP Environment Implementation.

This module implements a CartPole balancing task as a POMDP, where an agent must balance a pole on a cart using discrete left/right force actions, with noisy observations of the cart-pole state.

The CartPole POMDP features: - Continuous 4D state space: [cart_position, cart_velocity, pole_angle, pole_velocity] - Discrete binary action space: [left_force, right_force] - Noisy continuous observations of the state - Physics-based dynamics simulation - Episode termination when pole falls beyond threshold or cart moves too far

Classes:: CartPoleStateTransition: Physics-based state transition model CartPoleObservation: Gaussian noise observation model CartPolePOMDP: Main CartPole environment with POMDP formulation

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleInitialStateDistribution[source]

Bases: Distribution

Initial state distribution for CartPole POMDP.

This distribution generates random initial states for the cart-pole system by sampling uniformly from a small range around the equilibrium position. All state variables (position, velocity, angle, angular velocity) are initialized close to zero with small random perturbations.

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> # Create initial state distribution
>>> initial_dist = CartPoleInitialStateDistribution()

>>> # Sample initial state
>>> initial_state = initial_dist.sample()[0]
>>> len(initial_state) == 4
True
>>> all(-0.05 <= x <= 0.05 for x in initial_state)  # Values in valid range
True

>>> # Sample multiple initial states
>>> states = initial_dist.sample(n_samples=3)
>>> len(states) == 3
True
>>> all(len(state) == 4 for state in states)
True

>>> # Each state has 4 components: [cart_pos, cart_vel, pole_angle, pole_ang_vel]
>>> position, velocity, angle, angular_velocity = initial_state
>>> isinstance(position, (int, float, np.floating))
True

sample(n_samples=1)[source]

Sample values from the distribution.

Parameters:: n_samples (int) – Number of samples to return. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of n_samples independent samples from the distribution

Note

Subclasses must implement this method according to their specific distribution type and parameters.

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleObservation(next_state, action, obs_dist)[source]

Bases: ObservationModel

Noisy observation model for CartPole POMDP.

This model adds Gaussian noise to the true state to create partial observability. The agent receives a noisy version of the full state vector, making it challenging to determine the exact cart-pole configuration.

Parameters:

next_state (ndarray)
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)

next_state: True state after action execution

action: Action that was taken (not used in observation generation)

obs_dist: Pre-computed multivariate normal distribution for efficient sampling/PDF

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> # Define true state after action
>>> true_state = np.array([0.1, 0.05, 0.02, -0.1])
>>> action = 1

>>> # Define observation noise covariance and create distribution
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> obs_dist = CovarianceParameterizedMultivariateNormal(noise_cov)

>>> # Create observation model
>>> obs_model = CartPoleObservation(
...     next_state=true_state,
...     action=action,
...     obs_dist=obs_dist
... )

>>> # Sample noisy observation
>>> observation = obs_model.sample()[0]
>>> len(observation) == 4  # Same dimensionality as state
True
>>> isinstance(observation, np.ndarray)
True

>>> # Calculate probability of specific observation
>>> prob = obs_model.probability([observation])
>>> len(prob) == 1
True

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[ndarray]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPolePOMDP(discount_factor, noise_cov, state_transition_cov=None, name='CartPolePOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]

Bases: DiscreteActionsEnvironment

CartPole balancing task formulated as a POMDP.

This environment simulates the classic cart-pole balancing problem where an agent must apply left or right forces to keep a pole balanced on a moving cart. The challenge comes from noisy observations of the cart-pole state.

Problem Structure: - State: [cart_position, cart_velocity, pole_angle, pole_velocity] (continuous) - Actions: [left_force, right_force] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: +1.0 per time step alive, 0.0 when terminated - Termination: Pole falls beyond angle threshold or cart moves too far

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov)
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False

Parameters:

discount_factor (float)
noise_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]])
state_transition_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)

DEFAULT_STATE_TRANSITION_COV = array([[1.0e-04, 0.0e+00, 0.0e+00, 0.0e+00], [0.0e+00, 1.0e-04, 0.0e+00, 0.0e+00], [0.0e+00, 0.0e+00, 2.5e-05, 0.0e+00], [0.0e+00, 0.0e+00, 0.0e+00, 1.0e-04]])

compute_metrics(histories)[source]

Compute CartPole POMDP specific metrics from simulation histories.

Parameters:: histories (List[History]) – List of simulation histories
Return type:: List[MetricValue]
Returns:: List of MetricValue objects containing the computed metrics

get_actions()[source]

Get all possible actions in the discrete action space.

Return type:: List[int]
Returns:: List containing all valid actions that can be executed

Note

Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.

get_metric_names()[source]

Get names of CartPole POMDP specific metrics.

Returns:: goal_reaching_rate
Return type:: List[str]

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:: Distribution
Returns:: Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:: Distribution
Returns:: Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:

observation1 (ndarray) – First observation to compare
observation2 (ndarray) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

is_terminal(state)[source]

Check if a state is terminal.

Parameters:: state (ndarray) – State to check for terminal condition
Return type:: bool
Returns:: True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:

next_state (ndarray) – The resulting state after taking an action
action (int) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:

state (ndarray) – Current state
action (int) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

reward_batch(states, action)[source]

Calculate rewards for a batch of states given a single action.

Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.

Parameters:

states (Union[ndarray, Sequence[Any]]) – Sequence of states of length N.
action (int) – Action executed from each state.

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:

state (ndarray) – Current state
action (int) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPolePOMDPMetrics(*values)[source]

Bases: Enum

Metric names for CartPole POMDP environment.

GOAL_REACHING_RATE = 'goal_reaching_rate'

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleStateTransition(state, action, force_mag, total_mass, polemass_length, gravity, length, kinematics_integrator, tau, masspole, state_transition_dist)[source]

Bases: StateTransitionModel

Physics-based state transition model for CartPole POMDP.

This model implements the classical cart-pole dynamics with Gaussian process noise. The cart experiences forces that affect both cart acceleration and pole angular acceleration through coupled equations of motion, with additive Normal noise on the resulting next state.

Parameters:

state (ndarray)
action (int)
force_mag (float)
total_mass (float)
polemass_length (float)
gravity (float)
length (float)
kinematics_integrator (str)
tau (float)
masspole (float)
state_transition_dist (CovarianceParameterizedMultivariateNormal)

state: Current state [cart_position, cart_velocity, pole_angle, pole_velocity]

action: Force direction (0 for left, 1 for right)

force_mag: Magnitude of applied force

total_mass: Combined mass of cart and pole

polemass_length: Pole mass times pole length (moment calculation)

gravity: Gravitational acceleration constant

length: Half the pole’s length

kinematics_integrator: Integration method (“euler” or “semi-implicit euler”)

tau: Time step for integration

masspole: Mass of the pole

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> # Define initial state [position, velocity, angle, angular_velocity]
>>> state = np.array([0.0, 0.0, 0.1, 0.0])
>>> action = 1  # Apply right force

>>> # Create transition model with physics parameters and noise
>>> state_transition_cov = np.diag([1e-4, 1e-4, 2.5e-5, 1e-4])
>>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov)
>>> transition = CartPoleStateTransition(
...     state=state,
...     action=action,
...     force_mag=10.0,
...     total_mass=1.1,
...     polemass_length=0.05,
...     gravity=9.8,
...     length=0.5,
...     kinematics_integrator="euler",
...     tau=0.02,
...     masspole=0.1,
...     state_transition_dist=state_transition_dist
... )

>>> # Simulate physics step
>>> next_state = transition.sample()[0]
>>> len(next_state) == 4  # [pos, vel, angle, ang_vel]
True
>>> isinstance(next_state, np.ndarray)
True

probability(values)[source]

Calculate transition probabilities for given next states.

Parameters:: values (List[ndarray]) – List of next state values to calculate probabilities for
Return type:: ndarray
Returns:: Array of transition probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample next states from the transition model.

Parameters:: n_samples (int) – Number of next state samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled next states of length n_samples.

Note

Subclasses must implement this method according to their specific state transition dynamics.

POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs module

Vectorized particle belief updater for the CartPole POMDP.

This module implements a concrete VectorizedParticleBeliefUpdater that performs batched state transitions and observation log-likelihood evaluations for the CartPole environment, replacing per-particle Python loops with NumPy array operations.

Classes:: CartPoleVectorizedUpdater: Batched updater for the CartPole POMDP.
Functions:: create_cartpole_belief: Factory producing a configured belief for CartPolePOMDP.

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs.CartPoleVectorizedUpdater(state_transition_dist, obs_dist, force_mag, gravity, masscart, masspole, total_mass, length, polemass_length, tau, kinematics_integrator)[source]

Bases: VectorizedParticleBeliefUpdater

Vectorized particle belief updater for the CartPole POMDP.

Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations, replacing per-particle Python loops with batched array operations.

batch_transition applies the deterministic cart-pole physics to all particles and then adds a per-particle Gaussian process-noise sample drawn from state_transition_dist (mirroring CartPoleTransition.sample()). Observations follow a single Gaussian centred on the true state.

Parameters:

state_transition_dist (CovarianceParameterizedMultivariateNormal)
obs_dist (CovarianceParameterizedMultivariateNormal)
force_mag (float)
gravity (float)
masscart (float)
masspole (float)
total_mass (float)
length (float)
polemass_length (float)
tau (float)
kinematics_integrator (str)

state_transition_dist: Process-noise distribution added after the deterministic physics step.

obs_dist: Observation noise distribution.

force_mag: Magnitude of force applied to the cart.

gravity: Gravitational acceleration constant.

masscart: Mass of the cart.

masspole: Mass of the pole.

total_mass: Combined mass of cart and pole.

length: Half the pole’s length.

polemass_length: Pole mass times pole half-length.

tau: Integration time step.

kinematics_integrator: Integration method (“euler” or “semi-implicit euler”).

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov)
>>> updater = CartPoleVectorizedUpdater.from_environment(env)
>>> particles = np.random.uniform(-0.05, 0.05, (50, 4))
>>> action = 1
>>> next_p = updater.batch_transition(particles, action)
>>> next_p.shape
(50, 4)
>>> obs = np.array([0.0, 0.0, 0.0, 0.0])
>>> ll = updater.batch_observation_log_likelihood(next_p, action, obs)
>>> ll.shape
(50,)

batch_observation_log_likelihood(next_particles, action, observation)[source]

Compute observation log-likelihoods for all particles at once.

Parameters:

next_particles (ndarray) – Transitioned particle states of shape (N, d).
action (ndarray) – Action vector.
observation (ndarray) – Observed value.

Return type:

ndarray

Returns:

Log-likelihoods of shape (N,).

batch_transition(particles, action)[source]

Transition all particles in a single batched operation.

Parameters:

particles (ndarray) – Current particle states of shape (N, d).
action (ndarray) – Action vector.

Return type:

ndarray

Returns:

Next-state particles of shape (N, d).

property config_id: str: Return a deterministic identifier for this updater configuration.

classmethod from_environment(env)[source]

Construct an updater from a CartPolePOMDP instance.

Parameters:: env (CartPolePOMDP) – Environment to extract parameters from.
Return type:: CartPoleVectorizedUpdater
Returns:: A new CartPoleVectorizedUpdater instance.

POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs.create_cartpole_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]

Create a ready-to-use belief for the CartPole POMDP.

For BeliefType.GAUSSIAN, the following keyword arguments are forwarded to create_cartpole_gaussian_belief():

updater_type (GaussianBeliefUpdaterType): defaults to GaussianBeliefUpdaterType.UKF.
initial_covariance (np.ndarray): defaults to np.eye(4) * (0.1**2 / 12).
process_noise_scale (float): defaults to 1e-4.

Parameters:

env (CartPolePOMDP) – CartPolePOMDP environment instance.
belief_type (BeliefType) – Desired belief representation. Defaults to BeliefType.VECTORIZED_PARTICLE.
n_particles (int) – Number of particles (ignored for GAUSSIAN). Defaults to 200.
**kwargs (Any) – Extra arguments forwarded to the Gaussian factory.

Return type:

Belief

Returns:

A configured Belief object.

Raises:

ValueError – If belief_type is not supported.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP
>>> env = CartPolePOMDP(discount_factor=0.99,
...                    noise_cov=np.diag([0.1, 0.1, 0.1, 0.1]))
>>> belief = create_cartpole_belief(env, n_particles=50)
>>> belief.sample().shape
(4,)

POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs module

Factory for pre-configured Gaussian beliefs for the CartPole POMDP.

This module provides a single factory function that creates a GaussianBelief instance pre-configured for the CartPolePOMDP environment, with an enum-based selector for the updater type (EKF or UKF).

The CartPole POMDP has nonlinear dynamics (coupled cart-pole physics) with a linear-Gaussian observation model (identity plus additive noise). Because the dynamics are nonlinear, a standard linear Kalman filter is not applicable; only EKF (which requires analytical Jacobians) and UKF (Jacobian-free sigma-point propagation) are supported.

Classes:: GaussianBeliefUpdaterType: Enum selecting the Gaussian updater variant.
Functions:: create_cartpole_gaussian_belief: Factory producing a configured GaussianBelief.

class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs.GaussianBeliefUpdaterType(*values)[source]

Bases: Enum

Selector for the Gaussian belief updater variant.

EKF: Extended Kalman filter (linearised via analytical Jacobians).

UKF: Unscented Kalman filter (sigma-point propagation).

EKF = 'ekf'

UKF = 'ukf'

POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs.create_cartpole_gaussian_belief(env, updater_type, initial_covariance=None, process_noise_scale=0.0001)[source]

Create a GaussianBelief configured for a CartPolePOMDP.

The CartPole POMDP has nonlinear dynamics:

x_{t+1} = f(x_t, u_t)      (deterministic cart-pole physics)
z_t     = x_{t+1} + v,      v ~ N(0, R)

where R is env.noise_cov. A small process noise Q is added for numerical stability of the Kalman covariance updates.

Parameters:

env (CartPolePOMDP) – CartPolePOMDP instance.
updater_type (GaussianBeliefUpdaterType) – Which Gaussian updater to use (EKF or UKF).
initial_covariance (Optional[ndarray]) – Initial belief covariance of shape (4, 4). Defaults to np.eye(4) * (0.1**2 / 12) (variance of Uniform(-0.05, 0.05)).
process_noise_scale (float) – Diagonal scaling for the process noise covariance Q. Defaults to 1e-4.

Return type:

GaussianBelief

Returns:

A GaussianBelief with the selected updater.

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov)
>>> belief = create_cartpole_gaussian_belief(
...     env=env,
...     updater_type=GaussianBeliefUpdaterType.EKF,
... )
>>> belief.mean.shape
(4,)