POMDPPlanners.environments.cartpole_pomdp package
CartPole POMDP Environment Module.
This module provides the CartPole POMDP environment implementation and related components for pole-balancing tasks with noisy observations.
- Classes:
CartPolePOMDP: Main CartPole environment with POMDP formulation CartPoleStateTransition: Physics-based state transition model CartPoleObservation: Gaussian noise observation model CartPoleInitialStateDistribution: Initial state sampling distribution CartPolePOMDPMetrics: Metric names for CartPole POMDP environment
- class POMDPPlanners.environments.cartpole_pomdp.CartPoleInitialStateDistribution[source]
Bases:
DistributionInitial state distribution for CartPole POMDP.
This distribution generates random initial states for the cart-pole system by sampling uniformly from a small range around the equilibrium position. All state variables (position, velocity, angle, angular velocity) are initialized close to zero with small random perturbations.
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> # Create initial state distribution >>> initial_dist = CartPoleInitialStateDistribution()
>>> # Sample initial state >>> initial_state = initial_dist.sample()[0] >>> len(initial_state) == 4 True >>> all(-0.05 <= x <= 0.05 for x in initial_state) # Values in valid range True
>>> # Sample multiple initial states >>> states = initial_dist.sample(n_samples=3) >>> len(states) == 3 True >>> all(len(state) == 4 for state in states) True
>>> # Each state has 4 components: [cart_pos, cart_vel, pole_angle, pole_ang_vel] >>> position, velocity, angle, angular_velocity = initial_state >>> isinstance(position, (int, float, np.floating)) True
- sample(n_samples=1)[source]
Sample values from the distribution.
- Parameters:
n_samples (
int) – Number of samples to return. Defaults to 1.- Return type:
- Returns:
List of n_samples independent samples from the distribution
Note
Subclasses must implement this method according to their specific distribution type and parameters.
- class POMDPPlanners.environments.cartpole_pomdp.CartPoleObservation(next_state, action, obs_dist)[source]
Bases:
ObservationModelNoisy observation model for CartPole POMDP.
This model adds Gaussian noise to the true state to create partial observability. The agent receives a noisy version of the full state vector, making it challenging to determine the exact cart-pole configuration.
- Parameters:
next_state (ndarray)
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)
- next_state
True state after action execution
- action
Action that was taken (not used in observation generation)
- obs_dist
Pre-computed multivariate normal distribution for efficient sampling/PDF
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> # Define true state after action >>> true_state = np.array([0.1, 0.05, 0.02, -0.1]) >>> action = 1
>>> # Define observation noise covariance and create distribution >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> obs_dist = CovarianceParameterizedMultivariateNormal(noise_cov)
>>> # Create observation model >>> obs_model = CartPoleObservation( ... next_state=true_state, ... action=action, ... obs_dist=obs_dist ... )
>>> # Sample noisy observation >>> observation = obs_model.sample()[0] >>> len(observation) == 4 # Same dimensionality as state True >>> isinstance(observation, np.ndarray) True
>>> # Calculate probability of specific observation >>> prob = obs_model.probability([observation]) >>> len(prob) == 1 True
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[ndarray]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.cartpole_pomdp.CartPolePOMDP(discount_factor, noise_cov, state_transition_cov=None, name='CartPolePOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentCartPole balancing task formulated as a POMDP.
This environment simulates the classic cart-pole balancing problem where an agent must apply left or right forces to keep a pole balanced on a moving cart. The challenge comes from noisy observations of the cart-pole state.
Problem Structure: - State: [cart_position, cart_velocity, pole_angle, pole_velocity] (continuous) - Actions: [left_force, right_force] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: +1.0 per time step alive, 0.0 when terminated - Termination: Pole falls beyond angle threshold or cart moves too far
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
- DEFAULT_STATE_TRANSITION_COV = array([[1.0e-04, 0.0e+00, 0.0e+00, 0.0e+00], [0.0e+00, 1.0e-04, 0.0e+00, 0.0e+00], [0.0e+00, 0.0e+00, 2.5e-05, 0.0e+00], [0.0e+00, 0.0e+00, 0.0e+00, 1.0e-04]])
- compute_metrics(histories)[source]
Compute CartPole POMDP specific metrics from simulation histories.
- Parameters:
- Return type:
- Returns:
List of MetricValue objects containing the computed metrics
- get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
ndarray) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.cartpole_pomdp.CartPolePOMDPMetrics(*values)[source]
Bases:
EnumMetric names for CartPole POMDP environment.
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- class POMDPPlanners.environments.cartpole_pomdp.CartPoleStateTransition(state, action, force_mag, total_mass, polemass_length, gravity, length, kinematics_integrator, tau, masspole, state_transition_dist)[source]
Bases:
StateTransitionModelPhysics-based state transition model for CartPole POMDP.
This model implements the classical cart-pole dynamics with Gaussian process noise. The cart experiences forces that affect both cart acceleration and pole angular acceleration through coupled equations of motion, with additive Normal noise on the resulting next state.
- Parameters:
- state
Current state [cart_position, cart_velocity, pole_angle, pole_velocity]
- action
Force direction (0 for left, 1 for right)
- force_mag
Magnitude of applied force
- total_mass
Combined mass of cart and pole
- polemass_length
Pole mass times pole length (moment calculation)
- gravity
Gravitational acceleration constant
- length
Half the pole’s length
- kinematics_integrator
Integration method (“euler” or “semi-implicit euler”)
- tau
Time step for integration
- masspole
Mass of the pole
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> # Define initial state [position, velocity, angle, angular_velocity] >>> state = np.array([0.0, 0.0, 0.1, 0.0]) >>> action = 1 # Apply right force
>>> # Create transition model with physics parameters and noise >>> state_transition_cov = np.diag([1e-4, 1e-4, 2.5e-5, 1e-4]) >>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov) >>> transition = CartPoleStateTransition( ... state=state, ... action=action, ... force_mag=10.0, ... total_mass=1.1, ... polemass_length=0.05, ... gravity=9.8, ... length=0.5, ... kinematics_integrator="euler", ... tau=0.02, ... masspole=0.1, ... state_transition_dist=state_transition_dist ... )
>>> # Simulate physics step >>> next_state = transition.sample()[0] >>> len(next_state) == 4 # [pos, vel, angle, ang_vel] True >>> isinstance(next_state, np.ndarray) True
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[ndarray]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
Submodules
POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp module
CartPole POMDP Environment Implementation.
This module implements a CartPole balancing task as a POMDP, where an agent must balance a pole on a cart using discrete left/right force actions, with noisy observations of the cart-pole state.
The CartPole POMDP features: - Continuous 4D state space: [cart_position, cart_velocity, pole_angle, pole_velocity] - Discrete binary action space: [left_force, right_force] - Noisy continuous observations of the state - Physics-based dynamics simulation - Episode termination when pole falls beyond threshold or cart moves too far
- Classes:
CartPoleStateTransition: Physics-based state transition model CartPoleObservation: Gaussian noise observation model CartPolePOMDP: Main CartPole environment with POMDP formulation
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleInitialStateDistribution[source]
Bases:
DistributionInitial state distribution for CartPole POMDP.
This distribution generates random initial states for the cart-pole system by sampling uniformly from a small range around the equilibrium position. All state variables (position, velocity, angle, angular velocity) are initialized close to zero with small random perturbations.
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> # Create initial state distribution >>> initial_dist = CartPoleInitialStateDistribution()
>>> # Sample initial state >>> initial_state = initial_dist.sample()[0] >>> len(initial_state) == 4 True >>> all(-0.05 <= x <= 0.05 for x in initial_state) # Values in valid range True
>>> # Sample multiple initial states >>> states = initial_dist.sample(n_samples=3) >>> len(states) == 3 True >>> all(len(state) == 4 for state in states) True
>>> # Each state has 4 components: [cart_pos, cart_vel, pole_angle, pole_ang_vel] >>> position, velocity, angle, angular_velocity = initial_state >>> isinstance(position, (int, float, np.floating)) True
- sample(n_samples=1)[source]
Sample values from the distribution.
- Parameters:
n_samples (
int) – Number of samples to return. Defaults to 1.- Return type:
- Returns:
List of n_samples independent samples from the distribution
Note
Subclasses must implement this method according to their specific distribution type and parameters.
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleObservation(next_state, action, obs_dist)[source]
Bases:
ObservationModelNoisy observation model for CartPole POMDP.
This model adds Gaussian noise to the true state to create partial observability. The agent receives a noisy version of the full state vector, making it challenging to determine the exact cart-pole configuration.
- Parameters:
next_state (ndarray)
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)
- next_state
True state after action execution
- action
Action that was taken (not used in observation generation)
- obs_dist
Pre-computed multivariate normal distribution for efficient sampling/PDF
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> # Define true state after action >>> true_state = np.array([0.1, 0.05, 0.02, -0.1]) >>> action = 1
>>> # Define observation noise covariance and create distribution >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> obs_dist = CovarianceParameterizedMultivariateNormal(noise_cov)
>>> # Create observation model >>> obs_model = CartPoleObservation( ... next_state=true_state, ... action=action, ... obs_dist=obs_dist ... )
>>> # Sample noisy observation >>> observation = obs_model.sample()[0] >>> len(observation) == 4 # Same dimensionality as state True >>> isinstance(observation, np.ndarray) True
>>> # Calculate probability of specific observation >>> prob = obs_model.probability([observation]) >>> len(prob) == 1 True
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[ndarray]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPolePOMDP(discount_factor, noise_cov, state_transition_cov=None, name='CartPolePOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentCartPole balancing task formulated as a POMDP.
This environment simulates the classic cart-pole balancing problem where an agent must apply left or right forces to keep a pole balanced on a moving cart. The challenge comes from noisy observations of the cart-pole state.
Problem Structure: - State: [cart_position, cart_velocity, pole_angle, pole_velocity] (continuous) - Actions: [left_force, right_force] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: +1.0 per time step alive, 0.0 when terminated - Termination: Pole falls beyond angle threshold or cart moves too far
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
- DEFAULT_STATE_TRANSITION_COV = array([[1.0e-04, 0.0e+00, 0.0e+00, 0.0e+00], [0.0e+00, 1.0e-04, 0.0e+00, 0.0e+00], [0.0e+00, 0.0e+00, 2.5e-05, 0.0e+00], [0.0e+00, 0.0e+00, 0.0e+00, 1.0e-04]])
- compute_metrics(histories)[source]
Compute CartPole POMDP specific metrics from simulation histories.
- Parameters:
- Return type:
- Returns:
List of MetricValue objects containing the computed metrics
- get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
ndarray) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPolePOMDPMetrics(*values)[source]
Bases:
EnumMetric names for CartPole POMDP environment.
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp.CartPoleStateTransition(state, action, force_mag, total_mass, polemass_length, gravity, length, kinematics_integrator, tau, masspole, state_transition_dist)[source]
Bases:
StateTransitionModelPhysics-based state transition model for CartPole POMDP.
This model implements the classical cart-pole dynamics with Gaussian process noise. The cart experiences forces that affect both cart acceleration and pole angular acceleration through coupled equations of motion, with additive Normal noise on the resulting next state.
- Parameters:
- state
Current state [cart_position, cart_velocity, pole_angle, pole_velocity]
- action
Force direction (0 for left, 1 for right)
- force_mag
Magnitude of applied force
- total_mass
Combined mass of cart and pole
- polemass_length
Pole mass times pole length (moment calculation)
- gravity
Gravitational acceleration constant
- length
Half the pole’s length
- kinematics_integrator
Integration method (“euler” or “semi-implicit euler”)
- tau
Time step for integration
- masspole
Mass of the pole
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> # Define initial state [position, velocity, angle, angular_velocity] >>> state = np.array([0.0, 0.0, 0.1, 0.0]) >>> action = 1 # Apply right force
>>> # Create transition model with physics parameters and noise >>> state_transition_cov = np.diag([1e-4, 1e-4, 2.5e-5, 1e-4]) >>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov) >>> transition = CartPoleStateTransition( ... state=state, ... action=action, ... force_mag=10.0, ... total_mass=1.1, ... polemass_length=0.05, ... gravity=9.8, ... length=0.5, ... kinematics_integrator="euler", ... tau=0.02, ... masspole=0.1, ... state_transition_dist=state_transition_dist ... )
>>> # Simulate physics step >>> next_state = transition.sample()[0] >>> len(next_state) == 4 # [pos, vel, angle, ang_vel] True >>> isinstance(next_state, np.ndarray) True
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[ndarray]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs module
Vectorized particle belief updater for the CartPole POMDP.
This module implements a concrete
VectorizedParticleBeliefUpdater
that performs batched state transitions and observation log-likelihood
evaluations for the CartPole environment, replacing per-particle Python
loops with NumPy array operations.
- Classes:
CartPoleVectorizedUpdater: Batched updater for the CartPole POMDP.
- Functions:
create_cartpole_belief: Factory producing a configured belief for CartPolePOMDP.
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs.CartPoleVectorizedUpdater(state_transition_dist, obs_dist, force_mag, gravity, masscart, masspole, total_mass, length, polemass_length, tau, kinematics_integrator)[source]
Bases:
VectorizedParticleBeliefUpdaterVectorized particle belief updater for the CartPole POMDP.
Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations, replacing per-particle Python loops with batched array operations.
batch_transitionapplies the deterministic cart-pole physics to all particles and then adds a per-particle Gaussian process-noise sample drawn fromstate_transition_dist(mirroringCartPoleTransition.sample()). Observations follow a single Gaussian centred on the true state.- Parameters:
- state_transition_dist
Process-noise distribution added after the deterministic physics step.
- obs_dist
Observation noise distribution.
- force_mag
Magnitude of force applied to the cart.
- gravity
Gravitational acceleration constant.
- masscart
Mass of the cart.
- masspole
Mass of the pole.
- total_mass
Combined mass of cart and pole.
- length
Half the pole’s length.
- polemass_length
Pole mass times pole half-length.
- tau
Integration time step.
- kinematics_integrator
Integration method (“euler” or “semi-implicit euler”).
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov) >>> updater = CartPoleVectorizedUpdater.from_environment(env) >>> particles = np.random.uniform(-0.05, 0.05, (50, 4)) >>> action = 1 >>> next_p = updater.batch_transition(particles, action) >>> next_p.shape (50, 4) >>> obs = np.array([0.0, 0.0, 0.0, 0.0]) >>> ll = updater.batch_observation_log_likelihood(next_p, action, obs) >>> ll.shape (50,)
- batch_observation_log_likelihood(next_particles, action, observation)[source]
Compute observation log-likelihoods for all particles at once.
- batch_transition(particles, action)[source]
Transition all particles in a single batched operation.
- classmethod from_environment(env)[source]
Construct an updater from a CartPolePOMDP instance.
- Parameters:
env (
CartPolePOMDP) – Environment to extract parameters from.- Return type:
- Returns:
A new
CartPoleVectorizedUpdaterinstance.
- POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_beliefs.create_cartpole_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]
Create a ready-to-use belief for the CartPole POMDP.
For
BeliefType.GAUSSIAN, the following keyword arguments are forwarded tocreate_cartpole_gaussian_belief():updater_type(GaussianBeliefUpdaterType): defaults toGaussianBeliefUpdaterType.UKF.initial_covariance(np.ndarray): defaults tonp.eye(4) * (0.1**2 / 12).process_noise_scale(float): defaults to1e-4.
- Parameters:
env (
CartPolePOMDP) – CartPolePOMDP environment instance.belief_type (
BeliefType) – Desired belief representation. Defaults toBeliefType.VECTORIZED_PARTICLE.n_particles (
int) – Number of particles (ignored for GAUSSIAN). Defaults to 200.**kwargs (
Any) – Extra arguments forwarded to the Gaussian factory.
- Return type:
- Returns:
A configured
Beliefobject.- Raises:
ValueError – If belief_type is not supported.
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP >>> env = CartPolePOMDP(discount_factor=0.99, ... noise_cov=np.diag([0.1, 0.1, 0.1, 0.1])) >>> belief = create_cartpole_belief(env, n_particles=50) >>> belief.sample().shape (4,)
POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs module
Factory for pre-configured Gaussian beliefs for the CartPole POMDP.
This module provides a single factory function that creates a
GaussianBelief instance
pre-configured for the
CartPolePOMDP
environment, with an enum-based selector for the updater type (EKF or UKF).
The CartPole POMDP has nonlinear dynamics (coupled cart-pole physics) with a linear-Gaussian observation model (identity plus additive noise). Because the dynamics are nonlinear, a standard linear Kalman filter is not applicable; only EKF (which requires analytical Jacobians) and UKF (Jacobian-free sigma-point propagation) are supported.
- Classes:
GaussianBeliefUpdaterType: Enum selecting the Gaussian updater variant.
- Functions:
create_cartpole_gaussian_belief: Factory producing a configured GaussianBelief.
- class POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs.GaussianBeliefUpdaterType(*values)[source]
Bases:
EnumSelector for the Gaussian belief updater variant.
- EKF
Extended Kalman filter (linearised via analytical Jacobians).
- UKF
Unscented Kalman filter (sigma-point propagation).
- EKF = 'ekf'
- UKF = 'ukf'
- POMDPPlanners.environments.cartpole_pomdp.cartpole_pomdp_gaussian_beliefs.create_cartpole_gaussian_belief(env, updater_type, initial_covariance=None, process_noise_scale=0.0001)[source]
Create a GaussianBelief configured for a CartPolePOMDP.
The CartPole POMDP has nonlinear dynamics:
x_{t+1} = f(x_t, u_t) (deterministic cart-pole physics) z_t = x_{t+1} + v, v ~ N(0, R)
where R is
env.noise_cov. A small process noise Q is added for numerical stability of the Kalman covariance updates.- Parameters:
env (
CartPolePOMDP) – CartPolePOMDP instance.updater_type (
GaussianBeliefUpdaterType) – Which Gaussian updater to use (EKF or UKF).initial_covariance (
Optional[ndarray]) – Initial belief covariance of shape (4, 4). Defaults tonp.eye(4) * (0.1**2 / 12)(variance of Uniform(-0.05, 0.05)).process_noise_scale (
float) – Diagonal scaling for the process noise covariance Q. Defaults to 1e-4.
- Return type:
- Returns:
A
GaussianBeliefwith the selected updater.
Example
>>> import numpy as np >>> from POMDPPlanners.environments.cartpole_pomdp import CartPolePOMDP >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> env = CartPolePOMDP(discount_factor=0.99, noise_cov=noise_cov) >>> belief = create_cartpole_gaussian_belief( ... env=env, ... updater_type=GaussianBeliefUpdaterType.EKF, ... ) >>> belief.mean.shape (4,)