POMDPPlanners.environments.mountain_car_pomdp package
Mountain Car POMDP Environment Module.
This module provides the Mountain Car POMDP environment implementation and related components for hill-climbing tasks with noisy observations.
- Classes:
MountainCarPOMDP: Main Mountain Car environment with POMDP formulation MountainCarTransition: Physics-based state transition model MountainCarObservation: Gaussian noise observation model MountainCarPOMDPMetrics: Metric names for Mountain Car POMDP environment
- class POMDPPlanners.environments.mountain_car_pomdp.MountainCarObservation(next_state, action, obs_dist)[source]
Bases:
ObservationModelNoisy observation model for Mountain Car POMDP.
This model adds Gaussian noise to the true car state (position, velocity) to create partial observability. The agent receives noisy measurements of both position and velocity, making state estimation challenging.
- Parameters:
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)
- next_state
True state after action execution
- action
Action that was taken (not used in observation generation)
- mean
Expected observation (equals true state)
Example
Using the Mountain Car observation model:
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Define true state after physics step >>> true_state = (-0.45, 0.02) # [position, velocity] >>> action = 1 >>> >>> # Define observation noise covariance and create distribution >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> cov_matrix = np.array([[0.1**2, 0], [0, 0.01**2]]) # Position and velocity noise >>> obs_dist = CovarianceParameterizedMultivariateNormal(cov_matrix) >>> >>> # Create observation model >>> obs_model = MountainCarObservation( ... next_state=true_state, ... action=action, ... obs_dist=obs_dist ... ) >>> >>> # Sample noisy observation >>> observation = obs_model.sample()[0] >>> # Returns noisy [position, velocity] close to true_state >>> print(f"True state: {true_state}") True state: (-0.45, 0.02) >>> print(f"Noisy observation: [{observation[0]:.3f}, {observation[1]:.3f}]") Noisy observation: [-0.400, 0.019] >>> >>> # Calculate observation probability >>> prob = obs_model.probability([observation]) >>> print(f"Observation probability: {prob[0]:.6f}") Observation probability: 139.345607
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[ndarray]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.mountain_car_pomdp.MountainCarPOMDP(discount_factor, state_transition_cov=None, name='MountainCarPOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentMountain Car problem formulated as a POMDP.
This environment simulates an underpowered car trying to reach the top of a steep mountain. The car must build momentum by oscillating back and forth to gain enough energy to reach the goal, with noisy observations of its state.
Problem Structure: - State: [position, velocity] (continuous, position ∈ [-1.2, 0.6], velocity ∈ [-0.07, 0.07]) - Actions: [-1 (reverse), 0 (neutral), 1 (forward)] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: 0 for reaching goal (position ≥ 0.5), -1 per time step otherwise - Goal: Drive car to position ≥ 0.5 (top of mountain)
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = MountainCarPOMDP(discount_factor=0.99) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
- DEFAULT_STATE_TRANSITION_COV = array([[2.5e-05, 0.0e+00], [0.0e+00, 1.0e-06]])
- cache_visualization(history, cache_path)[source]
Cache visualization data for an episode history.
This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.
- compute_metrics(histories)[source]
Compute Mountain Car POMDP specific metrics from simulation histories.
- Parameters:
- Return type:
- Returns:
List of MetricValue objects containing the computed metrics
- get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
Tuple[float,float]) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.mountain_car_pomdp.MountainCarPOMDPMetrics(*values)[source]
Bases:
EnumMetric names for Mountain Car POMDP environment.
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- class POMDPPlanners.environments.mountain_car_pomdp.MountainCarTransition(state, action, power, gravity, max_speed, min_position, max_position, state_transition_dist)[source]
Bases:
StateTransitionModelPhysics-based state transition model for Mountain Car POMDP.
This model implements the physics of a car on a sinusoidal hill surface with additive Gaussian process noise. The car’s velocity is affected by both the applied action (engine force) and gravitational force that depends on the slope of the hill.
The physics equations are: - velocity += action * power + cos(3 * position) * (-gravity) - position += velocity
After computing the deterministic next state, Normal noise is sampled from the provided distribution and added. The result is then clipped to respect position and velocity bounds.
- Parameters:
- state
Current state (position, velocity) tuple
- action
Engine action (-1, 0, or 1)
- power
Engine power scaling factor
- gravity
Gravitational force constant
- max_speed
Maximum velocity magnitude
- min_position
Minimum position boundary
- max_position
Maximum position boundary
Example
Using the Mountain Car transition model:
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> >>> # Define car state: position=-0.5 (in valley), velocity=0.0 >>> state = (-0.5, 0.0) >>> action = 1 # Accelerate right/forward >>> >>> # Create transition model with noise >>> state_transition_cov = np.diag([2.5e-5, 1e-6]) >>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov) >>> transition = MountainCarTransition( ... state=state, ... action=action, ... power=0.001, ... gravity=0.0025, ... max_speed=0.07, ... min_position=-1.2, ... max_position=0.6, ... state_transition_dist=state_transition_dist ... ) >>> >>> # Simulate physics step >>> next_state = transition.sample()[0] >>> # Returns new [position, velocity] with physics and noise applied >>> len(next_state) == 2 True
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[ndarray]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
Submodules
POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp module
Mountain Car POMDP Environment Implementation.
This module implements the classic Mountain Car problem as a POMDP, where an agent must drive an underpowered car up a steep mountain by building momentum through oscillating motion, with noisy observations of the car’s state.
The Mountain Car POMDP features: - Continuous 2D state space: [position, velocity] - Discrete action space: [-1 (reverse), 0 (neutral), 1 (forward)] - Noisy continuous observations of position and velocity - Physics-based dynamics with gravity and momentum - Sparse reward: 0 for reaching goal, -1 per time step otherwise
The key challenge is that the car’s engine is too weak to drive directly up the mountain, so the agent must learn to build momentum by first moving away from the goal.
- Classes:
MountainCarTransition: Physics-based state transition model MountainCarObservation: Gaussian noise observation model MountainCarPOMDP: Main Mountain Car environment with POMDP formulation
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarObservation(next_state, action, obs_dist)[source]
Bases:
ObservationModelNoisy observation model for Mountain Car POMDP.
This model adds Gaussian noise to the true car state (position, velocity) to create partial observability. The agent receives noisy measurements of both position and velocity, making state estimation challenging.
- Parameters:
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)
- next_state
True state after action execution
- action
Action that was taken (not used in observation generation)
- mean
Expected observation (equals true state)
Example
Using the Mountain Car observation model:
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Define true state after physics step >>> true_state = (-0.45, 0.02) # [position, velocity] >>> action = 1 >>> >>> # Define observation noise covariance and create distribution >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> cov_matrix = np.array([[0.1**2, 0], [0, 0.01**2]]) # Position and velocity noise >>> obs_dist = CovarianceParameterizedMultivariateNormal(cov_matrix) >>> >>> # Create observation model >>> obs_model = MountainCarObservation( ... next_state=true_state, ... action=action, ... obs_dist=obs_dist ... ) >>> >>> # Sample noisy observation >>> observation = obs_model.sample()[0] >>> # Returns noisy [position, velocity] close to true_state >>> print(f"True state: {true_state}") True state: (-0.45, 0.02) >>> print(f"Noisy observation: [{observation[0]:.3f}, {observation[1]:.3f}]") Noisy observation: [-0.400, 0.019] >>> >>> # Calculate observation probability >>> prob = obs_model.probability([observation]) >>> print(f"Observation probability: {prob[0]:.6f}") Observation probability: 139.345607
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[ndarray]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarPOMDP(discount_factor, state_transition_cov=None, name='MountainCarPOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
DiscreteActionsEnvironmentMountain Car problem formulated as a POMDP.
This environment simulates an underpowered car trying to reach the top of a steep mountain. The car must build momentum by oscillating back and forth to gain enough energy to reach the goal, with noisy observations of its state.
Problem Structure: - State: [position, velocity] (continuous, position ∈ [-1.2, 0.6], velocity ∈ [-0.07, 0.07]) - Actions: [-1 (reverse), 0 (neutral), 1 (forward)] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: 0 for reaching goal (position ≥ 0.5), -1 per time step otherwise - Goal: Drive car to position ≥ 0.5 (top of mountain)
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> # Initialize environment >>> env = MountainCarPOMDP(discount_factor=0.99) >>> >>> # Get initial state and actions >>> initial_state = env.initial_state_dist().sample()[0] >>> actions = env.get_actions() >>> >>> # Sample complete step using convenience method >>> action = actions[0] >>> next_state, observation, reward = env.sample_next_step(initial_state, action) >>> >>> # Check terminal condition >>> env.is_terminal(initial_state) False
- Parameters:
- DEFAULT_STATE_TRANSITION_COV = array([[2.5e-05, 0.0e+00], [0.0e+00, 1.0e-06]])
- cache_visualization(history, cache_path)[source]
Cache visualization data for an episode history.
This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.
- compute_metrics(histories)[source]
Compute Mountain Car POMDP specific metrics from simulation histories.
- Parameters:
- Return type:
- Returns:
List of MetricValue objects containing the computed metrics
- get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
Tuple[float,float]) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarPOMDPMetrics(*values)[source]
Bases:
EnumMetric names for Mountain Car POMDP environment.
- GOAL_REACHING_RATE = 'goal_reaching_rate'
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarTransition(state, action, power, gravity, max_speed, min_position, max_position, state_transition_dist)[source]
Bases:
StateTransitionModelPhysics-based state transition model for Mountain Car POMDP.
This model implements the physics of a car on a sinusoidal hill surface with additive Gaussian process noise. The car’s velocity is affected by both the applied action (engine force) and gravitational force that depends on the slope of the hill.
The physics equations are: - velocity += action * power + cos(3 * position) * (-gravity) - position += velocity
After computing the deterministic next state, Normal noise is sampled from the provided distribution and added. The result is then clipped to respect position and velocity bounds.
- Parameters:
- state
Current state (position, velocity) tuple
- action
Engine action (-1, 0, or 1)
- power
Engine power scaling factor
- gravity
Gravitational force constant
- max_speed
Maximum velocity magnitude
- min_position
Minimum position boundary
- max_position
Maximum position boundary
Example
Using the Mountain Car transition model:
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal >>> >>> # Define car state: position=-0.5 (in valley), velocity=0.0 >>> state = (-0.5, 0.0) >>> action = 1 # Accelerate right/forward >>> >>> # Create transition model with noise >>> state_transition_cov = np.diag([2.5e-5, 1e-6]) >>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov) >>> transition = MountainCarTransition( ... state=state, ... action=action, ... power=0.001, ... gravity=0.0025, ... max_speed=0.07, ... min_position=-1.2, ... max_position=0.6, ... state_transition_dist=state_transition_dist ... ) >>> >>> # Simulate physics step >>> next_state = transition.sample()[0] >>> # Returns new [position, velocity] with physics and noise applied >>> len(next_state) == 2 True
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[ndarray]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs module
Vectorized particle belief updater for the Mountain Car POMDP.
This module implements a concrete
VectorizedParticleBeliefUpdater
that performs batched state transitions and observation log-likelihood
evaluations for the Mountain Car environment, replacing per-particle
Python loops with NumPy array operations.
- Classes:
MountainCarVectorizedUpdater: Batched updater for the Mountain Car POMDP.
- Functions:
create_mountain_car_belief: Factory producing a configured belief for MountainCarPOMDP.
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs.MountainCarVectorizedUpdater(state_transition_dist, obs_dist, power, gravity, max_speed, min_position, max_position)[source]
Bases:
VectorizedParticleBeliefUpdaterVectorized particle belief updater for the Mountain Car POMDP.
Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations, replacing per-particle Python loops with batched array operations.
batch_transitionapplies the deterministic cart physics to all particles, then adds a per-particle Gaussian process-noise sample drawn fromstate_transition_dist(mirroringMountainCarTransition.sample()), and finally re-applies the position/velocity clipping and wall-stop boundary rule. Observations follow a single Gaussian centred on the true state.- Parameters:
state_transition_dist (CovarianceParameterizedMultivariateNormal)
obs_dist (CovarianceParameterizedMultivariateNormal)
power (float)
gravity (float)
max_speed (float)
min_position (float)
max_position (float)
- state_transition_dist
Process-noise distribution added after the deterministic physics step.
- obs_dist
Observation noise distribution.
- power
Engine power scaling factor.
- gravity
Gravitational force constant.
- max_speed
Maximum velocity magnitude.
- min_position
Minimum position boundary.
- max_position
Maximum position boundary.
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP >>> env = MountainCarPOMDP(discount_factor=0.99) >>> updater = MountainCarVectorizedUpdater.from_environment(env) >>> particles = np.column_stack([ ... np.random.uniform(-0.6, -0.4, 50), ... np.zeros(50), ... ]) >>> action = 1 >>> next_p = updater.batch_transition(particles, action) >>> next_p.shape (50, 2) >>> obs = np.array([-0.5, 0.0]) >>> ll = updater.batch_observation_log_likelihood(next_p, action, obs) >>> ll.shape (50,)
- batch_observation_log_likelihood(next_particles, action, observation)[source]
Compute observation log-likelihoods for all particles at once.
- batch_transition(particles, action)[source]
Transition all particles in a single batched operation.
- classmethod from_environment(env)[source]
Construct an updater from a MountainCarPOMDP instance.
- Parameters:
env (
MountainCarPOMDP) – Environment to extract parameters from.- Return type:
- Returns:
A new
MountainCarVectorizedUpdaterinstance.
- POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs.create_mountain_car_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]
Create a ready-to-use belief for the Mountain Car POMDP.
For
BeliefType.GAUSSIAN, the following keyword arguments are forwarded tocreate_mountain_car_gaussian_belief():updater_type(GaussianBeliefUpdaterType): defaults toGaussianBeliefUpdaterType.UKF.initial_covariance(np.ndarray): defaults tonp.diag([0.2**2 / 12, 1e-4]).process_noise_scale(float): defaults to1e-4.
- Parameters:
env (
MountainCarPOMDP) – MountainCarPOMDP environment instance.belief_type (
BeliefType) – Desired belief representation. Defaults toBeliefType.VECTORIZED_PARTICLE.n_particles (
int) – Number of particles (ignored for GAUSSIAN). Defaults to 200.**kwargs (
Any) – Extra arguments forwarded to the Gaussian factory.
- Return type:
- Returns:
A configured
Beliefobject.- Raises:
ValueError – If belief_type is not supported.
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP >>> env = MountainCarPOMDP(discount_factor=0.99) >>> belief = create_mountain_car_belief(env, n_particles=50) >>> belief.sample().shape (2,)
POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs module
Factory for pre-configured Gaussian beliefs for the Mountain Car POMDP.
This module provides a single factory function that creates a
GaussianBelief instance
pre-configured for the
MountainCarPOMDP
environment, with an enum-based selector for the updater type (EKF or UKF).
The Mountain Car POMDP has nonlinear dynamics (velocity depends on
cos(3 * position)) with a linear-Gaussian observation model (identity
plus additive noise). Because the dynamics are nonlinear, a standard
linear Kalman filter is not applicable; only EKF (which requires analytical
Jacobians) and UKF (Jacobian-free sigma-point propagation) are supported.
- Classes:
GaussianBeliefUpdaterType: Enum selecting the Gaussian updater variant.
- Functions:
create_mountain_car_gaussian_belief: Factory producing a configured GaussianBelief.
- class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs.GaussianBeliefUpdaterType(*values)[source]
Bases:
EnumSelector for the Gaussian belief updater variant.
- EKF
Extended Kalman filter (linearised via analytical Jacobians).
- UKF
Unscented Kalman filter (sigma-point propagation).
- EKF = 'ekf'
- UKF = 'ukf'
- POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs.create_mountain_car_gaussian_belief(env, updater_type, initial_covariance=None, process_noise_scale=0.0001)[source]
Create a GaussianBelief configured for a MountainCarPOMDP.
The Mountain Car POMDP has nonlinear dynamics:
v_{t+1} = clip(v_t + action * power + cos(3 * p_t) * (-gravity)) p_{t+1} = clip(p_t + v_{t+1}) z_t = [p_{t+1}, v_{t+1}] + w, w ~ N(0, R)
where R is
env.cov_matrix. A small process noise Q is added for numerical stability of the Kalman covariance updates.- Parameters:
env (
MountainCarPOMDP) – MountainCarPOMDP instance.updater_type (
GaussianBeliefUpdaterType) – Which Gaussian updater to use (EKF or UKF).initial_covariance (
Optional[ndarray]) – Initial belief covariance of shape (2, 2). Defaults tonp.diag([0.2**2 / 12, 1e-4])(variance of Uniform(-0.6, -0.4) for position, small for velocity).process_noise_scale (
float) – Diagonal scaling for the process noise covariance Q. Defaults to 1e-4.
- Return type:
- Returns:
A
GaussianBeliefwith the selected updater.
Example
>>> import numpy as np >>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP >>> env = MountainCarPOMDP(discount_factor=0.99) >>> belief = create_mountain_car_gaussian_belief( ... env=env, ... updater_type=GaussianBeliefUpdaterType.EKF, ... ) >>> belief.mean.shape (2,)