POMDPPlanners.environments.mountain_car_pomdp package

Mountain Car POMDP Environment Module.

This module provides the Mountain Car POMDP environment implementation and related components for hill-climbing tasks with noisy observations.

Classes:: MountainCarPOMDP: Main Mountain Car environment with POMDP formulation MountainCarTransition: Physics-based state transition model MountainCarObservation: Gaussian noise observation model MountainCarPOMDPMetrics: Metric names for Mountain Car POMDP environment

class POMDPPlanners.environments.mountain_car_pomdp.MountainCarObservation(next_state, action, obs_dist)[source]

Bases: ObservationModel

Noisy observation model for Mountain Car POMDP.

This model adds Gaussian noise to the true car state (position, velocity) to create partial observability. The agent receives noisy measurements of both position and velocity, making state estimation challenging.

Parameters:

next_state (Tuple[float, float])
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)

next_state: True state after action execution

action: Action that was taken (not used in observation generation)

mean: Expected observation (equals true state)

Example

Using the Mountain Car observation model:

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Define true state after physics step
>>> true_state = (-0.45, 0.02)  # [position, velocity]
>>> action = 1
>>>
>>> # Define observation noise covariance and create distribution
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> cov_matrix = np.array([[0.1**2, 0], [0, 0.01**2]])  # Position and velocity noise
>>> obs_dist = CovarianceParameterizedMultivariateNormal(cov_matrix)
>>>
>>> # Create observation model
>>> obs_model = MountainCarObservation(
...     next_state=true_state,
...     action=action,
...     obs_dist=obs_dist
... )
>>>
>>> # Sample noisy observation
>>> observation = obs_model.sample()[0]
>>> # Returns noisy [position, velocity] close to true_state
>>> print(f"True state: {true_state}")
True state: (-0.45, 0.02)
>>> print(f"Noisy observation: [{observation[0]:.3f}, {observation[1]:.3f}]")
Noisy observation: [-0.400, 0.019]
>>>
>>> # Calculate observation probability
>>> prob = obs_model.probability([observation])
>>> print(f"Observation probability: {prob[0]:.6f}")
Observation probability: 139.345607

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[ndarray]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.mountain_car_pomdp.MountainCarPOMDP(discount_factor, state_transition_cov=None, name='MountainCarPOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]

Bases: DiscreteActionsEnvironment

Mountain Car problem formulated as a POMDP.

This environment simulates an underpowered car trying to reach the top of a steep mountain. The car must build momentum by oscillating back and forth to gain enough energy to reach the goal, with noisy observations of its state.

Problem Structure: - State: [position, velocity] (continuous, position ∈ [-1.2, 0.6], velocity ∈ [-0.07, 0.07]) - Actions: [-1 (reverse), 0 (neutral), 1 (forward)] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: 0 for reaching goal (position ≥ 0.5), -1 per time step otherwise - Goal: Drive car to position ≥ 0.5 (top of mountain)

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> env = MountainCarPOMDP(discount_factor=0.99)
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False

Parameters:

discount_factor (float)
state_transition_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)

DEFAULT_STATE_TRANSITION_COV = array([[2.5e-05, 0.0e+00], [0.0e+00, 1.0e-06]])

cache_visualization(history, cache_path)[source]

Cache visualization data for an episode history.

This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.

Parameters:

history (List[StepData]) – List of step data from an episode
cache_path (Path) – Path where visualization data should be cached

Return type:

None

compute_metrics(histories)[source]

Compute Mountain Car POMDP specific metrics from simulation histories.

Parameters:: histories (List[History]) – List of simulation histories
Return type:: List[MetricValue]
Returns:: List of MetricValue objects containing the computed metrics

get_actions()[source]

Get all possible actions in the discrete action space.

Return type:: List[Any]
Returns:: List containing all valid actions that can be executed

Note

Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.

get_metric_names()[source]

Get names of Mountain Car POMDP specific metrics.

Returns:: goal_reaching_rate
Return type:: List[str]

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:: Distribution
Returns:: Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:: Distribution
Returns:: Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:

observation1 (Tuple[float, float]) – First observation to compare
observation2 (Tuple[float, float]) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

is_terminal(state)[source]

Check if a state is terminal.

Parameters:: state (Tuple[float, float]) – State to check for terminal condition
Return type:: bool
Returns:: True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:

next_state (Tuple[float, float]) – The resulting state after taking an action
action (int) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:

state (Tuple[float, float]) – Current state
action (int) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

reward_batch(states, action)[source]

Calculate rewards for a batch of states given a single action.

Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.

Parameters:

states (Union[ndarray, Sequence[Any]]) – Sequence of states of length N.
action (int) – Action executed from each state.

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:

state (Tuple[float, float]) – Current state
action (int) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

class POMDPPlanners.environments.mountain_car_pomdp.MountainCarPOMDPMetrics(*values)[source]

Bases: Enum

Metric names for Mountain Car POMDP environment.

GOAL_REACHING_RATE = 'goal_reaching_rate'

class POMDPPlanners.environments.mountain_car_pomdp.MountainCarTransition(state, action, power, gravity, max_speed, min_position, max_position, state_transition_dist)[source]

Bases: StateTransitionModel

Physics-based state transition model for Mountain Car POMDP.

This model implements the physics of a car on a sinusoidal hill surface with additive Gaussian process noise. The car’s velocity is affected by both the applied action (engine force) and gravitational force that depends on the slope of the hill.

The physics equations are: - velocity += action * power + cos(3 * position) * (-gravity) - position += velocity

After computing the deterministic next state, Normal noise is sampled from the provided distribution and added. The result is then clipped to respect position and velocity bounds.

Parameters:

state (Tuple[float, float])
action (int)
power (float)
gravity (float)
max_speed (float)
min_position (float)
max_position (float)
state_transition_dist (CovarianceParameterizedMultivariateNormal)

state: Current state (position, velocity) tuple

action: Engine action (-1, 0, or 1)

power: Engine power scaling factor

gravity: Gravitational force constant

max_speed: Maximum velocity magnitude

min_position: Minimum position boundary

max_position: Maximum position boundary

Example

Using the Mountain Car transition model:

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>>
>>> # Define car state: position=-0.5 (in valley), velocity=0.0
>>> state = (-0.5, 0.0)
>>> action = 1  # Accelerate right/forward
>>>
>>> # Create transition model with noise
>>> state_transition_cov = np.diag([2.5e-5, 1e-6])
>>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov)
>>> transition = MountainCarTransition(
...     state=state,
...     action=action,
...     power=0.001,
...     gravity=0.0025,
...     max_speed=0.07,
...     min_position=-1.2,
...     max_position=0.6,
...     state_transition_dist=state_transition_dist
... )
>>>
>>> # Simulate physics step
>>> next_state = transition.sample()[0]
>>> # Returns new [position, velocity] with physics and noise applied
>>> len(next_state) == 2
True

probability(values)[source]

Calculate transition probabilities for given next states.

Parameters:: values (List[ndarray]) – List of next state values to calculate probabilities for
Return type:: ndarray
Returns:: Array of transition probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample next states from the transition model.

Parameters:: n_samples (int) – Number of next state samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled next states of length n_samples.

Note

Subclasses must implement this method according to their specific state transition dynamics.

Submodules

POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp module

Mountain Car POMDP Environment Implementation.

This module implements the classic Mountain Car problem as a POMDP, where an agent must drive an underpowered car up a steep mountain by building momentum through oscillating motion, with noisy observations of the car’s state.

The Mountain Car POMDP features: - Continuous 2D state space: [position, velocity] - Discrete action space: [-1 (reverse), 0 (neutral), 1 (forward)] - Noisy continuous observations of position and velocity - Physics-based dynamics with gravity and momentum - Sparse reward: 0 for reaching goal, -1 per time step otherwise

The key challenge is that the car’s engine is too weak to drive directly up the mountain, so the agent must learn to build momentum by first moving away from the goal.

Classes:: MountainCarTransition: Physics-based state transition model MountainCarObservation: Gaussian noise observation model MountainCarPOMDP: Main Mountain Car environment with POMDP formulation

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarObservation(next_state, action, obs_dist)[source]

Bases: ObservationModel

Noisy observation model for Mountain Car POMDP.

This model adds Gaussian noise to the true car state (position, velocity) to create partial observability. The agent receives noisy measurements of both position and velocity, making state estimation challenging.

Parameters:

next_state (Tuple[float, float])
action (int)
obs_dist (CovarianceParameterizedMultivariateNormal)

next_state: True state after action execution

action: Action that was taken (not used in observation generation)

mean: Expected observation (equals true state)

Example

Using the Mountain Car observation model:

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Define true state after physics step
>>> true_state = (-0.45, 0.02)  # [position, velocity]
>>> action = 1
>>>
>>> # Define observation noise covariance and create distribution
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>> cov_matrix = np.array([[0.1**2, 0], [0, 0.01**2]])  # Position and velocity noise
>>> obs_dist = CovarianceParameterizedMultivariateNormal(cov_matrix)
>>>
>>> # Create observation model
>>> obs_model = MountainCarObservation(
...     next_state=true_state,
...     action=action,
...     obs_dist=obs_dist
... )
>>>
>>> # Sample noisy observation
>>> observation = obs_model.sample()[0]
>>> # Returns noisy [position, velocity] close to true_state
>>> print(f"True state: {true_state}")
True state: (-0.45, 0.02)
>>> print(f"Noisy observation: [{observation[0]:.3f}, {observation[1]:.3f}]")
Noisy observation: [-0.400, 0.019]
>>>
>>> # Calculate observation probability
>>> prob = obs_model.probability([observation])
>>> print(f"Observation probability: {prob[0]:.6f}")
Observation probability: 139.345607

probability(values)[source]

Calculate observation probabilities for given values.

Parameters:: values (List[ndarray]) – List of observation values to calculate probabilities for
Return type:: ndarray
Returns:: Array of probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample observations from the observation model.

Parameters:: n_samples (int) – Number of observation samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled observations of length n_samples.

Note

Subclasses must implement this method according to their specific observation generation logic.

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarPOMDP(discount_factor, state_transition_cov=None, name='MountainCarPOMDP', output_dir=None, debug=False, use_queue_logger=False)[source]

Bases: DiscreteActionsEnvironment

Mountain Car problem formulated as a POMDP.

This environment simulates an underpowered car trying to reach the top of a steep mountain. The car must build momentum by oscillating back and forth to gain enough energy to reach the goal, with noisy observations of its state.

Problem Structure: - State: [position, velocity] (continuous, position ∈ [-1.2, 0.6], velocity ∈ [-0.07, 0.07]) - Actions: [-1 (reverse), 0 (neutral), 1 (forward)] (discrete) - Observations: Noisy state measurements (continuous) - Rewards: 0 for reaching goal (position ≥ 0.5), -1 per time step otherwise - Goal: Drive car to position ≥ 0.5 (top of mountain)

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> # Initialize environment
>>> env = MountainCarPOMDP(discount_factor=0.99)
>>>
>>> # Get initial state and actions
>>> initial_state = env.initial_state_dist().sample()[0]
>>> actions = env.get_actions()
>>>
>>> # Sample complete step using convenience method
>>> action = actions[0]
>>> next_state, observation, reward = env.sample_next_step(initial_state, action)
>>>
>>> # Check terminal condition
>>> env.is_terminal(initial_state)
False

Parameters:

discount_factor (float)
state_transition_cov (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
name (str)
output_dir (Path | None)
debug (bool)
use_queue_logger (bool)

DEFAULT_STATE_TRANSITION_COV = array([[2.5e-05, 0.0e+00], [0.0e+00, 1.0e-06]])

cache_visualization(history, cache_path)[source]

Cache visualization data for an episode history.

This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.

Parameters:

history (List[StepData]) – List of step data from an episode
cache_path (Path) – Path where visualization data should be cached

Return type:

None

compute_metrics(histories)[source]

Compute Mountain Car POMDP specific metrics from simulation histories.

Parameters:: histories (List[History]) – List of simulation histories
Return type:: List[MetricValue]
Returns:: List of MetricValue objects containing the computed metrics

get_actions()[source]

Get all possible actions in the discrete action space.

Return type:: List[Any]
Returns:: List containing all valid actions that can be executed

Note

Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.

get_metric_names()[source]

Get names of Mountain Car POMDP specific metrics.

Returns:: goal_reaching_rate
Return type:: List[str]

initial_observation_dist()[source]

Get the initial observation distribution.

Return type:: Distribution
Returns:: Distribution over initial observations

Note

Subclasses must implement this method to define initial observations.

initial_state_dist()[source]

Get the initial state distribution.

Return type:: Distribution
Returns:: Distribution over initial states

Note

Subclasses must implement this method to define the starting distribution.

is_equal_observation(observation1, observation2)[source]

Check if two observations are equal.

Parameters:

observation1 (Tuple[float, float]) – First observation to compare
observation2 (Tuple[float, float]) – Second observation to compare

Return type:

bool

Returns:

True if observations are considered equal, False otherwise

Note

Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.

is_terminal(state)[source]

Check if a state is terminal.

Parameters:: state (Tuple[float, float]) – State to check for terminal condition
Return type:: bool
Returns:: True if the state is terminal, False otherwise

Note

Subclasses must implement this method to define terminal conditions.

observation_model(next_state, action)[source]

Get the observation model for a given next state and action.

Parameters:

next_state (Tuple[float, float]) – The resulting state after taking an action
action (int) – The action that was executed

Return type:

ObservationModel

Returns:

Observation model that can sample observations

Note

Subclasses must implement this method to define observation generation.

reward(state, action)[source]

Calculate the immediate reward for a state-action pair.

Parameters:

state (Tuple[float, float]) – Current state
action (int) – Action executed from the state

Return type:

float

Returns:

Immediate reward value

Note

Subclasses must implement this method to define reward structure.

reward_batch(states, action)[source]

Calculate rewards for a batch of states given a single action.

Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.

Parameters:

states (Union[ndarray, Sequence[Any]]) – Sequence of states of length N.
action (int) – Action executed from each state.

Return type:

ndarray

Returns:

1-D array of reward values with shape (N,).

state_transition_model(state, action)[source]

Get the state transition model for a given state-action pair.

Parameters:

state (Tuple[float, float]) – Current state
action (int) – Action to be executed

Return type:

StateTransitionModel

Returns:

State transition model that can sample next states

Note

Subclasses must implement this method to define state dynamics.

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarPOMDPMetrics(*values)[source]

Bases: Enum

Metric names for Mountain Car POMDP environment.

GOAL_REACHING_RATE = 'goal_reaching_rate'

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp.MountainCarTransition(state, action, power, gravity, max_speed, min_position, max_position, state_transition_dist)[source]

Bases: StateTransitionModel

Physics-based state transition model for Mountain Car POMDP.

This model implements the physics of a car on a sinusoidal hill surface with additive Gaussian process noise. The car’s velocity is affected by both the applied action (engine force) and gravitational force that depends on the slope of the hill.

The physics equations are: - velocity += action * power + cos(3 * position) * (-gravity) - position += velocity

After computing the deterministic next state, Normal noise is sampled from the provided distribution and added. The result is then clipped to respect position and velocity bounds.

Parameters:

state (Tuple[float, float])
action (int)
power (float)
gravity (float)
max_speed (float)
min_position (float)
max_position (float)
state_transition_dist (CovarianceParameterizedMultivariateNormal)

state: Current state (position, velocity) tuple

action: Engine action (-1, 0, or 1)

power: Engine power scaling factor

gravity: Gravitational force constant

max_speed: Maximum velocity magnitude

min_position: Minimum position boundary

max_position: Maximum position boundary

Example

Using the Mountain Car transition model:

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>> from POMDPPlanners.utils.multivariate_normal import CovarianceParameterizedMultivariateNormal
>>>
>>> # Define car state: position=-0.5 (in valley), velocity=0.0
>>> state = (-0.5, 0.0)
>>> action = 1  # Accelerate right/forward
>>>
>>> # Create transition model with noise
>>> state_transition_cov = np.diag([2.5e-5, 1e-6])
>>> state_transition_dist = CovarianceParameterizedMultivariateNormal(state_transition_cov)
>>> transition = MountainCarTransition(
...     state=state,
...     action=action,
...     power=0.001,
...     gravity=0.0025,
...     max_speed=0.07,
...     min_position=-1.2,
...     max_position=0.6,
...     state_transition_dist=state_transition_dist
... )
>>>
>>> # Simulate physics step
>>> next_state = transition.sample()[0]
>>> # Returns new [position, velocity] with physics and noise applied
>>> len(next_state) == 2
True

probability(values)[source]

Calculate transition probabilities for given next states.

Parameters:: values (List[ndarray]) – List of next state values to calculate probabilities for
Return type:: ndarray
Returns:: Array of transition probabilities corresponding to the input values
Raises:: NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.

sample(n_samples=1)[source]

Sample next states from the transition model.

Parameters:: n_samples (int) – Number of next state samples to generate. Defaults to 1.
Return type:: List[ndarray]
Returns:: List of sampled next states of length n_samples.

Note

Subclasses must implement this method according to their specific state transition dynamics.

POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs module

Vectorized particle belief updater for the Mountain Car POMDP.

This module implements a concrete VectorizedParticleBeliefUpdater that performs batched state transitions and observation log-likelihood evaluations for the Mountain Car environment, replacing per-particle Python loops with NumPy array operations.

Classes:: MountainCarVectorizedUpdater: Batched updater for the Mountain Car POMDP.
Functions:: create_mountain_car_belief: Factory producing a configured belief for MountainCarPOMDP.

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs.MountainCarVectorizedUpdater(state_transition_dist, obs_dist, power, gravity, max_speed, min_position, max_position)[source]

Bases: VectorizedParticleBeliefUpdater

Vectorized particle belief updater for the Mountain Car POMDP.

Performs all-particle transitions and observation log-likelihood evaluations using vectorized NumPy operations, replacing per-particle Python loops with batched array operations.

batch_transition applies the deterministic cart physics to all particles, then adds a per-particle Gaussian process-noise sample drawn from state_transition_dist (mirroring MountainCarTransition.sample()), and finally re-applies the position/velocity clipping and wall-stop boundary rule. Observations follow a single Gaussian centred on the true state.

Parameters:

state_transition_dist (CovarianceParameterizedMultivariateNormal)
obs_dist (CovarianceParameterizedMultivariateNormal)
power (float)
gravity (float)
max_speed (float)
min_position (float)
max_position (float)

state_transition_dist: Process-noise distribution added after the deterministic physics step.

obs_dist: Observation noise distribution.

power: Engine power scaling factor.

gravity: Gravitational force constant.

max_speed: Maximum velocity magnitude.

min_position: Minimum position boundary.

max_position: Maximum position boundary.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP
>>> env = MountainCarPOMDP(discount_factor=0.99)
>>> updater = MountainCarVectorizedUpdater.from_environment(env)
>>> particles = np.column_stack([
...     np.random.uniform(-0.6, -0.4, 50),
...     np.zeros(50),
... ])
>>> action = 1
>>> next_p = updater.batch_transition(particles, action)
>>> next_p.shape
(50, 2)
>>> obs = np.array([-0.5, 0.0])
>>> ll = updater.batch_observation_log_likelihood(next_p, action, obs)
>>> ll.shape
(50,)

batch_observation_log_likelihood(next_particles, action, observation)[source]

Compute observation log-likelihoods for all particles at once.

Parameters:

next_particles (ndarray) – Transitioned particle states of shape (N, d).
action (ndarray) – Action vector.
observation (ndarray) – Observed value.

Return type:

ndarray

Returns:

Log-likelihoods of shape (N,).

batch_transition(particles, action)[source]

Transition all particles in a single batched operation.

Parameters:

particles (ndarray) – Current particle states of shape (N, d).
action (ndarray) – Action vector.

Return type:

ndarray

Returns:

Next-state particles of shape (N, d).

property config_id: str: Return a deterministic identifier for this updater configuration.

classmethod from_environment(env)[source]

Construct an updater from a MountainCarPOMDP instance.

Parameters:: env (MountainCarPOMDP) – Environment to extract parameters from.
Return type:: MountainCarVectorizedUpdater
Returns:: A new MountainCarVectorizedUpdater instance.

POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_beliefs.create_mountain_car_belief(env, belief_type=BeliefType.VECTORIZED_PARTICLE, n_particles=200, **kwargs)[source]

Create a ready-to-use belief for the Mountain Car POMDP.

For BeliefType.GAUSSIAN, the following keyword arguments are forwarded to create_mountain_car_gaussian_belief():

updater_type (GaussianBeliefUpdaterType): defaults to GaussianBeliefUpdaterType.UKF.
initial_covariance (np.ndarray): defaults to np.diag([0.2**2 / 12, 1e-4]).
process_noise_scale (float): defaults to 1e-4.

Parameters:

env (MountainCarPOMDP) – MountainCarPOMDP environment instance.
belief_type (BeliefType) – Desired belief representation. Defaults to BeliefType.VECTORIZED_PARTICLE.
n_particles (int) – Number of particles (ignored for GAUSSIAN). Defaults to 200.
**kwargs (Any) – Extra arguments forwarded to the Gaussian factory.

Return type:

Belief

Returns:

A configured Belief object.

Raises:

ValueError – If belief_type is not supported.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP
>>> env = MountainCarPOMDP(discount_factor=0.99)
>>> belief = create_mountain_car_belief(env, n_particles=50)
>>> belief.sample().shape
(2,)

POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs module

Factory for pre-configured Gaussian beliefs for the Mountain Car POMDP.

This module provides a single factory function that creates a GaussianBelief instance pre-configured for the MountainCarPOMDP environment, with an enum-based selector for the updater type (EKF or UKF).

The Mountain Car POMDP has nonlinear dynamics (velocity depends on cos(3 * position)) with a linear-Gaussian observation model (identity plus additive noise). Because the dynamics are nonlinear, a standard linear Kalman filter is not applicable; only EKF (which requires analytical Jacobians) and UKF (Jacobian-free sigma-point propagation) are supported.

Classes:: GaussianBeliefUpdaterType: Enum selecting the Gaussian updater variant.
Functions:: create_mountain_car_gaussian_belief: Factory producing a configured GaussianBelief.

class POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs.GaussianBeliefUpdaterType(*values)[source]

Bases: Enum

Selector for the Gaussian belief updater variant.

EKF: Extended Kalman filter (linearised via analytical Jacobians).

UKF: Unscented Kalman filter (sigma-point propagation).

EKF = 'ekf'

UKF = 'ukf'

POMDPPlanners.environments.mountain_car_pomdp.mountain_car_pomdp_gaussian_beliefs.create_mountain_car_gaussian_belief(env, updater_type, initial_covariance=None, process_noise_scale=0.0001)[source]

Create a GaussianBelief configured for a MountainCarPOMDP.

The Mountain Car POMDP has nonlinear dynamics:

v_{t+1} = clip(v_t + action * power + cos(3 * p_t) * (-gravity))
p_{t+1} = clip(p_t + v_{t+1})
z_t     = [p_{t+1}, v_{t+1}] + w,   w ~ N(0, R)

where R is env.cov_matrix. A small process noise Q is added for numerical stability of the Kalman covariance updates.

Parameters:

env (MountainCarPOMDP) – MountainCarPOMDP instance.
updater_type (GaussianBeliefUpdaterType) – Which Gaussian updater to use (EKF or UKF).
initial_covariance (Optional[ndarray]) – Initial belief covariance of shape (2, 2). Defaults to np.diag([0.2**2 / 12, 1e-4]) (variance of Uniform(-0.6, -0.4) for position, small for velocity).
process_noise_scale (float) – Diagonal scaling for the process noise covariance Q. Defaults to 1e-4.

Return type:

GaussianBelief

Returns:

A GaussianBelief with the selected updater.

Example

>>> import numpy as np
>>> from POMDPPlanners.environments.mountain_car_pomdp import MountainCarPOMDP
>>> env = MountainCarPOMDP(discount_factor=0.99)
>>> belief = create_mountain_car_gaussian_belief(
...     env=env,
...     updater_type=GaussianBeliefUpdaterType.EKF,
... )
>>> belief.mean.shape
(2,)