POMDPPlanners.core package
Subpackages
- POMDPPlanners.core.belief package
BeliefExtendedKalmanFilterUpdaterExtendedKalmanFilterUpdater.transition_fnExtendedKalmanFilterUpdater.observation_fnExtendedKalmanFilterUpdater.transition_jacobianExtendedKalmanFilterUpdater.observation_jacobianExtendedKalmanFilterUpdater.QExtendedKalmanFilterUpdater.RExtendedKalmanFilterUpdater.config_idExtendedKalmanFilterUpdater.update()
GaussianBeliefGaussianBeliefUpdaterGaussianMixtureBeliefGaussianMixtureBelief.meansGaussianMixtureBelief.covariancesGaussianMixtureBelief.weightsGaussianMixtureBelief.updaterGaussianMixtureBelief.n_terminal_check_samplesGaussianMixtureBelief.config_idGaussianMixtureBelief.dimGaussianMixtureBelief.entropy()GaussianMixtureBelief.n_componentsGaussianMixtureBelief.sample()GaussianMixtureBelief.update()
GaussianMixtureBeliefUpdaterLinearKalmanFilterUpdaterUnscentedKalmanFilterUpdaterUnscentedKalmanFilterUpdater.transition_fnUnscentedKalmanFilterUpdater.observation_fnUnscentedKalmanFilterUpdater.QUnscentedKalmanFilterUpdater.RUnscentedKalmanFilterUpdater.alphaUnscentedKalmanFilterUpdater.betaUnscentedKalmanFilterUpdater.kappaUnscentedKalmanFilterUpdater.config_idUnscentedKalmanFilterUpdater.update()
UnweightedParticleBeliefUnweightedParticleBeliefStateUpdateVectorizedParticleBeliefUpdaterVectorizedWeightedParticleBeliefVectorizedWeightedParticleBelief.particlesVectorizedWeightedParticleBelief.log_weightsVectorizedWeightedParticleBelief.normalized_weightsVectorizedWeightedParticleBelief.updaterVectorizedWeightedParticleBelief.resamplingVectorizedWeightedParticleBelief.ess_factorVectorizedWeightedParticleBelief.config_idVectorizedWeightedParticleBelief.dimVectorizedWeightedParticleBelief.n_particlesVectorizedWeightedParticleBelief.sample()VectorizedWeightedParticleBelief.update()
WeightedParticleBeliefWeightedParticleBelief.particlesWeightedParticleBelief.log_weightsWeightedParticleBelief.normalized_weightsWeightedParticleBelief.resamplingWeightedParticleBelief.ess_factorWeightedParticleBelief.ess_thresholdWeightedParticleBelief.epsWeightedParticleBelief.config_idWeightedParticleBelief.sample()WeightedParticleBelief.to_dict()WeightedParticleBelief.to_unique_support_distribution()WeightedParticleBelief.update()
WeightedParticleBeliefReinvigorationWeightedParticleBeliefStateUpdateWeightedParticleBeliefStateUpdate.particlesWeightedParticleBeliefStateUpdate.weightsWeightedParticleBeliefStateUpdate.weights_sumWeightedParticleBeliefStateUpdate.config_idWeightedParticleBeliefStateUpdate.inplace_update()WeightedParticleBeliefStateUpdate.sample()WeightedParticleBeliefStateUpdate.to_unique_support_distribution()WeightedParticleBeliefStateUpdate.update()
get_initial_belief()get_unique_support()is_terminal_belief()is_terminal_particle_belief()sample_next_belief()- Submodules
- POMDPPlanners.core.belief.base_belief module
- POMDPPlanners.core.belief.belief_utils module
- POMDPPlanners.core.belief.gaussian_belief module
- POMDPPlanners.core.belief.gaussian_belief_updaters module
ExtendedKalmanFilterUpdaterExtendedKalmanFilterUpdater.transition_fnExtendedKalmanFilterUpdater.observation_fnExtendedKalmanFilterUpdater.transition_jacobianExtendedKalmanFilterUpdater.observation_jacobianExtendedKalmanFilterUpdater.QExtendedKalmanFilterUpdater.RExtendedKalmanFilterUpdater.config_idExtendedKalmanFilterUpdater.update()
GaussianBeliefUpdaterLinearKalmanFilterUpdaterUnscentedKalmanFilterUpdaterUnscentedKalmanFilterUpdater.transition_fnUnscentedKalmanFilterUpdater.observation_fnUnscentedKalmanFilterUpdater.QUnscentedKalmanFilterUpdater.RUnscentedKalmanFilterUpdater.alphaUnscentedKalmanFilterUpdater.betaUnscentedKalmanFilterUpdater.kappaUnscentedKalmanFilterUpdater.config_idUnscentedKalmanFilterUpdater.update()
- POMDPPlanners.core.belief.gaussian_mixture_belief module
GaussianMixtureBeliefGaussianMixtureBelief.meansGaussianMixtureBelief.covariancesGaussianMixtureBelief.weightsGaussianMixtureBelief.updaterGaussianMixtureBelief.n_terminal_check_samplesGaussianMixtureBelief.config_idGaussianMixtureBelief.dimGaussianMixtureBelief.entropy()GaussianMixtureBelief.n_componentsGaussianMixtureBelief.sample()GaussianMixtureBelief.update()
GaussianMixtureBeliefUpdater
- POMDPPlanners.core.belief.particle_beliefs module
UnweightedParticleBeliefUnweightedParticleBeliefStateUpdateWeightedParticleBeliefWeightedParticleBelief.particlesWeightedParticleBelief.log_weightsWeightedParticleBelief.normalized_weightsWeightedParticleBelief.resamplingWeightedParticleBelief.ess_factorWeightedParticleBelief.ess_thresholdWeightedParticleBelief.epsWeightedParticleBelief.config_idWeightedParticleBelief.sample()WeightedParticleBelief.to_dict()WeightedParticleBelief.to_unique_support_distribution()WeightedParticleBelief.update()
WeightedParticleBeliefReinvigorationWeightedParticleBeliefStateUpdateWeightedParticleBeliefStateUpdate.particlesWeightedParticleBeliefStateUpdate.weightsWeightedParticleBeliefStateUpdate.weights_sumWeightedParticleBeliefStateUpdate.config_idWeightedParticleBeliefStateUpdate.inplace_update()WeightedParticleBeliefStateUpdate.sample()WeightedParticleBeliefStateUpdate.to_unique_support_distribution()WeightedParticleBeliefStateUpdate.update()
get_unique_support()
- POMDPPlanners.core.belief.vectorized_particle_belief_updater module
- POMDPPlanners.core.belief.vectorized_weighted_particle_belief module
VectorizedWeightedParticleBeliefVectorizedWeightedParticleBelief.particlesVectorizedWeightedParticleBelief.log_weightsVectorizedWeightedParticleBelief.normalized_weightsVectorizedWeightedParticleBelief.updaterVectorizedWeightedParticleBelief.resamplingVectorizedWeightedParticleBelief.ess_factorVectorizedWeightedParticleBelief.config_idVectorizedWeightedParticleBelief.dimVectorizedWeightedParticleBelief.n_particlesVectorizedWeightedParticleBelief.sample()VectorizedWeightedParticleBelief.update()
- POMDPPlanners.core.simulation package
CategoricalHyperParameterDataBaseInterfaceEnvironmentRunParamsEnvironmentRunParams.environmentEnvironmentRunParams.beliefEnvironmentRunParams.policiesEnvironmentRunParams.num_episodesEnvironmentRunParams.num_stepsEnvironmentRunParams.beliefEnvironmentRunParams.config_idEnvironmentRunParams.environmentEnvironmentRunParams.num_episodesEnvironmentRunParams.num_stepsEnvironmentRunParams.policies
ExperimentVisualizerHistoryHistory.historyHistory.discount_factorHistory.average_state_sampling_timeHistory.average_action_timeHistory.average_observation_timeHistory.average_belief_update_timeHistory.average_reward_timeHistory.actual_num_stepsHistory.reach_terminal_stateHistory.policy_run_dataHistory.actual_num_stepsHistory.average_action_timeHistory.average_belief_update_timeHistory.average_observation_timeHistory.average_reward_timeHistory.average_state_sampling_timeHistory.discount_factorHistory.from_dict()History.historyHistory.policy_run_dataHistory.reach_terminal_stateHistory.to_dict()
HyperParameterRunParamsHyperParameterRunParams.environmentHyperParameterRunParams.beliefHyperParameterRunParams.hyper_param_planner_configHyperParameterRunParams.num_episodesHyperParameterRunParams.num_stepsHyperParameterRunParams.n_trialsHyperParameterRunParams.parameters_to_optimizeHyperParameterRunParams.beliefHyperParameterRunParams.config_idHyperParameterRunParams.environmentHyperParameterRunParams.hyper_param_planner_configHyperParameterRunParams.n_trialsHyperParameterRunParams.num_episodesHyperParameterRunParams.num_stepsHyperParameterRunParams.parameters_to_optimize
MetricValueNumericalHyperParameterParallelizationLevelSimulationTaskStepDataTaskManagerTaskManagerExternalDBhistory_to_discounted_return_value()- Submodules
- POMDPPlanners.core.simulation.history module
HistoryHistory.historyHistory.discount_factorHistory.average_state_sampling_timeHistory.average_action_timeHistory.average_observation_timeHistory.average_belief_update_timeHistory.average_reward_timeHistory.actual_num_stepsHistory.reach_terminal_stateHistory.policy_run_dataHistory.actual_num_stepsHistory.average_action_timeHistory.average_belief_update_timeHistory.average_observation_timeHistory.average_reward_timeHistory.average_state_sampling_timeHistory.discount_factorHistory.from_dict()History.historyHistory.policy_run_dataHistory.reach_terminal_stateHistory.to_dict()
StepDatahistory_to_discounted_return_value()
- POMDPPlanners.core.simulation.hyperparameter_tuning module
CategoricalHyperParameterHyperParamPlannerConfigHyperParamPlannerConfigGeneratorHyperParameterOptimizationDirectionHyperParameterRunParamsHyperParameterRunParams.environmentHyperParameterRunParams.beliefHyperParameterRunParams.hyper_param_planner_configHyperParameterRunParams.num_episodesHyperParameterRunParams.num_stepsHyperParameterRunParams.n_trialsHyperParameterRunParams.parameters_to_optimizeHyperParameterRunParams.beliefHyperParameterRunParams.config_idHyperParameterRunParams.environmentHyperParameterRunParams.hyper_param_planner_configHyperParameterRunParams.n_trialsHyperParameterRunParams.num_episodesHyperParameterRunParams.num_stepsHyperParameterRunParams.parameters_to_optimize
NumericalHyperParameterOptimizedPolicyResultOptimizedPolicyResult.environmentOptimizedPolicyResult.policyOptimizedPolicyResult.chosen_hyper_parametersOptimizedPolicyResult.num_episodesOptimizedPolicyResult.num_stepsOptimizedPolicyResult.parameters_to_optimizeOptimizedPolicyResult.optimized_metric_valuesOptimizedPolicyResult.chosen_hyper_parametersOptimizedPolicyResult.environmentOptimizedPolicyResult.num_episodesOptimizedPolicyResult.num_stepsOptimizedPolicyResult.optimized_metric_valuesOptimizedPolicyResult.parameters_to_optimizeOptimizedPolicyResult.policy
ParallelizationLevelParameterToOptimizeMapper
- POMDPPlanners.core.simulation.metrics module
- POMDPPlanners.core.simulation.simulation_configs module
EnvironmentRunParamsEnvironmentRunParams.environmentEnvironmentRunParams.beliefEnvironmentRunParams.policiesEnvironmentRunParams.num_episodesEnvironmentRunParams.num_stepsEnvironmentRunParams.beliefEnvironmentRunParams.config_idEnvironmentRunParams.environmentEnvironmentRunParams.num_episodesEnvironmentRunParams.num_stepsEnvironmentRunParams.policies
EvaluationExperimentConfigCreatorHyperparameterOptimizationExperimentConfigCreatorPlannerGenerator
- POMDPPlanners.core.simulation.tasks module
- POMDPPlanners.core.simulation.visualizers module
Submodules
POMDPPlanners.core.config_types module
Configuration data structures for POMDP components.
This module defines standardized configuration data structures used throughout the POMDP planning framework for component specification and experiment setup.
- Classes:
EnvironmentConfig: Configuration specification for environments PolicyConfig: Configuration specification for policies BeliefConfig: Configuration specification for beliefs ExperimentConfig: Complete experiment specification with all components
- class POMDPPlanners.core.config_types.BeliefConfig(class_name, params)[source]
Bases:
objectConfiguration specification for belief representations.
This data class standardizes how belief configurations are specified, enabling dynamic creation of belief instances from configuration files.
- class_name
Name of the belief class to instantiate
- params
Dictionary of parameters to pass to the belief constructor
- class POMDPPlanners.core.config_types.EnvironmentConfig(class_name, params)[source]
Bases:
objectConfiguration specification for POMDP environments.
This data class standardizes how environment configurations are specified, enabling dynamic creation of environment instances from configuration files.
- class_name
Name of the environment class to instantiate
- params
Dictionary of parameters to pass to the environment constructor
Example
Creating environment configurations:
>>> # Tiger POMDP configuration >>> tiger_config = EnvironmentConfig( ... class_name="TigerPOMDP", ... params={ ... "discount_factor": 0.95, ... "tiger_location": "left", ... "reward_correct": 10.0, ... "reward_incorrect": -100.0 ... } ... )
- class POMDPPlanners.core.config_types.ExperimentConfig(environment, policies, belief, num_episodes, num_steps)[source]
Bases:
objectComplete experiment specification with all required components.
This data class aggregates all the components needed to run a POMDP experiment, including the environment, policies, belief representation, and execution parameters.
- Parameters:
- environment
Configured environment instance
- policies
List of policy instances to evaluate
- belief
Initial belief representation
- num_episodes
Number of episodes to run per policy
- num_steps
Maximum number of steps per episode
- environment: Environment
- class POMDPPlanners.core.config_types.PolicyConfig(class_name, params)[source]
Bases:
objectConfiguration specification for POMDP policies.
This data class standardizes how policy configurations are specified, enabling dynamic creation of policy instances from configuration files.
- class_name
Name of the policy class to instantiate
- params
Dictionary of parameters to pass to the policy constructor
Example
Creating policy configurations:
>>> # POMCP policy configuration >>> pomcp_config = PolicyConfig( ... class_name="POMCP", ... params={ ... "num_simulations": 1000, ... "exploration_constant": 1.0, ... "max_depth": 10 ... } ... )
POMDPPlanners.core.cost module
Module for cost and reward calculation utilities.
This module provides utility functions for calculating expected costs and rewards from belief states, particularly for weighted particle beliefs.
- Functions:
belief_expectation_cost: Calculate expected cost from weighted particle belief belief_expectation_reward: Calculate expected reward from weighted particle belief
- POMDPPlanners.core.cost.belief_expectation_cost(belief, action, env)[source]
Calculate expected cost for an action given a belief.
This function computes the expected immediate cost (negative reward) by taking the weighted average over all particles in the belief state.
- Parameters:
belief (
Belief) – Belief representing state uncertaintyaction (
Any) – Action to evaluateenv (
Environment) – Environment providing cost function
- Return type:
- Returns:
Expected immediate cost
- POMDPPlanners.core.cost.belief_expectation_cost_belief_information_gain(belief, action, next_belief, env, entropy_weight=0.0, lower_clip=-inf, upper_clip=inf)[source]
- POMDPPlanners.core.cost.belief_expectation_cost_entropy_penalty(belief, action, env, entropy_weight=0.0, lower_clip=-inf, upper_clip=inf)[source]
- POMDPPlanners.core.cost.belief_expectation_cost_gaussian_belief(belief, action, env, n_samples=100)[source]
- Return type:
- Parameters:
belief (GaussianBelief)
action (Any)
env (Environment)
n_samples (int)
- POMDPPlanners.core.cost.belief_expectation_cost_gaussian_mixture_belief(belief, action, env, n_samples=100)[source]
- Return type:
- Parameters:
belief (GaussianMixtureBelief)
action (Any)
env (Environment)
n_samples (int)
- POMDPPlanners.core.cost.belief_expectation_cost_particle_belief(belief, action, env)[source]
Calculate expected cost for an action given a weighted particle belief.
This function computes the expected immediate cost (negative reward) by taking the weighted average over all particles in the belief state.
- Parameters:
belief (
WeightedParticleBelief) – Weighted particle belief representing state uncertaintyaction (
Any) – Action to evaluateenv (
Environment) – Environment providing reward function
- Return type:
- Returns:
Expected immediate cost (negative of expected reward)
- POMDPPlanners.core.cost.belief_expectation_reward(belief, action, env)[source]
Calculate expected reward for an action given a belief.
This function computes the expected immediate reward by taking the weighted average over all particles in the belief state.
- Parameters:
belief (
Belief) – Belief representing state uncertaintyaction (
Any) – Action to evaluateenv (
Environment) – Environment providing reward function
- Return type:
- Returns:
Expected immediate reward
- POMDPPlanners.core.cost.belief_expectation_reward_particle_belief(belief, action, env)[source]
Calculate expected reward for an action given a weighted particle belief.
This function computes the expected immediate reward by taking the weighted average over all particles in the belief state.
- Parameters:
belief (
WeightedParticleBelief) – Weighted particle belief representing state uncertaintyaction (
Any) – Action to evaluateenv (
Environment) – Environment providing reward function
- Return type:
- Returns:
Expected immediate reward
- POMDPPlanners.core.cost.particle_belief_entropy(belief)[source]
- Return type:
- Parameters:
belief (WeightedParticleBelief | WeightedParticleBeliefStateUpdate)
- POMDPPlanners.core.cost.particle_belief_expectation_cost_entropy_penalty(belief, action, env, entropy_weight=0.0, lower_clip=-inf, upper_clip=inf)[source]
- Return type:
- Parameters:
belief (WeightedParticleBelief)
action (Any)
env (Environment)
entropy_weight (float)
lower_clip (float)
upper_clip (float)
- POMDPPlanners.core.cost.particle_belief_expectation_cost_information_gain(belief, action, next_belief, env, entropy_weight=0.0, lower_clip=-inf, upper_clip=inf)[source]
- Return type:
- Parameters:
belief (WeightedParticleBelief)
action (Any)
next_belief (WeightedParticleBelief)
env (Environment)
entropy_weight (float)
lower_clip (float)
upper_clip (float)
POMDPPlanners.core.distributions module
Module for probability distribution implementations.
This module provides abstract and concrete implementations of probability distributions used throughout the POMDP planning framework.
- Classes:
Distribution: Abstract base class for all probability distributions DiscreteDistribution: Implementation for discrete probability distributions Numpy2DDistribution: Specialized distribution for 2D numpy array values
- class POMDPPlanners.core.distributions.DiscreteDistribution(values, probs)[source]
Bases:
DistributionImplementation of discrete probability distributions.
This class represents a discrete probability distribution over a finite set of values, with associated probabilities that sum to 1.
- values
List of possible values in the distribution support
- probs
Numpy array of probabilities corresponding to each value
Example
>>> import numpy as np >>> # Create a distribution over actions >>> actions = ["up", "down", "left", "right"] >>> probs = np.array([0.4, 0.3, 0.2, 0.1]) >>> dist = DiscreteDistribution(actions, probs)
>>> # Sample actions >>> samples = dist.sample(5) >>> len(samples) == 5 True >>> all(sample in actions for sample in samples) True
>>> # Get probability of specific action >>> prob_up = dist.probability(["up"])[0] >>> bool(prob_up == 0.4) True
- probability(values)[source]
Calculate probabilities for given values.
- Parameters:
values (
List[Any]) – List of values to calculate probabilities for- Return type:
- Returns:
Numpy array of probabilities corresponding to input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample values from the distribution.
- Parameters:
n_samples (
int) – Number of samples to return. Defaults to 1.- Return type:
- Returns:
List of n_samples independent samples from the distribution
Note
Subclasses must implement this method according to their specific distribution type and parameters.
- class POMDPPlanners.core.distributions.Distribution[source]
Bases:
ABCAbstract base class for probability distributions.
This class defines the interface that all probability distributions must implement, providing methods for sampling and probability calculation.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the sample() method.
- probability(values)[source]
Calculate probabilities for given values.
- Parameters:
values (
List[Any]) – List of values to calculate probabilities for- Return type:
- Returns:
Numpy array of probabilities corresponding to input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- abstractmethod sample(n_samples=1)[source]
Sample values from the distribution.
- Parameters:
n_samples (
int) – Number of samples to return. Defaults to 1.- Return type:
- Returns:
List of n_samples independent samples from the distribution
Note
Subclasses must implement this method according to their specific distribution type and parameters.
- class POMDPPlanners.core.distributions.Numpy2DDistribution(values, probs)[source]
Bases:
Distribution- probability(values)[source]
Calculate probabilities for given values.
- Parameters:
values (
List[Any]) – List of values to calculate probabilities for- Return type:
- Returns:
Numpy array of probabilities corresponding to input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- sample(n_samples=1)[source]
Sample values from the distribution.
- Parameters:
n_samples (
int) – Number of samples to return. Defaults to 1.- Return type:
- Returns:
List of n_samples independent samples from the distribution
Note
Subclasses must implement this method according to their specific distribution type and parameters.
POMDPPlanners.core.environment module
Module for POMDP environment abstractions.
This module provides the foundational classes and interfaces for defining POMDP environments, including abstract base classes for state transitions, observation models, and reward functions.
- Classes:
Environment: Abstract base class for POMDP environments DiscreteActionsEnvironment: Specialized for discrete action spaces ObservationModel: Abstract observation model interface StateTransitionModel: Abstract state transition interface EnvironmentGenerator: Factory pattern for environment creation SpaceType: Enumeration for action/observation space types SpaceInfo: Data class containing space type information
- class POMDPPlanners.core.environment.DiscreteActionsEnvironment(discount_factor, name, space_info, reward_range=None, output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
EnvironmentAbstract base class for POMDP environments with discrete action spaces.
This class extends the base Environment class with additional functionality specific to environments that have finite, enumerable action sets.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement all abstract methods from Environment plus the get_actions() method.
- Parameters:
- abstractmethod get_actions()[source]
Get all possible actions in the discrete action space.
Note
Subclasses must implement this method to enumerate all possible actions. This is used by planning algorithms that need to iterate over actions.
- abstractmethod initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- abstractmethod initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- abstractmethod is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- abstractmethod is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
Any) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- abstractmethod observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- abstractmethod reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- class POMDPPlanners.core.environment.Environment(discount_factor, name, space_info, reward_range=None, output_dir=None, debug=False, use_queue_logger=False)[source]
Bases:
ABCAbstract base class for POMDP environments.
This is the core abstract class that all POMDP environments must inherit from. It defines the essential interface for POMDP environments including state transitions, observations, rewards, and terminal conditions.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement all abstract methods.
- Parameters:
- discount_factor
Discount factor for future rewards
- name
Environment identifier string
- space_info
Information about action and observation space types
- reward_range
Optional tuple containing (min_reward, max_reward)
- output_dir
Optional directory for logging output
- debug
Flag to enable debug logging
- cache_visualization(history, cache_path)[source]
Cache visualization data for an episode history.
This method can be overridden by subclasses to provide environment-specific visualization caching capabilities.
- compute_metrics(histories)[source]
Compute environment-specific metrics from episode histories.
This method can be overridden by subclasses to provide custom metric calculations beyond standard return and episode length.
- property config_id: str
Generate a deterministic identifier based on environment configuration.
Note
Uses custom serialization logic (not centralized serialize_value) to ensure: - Deterministic dict key ordering for consistent hashing - Compact format without __type__ markers - Recursive handling of nested objects Changing this serialization format would invalidate all cached results.
- classmethod from_dict(data)[source]
Reconstruct environment from dictionary.
Dynamically imports the environment class and instantiates it with the saved parameters.
- Parameters:
data (
Dict[str,Any]) – Dictionary containing environment serialization data with keys: class, module, params, config_id- Return type:
- Returns:
Reconstructed environment instance
- Raises:
ImportError – If environment class cannot be imported
ValueError – If required data fields are missing
TypeError – If parameters are invalid for environment constructor
Example
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> env = TigerPOMDP(discount_factor=0.95) >>> env_dict = env.to_dict() >>> reconstructed_env = Environment.from_dict(env_dict) >>> reconstructed_env.discount_factor 0.95
- get_metric_names()[source]
Get names of environment-specific metrics.
This method returns the names of custom metrics that this environment computes in the compute_metrics() method. It enables users to discover what metrics are available for hyperparameter optimization.
- Return type:
- Returns:
List of metric names that this environment produces. Default implementation returns empty list for environments without custom metrics.
Note
Subclasses that override compute_metrics() should also override this method to return the names of metrics they produce. Use an Enum to ensure consistency between the names returned here and the names used in compute_metrics().
- abstractmethod initial_observation_dist()[source]
Get the initial observation distribution.
- Return type:
- Returns:
Distribution over initial observations
Note
Subclasses must implement this method to define initial observations.
- abstractmethod initial_state_dist()[source]
Get the initial state distribution.
- Return type:
- Returns:
Distribution over initial states
Note
Subclasses must implement this method to define the starting distribution.
- abstractmethod is_equal_observation(observation1, observation2)[source]
Check if two observations are equal.
- Parameters:
- Return type:
- Returns:
True if observations are considered equal, False otherwise
Note
Subclasses must implement this method to define observation equality. This is particularly important for discrete observation spaces.
- abstractmethod is_terminal(state)[source]
Check if a state is terminal.
- Parameters:
state (
Any) – State to check for terminal condition- Return type:
- Returns:
True if the state is terminal, False otherwise
Note
Subclasses must implement this method to define terminal conditions.
- property logger: Logger
Get logger instance for this environment.
The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.
- Returns:
Configured logger instance with hierarchical naming
- abstractmethod observation_model(next_state, action)[source]
Get the observation model for a given next state and action.
- Parameters:
- Return type:
- Returns:
Observation model that can sample observations
Note
Subclasses must implement this method to define observation generation.
- abstractmethod reward(state, action)[source]
Calculate the immediate reward for a state-action pair.
- Parameters:
- Return type:
- Returns:
Immediate reward value
Note
Subclasses must implement this method to define reward structure.
- reward_batch(states, action)[source]
Calculate rewards for a batch of states given a single action.
Provides a loop-based default that subclasses can override with vectorized numpy implementations for better performance.
- sample_next_step(state, action)[source]
Sample a complete state transition step.
This convenience method combines state transition, observation generation, and reward calculation in a single operation.
- abstractmethod state_transition_model(state, action)[source]
Get the state transition model for a given state-action pair.
- Parameters:
- Return type:
- Returns:
State transition model that can sample next states
Note
Subclasses must implement this method to define state dynamics.
- to_dict()[source]
Serialize environment to dictionary format.
Extracts environment class information and constructor parameters to enable JSON serialization and reconstruction.
- Returns:
class: Full class path (module.ClassName)
module: Module name
params: Constructor parameters
config_id: Deterministic configuration identifier
- Return type:
Example
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> env = TigerPOMDP(discount_factor=0.95) >>> env_dict = env.to_dict() >>> 'class' in env_dict and 'params' in env_dict True
Note
Uses centralized serialization system with registered SpaceInfo handler.
- class POMDPPlanners.core.environment.EnvironmentGenerator(name)[source]
Bases:
ABCAbstract base class for environment generators.
This class implements the factory pattern for creating environment instances. It’s useful for generating environments with randomized parameters or for creating multiple environment variants.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the generate_environment() method.
- Parameters:
name (str)
- name
Identifier for the generator
- abstractmethod generate_environment()[source]
Generate a new environment instance.
- Return type:
- Returns:
Newly created environment instance
Note
Subclasses must implement this method to define environment creation logic. This may involve randomization, parameter sampling, or deterministic generation.
- class POMDPPlanners.core.environment.ObservationModel(next_state, action)[source]
Bases:
Distribution,ABCAbstract base class for POMDP observation models.
This class defines the interface for observation models that generate observations given a next state and action. Inherits from Distribution to provide sampling and probability calculation capabilities.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the sample() method.
- next_state
The state after taking an action
- action
The action that was taken
- probability(values)[source]
Calculate observation probabilities for given values.
- Parameters:
values (
List[Any]) – List of observation values to calculate probabilities for- Return type:
- Returns:
Array of probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- abstractmethod sample(n_samples=1)[source]
Sample observations from the observation model.
- Parameters:
n_samples (
int) – Number of observation samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled observations of length n_samples.
Note
Subclasses must implement this method according to their specific observation generation logic.
- class POMDPPlanners.core.environment.SpaceInfo(action_space, observation_space)[source]
Bases:
objectData class containing space type information for an environment.
This class encapsulates the space type classifications for both actions and observations in a POMDP environment.
- action_space
The type of action space (discrete, continuous, or mixed)
- observation_space
The type of observation space (discrete, continuous, or mixed)
Example
Creating space info for different environment types:
>>> # Discrete actions, continuous observations >>> space_info = SpaceInfo( ... action_space=SpaceType.DISCRETE, ... observation_space=SpaceType.CONTINUOUS ... )
- class POMDPPlanners.core.environment.SpaceType(*values)[source]
Bases:
EnumEnumeration for categorizing action and observation spaces.
This enum is used to classify the mathematical structure of action and observation spaces in POMDP environments.
- DISCRETE
Finite, countable spaces (e.g., {0, 1, 2, …})
- CONTINUOUS
Real-valued continuous spaces (e.g., R^n)
- MIXED
Combination of discrete and continuous elements
- CONTINUOUS = 'continuous'
- DISCRETE = 'discrete'
- MIXED = 'mixed'
- class POMDPPlanners.core.environment.StateTransitionModel(state, action)[source]
Bases:
Distribution,ABCAbstract base class for POMDP state transition models.
This class defines the interface for state transition models that generate next states given a current state and action. Inherits from Distribution to provide sampling and probability calculation capabilities.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the sample() method.
- state
The current state
- action
The action to be taken
- probability(values)[source]
Calculate transition probabilities for given next states.
- Parameters:
values (
List[Any]) – List of next state values to calculate probabilities for- Return type:
- Returns:
Array of transition probabilities corresponding to the input values
- Raises:
NotImplementedError – This method is not implemented by default. Subclasses should override if probability calculation is needed.
- abstractmethod sample(n_samples=1)[source]
Sample next states from the transition model.
- Parameters:
n_samples (
int) – Number of next state samples to generate. Defaults to 1.- Return type:
- Returns:
List of sampled next states of length n_samples.
Note
Subclasses must implement this method according to their specific state transition dynamics.
POMDPPlanners.core.policy module
Module for POMDP policy abstractions and execution tracking.
This module provides the foundational interface for POMDP policies, including abstract base classes for policy implementations and data structures for tracking policy execution and performance metrics.
- Classes:
Policy: Abstract base class for all POMDP policies PolicySpaceInfo: Space type information for policy compatibility PolicyInfoVariable: Named tuple for policy execution metrics PolicyRunData: Container for policy execution information
- class POMDPPlanners.core.policy.Policy(environment, discount_factor, name, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
ABCAbstract base class for POMDP policies.
This class defines the interface for POMDP policies that select actions based on belief states. All concrete policy implementations must inherit from this class and implement the action selection and space information methods.
Note
This is an abstract base class and cannot be instantiated directly. Subclasses must implement the action() and get_space_info() methods.
- Parameters:
- environment
The POMDP environment this policy operates in
- discount_factor
Discount factor for future rewards
- name
Unique identifier for the policy
- log_path
Optional directory for logging output
- debug
Flag to enable debug logging
- abstractmethod action(belief)[source]
Select action(s) based on the current belief state.
This is the core method that implements the policy’s decision-making logic. It takes a belief state and returns the selected action(s) along with execution information and performance metrics.
- Parameters:
belief (
Belief) – Current belief state representing uncertainty over states- Returns:
List of selected actions (typically single action, but supports multiple)
PolicyRunData with execution metrics and performance information
- Return type:
Note
Subclasses must implement this method with their specific planning or decision-making algorithm.
- abstractmethod classmethod get_info_variable_names()[source]
Get names of policy info variables that this policy produces.
This class method returns the names of metrics and performance data that the policy tracks during execution via PolicyInfoVariable objects. It enables users to discover what metrics are available for hyperparameter optimization before running simulations.
- Return type:
- Returns:
List of info variable names that this policy produces during action selection
Note
Subclasses must implement this method to declare what metrics they track. Use an Enum to ensure consistency between the names returned here and the names used when creating PolicyInfoVariable objects in the action() method.
- abstractmethod classmethod get_space_info()[source]
Get space type requirements for this policy class.
This class method specifies what types of action and observation spaces this policy implementation can handle, enabling compatibility checking with environments.
- Return type:
- Returns:
PolicySpaceInfo specifying required action and observation space types
Note
Subclasses must implement this method to declare their space compatibility. This is used for validation when pairing policies with environments.
- classmethod load(filepath)[source]
Load policy configuration from JSON file.
Reconstructs policy instance from saved constructor parameters. Creates both the environment and policy from the saved configuration.
- Parameters:
filepath (
Union[str,Path]) – Path to the saved policy configuration file- Return type:
- Returns:
Reconstructed policy instance
- Raises:
FileNotFoundError – If filepath does not exist
ValueError – If JSON format is invalid or unsupported
ImportError – If policy/environment classes cannot be imported
Example
>>> import tempfile >>> from pathlib import Path >>> from POMDPPlanners.planners import POMCP >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> # Create and save a policy >>> env = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP(environment=env, discount_factor=0.95, depth=10, exploration_constant=1.0, name="test", n_simulations=100) >>> with tempfile.TemporaryDirectory() as tmpdir: ... filepath = Path(tmpdir) / "test_policy.json" ... _ = planner.save(filepath) ... # Load the policy back ... loaded_planner = POMCP.load(filepath) ... print(loaded_planner.depth) 10
- property logger: Logger
Get logger instance for this policy.
The logger is implemented as a property to maintain pickle compatibility, as logger objects cannot be pickled directly.
- Returns:
Configured logger instance with hierarchical naming
- save(filepath=None)[source]
Save policy configuration to JSON file.
Saves only constructor parameters needed to reconstruct the policy, not the full internal state. This enables human-readable policy configurations that can be versioned, inspected, and modified.
- Parameters:
filepath (
Union[str,Path,None]) – Path where to save the policy configuration. If None, uses default location: saved_policies/{env_name}/{policy_class}/{policy_name}_{timestamp}.json- Return type:
- Returns:
Path where policy was saved
- Raises:
ValueError – If policy parameters cannot be serialized
IOError – If file cannot be written
Example
>>> from POMDPPlanners.environments import TigerPOMDP >>> from POMDPPlanners.planners import POMCP >>> env = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP(environment=env, discount_factor=0.95, ... depth=10, exploration_constant=1.0, ... name="test", n_simulations=100) >>> # Save with default path >>> filepath = planner.save() >>> # Or save to custom path >>> filepath = planner.save("my_policy.json")
- class POMDPPlanners.core.policy.PolicyInfoVariable(name, value)[source]
Bases:
NamedTupleNamed tuple for storing policy execution metrics.
This structure stores key-value pairs of policy performance metrics that are collected during policy execution.
- name
Descriptive name of the metric (e.g., “nodes_expanded”, “planning_time”)
- value
Numeric value of the metric
- class POMDPPlanners.core.policy.PolicyRunData(info_variables)[source]
Bases:
NamedTupleContainer for policy execution information and metrics.
This class aggregates all the information collected during a policy’s action selection process, including performance metrics and execution details.
- Parameters:
info_variables (List[PolicyInfoVariable])
- info_variables
List of policy-specific metrics and performance data
- info_variables: List[PolicyInfoVariable]
Alias for field number 0
- class POMDPPlanners.core.policy.PolicySpaceInfo(action_space, observation_space)[source]
Bases:
objectData class containing space type requirements for policy compatibility.
This class specifies the action and observation space types that a policy is designed to work with, enabling compatibility checking with environments.
- action_space
Required action space type (discrete, continuous, or mixed)
- observation_space
Required observation space type (discrete, continuous, or mixed)
- class POMDPPlanners.core.policy.TrainablePolicy(environment, discount_factor, name, log_path=None, debug=False, use_queue_logger=False)[source]
Bases:
PolicyAbstract base class for policies that support offline training.
Extends
Policywith hooks that separate the model (what to compute) from the trainer (how to run the training loop), following the PyTorch Lightning pattern. Concrete subclasses implement these hooks, andPolicyTrainerorchestrates the loop.- Parameters:
- abstractmethod begin_collecting()[source]
Signal the start of a data-collection phase.
- Return type:
- abstractmethod buffer_size()[source]
Return the number of examples currently in the replay buffer.
- Return type:
- abstractmethod collect_episodes_batched(initial_belief_fn, n_episodes, episode_length)[source]
Collect training data using fast batched (network-only) rollouts.
- abstractmethod finalize_episode(history)[source]
Process a completed episode into the replay buffer.
- abstractmethod get_metric_keys()[source]
Return the loss-metric key names produced by
train_step().
- get_network()[source]
Return the underlying trainable network, or
Noneif not applicable.Override in concrete policies to enable weight-histogram logging in
TensorBoardCallback.
POMDPPlanners.core.serialization module
Centralized serialization utilities for POMDPPlanners.
This module provides reusable serialization functions for converting Python objects to/from JSON-compatible formats. Used by Environment, Policy, Belief, and other components for configuration persistence.
- Key Functions:
serialize_value: Convert Python value to JSON-compatible format deserialize_value: Convert JSON format back to Python value extract_constructor_params: Extract constructor parameters from object reconstruct_from_params: Reconstruct object from class path and parameters save_to_json: Save data to JSON file with metadata load_from_json: Load data from JSON file serialize_stateful_object: Serialize using __getstate__ protocol deserialize_stateful_object: Deserialize using __setstate__ protocol
Example
>>> from pathlib import Path
>>> import numpy as np
>>>
>>> # Serialize various types
>>> serialize_value(Path("/tmp/test"))
{'__type__': 'Path', 'value': '/tmp/test'}
>>>
>>> serialize_value(np.array([1, 2, 3]))
{'__type__': 'ndarray', 'value': [1, 2, 3], 'dtype': 'int64'}
>>>
>>> # Deserialize back to original types
>>> path_data = {'__type__': 'Path', 'value': '/tmp/test'}
>>> deserialize_value(path_data)
PosixPath('/tmp/test')
- POMDPPlanners.core.serialization.deserialize_stateful_object(data)[source]
Reconstruct object using __setstate__ protocol.
- Parameters:
data (
Dict[str,Any]) – Serialized object data with keys: class, module, state- Return type:
- Returns:
Reconstructed object
- Raises:
ImportError – If module cannot be imported
AttributeError – If class not found in module
AttributeError – If object has no __setstate__ method
- POMDPPlanners.core.serialization.deserialize_value(value, target_type=None)[source]
Deserialize JSON-compatible value to Python type.
- POMDPPlanners.core.serialization.extract_constructor_params(obj, exclude=('self',))[source]
Extract constructor parameters from object instance.
Uses inspect to discover constructor parameters and walks class hierarchy to capture all inherited parameters.
- POMDPPlanners.core.serialization.load_from_json(filepath)[source]
Load data from JSON file.
- Parameters:
- Return type:
- Returns:
Loaded data dictionary
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If JSON is invalid
- POMDPPlanners.core.serialization.reconstruct_from_params(class_path, params)[source]
Reconstruct object from class path and parameters.
- Parameters:
- Return type:
- Returns:
Reconstructed object instance
- Raises:
ImportError – If module cannot be imported
AttributeError – If class not found in module
TypeError – If parameters are invalid for constructor
- POMDPPlanners.core.serialization.register_deserializer(type_class, deserializer)[source]
Register custom deserialization handler for a type.
- POMDPPlanners.core.serialization.register_serializer(type_class, serializer)[source]
Register custom serialization handler for a type.
- POMDPPlanners.core.serialization.save_to_json(filepath, data, metadata=None)[source]
Save data to JSON file with optional metadata.
- POMDPPlanners.core.serialization.serialize_stateful_object(obj)[source]
Serialize object using __getstate__ protocol.
- Parameters:
obj (
Any) – Object with __getstate__ method- Return type:
- Returns:
Dictionary with class info and state
- Raises:
ValueError – If object has no custom __getstate__ method
- POMDPPlanners.core.serialization.serialize_value(value)[source]
Serialize Python value to JSON-compatible format.
Handles common types used in POMDPPlanners: - Primitives: str, int, float, bool, None - Collections: list, tuple, set, dict - NumPy: ndarray, integer, floating - Path objects - Enums - Loggers (skipped) - Custom types with registered handlers
POMDPPlanners.core.tree module
- class POMDPPlanners.core.tree.ActionNode(action, parent=None, children=(), data=None)[source]
Bases:
BaseNode- Parameters:
data (Any)
- get_belief_node_child(observation, environment)[source]
- Return type:
- Parameters:
observation (Any)
environment (Environment)
- property name
- property spec
- class POMDPPlanners.core.tree.BaseNode(parent=None, children=(), data=None)[source]
Bases:
NodeMixin- Parameters:
data (Any)
- class POMDPPlanners.core.tree.BeliefNode(belief, observation=None, weight=1.0, parent=None, children=(), data=None)[source]
Bases:
BaseNode- property name
- property spec
- update_belief(action, observation, pomdp, **kwargs)[source]
- Parameters:
action (Any)
observation (Any)
pomdp (Environment)
- POMDPPlanners.core.tree.get_optimal_action_cost_setting(belief_node)[source]
- Return type:
- Parameters:
belief_node (BeliefNode)
- POMDPPlanners.core.tree.get_optimal_action_reward_setting(belief_node)[source]
- Return type:
- Parameters:
belief_node (BeliefNode)
- POMDPPlanners.core.tree.print_tree(tree)[source]
- Parameters:
tree (BeliefNode | ActionNode)