POMDPPlanners.utils package
Subpackages
Submodules
POMDPPlanners.utils.action_samplers module
- class POMDPPlanners.utils.action_samplers.DiscreteActionSampler(actions=None)[source]
Bases:
ActionSamplerSimple action sampler for discrete action spaces.
This class is designed to be fully serializable for use in parallel processing environments like joblib.
- sample(belief_node=None)[source]
Sample a random action from the discrete action space.
- Parameters:
belief_node (
Optional[BeliefNode]) – Optional belief node (unused in this implementation)- Return type:
- Returns:
Randomly sampled action from the action space
- class POMDPPlanners.utils.action_samplers.UnitCircleActionSampler(max_action_magnitude=1.0)[source]
Bases:
ActionSamplerAction sampler for 2D continuous action spaces within a unit circle.
This sampler generates 2D action vectors uniformly distributed within a circle of specified maximum magnitude. It’s particularly useful for navigation and continuous control problems where actions represent velocities or forces constrained to a circular region.
The sampler uses polar coordinate generation to ensure uniform distribution within the circle, avoiding the clustering near the center that would occur with naive rectangular sampling.
- Mathematical Foundation:
Angle θ ~ Uniform(0, 2π)
Radius r ~ √Uniform(0, 1) × max_magnitude
Action = [r·cos(θ), r·sin(θ)]
The square root transformation for radius ensures uniform area distribution within the circle rather than biasing toward the center.
- Parameters:
max_action_magnitude (
float) – Maximum magnitude of action vectors (circle radius)
- sample(belief_node=None)[source]
Sample an action from a unit circle.
- Parameters:
belief_node (
Optional[BeliefNode]) – The current belief node (not used in this implementation)- Returns:
A 2D action vector within the unit circle
- Return type:
POMDPPlanners.utils.belief_factory module
Top-level factory for creating environment-specific belief states.
This module provides a unified entry-point for constructing ready-to-use
Belief objects for any POMDP
environment in the library. It dispatches to per-environment factories when
a custom belief implementation exists (e.g., vectorized particle filters or
Gaussian beliefs), and falls back to a generic
WeightedParticleBelief
otherwise.
- Classes:
BeliefType: Enum of supported belief representations.
- Functions:
create_environment_belief: Top-level factory returning a configured Belief.
- class POMDPPlanners.utils.belief_factory.BeliefType(*values)[source]
Bases:
EnumSupported belief representations.
- PARTICLE
Standard weighted particle belief.
- VECTORIZED_PARTICLE
Vectorized weighted particle belief with batched NumPy updates.
- GAUSSIAN
Single Gaussian (mean + covariance) belief.
- GAUSSIAN_MIXTURE
Gaussian mixture model belief.
- GAUSSIAN = 'gaussian'
- GAUSSIAN_MIXTURE = 'gaussian_mixture'
- PARTICLE = 'particle'
- VECTORIZED_PARTICLE = 'vectorized_particle'
- POMDPPlanners.utils.belief_factory.create_environment_belief(env, belief_type=None, n_particles=200, **kwargs)[source]
Create a ready-to-use belief for the given environment.
When belief_type is
Nonethe environment’s default belief type is used (typicallyVECTORIZED_PARTICLEfor environments that have a custom updater, orPARTICLEotherwise).For environments without a registered per-environment factory the function falls back to a generic
get_initial_belief()producing aWeightedParticleBelief.- Parameters:
env (
Environment) – POMDP environment instance.belief_type (
BeliefType|None) – Desired belief representation.Noneselects the environment default.n_particles (
int) – Number of particles (used by PARTICLE and VECTORIZED_PARTICLE types). Defaults to 200.**kwargs (
Any) – Forwarded to per-environment factories (e.g.updater_typefor Gaussian light-dark beliefs).
- Return type:
- Returns:
A configured
Beliefobject.- Raises:
ValueError – If belief_type is not supported by the environment.
Example
>>> import numpy as np >>> np.random.seed(42) >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> env = TigerPOMDP(discount_factor=0.95) >>> belief = create_environment_belief(env, n_particles=100) >>> belief.sample() in ["tiger_left", "tiger_right"] True
POMDPPlanners.utils.config_loader module
- POMDPPlanners.utils.config_loader.load_config(config_path)[source]
Load and parse YAML configuration files for POMDP experiments.
This utility function provides a standardized way to load experimental configurations from YAML files. It’s commonly used to configure environments, planners, and simulation parameters for reproducible experiments.
The function uses PyYAML’s safe_load to prevent execution of arbitrary Python code, making it secure for loading untrusted configuration files.
- Parameters:
config_path (
str) – Path to the YAML configuration file (absolute or relative)- Return type:
- Returns:
Dictionary containing the parsed configuration parameters
- Raises:
FileNotFoundError – If the configuration file doesn’t exist
yaml.YAMLError – If the file contains invalid YAML syntax
Example
Loading experiment configuration:
>>> from POMDPPlanners.utils.config_loader import load_config >>> import tempfile >>> import os
>>> # Create a temporary config file for testing >>> config_content = ''' ... environment: ... name: "TigerPOMDP" ... discount_factor: 0.95 ... planners: ... - name: "POMCP" ... n_simulations: 1000 ... simulation: ... episodes_per_run: 100 ... ''' >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: ... _ = f.write(config_content) ... temp_config_path = f.name
>>> # Load experiment configuration >>> config = load_config(temp_config_path) >>> config['environment']['name'] 'TigerPOMDP' >>> config['environment']['discount_factor'] 0.95 >>> config['planners'][0]['name'] 'POMCP' >>> config['simulation']['episodes_per_run'] 100
>>> # Clean up >>> os.unlink(temp_config_path)
Example
Using with environment configuration:
>>> # Create a temporary config file with environment and planners >>> config_content = ''' ... environment: ... name: "TigerPOMDP" ... discount_factor: 0.95 ... observation_accuracy: 0.85 ... planners: ... - name: "POMCP" ... n_simulations: 100 ... depth: 5 ... exploration_constant: 1.0 ... simulation: ... episodes_per_run: 10 ... num_runs: 2 ... ''' >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: ... _ = f.write(config_content) ... temp_config_path = f.name
>>> config = load_config(temp_config_path)
>>> # Create environment from config >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> env = TigerPOMDP(discount_factor=config['environment']['discount_factor']) >>> env.discount_factor 0.95
>>> # Access planner configuration >>> planner_config = config['planners'][0] >>> planner_config['name'] 'POMCP' >>> planner_config['n_simulations'] 100
>>> # Clean up >>> os.unlink(temp_config_path)
Example
Handling configuration hierarchies and defaults:
>>> # Create a nested configuration file >>> nested_config_content = ''' ... defaults: ... simulation: ... episodes: 50 ... particles: 100 ... experiments: ... quick_test: ... environment: "SanityPOMDP" ... planners: ["POMCP"] ... full_study: ... environment: "TigerPOMDP" ... planners: ["POMCP", "PFT_DPW"] ... simulation: ... episodes: 200 ... particles: 500 ... ''' >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: ... _ = f.write(nested_config_content) ... temp_config_path = f.name
>>> config = load_config(temp_config_path)
>>> # Access different experiment configurations >>> defaults = config['defaults'] >>> quick_config = config['experiments']['quick_test'] >>> full_config = config['experiments']['full_study']
>>> # Verify configuration structure >>> defaults['simulation']['episodes'] 50 >>> quick_config['environment'] 'SanityPOMDP' >>> full_config['simulation']['episodes'] 200
>>> # Simple config merging test >>> merged = {**defaults['simulation'], **full_config.get('simulation', {})} >>> merged['episodes'] 200 >>> merged['particles'] 500
>>> # Clean up >>> os.unlink(temp_config_path)
- Configuration Best Practices:
File Organization: - Use descriptive filenames (e.g., pomcp_tiger_baseline.yaml) - Organize configs by environment or study type - Include version information in complex configurations
Parameter Naming: - Use consistent naming conventions across configurations - Group related parameters under sections - Include comments explaining non-obvious parameters
Default Handling: - Define sensible defaults for optional parameters - Use inheritance or merging for parameter variants - Validate required parameters after loading
- Security Considerations:
Uses yaml.safe_load() to prevent code execution
Suitable for loading user-provided configuration files
Automatically handles standard YAML data types safely
Does not support custom Python object instantiation
POMDPPlanners.utils.config_to_id module
- class POMDPPlanners.utils.config_to_id.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
JSONEncoderCustom JSON encoder for handling NumPy arrays and other NumPy types
- default(o)[source]
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
POMDPPlanners.utils.distributed_computing module
- POMDPPlanners.utils.distributed_computing.run_distributed(func, kwargs_list, num_cpus=1, num_gpus=0, description='Running distributed tasks', unit='task', address=None, namespace='POMDPPlanners', runtime_env=None)[source]
Run a function in parallel across multiple machines using Ray.
- Parameters:
func (
Callable) – The function to run in parallelkwargs_list (
List[Dict[str,Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function callnum_cpus (
int) – Number of CPUs to allocate per tasknum_gpus (
int) – Number of GPUs to allocate per taskdescription (
str) – Description for the progress barunit (
str) – Unit label for the progress baraddress (
Optional[str]) – Ray cluster address to connect to (if None, starts a local cluster)namespace (
str) – Ray namespace for the tasksruntime_env (
Optional[dict]) – Runtime environment configuration for Ray
- Return type:
- Returns:
List of results from each function call
- POMDPPlanners.utils.distributed_computing.run_parallel_locally(func, kwargs_list, n_jobs=1, description='Running parallel tasks', unit='task', cache_dir=None)[source]
Run a function in parallel with different keyword argument sets using joblib.
- Parameters:
func (
Callable) – The function to run in parallelkwargs_list (
List[Dict[str,Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function calln_jobs (
int) – Number of parallel jobs to usedescription (
str) – Description for the progress barunit (
str) – Unit label for the progress barcache_dir (
Optional[str]) – Directory to store cached results. If None, caching is disabled.
- Return type:
- Returns:
List of results from each function call
POMDPPlanners.utils.hyperparameter_tuning_and_eval module
POMDPPlanners.utils.logger module
- class POMDPPlanners.utils.logger.ConditionalMemoryHandler(capacity, target)[source]
Bases:
MemoryHandlerMemory handler that buffers logs and only flushes on failure or explicit request.
This handler is designed for failure-only logging scenarios where you want to capture all logs during execution but only write them to disk/console when a failure occurs. This dramatically reduces I/O overhead for successful operations.
- should_flush
Flag to manually trigger flushing of buffered logs
- class POMDPPlanners.utils.logger.QueueLoggerManager(max_handlers=100, cleanup_interval=60, handler_timeout=300)[source]
Bases:
objectCentralized queue-based logger manager with individual task log files.
This manager solves the “too many open files” problem in heavy multiprocessing scenarios by using a single writer thread that handles all file I/O operations. Worker processes only interact with a memory queue, eliminating file descriptor leaks while maintaining individual log files per task.
Key benefits: - Workers never open file handles directly (prevents FD leaks) - Single writer thread manages individual file handlers per task - Smart handler pooling and cleanup - Maintains individual log files as before - Scales to thousands of concurrent processes
- class POMDPPlanners.utils.logger.TaskLoggerManager[source]
Bases:
objectManages task logger configuration and buffered handlers without polluting logger objects.
This manager maintains a registry of configured loggers and their associated memory handlers, eliminating the need to set dynamic attributes on logger objects.
- _configured_loggers
Dictionary mapping logger names to their configuration state
- _memory_handlers
Dictionary mapping logger names to their ConditionalMemoryHandler list
- _lock
Thread lock for safe concurrent access
- cleanup_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]
Clean up task logger resources with buffering awareness.
Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes
- get_or_create_logger(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]
Get or create a task logger with optional buffering.
- Parameters:
logger_name (
str) – Unique name for the loggeroutput_dir (
Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)debug (
bool) – Enable debug mode with verbose loggingconsole_output (
bool) – Enable/disable console outputuse_queue (
bool) – Enable queue-based logging for multiprocessinglog_only_on_failure (
bool) – Buffer logs in memory and only flush on failure
- Return type:
- Returns:
Configured logger instance ready for task execution
- POMDPPlanners.utils.logger.cleanup_all_loggers()[source]
Emergency cleanup of all logging resources.
This function should be called when shutting down the application to ensure all logging resources are properly cleaned up.
- POMDPPlanners.utils.logger.cleanup_task_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]
Clean up task logger resources with buffering awareness.
Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes
- POMDPPlanners.utils.logger.flush_buffered_task_logs(logger_name)[source]
Flush buffered logs to file/console when failure occurs.
This function triggers the flush of all buffered log records for loggers configured with ConditionalMemoryHandler (when log_only_on_failure is enabled).
- POMDPPlanners.utils.logger.get_logger(name, level=20, output_dir=None, debug=False, console_output=True, use_queue=False)[source]
Get a configured logger for POMDP experiments and algorithm execution.
This utility creates standardized loggers for tracking experimental progress, algorithm execution, and debugging information. It supports both console and file logging with configurable levels and formatting.
- Parameters:
name (
str) – Logger identifier (typically module name or algorithm name)level (
int) – Base logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)output_dir (
Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)debug (
bool) – Enable debug mode with verbose logging and detailed formattingconsole_output (
bool) – Enable/disable console output (useful for batch experiments)use_queue (
bool) – Enable queue-based logging for heavy multiprocessing scenarios
- Return type:
- Returns:
Configured logger instance ready for use
Note
Queue-based logging is recommended for heavy multiprocessing workloads to prevent “too many open files” errors. It uses a single writer thread to handle all file I/O operations while workers only interact with a memory queue, maintaining individual log files per task.
Example
Basic console logging:
>>> import logging >>> logger = get_logger("POMCP_Tiger", level=logging.INFO) >>> logger.info("Starting POMCP planning")
Example
File logging with output directory:
>>> from pathlib import Path >>> logger = get_logger( ... name="Experiment", ... output_dir=Path("/tmp/test_logs"), ... console_output=False ... ) >>> logger.info("Test message")
Example
Debug mode with detailed logging:
>>> debug_logger = get_logger( ... name="Debug_Test", ... debug=True, ... console_output=False ... ) >>> debug_logger.debug("Debug info") >>> debug_logger.info("Regular info")
- Logging Best Practices:
DEBUG: Detailed algorithm internals, variable values
INFO: Algorithm progress, major milestones, results
WARNING: Suboptimal conditions, parameter issues
ERROR: Algorithm failures, invalid inputs
CRITICAL: System failures, experiment termination
Note
Log files are automatically organized with timestamps in output_dir/logs/. The logger is thread-safe for use with distributed computing frameworks.
- POMDPPlanners.utils.logger.get_queue_logger_diagnostics()[source]
Get diagnostic information about the queue-based logging system.
- POMDPPlanners.utils.logger.get_queue_logger_manager()[source]
Get the global queue logger manager instance.
- Return type:
- POMDPPlanners.utils.logger.get_task_logger_manager()[source]
Get the global task logger manager instance.
- Return type:
- POMDPPlanners.utils.logger.reset_logger_state()[source]
Reset the global logger state for testing.
This function ensures clean state between test runs by: - Stopping any running queue manager - Clearing the global singleton - Resetting the task logger manager - Removing all Python loggers created by this module
- POMDPPlanners.utils.logger.setup_task_logger_with_buffering(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]
Set up a task logger with optional buffering for failure-only logging.
This is a convenience function that delegates to TaskLoggerManager to avoid setting dynamic attributes on logger objects.
- Parameters:
logger_name (
str) – Unique name for the loggeroutput_dir (
Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)debug (
bool) – Enable debug mode with verbose loggingconsole_output (
bool) – Enable/disable console outputuse_queue (
bool) – Enable queue-based logging for multiprocessinglog_only_on_failure (
bool) – Buffer logs in memory and only flush on failure
- Return type:
- Returns:
Configured logger instance ready for task execution
POMDPPlanners.utils.memory_tracker module
Memory tracking utilities for POMDP simulations.
This module provides memory tracking capabilities for monitoring memory usage during simulation execution. It includes both lightweight monitoring for production use and detailed profiling for development and debugging.
Key Features: - Conditional profiling (only when enabled) - Multiple tracking modes (lightweight, detailed, sampling-based) - Memory leak detection with configurable thresholds - Integration with existing logging infrastructure - Support for tracemalloc, psutil, and memory_profiler
- class POMDPPlanners.utils.memory_tracker.MemoryTracker(logger=None, enable_tracking=None, tracking_mode='lightweight', sample_rate=1.0, interval_seconds=5.0, leak_threshold_mb=100.0)[source]
Bases:
objectMemory tracking utility for simulator operations.
Provides configurable memory monitoring with minimal overhead when disabled. Supports multiple tracking modes and integrates with existing logging.
- Parameters:
- compare_snapshots(label1, label2, limit=10)[source]
Compare tracemalloc snapshots between two checkpoints.
POMDPPlanners.utils.multivariate_normal module
Module for efficient multivariate normal distribution with pre-computed Cholesky decomposition.
This module provides a multivariate normal distribution implementation optimized for scenarios where the covariance matrix is fixed but the mean varies (e.g., POMDP observation models). The expensive Cholesky decomposition O(n^3) is computed once during initialization, making subsequent sampling and PDF evaluations O(n^2).
- Classes:
CovarianceParameterizedMultivariateNormal: Efficient multivariate normal with pre-computed Cholesky
- class POMDPPlanners.utils.multivariate_normal.CovarianceParameterizedMultivariateNormal(covariance, validate=True)[source]
Bases:
objectMultivariate normal distribution with pre-computed Cholesky decomposition.
This class provides efficient sampling and PDF computation for multivariate normal distributions where the covariance matrix is fixed but the mean can vary. The Cholesky decomposition is computed once during initialization, enabling O(n^2) operations for sampling and PDF evaluation.
This class does NOT inherit from the Distribution base class because it has a different interface - the mean is passed to methods rather than the constructor.
- covariance
The covariance matrix (read-only property).
- dim
Dimensionality of the distribution.
Example
>>> import numpy as np >>> np.random.seed(42) >>> >>> # Create distribution with fixed covariance >>> cov = np.array([[1.0, 0.5], [0.5, 2.0]]) >>> mvn = CovarianceParameterizedMultivariateNormal(cov) >>> >>> # Sample with different means >>> mean1 = np.array([0.0, 0.0]) >>> samples1 = mvn.sample(mean1, n_samples=3) >>> samples1.shape (3, 2) >>> >>> # Compute PDF >>> values = np.array([[0.0, 0.0], [1.0, 1.0]]) >>> pdf_values = mvn.pdf(values, mean1) >>> pdf_values.shape (2,)
- log_pdf(values, mean)[source]
Compute the log probability density function.
More numerically stable than pdf() for small probability values.
- Parameters:
- Return type:
- Returns:
Array of shape (n,) containing log PDF values at each point.
- Raises:
ValueError – If dimensions don’t match.
- pdf(values, mean)[source]
Compute the probability density function.
Uses the pre-computed Cholesky decomposition for efficient computation: y = solve_triangular(L, (x - mean).T) mahalanobis_sq = sum(y^2) pdf = exp(log_normalization - 0.5 * mahalanobis_sq)
- Parameters:
- Return type:
- Returns:
Array of shape (n,) containing PDF values at each point.
- Raises:
ValueError – If dimensions don’t match.
- sample(mean, n_samples=1)[source]
Sample from the multivariate normal distribution.
Generates samples using the transformation: x = mean + L @ z, where z ~ N(0, I) and L is the Cholesky factor of the covariance.
- Parameters:
- Return type:
- Returns:
Array of shape (n_samples, dim) containing the samples.
- Raises:
ValueError – If mean dimension doesn’t match covariance dimension.
POMDPPlanners.utils.planner_episode_visualization module
POMDPPlanners.utils.simulations_caching module
- POMDPPlanners.utils.simulations_caching.cache_episode_simulation_results(environment, policy, initial_belief, results, cache_dir_path, general_config={})[source]
- POMDPPlanners.utils.simulations_caching.get_cache_key(environment, policy, initial_belief, general_config={})[source]
- Return type:
- Parameters:
environment (Environment)
policy (Policy)
initial_belief (Belief)
general_config (dict)
POMDPPlanners.utils.statistics_utils module
- POMDPPlanners.utils.statistics_utils.aggregate_weights_for_duplicate_values(values, weights)[source]
Aggregate weights for duplicate values to ensure unique values.
When the same value appears multiple times in the values array with different weights, this function combines them into a single entry with the sum of all weights for that value. This is useful for discrete distributions where duplicate values should be treated as a single outcome with aggregated probability.
- Parameters:
- Returns:
unique_values: Array of unique values (sorted)
aggregated_weights: Array of weights corresponding to unique values, normalized to sum to 1
- Return type:
- Raises:
ValueError – If arrays are empty or have mismatched lengths
Example
>>> import numpy as np >>> from POMDPPlanners.utils.statistics_utils import aggregate_weights_for_duplicate_values >>> values = np.array([1.0, 2.0, 2.0, 3.0]) >>> weights = np.array([0.3, 0.2, 0.3, 0.2]) >>> unique_vals, agg_weights = aggregate_weights_for_duplicate_values(values, weights) >>> unique_vals array([1., 2., 3.]) >>> bool(np.isclose(agg_weights, np.array([0.3, 0.5, 0.2])).all()) True >>> bool(np.isclose(np.sum(agg_weights), 1.0)) True
- POMDPPlanners.utils.statistics_utils.confidence_interval(data, confidence=0.95)[source]
Calculate confidence interval for the mean using t-distribution.
Computes confidence intervals for algorithm performance means, providing statistical bounds on the true expected performance. This is essential for making statistically sound comparisons between POMDP algorithms.
Uses the t-distribution to account for sample size uncertainty, which is more appropriate than normal distribution for small to moderate sample sizes common in POMDP experiments.
- Parameters:
data (array-like) – Sample data (algorithm returns, rewards, or performance metrics)
confidence (float) – Confidence level (0 < confidence < 1, typically 0.95)
- Returns:
Tuple of (lower_bound, upper_bound) for the confidence interval
- Raises:
ValueError – If insufficient data or contains NaN values
- Return type:
Example
Statistical comparison of algorithm performance:
>>> import numpy as np >>> # Algorithm performance from multiple runs >>> pomcp_rewards = [12.3, 11.8, 13.1, 12.7, 11.9, 12.5, 13.0, 12.1, 12.8, 12.4] >>> pft_rewards = [11.5, 13.2, 12.8, 11.9, 12.3, 13.5, 12.1, 12.9, 11.7, 12.6] >>> >>> # Calculate 95% confidence intervals >>> pomcp_ci = confidence_interval(pomcp_rewards, confidence=0.95) >>> pft_ci = confidence_interval(pft_rewards, confidence=0.95) >>> >>> # Verify confidence intervals are tuples with two elements >>> isinstance(pomcp_ci, tuple) and len(pomcp_ci) == 2 True >>> isinstance(pft_ci, tuple) and len(pft_ci) == 2 True >>> >>> # Verify confidence intervals contain the mean >>> pomcp_mean = np.mean(pomcp_rewards) >>> pft_mean = np.mean(pft_rewards) >>> bool(pomcp_ci[0] <= pomcp_mean <= pomcp_ci[1]) True >>> bool(pft_ci[0] <= pft_mean <= pft_ci[1]) True >>> >>> # Verify lower bound is less than upper bound >>> pomcp_ci[0] < pomcp_ci[1] True >>> pft_ci[0] < pft_ci[1] True
Calculate the confidence interval for the mean of a dataset using the t-distribution.
- Parameters:
data (array-like) – Sample data
confidence (float) – Confidence level (default 0.95 for 95%)
- Returns:
(lower_bound, upper_bound) of the confidence interval
- Return type:
- Raises:
ValueError – If data contains NaN values or has insufficient samples
- POMDPPlanners.utils.statistics_utils.cvar_bound_const_eps(y_samp, y_sup, y_inf, eps, alpha=0.05)[source]
Calculate bounds for CVaR using a constant epsilon parameter.
- Parameters:
- Returns:
(lower_bound, upper_bound) representing the bounds for CVaR
- Return type:
- Raises:
ValueError – If alpha or eps are not between 0 and 1, or if y_samp is empty
- POMDPPlanners.utils.statistics_utils.cvar_confidence_interval(data, alpha=0.95, delta=0.05, dist_lower_bound=None, dist_upper_bound=None)[source]
Calculate the confidence interval for the CVaR of a dataset using the t-distribution.
- Parameters:
data – Array of values
alpha – Confidence level (default 0.95 for 95%)
delta – Significance level for the probabilistic bounds (default 0.05)
dist_lower_bound (
Optional[float]) – Known lower bound of the distribution support. If None, uses min(data) as a conservative data-driven fallback.dist_upper_bound (
Optional[float]) – Known upper bound of the distribution support. If None, uses max(data) as a conservative data-driven fallback.
- Returns:
(lower_bound, upper_bound) of the confidence interval
- Return type:
- Raises:
ValueError – If data contains NaN values or has insufficient samples
- POMDPPlanners.utils.statistics_utils.cvar_estimator(vec, alpha)[source]
Calculate Conditional Value at Risk (CVaR) for risk-sensitive POMDP evaluation.
CVaR measures the expected value of the worst-case outcomes, providing a risk-sensitive performance metric that goes beyond simple mean rewards. This is particularly valuable for safety-critical applications where tail risk matters more than average performance.
- Mathematical Definition:
CVaR_α(X) = E[X | X ≥ VaR_α(X)]
Where VaR_α is the Value at Risk at confidence level α.
The implementation uses a vectorized approach for computational efficiency, calculating CVaR by integrating over the tail distribution above the α-quantile.
- Parameters:
- Return type:
- Returns:
CVaR value representing expected worst-case performance
- Raises:
ValueError – If alpha not in [0,1] or vec is empty
Example
Risk analysis of POMDP algorithm performance:
>>> import numpy as np >>> from POMDPPlanners.utils.statistics_utils import cvar_estimator
>>> # Simulate algorithm returns from multiple episodes >>> returns = np.array([12.5, 8.3, 15.7, -2.1, 9.8, 13.2, 6.4, 11.0, -1.5, 14.3]) >>> len(returns) 10
>>> # Calculate risk metrics >>> mean_return = np.mean(returns) >>> bool(mean_return > 8.0) # Check reasonable mean True >>> cvar_90 = cvar_estimator(returns, alpha=0.9) # Worst 10% outcomes >>> cvar_95 = cvar_estimator(returns, alpha=0.95) # Worst 5% outcomes >>> isinstance(cvar_90, (float, np.floating)) True >>> isinstance(cvar_95, (float, np.floating)) True >>> cvar_95 <= cvar_90 # CVaR should be lower for higher alpha True
Example
Comparing algorithm risk profiles:
>>> # Algorithm performance data from experiments >>> pomcp_returns = np.array([10.2, 12.8, 9.5, 11.3, 8.7, 12.1, 10.9, 9.8, 11.5, 10.4]) >>> pft_returns = np.array([15.1, 7.2, 14.8, 13.3, 6.9, 15.5, 8.1, 14.2, 12.7, 9.3]) >>> pomcp_cvar = cvar_estimator(pomcp_returns, alpha=0.9) >>> pft_cvar = cvar_estimator(pft_returns, alpha=0.9)
- Risk Assessment Applications:
Portfolio Analysis: Compare multiple algorithms’ risk-return profiles
Safety-Critical Systems: Evaluate worst-case performance guarantees
Robust Planning: Select algorithms with acceptable tail risk
Performance Bounds: Establish confidence intervals for worst-case scenarios
- Mathematical Properties:
Monotonic: CVaR_α ≥ VaR_α (CVaR is always at least as large as VaR)
Coherent: Satisfies subadditivity, monotonicity, positive homogeneity
Tail Sensitivity: Lower α values emphasize extreme outcomes more
Computational: More stable than VaR, especially for small samples
- POMDPPlanners.utils.statistics_utils.cvar_estimator_from_dist(values, weights, alpha)[source]
Calculate the Conditional Value at Risk (CVaR) from a discrete distribution.
- Parameters:
- Returns:
The CVaR value
- Return type:
- Raises:
ValueError – If alpha is not between 0 and 1, if arrays are empty, or if weights don’t sum to 1
- POMDPPlanners.utils.statistics_utils.cvar_probabilistic_lower_bound_thomas(vec, alpha, delta, dist_lower_bound)[source]
Calculate a probabilistic lower bound for CVaR using Thomas’s method.
- Parameters:
- Returns:
The probabilistic lower bound for CVaR
- Return type:
- Raises:
ValueError – If alpha or delta are not between 0 and 1, or if vec is empty
- POMDPPlanners.utils.statistics_utils.cvar_probabilistic_upper_bound_thomas(vec, alpha, delta, dist_upper_bound)[source]
Calculate a probabilistic upper bound for CVaR using Thomas’s method.
- Parameters:
- Returns:
The probabilistic upper bound for CVaR
- Return type:
- Raises:
ValueError – If alpha or delta are not between 0 and 1, or if vec is empty
- POMDPPlanners.utils.statistics_utils.get_min_and_max_cost(min_immediate_cost, max_immediate_cost, depth, max_depth, gamma)[source]
Calculate the minimum and maximum costs over a time horizon using a discount factor.
- Parameters:
- Returns:
(min_cost, max_cost) representing the minimum and maximum costs over the time horizon
- Return type:
- Raises:
ValueError – If gamma is not between 0 and 1
- POMDPPlanners.utils.statistics_utils.quantile_confidence_interval(data, alpha=0.95, conf_level=0.95)[source]
data: 1D array-like of samples alpha: target quantile (e.g. 0.95 for 95% VaR) conf_level: overall coverage (e.g. 0.95 for 95% CI) Returns: (lower_value, upper_value, k1, k2)
- POMDPPlanners.utils.statistics_utils.tv_distance(p, q, n_samples=1000, method='grid', **kwargs)[source]
Compute Total Variation distance between two distributions.
Total Variation distance measures how different two probability distributions are, with values ranging from 0 (identical) to 1 (completely different).
- Parameters:
p (
Distribution) – First distributionq (
Distribution) – Second distributionn_samples (
int) – Number of samples (method-dependent usage)method (
str) – Estimation method - “grid”, “monte_carlo”, “averaged”, “mixture”**kwargs – Additional method-specific parameters: For “grid”: x_min, x_max, n_points For “averaged”: n_runs For “mixture”: (uses n_samples directly) For “monte_carlo”: (uses n_samples directly)
- Return type:
- Returns:
TV distance estimate
Example
>>> import numpy as np >>> from POMDPPlanners.core.distributions import DiscreteDistribution >>> np.random.seed(42) >>> # Create two similar discrete distributions >>> values = [0, 1, 2, 3, 4] >>> p_probs = np.array([0.2, 0.3, 0.25, 0.15, 0.1]) >>> q_probs = np.array([0.25, 0.25, 0.25, 0.15, 0.1]) >>> p = DiscreteDistribution(values, p_probs) >>> q = DiscreteDistribution(values, q_probs) >>> tv = tv_distance(p, q, method="monte_carlo", n_samples=10000) >>> isinstance(tv, float) True >>> 0.0 <= tv <= 1.0 True
- POMDPPlanners.utils.statistics_utils.tv_distance_averaged(p, q, n_samples=1000, n_runs=10)[source]
Compute TV distance by averaging multiple independent estimates.
Reduces variance by sqrt(n_runs) compared to single run.
- Parameters:
p (
Distribution) – First distributionq (
Distribution) – Second distributionn_samples (
int) – Number of samples per runn_runs (
int) – Number of independent runs to average
- Return type:
- Returns:
Average TV distance estimate
- POMDPPlanners.utils.statistics_utils.tv_distance_grid(p, q, x_min=-5.0, x_max=5.0, n_points=10000)[source]
Compute TV distance using grid-based numerical integration.
This method has zero sampling variance and is deterministic. Works well for continuous distributions with known support.
- Parameters:
p (
Distribution) – First distributionq (
Distribution) – Second distributionx_min (
float) – Lower bound of integration rangex_max (
float) – Upper bound of integration rangen_points (
int) – Number of grid points (higher = more accurate)
- Return type:
- Returns:
TV distance between p and q
- POMDPPlanners.utils.statistics_utils.tv_distance_mixture_sampling(p, q, n_samples=2000)[source]
Compute TV distance using mixture sampling for better coverage.
Samples from mixture (p + q) / 2 to ensure good coverage of both distributions’ support.
- Parameters:
p (
Distribution) – First distributionq (
Distribution) – Second distributionn_samples (
int) – Number of samples from mixture
- Return type:
- Returns:
TV distance estimate
- POMDPPlanners.utils.statistics_utils.tv_distance_monte_carlo(p, q, n_samples=1000)[source]
Compute TV distance using basic Monte Carlo sampling (original method).
This is the original implementation that samples randomly from both distributions. Has higher variance than other methods.
- Parameters:
p (
Distribution) – First distributionq (
Distribution) – Second distributionn_samples (
int) – Number of samples
- Return type:
- Returns:
TV distance estimate
POMDPPlanners.utils.tree_statistics module
- class POMDPPlanners.utils.tree_statistics.TreeMetrics(*values)[source]
Bases:
EnumMetric names for MCTS tree statistics.
- ACTIONS_VISIT_COUNT_ENTROPY = 'actions_visit_count_entropy'
- IS_LEAF = 'is_leaf'
- MAX_ACTIONS_VISIT_COUNT = 'max_actions_visit_count'
- MIN_ACTIONS_VISIT_COUNT = 'min_actions_visit_count'
- N_ACTIONS_FROM_ROOT = 'n_actions_from_root'
- ROOT_VISIT_COUNT = 'root_visit_count'
- TREE_MAX_DEPTH = 'tree_max_depth'
- POMDPPlanners.utils.tree_statistics.compute_tree_metrics(tree)[source]
Compute comprehensive statistics for MCTS tree analysis and debugging.
Extracts key metrics from MCTS search trees to understand algorithm behavior, convergence properties, and search quality. These metrics are essential for algorithm debugging, parameter tuning, and performance analysis.
The function analyzes the root belief node’s action children to compute visitation statistics and exploration patterns that indicate search quality.
- Parameters:
tree (
BeliefNode) – Root belief node of the MCTS search tree- Returns:
min_actions_visit_count: Minimum visits to any action
max_actions_visit_count: Maximum visits to any action
actions_visit_count_entropy: Shannon entropy of action visit distribution
n_actions_from_root: Number of actions from the root node
root_visit_count: Number of visits to the root node
- Return type:
- Raises:
TypeError – If tree is not a BeliefNode instance
Example
>>> import numpy as np >>> np.random.seed(42) # For reproducible results >>> >>> from POMDPPlanners.utils.tree_statistics import compute_tree_metrics >>> from POMDPPlanners.planners.mcts_planners.pomcp import POMCP >>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP >>> from POMDPPlanners.core.belief import get_initial_belief >>> >>> # Create POMCP planner and run planning >>> env = TigerPOMDP(discount_factor=0.95) >>> planner = POMCP( ... environment=env, ... discount_factor=0.95, ... depth=20, ... exploration_constant=1.0, ... name="POMCP_Analysis", ... n_simulations=100 ... ) >>> >>> initial_belief = get_initial_belief(env, n_particles=200) >>> action, run_data = planner.action(initial_belief) >>> >>> # Extract tree metrics from run data >>> metrics = run_data.info_variables >>> len(metrics) > 0 True >>> import numbers >>> isinstance(metrics[0].value, numbers.Number) True
- Metric Interpretation:
min_actions_visit_count: - Higher values indicate more balanced exploration - Very low values suggest some actions are barely explored - Zero indicates leaf node (no actions expanded)
max_actions_visit_count: - Shows most visited action’s exploration intensity - Compare with total simulations to assess concentration - Higher values indicate convergence to preferred action
actions_visit_count_entropy: - Measures uniformity of action exploration - Higher entropy = more uniform exploration - Lower entropy = more concentrated search - log₂(|A|) is maximum possible entropy for |A| actions
Visit Ratio (max/min): - 1.0 = perfectly uniform exploration - Higher values = more concentrated/converged search - Very high ratios may indicate premature convergence
- Algorithm Debugging Applications:
Parameter Tuning: - Low entropy → increase exploration parameter - High min visits but low entropy → decrease simulations - Extreme visit ratios → adjust exploration/exploitation balance
Convergence Analysis: - Entropy decreasing over time indicates convergence - Stable visit ratios suggest algorithm has converged - Oscillating metrics may indicate unstable parameters
Comparative Analysis: - Compare entropy across algorithms to assess exploration quality - Use visit patterns to understand different search strategies - Identify algorithms with better exploration-exploitation tradeoffs
- POMDPPlanners.utils.tree_statistics.get_v_values_sample(action_node)[source]
- Return type:
- Parameters:
action_node (ActionNode)
POMDPPlanners.utils.weighted_particle_beliefs module
- class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefContinuousLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05, reinvigoration_cov_matrix=array([[1., 0.], [0., 1.]]))[source]
Bases:
WeightedParticleBeliefReinvigoration- Parameters:
- reinvigorate(action, observation, pomdp, belief)[source]
Implement reinvigoration for specific POMDP environment.
- Return type:
- Parameters:
action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)
- class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDark(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]
Bases:
WeightedParticleBeliefReinvigoration- Parameters:
- reinvigorate(action, observation, pomdp, belief)[source]
Implement reinvigoration for specific POMDP environment.
- Return type:
- Parameters:
action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)
- class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05)[source]
Bases:
WeightedParticleBeliefReinvigoration- Parameters:
- reinvigorate(action, observation, pomdp, belief)[source]
Implement reinvigoration for specific POMDP environment.
- Return type:
- Parameters:
action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)
- class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefSanityPOMDP(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]
Bases:
WeightedParticleBeliefReinvigoration- Parameters:
- POMDPPlanners.utils.weighted_particle_beliefs.create_belief(environment, belief_config)[source]
Create a belief instance from a belief config.
- Parameters:
environment (
Environment) – The POMDP environmentbelief_config (
BeliefConfig) – BeliefConfig object for the belief
- Return type:
- Returns:
An instance of the specified belief class