POMDPPlanners.utils package<a class="headerlink" href="#module-POMDPPlanners.utils" title="Link to this heading">

Returns:

A configured Belief object.

Raises:

ValueError – If belief_type is not supported by the environment.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> belief = create_environment_belief(env, n_particles=100)
>>> belief.sample() in ["tiger_left", "tiger_right"]
True

POMDPPlanners.utils.config_loader module

POMDPPlanners.utils.config_loader.load_config(config_path)[source]

Load and parse YAML configuration files for POMDP experiments.

This utility function provides a standardized way to load experimental configurations from YAML files. It’s commonly used to configure environments, planners, and simulation parameters for reproducible experiments.

The function uses PyYAML’s safe_load to prevent execution of arbitrary Python code, making it secure for loading untrusted configuration files.

Parameters:

config_path (str) – Path to the YAML configuration file (absolute or relative)

Return type:

Dict[str, Any]

Returns:

Dictionary containing the parsed configuration parameters

Raises:

FileNotFoundError – If the configuration file doesn’t exist
yaml.YAMLError – If the file contains invalid YAML syntax

Example

Loading experiment configuration:

>>> from POMDPPlanners.utils.config_loader import load_config
>>> import tempfile
>>> import os

>>> # Create a temporary config file for testing
>>> config_content = '''
... environment:
...   name: "TigerPOMDP"
...   discount_factor: 0.95
... planners:
...   - name: "POMCP"
...     n_simulations: 1000
... simulation:
...   episodes_per_run: 100
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(config_content)
...     temp_config_path = f.name

>>> # Load experiment configuration
>>> config = load_config(temp_config_path)
>>> config['environment']['name']
'TigerPOMDP'
>>> config['environment']['discount_factor']
0.95
>>> config['planners'][0]['name']
'POMCP'
>>> config['simulation']['episodes_per_run']
100

>>> # Clean up
>>> os.unlink(temp_config_path)

Example

Using with environment configuration:

>>> # Create a temporary config file with environment and planners
>>> config_content = '''
... environment:
...   name: "TigerPOMDP"
...   discount_factor: 0.95
...   observation_accuracy: 0.85
... planners:
...   - name: "POMCP"
...     n_simulations: 100
...     depth: 5
...     exploration_constant: 1.0
... simulation:
...   episodes_per_run: 10
...   num_runs: 2
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(config_content)
...     temp_config_path = f.name

>>> config = load_config(temp_config_path)

>>> # Create environment from config
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=config['environment']['discount_factor'])
>>> env.discount_factor
0.95

>>> # Access planner configuration
>>> planner_config = config['planners'][0]
>>> planner_config['name']
'POMCP'
>>> planner_config['n_simulations']
100

>>> # Clean up
>>> os.unlink(temp_config_path)

Example

Handling configuration hierarchies and defaults:

>>> # Create a nested configuration file
>>> nested_config_content = '''
... defaults:
...   simulation:
...     episodes: 50
...     particles: 100
... experiments:
...   quick_test:
...     environment: "SanityPOMDP"
...     planners: ["POMCP"]
...   full_study:
...     environment: "TigerPOMDP"
...     planners: ["POMCP", "PFT_DPW"]
...     simulation:
...       episodes: 200
...       particles: 500
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(nested_config_content)
...     temp_config_path = f.name

>>> config = load_config(temp_config_path)

>>> # Access different experiment configurations
>>> defaults = config['defaults']
>>> quick_config = config['experiments']['quick_test']
>>> full_config = config['experiments']['full_study']

>>> # Verify configuration structure
>>> defaults['simulation']['episodes']
50
>>> quick_config['environment']
'SanityPOMDP'
>>> full_config['simulation']['episodes']
200

>>> # Simple config merging test
>>> merged = {**defaults['simulation'], **full_config.get('simulation', {})}
>>> merged['episodes']
200
>>> merged['particles']
500

>>> # Clean up
>>> os.unlink(temp_config_path)

Configuration Best Practices:

File Organization: - Use descriptive filenames (e.g., pomcp_tiger_baseline.yaml) - Organize configs by environment or study type - Include version information in complex configurations

Parameter Naming: - Use consistent naming conventions across configurations - Group related parameters under sections - Include comments explaining non-obvious parameters

Default Handling: - Define sensible defaults for optional parameters - Use inheritance or merging for parameter variants - Validate required parameters after loading

Security Considerations:

Uses yaml.safe_load() to prevent code execution
Suitable for loading user-provided configuration files
Automatically handles standard YAML data types safely
Does not support custom Python object instantiation

POMDPPlanners.utils.config_to_id module

class POMDPPlanners.utils.config_to_id.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

Custom JSON encoder for handling NumPy arrays and other NumPy types

default(o)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

POMDPPlanners.utils.config_to_id.config_to_id(config_dict)[source]

Generate a unique ID from a configuration dictionary using hashing. Handles NumPy arrays and other NumPy types.

Parameters:: config_dict (dict) – The configuration dictionary to hash
Returns:: A unique hash string representing the configuration
Return type:: str

POMDPPlanners.utils.distributed_computing module

POMDPPlanners.utils.distributed_computing.run_distributed(func, kwargs_list, num_cpus=1, num_gpus=0, description='Running distributed tasks', unit='task', address=None, namespace='POMDPPlanners', runtime_env=None)[source]

Run a function in parallel across multiple machines using Ray.

Parameters:

func (Callable) – The function to run in parallel
kwargs_list (List[Dict[str, Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function call
num_cpus (int) – Number of CPUs to allocate per task
num_gpus (int) – Number of GPUs to allocate per task
description (str) – Description for the progress bar
unit (str) – Unit label for the progress bar
address (Optional[str]) – Ray cluster address to connect to (if None, starts a local cluster)
namespace (str) – Ray namespace for the tasks
runtime_env (Optional[dict]) – Runtime environment configuration for Ray

Return type:

List[Any]

Returns:

List of results from each function call

POMDPPlanners.utils.distributed_computing.run_parallel_locally(func, kwargs_list, n_jobs=1, description='Running parallel tasks', unit='task', cache_dir=None)[source]

Run a function in parallel with different keyword argument sets using joblib.

Parameters:

func (Callable) – The function to run in parallel
kwargs_list (List[Dict[str, Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function call
n_jobs (int) – Number of parallel jobs to use
description (str) – Description for the progress bar
unit (str) – Unit label for the progress bar
cache_dir (Optional[str]) – Directory to store cached results. If None, caching is disabled.

Return type:

List[Any]

Returns:

List of results from each function call

POMDPPlanners.utils.hyperparameter_tuning_and_eval module

POMDPPlanners.utils.logger module

class POMDPPlanners.utils.logger.ConditionalMemoryHandler(capacity, target)[source]

Bases: MemoryHandler

Memory handler that buffers logs and only flushes on failure or explicit request.

This handler is designed for failure-only logging scenarios where you want to capture all logs during execution but only write them to disk/console when a failure occurs. This dramatically reduces I/O overhead for successful operations.

Parameters:

capacity (int)
target (Handler)

should_flush: Flag to manually trigger flushing of buffered logs

shouldFlush(record)[source]

Determine if buffered logs should be flushed.

Flushes occur when: - Manual trigger via trigger_flush() - ERROR or CRITICAL level message - Buffer capacity reached

Parameters:: record (LogRecord) – The log record being processed
Return type:: bool
Returns:: True if logs should be flushed, False otherwise

trigger_flush()[source]: Manually trigger flush of all buffered logs to target handler.

class POMDPPlanners.utils.logger.QueueLoggerManager(max_handlers=100, cleanup_interval=60, handler_timeout=300)[source]

Bases: object

Centralized queue-based logger manager with individual task log files.

This manager solves the “too many open files” problem in heavy multiprocessing scenarios by using a single writer thread that handles all file I/O operations. Worker processes only interact with a memory queue, eliminating file descriptor leaks while maintaining individual log files per task.

Key benefits: - Workers never open file handles directly (prevents FD leaks) - Single writer thread manages individual file handlers per task - Smart handler pooling and cleanup - Maintains individual log files as before - Scales to thousands of concurrent processes

Parameters:

max_handlers (int)
cleanup_interval (int)
handler_timeout (int)

get_queue_logger(task_id, cache_dir=None, debug=False, console_output=True)[source]

Get a logger that writes to the centralized queue with individual task file.

Parameters:

task_id (str) – Unique task identifier for individual log file
cache_dir (Optional[Path]) – Directory for log files
debug (bool) – Enable debug mode with verbose logging
console_output (bool) – Enable/disable console output

Return type:

Returns:

Configured logger instance that writes to the queue

start()[source]: Start the background logging thread.

stop()[source]: Stop the background logging thread and cleanup resources.

class POMDPPlanners.utils.logger.TaskLoggerManager[source]

Bases: object

Manages task logger configuration and buffered handlers without polluting logger objects.

This manager maintains a registry of configured loggers and their associated memory handlers, eliminating the need to set dynamic attributes on logger objects.

_configured_loggers: Dictionary mapping logger names to their configuration state

_memory_handlers: Dictionary mapping logger names to their ConditionalMemoryHandler list

_lock: Thread lock for safe concurrent access

cleanup_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]

Clean up task logger resources with buffering awareness.

Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes

Parameters:

logger_name (str) – Name of the logger to clean up
episode_failed (bool) – Whether the episode failed
log_only_on_failure (bool) – Whether buffering is enabled for this logger

Return type:

flush_buffered_logs(logger_name)[source]

Flush buffered logs to file/console when failure occurs.

Parameters:: logger_name (str) – Name of the logger to flush
Return type:: None

get_or_create_logger(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]

Get or create a task logger with optional buffering.

Parameters:

logger_name (str) – Unique name for the logger
output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)
debug (bool) – Enable debug mode with verbose logging
console_output (bool) – Enable/disable console output
use_queue (bool) – Enable queue-based logging for multiprocessing
log_only_on_failure (bool) – Buffer logs in memory and only flush on failure

Return type:

Returns:

Configured logger instance ready for task execution

POMDPPlanners.utils.logger.cleanup_all_loggers()[source]

Emergency cleanup of all logging resources.

This function should be called when shutting down the application to ensure all logging resources are properly cleaned up.

POMDPPlanners.utils.logger.cleanup_task_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]

Clean up task logger resources with buffering awareness.

Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes

Parameters:

logger_name (str) – Name of the logger to clean up
episode_failed (bool) – Whether the episode failed
log_only_on_failure (bool) – Whether buffering is enabled for this logger

Return type:

POMDPPlanners.utils.logger.flush_buffered_task_logs(logger_name)[source]

Flush buffered logs to file/console when failure occurs.

This function triggers the flush of all buffered log records for loggers configured with ConditionalMemoryHandler (when log_only_on_failure is enabled).

Parameters:: logger_name (str) – Name of the logger to flush
Return type:: None

POMDPPlanners.utils.logger.get_logger(name, level=20, output_dir=None, debug=False, console_output=True, use_queue=False)[source]

Get a configured logger for POMDP experiments and algorithm execution.

This utility creates standardized loggers for tracking experimental progress, algorithm execution, and debugging information. It supports both console and file logging with configurable levels and formatting.

Parameters:

name (str) – Logger identifier (typically module name or algorithm name)
level (int) – Base logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)
debug (bool) – Enable debug mode with verbose logging and detailed formatting
console_output (bool) – Enable/disable console output (useful for batch experiments)
use_queue (bool) – Enable queue-based logging for heavy multiprocessing scenarios

Return type:

Returns:

Configured logger instance ready for use

Note

Queue-based logging is recommended for heavy multiprocessing workloads to prevent “too many open files” errors. It uses a single writer thread to handle all file I/O operations while workers only interact with a memory queue, maintaining individual log files per task.

Example

Basic console logging:

>>> import logging
>>> logger = get_logger("POMCP_Tiger", level=logging.INFO)
>>> logger.info("Starting POMCP planning")

Example

File logging with output directory:

>>> from pathlib import Path
>>> logger = get_logger(
...     name="Experiment",
...     output_dir=Path("/tmp/test_logs"),
...     console_output=False
... )
>>> logger.info("Test message")

Example

Debug mode with detailed logging:

>>> debug_logger = get_logger(
...     name="Debug_Test",
...     debug=True,
...     console_output=False
... )
>>> debug_logger.debug("Debug info")
>>> debug_logger.info("Regular info")

Logging Best Practices:

DEBUG: Detailed algorithm internals, variable values
INFO: Algorithm progress, major milestones, results
WARNING: Suboptimal conditions, parameter issues
ERROR: Algorithm failures, invalid inputs
CRITICAL: System failures, experiment termination

Note

Log files are automatically organized with timestamps in output_dir/logs/. The logger is thread-safe for use with distributed computing frameworks.

POMDPPlanners.utils.logger.get_queue_logger_diagnostics()[source]

Get diagnostic information about the queue-based logging system.

Return type:: Dict[str, Any]
Returns:: Dictionary containing diagnostic information including queue size, handler count, writer thread status, and resource usage.

POMDPPlanners.utils.logger.get_queue_logger_manager()[source]

Get the global queue logger manager instance.

Return type:: QueueLoggerManager

POMDPPlanners.utils.logger.get_task_logger_manager()[source]

Get the global task logger manager instance.

Return type:: TaskLoggerManager

POMDPPlanners.utils.logger.reset_logger_state()[source]

Reset the global logger state for testing.

This function ensures clean state between test runs by: - Stopping any running queue manager - Clearing the global singleton - Resetting the task logger manager - Removing all Python loggers created by this module

POMDPPlanners.utils.logger.setup_task_logger_with_buffering(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]

Set up a task logger with optional buffering for failure-only logging.

This is a convenience function that delegates to TaskLoggerManager to avoid setting dynamic attributes on logger objects.

Parameters:

logger_name (str) – Unique name for the logger
output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)
debug (bool) – Enable debug mode with verbose logging
console_output (bool) – Enable/disable console output
use_queue (bool) – Enable queue-based logging for multiprocessing
log_only_on_failure (bool) – Buffer logs in memory and only flush on failure

Return type:

Returns:

Configured logger instance ready for task execution

POMDPPlanners.utils.memory_tracker module

Memory tracking utilities for POMDP simulations.

This module provides memory tracking capabilities for monitoring memory usage during simulation execution. It includes both lightweight monitoring for production use and detailed profiling for development and debugging.

Key Features: - Conditional profiling (only when enabled) - Multiple tracking modes (lightweight, detailed, sampling-based) - Memory leak detection with configurable thresholds - Integration with existing logging infrastructure - Support for tracemalloc, psutil, and memory_profiler

class POMDPPlanners.utils.memory_tracker.MemoryTracker(logger=None, enable_tracking=None, tracking_mode='lightweight', sample_rate=1.0, interval_seconds=5.0, leak_threshold_mb=100.0)[source]

Bases: object

Memory tracking utility for simulator operations.

Provides configurable memory monitoring with minimal overhead when disabled. Supports multiple tracking modes and integrates with existing logging.

Parameters:

logger (Logger | None)
enable_tracking (bool | None)
tracking_mode (str)
sample_rate (float)
interval_seconds (float)
leak_threshold_mb (float)

checkpoint(label, force_gc=True)[source]

Record memory checkpoint.

Parameters:

label (str) – Label for this checkpoint.
force_gc (bool) – Whether to force garbage collection before measurement.

Return type:

Dict[str, Any]

Returns:

Dictionary with memory information.

cleanup()[source]

Clean up tracking resources.

Return type:: None

compare_snapshots(label1, label2, limit=10)[source]

Compare tracemalloc snapshots between two checkpoints.

Parameters:

label1 (str) – Label of first checkpoint.
label2 (str) – Label of second checkpoint.
limit (int) – Maximum number of differences to show.

Return type:

detect_memory_leak(threshold_mb=None)[source]

Detect potential memory leaks.

Parameters:: threshold_mb (Optional[float]) – Threshold for leak detection. Uses instance default if None.
Return type:: bool
Returns:: True if potential leak detected.

get_memory_growth()[source]

Get total memory growth from first to last checkpoint.

Return type:: float

get_peak_usage()[source]

Get peak memory usage in MB.

Return type:: float

log_summary()[source]

Log memory usage summary.

Return type:: None

save_report(filepath)[source]

Save memory tracking report to file.

Parameters:: filepath (Path) – Path to save the report.
Return type:: None

POMDPPlanners.utils.memory_tracker.memory_monitor(threshold_mb=1000.0, logger=None, enable_tracking=True)[source]

Context manager for memory monitoring.

Parameters:

threshold_mb (float) – Alert threshold for memory usage.
logger (Optional[Logger]) – Logger instance for output.
enable_tracking (bool) – Whether to enable tracking.

Yields:

MemoryTracker instance.

POMDPPlanners.utils.memory_tracker.profile_memory(func)[source]

Decorator for memory profiling functions.

Only profiles if memory_profiler is available and tracking is enabled.

POMDPPlanners.utils.multivariate_normal module

Module for efficient multivariate normal distribution with pre-computed Cholesky decomposition.

This module provides a multivariate normal distribution implementation optimized for scenarios where the covariance matrix is fixed but the mean varies (e.g., POMDP observation models). The expensive Cholesky decomposition O(n^3) is computed once during initialization, making subsequent sampling and PDF evaluations O(n^2).

Classes:: CovarianceParameterizedMultivariateNormal: Efficient multivariate normal with pre-computed Cholesky

class POMDPPlanners.utils.multivariate_normal.CovarianceParameterizedMultivariateNormal(covariance, validate=True)[source]

Bases: object

Multivariate normal distribution with pre-computed Cholesky decomposition.

This class provides efficient sampling and PDF computation for multivariate normal distributions where the covariance matrix is fixed but the mean can vary. The Cholesky decomposition is computed once during initialization, enabling O(n^2) operations for sampling and PDF evaluation.

This class does NOT inherit from the Distribution base class because it has a different interface - the mean is passed to methods rather than the constructor.

Parameters:

covariance (ndarray)
validate (bool)

covariance: The covariance matrix (read-only property).

dim: Dimensionality of the distribution.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>>
>>> # Create distribution with fixed covariance
>>> cov = np.array([[1.0, 0.5], [0.5, 2.0]])
>>> mvn = CovarianceParameterizedMultivariateNormal(cov)
>>>
>>> # Sample with different means
>>> mean1 = np.array([0.0, 0.0])
>>> samples1 = mvn.sample(mean1, n_samples=3)
>>> samples1.shape
(3, 2)
>>>
>>> # Compute PDF
>>> values = np.array([[0.0, 0.0], [1.0, 1.0]])
>>> pdf_values = mvn.pdf(values, mean1)
>>> pdf_values.shape
(2,)

property covariance: ndarray: Return a copy of the covariance matrix.

property dim: int: Return the dimensionality of the distribution.

log_pdf(values, mean)[source]

Compute the log probability density function.

More numerically stable than pdf() for small probability values.

Parameters:

values (ndarray) – Array of shape (n, dim) or (dim,) containing points to evaluate.
mean (ndarray) – Mean vector of shape (dim,).

Return type:

ndarray

Returns:

Array of shape (n,) containing log PDF values at each point.

Raises:

ValueError – If dimensions don’t match.

pdf(values, mean)[source]

Compute the probability density function.

Uses the pre-computed Cholesky decomposition for efficient computation: y = solve_triangular(L, (x - mean).T) mahalanobis_sq = sum(y^2) pdf = exp(log_normalization - 0.5 * mahalanobis_sq)

Parameters:

values (ndarray) – Array of shape (n, dim) or (dim,) containing points to evaluate.
mean (ndarray) – Mean vector of shape (dim,).

Return type:

ndarray

Returns:

Array of shape (n,) containing PDF values at each point.

Raises:

ValueError – If dimensions don’t match.

sample(mean, n_samples=1)[source]

Sample from the multivariate normal distribution.

Generates samples using the transformation: x = mean + L @ z, where z ~ N(0, I) and L is the Cholesky factor of the covariance.

Parameters:

mean (ndarray) – Mean vector of shape (dim,).
n_samples (int) – Number of samples to generate. Defaults to 1.

Return type:

ndarray

Returns:

Array of shape (n_samples, dim) containing the samples.

Raises:

ValueError – If mean dimension doesn’t match covariance dimension.

POMDPPlanners.utils.planner_episode_visualization module

POMDPPlanners.utils.simulations_caching module

POMDPPlanners.utils.simulations_caching.cache_episode_simulation_results(environment, policy, initial_belief, results, cache_dir_path, general_config={})[source]

Return type:

Parameters:

environment (Environment)
policy (Policy)
initial_belief (Belief)
results (List[History])
cache_dir_path (Path)
general_config (dict)

POMDPPlanners.utils.simulations_caching.get_cache_dir_path(cache_dir_path)[source]

Return type:: Path
Parameters:: cache_dir_path (Path)

POMDPPlanners.utils.simulations_caching.get_cache_key(environment, policy, initial_belief, general_config={})[source]

Return type:

str

Parameters:

environment (Environment)
policy (Policy)
initial_belief (Belief)
general_config (dict)

POMDPPlanners.utils.simulations_caching.load_episode_simulation_results(environment, policy, initial_belief, cache_dir_path, general_config={})[source]

Return type:

List[History]

Parameters:

environment (Environment)
policy (Policy)
initial_belief (Belief)
cache_dir_path (Path)
general_config (dict)

POMDPPlanners.utils.statistics_utils module

POMDPPlanners.utils.statistics_utils.aggregate_weights_for_duplicate_values(values, weights)[source]

Aggregate weights for duplicate values to ensure unique values.

When the same value appears multiple times in the values array with different weights, this function combines them into a single entry with the sum of all weights for that value. This is useful for discrete distributions where duplicate values should be treated as a single outcome with aggregated probability.

Parameters:

values (ndarray) – Array of values (may contain duplicates)
weights (ndarray) – Array of corresponding weights/probabilities

Returns:

unique_values: Array of unique values (sorted)
aggregated_weights: Array of weights corresponding to unique values, normalized to sum to 1

Return type:

Tuple[ndarray, ndarray]

Raises:

ValueError – If arrays are empty or have mismatched lengths

Example

>>> import numpy as np
>>> from POMDPPlanners.utils.statistics_utils import aggregate_weights_for_duplicate_values
>>> values = np.array([1.0, 2.0, 2.0, 3.0])
>>> weights = np.array([0.3, 0.2, 0.3, 0.2])
>>> unique_vals, agg_weights = aggregate_weights_for_duplicate_values(values, weights)
>>> unique_vals
array([1., 2., 3.])
>>> bool(np.isclose(agg_weights, np.array([0.3, 0.5, 0.2])).all())
True
>>> bool(np.isclose(np.sum(agg_weights), 1.0))
True

POMDPPlanners.utils.statistics_utils.confidence_interval(data, confidence=0.95)[source]

Calculate confidence interval for the mean using t-distribution.

Computes confidence intervals for algorithm performance means, providing statistical bounds on the true expected performance. This is essential for making statistically sound comparisons between POMDP algorithms.

Uses the t-distribution to account for sample size uncertainty, which is more appropriate than normal distribution for small to moderate sample sizes common in POMDP experiments.

Parameters:

data (array-like) – Sample data (algorithm returns, rewards, or performance metrics)
confidence (float) – Confidence level (0 < confidence < 1, typically 0.95)

Returns:

Tuple of (lower_bound, upper_bound) for the confidence interval

Raises:

ValueError – If insufficient data or contains NaN values

Return type:

Tuple[float, float]

Example

Statistical comparison of algorithm performance:

>>> import numpy as np
>>> # Algorithm performance from multiple runs
>>> pomcp_rewards = [12.3, 11.8, 13.1, 12.7, 11.9, 12.5, 13.0, 12.1, 12.8, 12.4]
>>> pft_rewards = [11.5, 13.2, 12.8, 11.9, 12.3, 13.5, 12.1, 12.9, 11.7, 12.6]
>>>
>>> # Calculate 95% confidence intervals
>>> pomcp_ci = confidence_interval(pomcp_rewards, confidence=0.95)
>>> pft_ci = confidence_interval(pft_rewards, confidence=0.95)
>>>
>>> # Verify confidence intervals are tuples with two elements
>>> isinstance(pomcp_ci, tuple) and len(pomcp_ci) == 2
True
>>> isinstance(pft_ci, tuple) and len(pft_ci) == 2
True
>>>
>>> # Verify confidence intervals contain the mean
>>> pomcp_mean = np.mean(pomcp_rewards)
>>> pft_mean = np.mean(pft_rewards)
>>> bool(pomcp_ci[0] <= pomcp_mean <= pomcp_ci[1])
True
>>> bool(pft_ci[0] <= pft_mean <= pft_ci[1])
True
>>>
>>> # Verify lower bound is less than upper bound
>>> pomcp_ci[0] < pomcp_ci[1]
True
>>> pft_ci[0] < pft_ci[1]
True

Calculate the confidence interval for the mean of a dataset using the t-distribution.

Parameters:

data (array-like) – Sample data
confidence (float) – Confidence level (default 0.95 for 95%)

Returns:

(lower_bound, upper_bound) of the confidence interval

Return type:

Tuple[float, float]

Raises:

ValueError – If data contains NaN values or has insufficient samples

POMDPPlanners.utils.statistics_utils.cvar_bound_const_eps(y_samp, y_sup, y_inf, eps, alpha=0.05)[source]

Calculate bounds for CVaR using a constant epsilon parameter.

Parameters:

y_samp (ndarray) – Array of sample values
y_sup (float) – Upper bound of the distribution
y_inf (float) – Lower bound of the distribution
eps (float) – Epsilon parameter for the bound calculation
alpha (float) – Confidence level (default 0.05)

Returns:

(lower_bound, upper_bound) representing the bounds for CVaR

Return type:

tuple[float, float]

Raises:

ValueError – If alpha or eps are not between 0 and 1, or if y_samp is empty

POMDPPlanners.utils.statistics_utils.cvar_confidence_interval(data, alpha=0.95, delta=0.05, dist_lower_bound=None, dist_upper_bound=None)[source]

Calculate the confidence interval for the CVaR of a dataset using the t-distribution.

Parameters:

data – Array of values
alpha – Confidence level (default 0.95 for 95%)
delta – Significance level for the probabilistic bounds (default 0.05)
dist_lower_bound (Optional[float]) – Known lower bound of the distribution support. If None, uses min(data) as a conservative data-driven fallback.
dist_upper_bound (Optional[float]) – Known upper bound of the distribution support. If None, uses max(data) as a conservative data-driven fallback.

Returns:

(lower_bound, upper_bound) of the confidence interval

Return type:

tuple

Raises:

ValueError – If data contains NaN values or has insufficient samples

POMDPPlanners.utils.statistics_utils.cvar_estimator(vec, alpha)[source]

Calculate Conditional Value at Risk (CVaR) for risk-sensitive POMDP evaluation.

CVaR measures the expected value of the worst-case outcomes, providing a risk-sensitive performance metric that goes beyond simple mean rewards. This is particularly valuable for safety-critical applications where tail risk matters more than average performance.

Mathematical Definition:: CVaR_α(X) = E[X | X ≥ VaR_α(X)]

Where VaR_α is the Value at Risk at confidence level α.

The implementation uses a vectorized approach for computational efficiency, calculating CVaR by integrating over the tail distribution above the α-quantile.

Parameters:

vec (ndarray) – Array of values (typically returns, costs, or performance metrics)
alpha (float) – Confidence level (0 < α ≤ 1), where higher values focus on worse outcomes

Return type:

Returns:

CVaR value representing expected worst-case performance

Raises:

ValueError – If alpha not in [0,1] or vec is empty

Example

Risk analysis of POMDP algorithm performance:

>>> import numpy as np
>>> from POMDPPlanners.utils.statistics_utils import cvar_estimator

>>> # Simulate algorithm returns from multiple episodes
>>> returns = np.array([12.5, 8.3, 15.7, -2.1, 9.8, 13.2, 6.4, 11.0, -1.5, 14.3])
>>> len(returns)
10

>>> # Calculate risk metrics
>>> mean_return = np.mean(returns)
>>> bool(mean_return > 8.0)  # Check reasonable mean
True
>>> cvar_90 = cvar_estimator(returns, alpha=0.9)  # Worst 10% outcomes
>>> cvar_95 = cvar_estimator(returns, alpha=0.95) # Worst 5% outcomes
>>> isinstance(cvar_90, (float, np.floating))
True
>>> isinstance(cvar_95, (float, np.floating))
True
>>> cvar_95 <= cvar_90  # CVaR should be lower for higher alpha
True

Example

Comparing algorithm risk profiles:

>>> # Algorithm performance data from experiments
>>> pomcp_returns = np.array([10.2, 12.8, 9.5, 11.3, 8.7, 12.1, 10.9, 9.8, 11.5, 10.4])
>>> pft_returns = np.array([15.1, 7.2, 14.8, 13.3, 6.9, 15.5, 8.1, 14.2, 12.7, 9.3])
>>> pomcp_cvar = cvar_estimator(pomcp_returns, alpha=0.9)
>>> pft_cvar = cvar_estimator(pft_returns, alpha=0.9)

Risk Assessment Applications:

Portfolio Analysis: Compare multiple algorithms’ risk-return profiles

Safety-Critical Systems: Evaluate worst-case performance guarantees

Robust Planning: Select algorithms with acceptable tail risk

Performance Bounds: Establish confidence intervals for worst-case scenarios

Mathematical Properties:

Monotonic: CVaR_α ≥ VaR_α (CVaR is always at least as large as VaR)
Coherent: Satisfies subadditivity, monotonicity, positive homogeneity
Tail Sensitivity: Lower α values emphasize extreme outcomes more
Computational: More stable than VaR, especially for small samples

POMDPPlanners.utils.statistics_utils.cvar_estimator_from_dist(values, weights, alpha)[source]

Calculate the Conditional Value at Risk (CVaR) from a discrete distribution.

Parameters:

values (ndarray) – Array of values
weights (ndarray) – Array of corresponding weights/probabilities
alpha (float) – Confidence level (between 0 and 1)

Returns:

The CVaR value

Return type:

Raises:

ValueError – If alpha is not between 0 and 1, if arrays are empty, or if weights don’t sum to 1

POMDPPlanners.utils.statistics_utils.cvar_probabilistic_lower_bound_thomas(vec, alpha, delta, dist_lower_bound)[source]

Calculate a probabilistic lower bound for CVaR using Thomas’s method.

Parameters:

vec (ndarray) – Array of values
alpha (float) – Confidence level (between 0 and 1)
delta (float) – Probability of the bound holding (between 0 and 1)
dist_lower_bound (float) – Lower bound of the distribution

Returns:

The probabilistic lower bound for CVaR

Return type:

Raises:

ValueError – If alpha or delta are not between 0 and 1, or if vec is empty

POMDPPlanners.utils.statistics_utils.cvar_probabilistic_upper_bound_thomas(vec, alpha, delta, dist_upper_bound)[source]

Calculate a probabilistic upper bound for CVaR using Thomas’s method.

Parameters:

vec (ndarray) – Array of values
alpha (float) – Confidence level (between 0 and 1)
delta (float) – Probability of the bound holding (between 0 and 1)
dist_upper_bound (float) – Upper bound of the distribution

Returns:

The probabilistic upper bound for CVaR

Return type:

Raises:

ValueError – If alpha or delta are not between 0 and 1, or if vec is empty

POMDPPlanners.utils.statistics_utils.get_min_and_max_cost(min_immediate_cost, max_immediate_cost, depth, max_depth, gamma)[source]

Calculate the minimum and maximum costs over a time horizon using a discount factor.

Parameters:

min_immediate_cost (float) – Minimum immediate cost
max_immediate_cost (float) – Maximum immediate cost
depth (int) – Current depth in the search tree
max_depth (int) – Maximum depth of the search tree
gamma (float) – Discount factor (between 0 and 1)

Returns:

(min_cost, max_cost) representing the minimum and maximum costs over the time horizon

Return type:

tuple[float, float]

Raises:

ValueError – If gamma is not between 0 and 1

POMDPPlanners.utils.statistics_utils.quantile_confidence_interval(data, alpha=0.95, conf_level=0.95)[source]: data: 1D array-like of samples alpha: target quantile (e.g. 0.95 for 95% VaR) conf_level: overall coverage (e.g. 0.95 for 95% CI) Returns: (lower_value, upper_value, k1, k2)

POMDPPlanners.utils.statistics_utils.tv_distance(p, q, n_samples=1000, method='grid', **kwargs)[source]

Compute Total Variation distance between two distributions.

Total Variation distance measures how different two probability distributions are, with values ranging from 0 (identical) to 1 (completely different).

Parameters:

p (Distribution) – First distribution
q (Distribution) – Second distribution
n_samples (int) – Number of samples (method-dependent usage)
method (str) – Estimation method - “grid”, “monte_carlo”, “averaged”, “mixture”
**kwargs – Additional method-specific parameters: For “grid”: x_min, x_max, n_points For “averaged”: n_runs For “mixture”: (uses n_samples directly) For “monte_carlo”: (uses n_samples directly)

Return type:

Returns:

TV distance estimate

Example

>>> import numpy as np
>>> from POMDPPlanners.core.distributions import DiscreteDistribution
>>> np.random.seed(42)
>>> # Create two similar discrete distributions
>>> values = [0, 1, 2, 3, 4]
>>> p_probs = np.array([0.2, 0.3, 0.25, 0.15, 0.1])
>>> q_probs = np.array([0.25, 0.25, 0.25, 0.15, 0.1])
>>> p = DiscreteDistribution(values, p_probs)
>>> q = DiscreteDistribution(values, q_probs)
>>> tv = tv_distance(p, q, method="monte_carlo", n_samples=10000)
>>> isinstance(tv, float)
True
>>> 0.0 <= tv <= 1.0
True

POMDPPlanners.utils.statistics_utils.tv_distance_averaged(p, q, n_samples=1000, n_runs=10)[source]

Compute TV distance by averaging multiple independent estimates.

Reduces variance by sqrt(n_runs) compared to single run.

Parameters:

p (Distribution) – First distribution
q (Distribution) – Second distribution
n_samples (int) – Number of samples per run
n_runs (int) – Number of independent runs to average

Return type:

Returns:

Average TV distance estimate

POMDPPlanners.utils.statistics_utils.tv_distance_grid(p, q, x_min=-5.0, x_max=5.0, n_points=10000)[source]

Compute TV distance using grid-based numerical integration.

This method has zero sampling variance and is deterministic. Works well for continuous distributions with known support.

Parameters:

p (Distribution) – First distribution
q (Distribution) – Second distribution
x_min (float) – Lower bound of integration range
x_max (float) – Upper bound of integration range
n_points (int) – Number of grid points (higher = more accurate)

Return type:

Returns:

TV distance between p and q

POMDPPlanners.utils.statistics_utils.tv_distance_mixture_sampling(p, q, n_samples=2000)[source]

Compute TV distance using mixture sampling for better coverage.

Samples from mixture (p + q) / 2 to ensure good coverage of both distributions’ support.

Parameters:

p (Distribution) – First distribution
q (Distribution) – Second distribution
n_samples (int) – Number of samples from mixture

Return type:

Returns:

TV distance estimate

POMDPPlanners.utils.statistics_utils.tv_distance_monte_carlo(p, q, n_samples=1000)[source]

Compute TV distance using basic Monte Carlo sampling (original method).

This is the original implementation that samples randomly from both distributions. Has higher variance than other methods.

Parameters:

p (Distribution) – First distribution
q (Distribution) – Second distribution
n_samples (int) – Number of samples

Return type:

List[PolicyInfoVariable]

Returns:

TV distance estimate

POMDPPlanners.utils.tree_statistics module

class POMDPPlanners.utils.tree_statistics.TreeMetrics(*values)[source]

Bases: Enum

Metric names for MCTS tree statistics.

ACTIONS_VISIT_COUNT_ENTROPY = 'actions_visit_count_entropy'

IS_LEAF = 'is_leaf'

MAX_ACTIONS_VISIT_COUNT = 'max_actions_visit_count'

MIN_ACTIONS_VISIT_COUNT = 'min_actions_visit_count'

N_ACTIONS_FROM_ROOT = 'n_actions_from_root'

ROOT_VISIT_COUNT = 'root_visit_count'

TREE_MAX_DEPTH = 'tree_max_depth'

POMDPPlanners.utils.tree_statistics.compute_tree_metrics(tree)[source]

Compute comprehensive statistics for MCTS tree analysis and debugging.

Extracts key metrics from MCTS search trees to understand algorithm behavior, convergence properties, and search quality. These metrics are essential for algorithm debugging, parameter tuning, and performance analysis.

The function analyzes the root belief node’s action children to compute visitation statistics and exploration patterns that indicate search quality.

Parameters:

tree (BeliefNode) – Root belief node of the MCTS search tree

Returns:

min_actions_visit_count: Minimum visits to any action
max_actions_visit_count: Maximum visits to any action
actions_visit_count_entropy: Shannon entropy of action visit distribution
n_actions_from_root: Number of actions from the root node
root_visit_count: Number of visits to the root node

Return type:

Raises:

TypeError – If tree is not a BeliefNode instance

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> from POMDPPlanners.utils.tree_statistics import compute_tree_metrics
>>> from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>>
>>> # Create POMCP planner and run planning
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(
...     environment=env,
...     discount_factor=0.95,
...     depth=20,
...     exploration_constant=1.0,
...     name="POMCP_Analysis",
...     n_simulations=100
... )
>>>
>>> initial_belief = get_initial_belief(env, n_particles=200)
>>> action, run_data = planner.action(initial_belief)
>>>
>>> # Extract tree metrics from run data
>>> metrics = run_data.info_variables
>>> len(metrics) > 0
True
>>> import numbers
>>> isinstance(metrics[0].value, numbers.Number)
True

Metric Interpretation:

min_actions_visit_count: - Higher values indicate more balanced exploration - Very low values suggest some actions are barely explored - Zero indicates leaf node (no actions expanded)

max_actions_visit_count: - Shows most visited action’s exploration intensity - Compare with total simulations to assess concentration - Higher values indicate convergence to preferred action

actions_visit_count_entropy: - Measures uniformity of action exploration - Higher entropy = more uniform exploration - Lower entropy = more concentrated search - log₂(|A|) is maximum possible entropy for |A| actions

Visit Ratio (max/min): - 1.0 = perfectly uniform exploration - Higher values = more concentrated/converged search - Very high ratios may indicate premature convergence

Algorithm Debugging Applications:

Parameter Tuning: - Low entropy → increase exploration parameter - High min visits but low entropy → decrease simulations - Extreme visit ratios → adjust exploration/exploitation balance

Convergence Analysis: - Entropy decreasing over time indicates convergence - Stable visit ratios suggest algorithm has converged - Oscillating metrics may indicate unstable parameters

Comparative Analysis: - Compare entropy across algorithms to assess exploration quality - Use visit patterns to understand different search strategies - Identify algorithms with better exploration-exploitation tradeoffs

POMDPPlanners.utils.tree_statistics.get_v_values_sample(action_node)[source]

Return type:: ndarray
Parameters:: action_node (ActionNode)

POMDPPlanners.utils.weighted_particle_beliefs module

class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefContinuousLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05, reinvigoration_cov_matrix=array([[1., 0.], [0., 1.]]))[source]

WeightedParticleBeliefReinvigoration

Parameters:

particles (List[Any])
log_weights (ndarray)
ess_factor (float)
reinvigoration_fraction (float)
reinvigoration_cov_matrix (ndarray)

reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

Parameters:

action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)

class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDark(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]

WeightedParticleBeliefReinvigoration

Parameters:

particles (List[Any])
log_weights (ndarray)
resampling (bool)
ess_factor (float)
reinvigoration_fraction (float)

reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

Parameters:

action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)

class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05)[source]

WeightedParticleBeliefReinvigoration

Parameters:

particles (List[Any])
log_weights (ndarray)
ess_factor (float)
reinvigoration_fraction (float)

reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

Parameters:

action (Any)
observation (Any)
pomdp (Environment)
belief (WeightedParticleBeliefReinvigoration)

class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefSanityPOMDP(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]

Parameters:

particles (List[Any])
log_weights (ndarray)
resampling (bool)
ess_factor (float)
reinvigoration_fraction (float)

reinvigorate(action, observation, pomdp, belief)[source]

Reinvigorate particles by sampling from initial state distribution.

Return type:

Parameters:

action (Any)
observation (Any)
pomdp (Environment)
belief (Belief)

POMDPPlanners.utils.weighted_particle_beliefs.create_belief(environment, belief_config)[source]

Create a belief instance from a belief config.

Parameters:

environment (Environment) – The POMDP environment
belief_config (BeliefConfig) – BeliefConfig object for the belief

Return type:

Returns:

An instance of the specified belief class

POMDPPlanners.utils.weighted_particle_beliefs.get_initial_belief(environment, n_particles, resampling=True)[source]

Create initial belief from environment’s initial state distribution.

Return type: