POMDPPlanners.utils package

Subpackages

Submodules

POMDPPlanners.utils.action_samplers module

class POMDPPlanners.utils.action_samplers.DiscreteActionSampler(actions=None)[source]

Bases: ActionSampler

Simple action sampler for discrete action spaces.

This class is designed to be fully serializable for use in parallel processing environments like joblib.

Parameters:

actions (List[Any] | None)

sample(belief_node=None)[source]

Sample a random action from the discrete action space.

Parameters:

belief_node (Optional[BeliefNode]) – Optional belief node (unused in this implementation)

Return type:

Any

Returns:

Randomly sampled action from the action space

class POMDPPlanners.utils.action_samplers.UnitCircleActionSampler(max_action_magnitude=1.0)[source]

Bases: ActionSampler

Action sampler for 2D continuous action spaces within a unit circle.

This sampler generates 2D action vectors uniformly distributed within a circle of specified maximum magnitude. It’s particularly useful for navigation and continuous control problems where actions represent velocities or forces constrained to a circular region.

The sampler uses polar coordinate generation to ensure uniform distribution within the circle, avoiding the clustering near the center that would occur with naive rectangular sampling.

Mathematical Foundation:
  • Angle θ ~ Uniform(0, 2π)

  • Radius r ~ √Uniform(0, 1) × max_magnitude

  • Action = [r·cos(θ), r·sin(θ)]

The square root transformation for radius ensures uniform area distribution within the circle rather than biasing toward the center.

Parameters:

max_action_magnitude (float) – Maximum magnitude of action vectors (circle radius)

sample(belief_node=None)[source]

Sample an action from a unit circle.

Parameters:

belief_node (Optional[BeliefNode]) – The current belief node (not used in this implementation)

Returns:

A 2D action vector within the unit circle

Return type:

ndarray

POMDPPlanners.utils.belief_factory module

Top-level factory for creating environment-specific belief states.

This module provides a unified entry-point for constructing ready-to-use Belief objects for any POMDP environment in the library. It dispatches to per-environment factories when a custom belief implementation exists (e.g., vectorized particle filters or Gaussian beliefs), and falls back to a generic WeightedParticleBelief otherwise.

Classes:

BeliefType: Enum of supported belief representations.

Functions:

create_environment_belief: Top-level factory returning a configured Belief.

class POMDPPlanners.utils.belief_factory.BeliefType(*values)[source]

Bases: Enum

Supported belief representations.

PARTICLE

Standard weighted particle belief.

VECTORIZED_PARTICLE

Vectorized weighted particle belief with batched NumPy updates.

GAUSSIAN

Single Gaussian (mean + covariance) belief.

GAUSSIAN_MIXTURE

Gaussian mixture model belief.

GAUSSIAN = 'gaussian'
GAUSSIAN_MIXTURE = 'gaussian_mixture'
PARTICLE = 'particle'
VECTORIZED_PARTICLE = 'vectorized_particle'
POMDPPlanners.utils.belief_factory.create_environment_belief(env, belief_type=None, n_particles=200, **kwargs)[source]

Create a ready-to-use belief for the given environment.

When belief_type is None the environment’s default belief type is used (typically VECTORIZED_PARTICLE for environments that have a custom updater, or PARTICLE otherwise).

For environments without a registered per-environment factory the function falls back to a generic get_initial_belief() producing a WeightedParticleBelief.

Parameters:
  • env (Environment) – POMDP environment instance.

  • belief_type (BeliefType | None) – Desired belief representation. None selects the environment default.

  • n_particles (int) – Number of particles (used by PARTICLE and VECTORIZED_PARTICLE types). Defaults to 200.

  • **kwargs (Any) – Forwarded to per-environment factories (e.g. updater_type for Gaussian light-dark beliefs).

Return type:

Belief

Returns:

A configured Belief object.

Raises:

ValueError – If belief_type is not supported by the environment.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=0.95)
>>> belief = create_environment_belief(env, n_particles=100)
>>> belief.sample() in ["tiger_left", "tiger_right"]
True

POMDPPlanners.utils.config_loader module

POMDPPlanners.utils.config_loader.load_config(config_path)[source]

Load and parse YAML configuration files for POMDP experiments.

This utility function provides a standardized way to load experimental configurations from YAML files. It’s commonly used to configure environments, planners, and simulation parameters for reproducible experiments.

The function uses PyYAML’s safe_load to prevent execution of arbitrary Python code, making it secure for loading untrusted configuration files.

Parameters:

config_path (str) – Path to the YAML configuration file (absolute or relative)

Return type:

Dict[str, Any]

Returns:

Dictionary containing the parsed configuration parameters

Raises:
  • FileNotFoundError – If the configuration file doesn’t exist

  • yaml.YAMLError – If the file contains invalid YAML syntax

Example

Loading experiment configuration:

>>> from POMDPPlanners.utils.config_loader import load_config
>>> import tempfile
>>> import os
>>> # Create a temporary config file for testing
>>> config_content = '''
... environment:
...   name: "TigerPOMDP"
...   discount_factor: 0.95
... planners:
...   - name: "POMCP"
...     n_simulations: 1000
... simulation:
...   episodes_per_run: 100
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(config_content)
...     temp_config_path = f.name
>>> # Load experiment configuration
>>> config = load_config(temp_config_path)
>>> config['environment']['name']
'TigerPOMDP'
>>> config['environment']['discount_factor']
0.95
>>> config['planners'][0]['name']
'POMCP'
>>> config['simulation']['episodes_per_run']
100
>>> # Clean up
>>> os.unlink(temp_config_path)

Example

Using with environment configuration:

>>> # Create a temporary config file with environment and planners
>>> config_content = '''
... environment:
...   name: "TigerPOMDP"
...   discount_factor: 0.95
...   observation_accuracy: 0.85
... planners:
...   - name: "POMCP"
...     n_simulations: 100
...     depth: 5
...     exploration_constant: 1.0
... simulation:
...   episodes_per_run: 10
...   num_runs: 2
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(config_content)
...     temp_config_path = f.name
>>> config = load_config(temp_config_path)
>>> # Create environment from config
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> env = TigerPOMDP(discount_factor=config['environment']['discount_factor'])
>>> env.discount_factor
0.95
>>> # Access planner configuration
>>> planner_config = config['planners'][0]
>>> planner_config['name']
'POMCP'
>>> planner_config['n_simulations']
100
>>> # Clean up
>>> os.unlink(temp_config_path)

Example

Handling configuration hierarchies and defaults:

>>> # Create a nested configuration file
>>> nested_config_content = '''
... defaults:
...   simulation:
...     episodes: 50
...     particles: 100
... experiments:
...   quick_test:
...     environment: "SanityPOMDP"
...     planners: ["POMCP"]
...   full_study:
...     environment: "TigerPOMDP"
...     planners: ["POMCP", "PFT_DPW"]
...     simulation:
...       episodes: 200
...       particles: 500
... '''
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
...     _ = f.write(nested_config_content)
...     temp_config_path = f.name
>>> config = load_config(temp_config_path)
>>> # Access different experiment configurations
>>> defaults = config['defaults']
>>> quick_config = config['experiments']['quick_test']
>>> full_config = config['experiments']['full_study']
>>> # Verify configuration structure
>>> defaults['simulation']['episodes']
50
>>> quick_config['environment']
'SanityPOMDP'
>>> full_config['simulation']['episodes']
200
>>> # Simple config merging test
>>> merged = {**defaults['simulation'], **full_config.get('simulation', {})}
>>> merged['episodes']
200
>>> merged['particles']
500
>>> # Clean up
>>> os.unlink(temp_config_path)
Configuration Best Practices:

File Organization: - Use descriptive filenames (e.g., pomcp_tiger_baseline.yaml) - Organize configs by environment or study type - Include version information in complex configurations

Parameter Naming: - Use consistent naming conventions across configurations - Group related parameters under sections - Include comments explaining non-obvious parameters

Default Handling: - Define sensible defaults for optional parameters - Use inheritance or merging for parameter variants - Validate required parameters after loading

Security Considerations:
  • Uses yaml.safe_load() to prevent code execution

  • Suitable for loading user-provided configuration files

  • Automatically handles standard YAML data types safely

  • Does not support custom Python object instantiation

POMDPPlanners.utils.config_to_id module

class POMDPPlanners.utils.config_to_id.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

Custom JSON encoder for handling NumPy arrays and other NumPy types

default(o)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
POMDPPlanners.utils.config_to_id.config_to_id(config_dict)[source]

Generate a unique ID from a configuration dictionary using hashing. Handles NumPy arrays and other NumPy types.

Parameters:

config_dict (dict) – The configuration dictionary to hash

Returns:

A unique hash string representing the configuration

Return type:

str

POMDPPlanners.utils.distributed_computing module

POMDPPlanners.utils.distributed_computing.run_distributed(func, kwargs_list, num_cpus=1, num_gpus=0, description='Running distributed tasks', unit='task', address=None, namespace='POMDPPlanners', runtime_env=None)[source]

Run a function in parallel across multiple machines using Ray.

Parameters:
  • func (Callable) – The function to run in parallel

  • kwargs_list (List[Dict[str, Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function call

  • num_cpus (int) – Number of CPUs to allocate per task

  • num_gpus (int) – Number of GPUs to allocate per task

  • description (str) – Description for the progress bar

  • unit (str) – Unit label for the progress bar

  • address (Optional[str]) – Ray cluster address to connect to (if None, starts a local cluster)

  • namespace (str) – Ray namespace for the tasks

  • runtime_env (Optional[dict]) – Runtime environment configuration for Ray

Return type:

List[Any]

Returns:

List of results from each function call

POMDPPlanners.utils.distributed_computing.run_parallel_locally(func, kwargs_list, n_jobs=1, description='Running parallel tasks', unit='task', cache_dir=None)[source]

Run a function in parallel with different keyword argument sets using joblib.

Parameters:
  • func (Callable) – The function to run in parallel

  • kwargs_list (List[Dict[str, Any]]) – List of keyword argument dictionaries, where each dict contains the kwargs for one function call

  • n_jobs (int) – Number of parallel jobs to use

  • description (str) – Description for the progress bar

  • unit (str) – Unit label for the progress bar

  • cache_dir (Optional[str]) – Directory to store cached results. If None, caching is disabled.

Return type:

List[Any]

Returns:

List of results from each function call

POMDPPlanners.utils.hyperparameter_tuning_and_eval module

POMDPPlanners.utils.logger module

class POMDPPlanners.utils.logger.ConditionalMemoryHandler(capacity, target)[source]

Bases: MemoryHandler

Memory handler that buffers logs and only flushes on failure or explicit request.

This handler is designed for failure-only logging scenarios where you want to capture all logs during execution but only write them to disk/console when a failure occurs. This dramatically reduces I/O overhead for successful operations.

Parameters:
should_flush

Flag to manually trigger flushing of buffered logs

shouldFlush(record)[source]

Determine if buffered logs should be flushed.

Flushes occur when: - Manual trigger via trigger_flush() - ERROR or CRITICAL level message - Buffer capacity reached

Parameters:

record (LogRecord) – The log record being processed

Return type:

bool

Returns:

True if logs should be flushed, False otherwise

trigger_flush()[source]

Manually trigger flush of all buffered logs to target handler.

class POMDPPlanners.utils.logger.QueueLoggerManager(max_handlers=100, cleanup_interval=60, handler_timeout=300)[source]

Bases: object

Centralized queue-based logger manager with individual task log files.

This manager solves the “too many open files” problem in heavy multiprocessing scenarios by using a single writer thread that handles all file I/O operations. Worker processes only interact with a memory queue, eliminating file descriptor leaks while maintaining individual log files per task.

Key benefits: - Workers never open file handles directly (prevents FD leaks) - Single writer thread manages individual file handlers per task - Smart handler pooling and cleanup - Maintains individual log files as before - Scales to thousands of concurrent processes

Parameters:
  • max_handlers (int)

  • cleanup_interval (int)

  • handler_timeout (int)

get_queue_logger(task_id, cache_dir=None, debug=False, console_output=True)[source]

Get a logger that writes to the centralized queue with individual task file.

Parameters:
  • task_id (str) – Unique task identifier for individual log file

  • cache_dir (Optional[Path]) – Directory for log files

  • debug (bool) – Enable debug mode with verbose logging

  • console_output (bool) – Enable/disable console output

Return type:

Logger

Returns:

Configured logger instance that writes to the queue

start()[source]

Start the background logging thread.

stop()[source]

Stop the background logging thread and cleanup resources.

class POMDPPlanners.utils.logger.TaskLoggerManager[source]

Bases: object

Manages task logger configuration and buffered handlers without polluting logger objects.

This manager maintains a registry of configured loggers and their associated memory handlers, eliminating the need to set dynamic attributes on logger objects.

_configured_loggers

Dictionary mapping logger names to their configuration state

_memory_handlers

Dictionary mapping logger names to their ConditionalMemoryHandler list

_lock

Thread lock for safe concurrent access

cleanup_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]

Clean up task logger resources with buffering awareness.

Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes

Parameters:
  • logger_name (str) – Name of the logger to clean up

  • episode_failed (bool) – Whether the episode failed

  • log_only_on_failure (bool) – Whether buffering is enabled for this logger

Return type:

None

flush_buffered_logs(logger_name)[source]

Flush buffered logs to file/console when failure occurs.

Parameters:

logger_name (str) – Name of the logger to flush

Return type:

None

get_or_create_logger(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]

Get or create a task logger with optional buffering.

Parameters:
  • logger_name (str) – Unique name for the logger

  • output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)

  • debug (bool) – Enable debug mode with verbose logging

  • console_output (bool) – Enable/disable console output

  • use_queue (bool) – Enable queue-based logging for multiprocessing

  • log_only_on_failure (bool) – Buffer logs in memory and only flush on failure

Return type:

Logger

Returns:

Configured logger instance ready for task execution

POMDPPlanners.utils.logger.cleanup_all_loggers()[source]

Emergency cleanup of all logging resources.

This function should be called when shutting down the application to ensure all logging resources are properly cleaned up.

POMDPPlanners.utils.logger.cleanup_task_logger(logger_name, episode_failed=False, log_only_on_failure=False)[source]

Clean up task logger resources with buffering awareness.

Handles cleanup for both buffered and non-buffered loggers: - For buffered loggers: Flushes on failure, discards on success - For non-buffered loggers: Always flushes

Parameters:
  • logger_name (str) – Name of the logger to clean up

  • episode_failed (bool) – Whether the episode failed

  • log_only_on_failure (bool) – Whether buffering is enabled for this logger

Return type:

None

POMDPPlanners.utils.logger.flush_buffered_task_logs(logger_name)[source]

Flush buffered logs to file/console when failure occurs.

This function triggers the flush of all buffered log records for loggers configured with ConditionalMemoryHandler (when log_only_on_failure is enabled).

Parameters:

logger_name (str) – Name of the logger to flush

Return type:

None

POMDPPlanners.utils.logger.get_logger(name, level=20, output_dir=None, debug=False, console_output=True, use_queue=False)[source]

Get a configured logger for POMDP experiments and algorithm execution.

This utility creates standardized loggers for tracking experimental progress, algorithm execution, and debugging information. It supports both console and file logging with configurable levels and formatting.

Parameters:
  • name (str) – Logger identifier (typically module name or algorithm name)

  • level (int) – Base logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

  • output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)

  • debug (bool) – Enable debug mode with verbose logging and detailed formatting

  • console_output (bool) – Enable/disable console output (useful for batch experiments)

  • use_queue (bool) – Enable queue-based logging for heavy multiprocessing scenarios

Return type:

Logger

Returns:

Configured logger instance ready for use

Note

Queue-based logging is recommended for heavy multiprocessing workloads to prevent “too many open files” errors. It uses a single writer thread to handle all file I/O operations while workers only interact with a memory queue, maintaining individual log files per task.

Example

Basic console logging:

>>> import logging
>>> logger = get_logger("POMCP_Tiger", level=logging.INFO)
>>> logger.info("Starting POMCP planning")

Example

File logging with output directory:

>>> from pathlib import Path
>>> logger = get_logger(
...     name="Experiment",
...     output_dir=Path("/tmp/test_logs"),
...     console_output=False
... )
>>> logger.info("Test message")

Example

Debug mode with detailed logging:

>>> debug_logger = get_logger(
...     name="Debug_Test",
...     debug=True,
...     console_output=False
... )
>>> debug_logger.debug("Debug info")
>>> debug_logger.info("Regular info")
Logging Best Practices:
  • DEBUG: Detailed algorithm internals, variable values

  • INFO: Algorithm progress, major milestones, results

  • WARNING: Suboptimal conditions, parameter issues

  • ERROR: Algorithm failures, invalid inputs

  • CRITICAL: System failures, experiment termination

Note

Log files are automatically organized with timestamps in output_dir/logs/. The logger is thread-safe for use with distributed computing frameworks.

POMDPPlanners.utils.logger.get_queue_logger_diagnostics()[source]

Get diagnostic information about the queue-based logging system.

Return type:

Dict[str, Any]

Returns:

Dictionary containing diagnostic information including queue size, handler count, writer thread status, and resource usage.

POMDPPlanners.utils.logger.get_queue_logger_manager()[source]

Get the global queue logger manager instance.

Return type:

QueueLoggerManager

POMDPPlanners.utils.logger.get_task_logger_manager()[source]

Get the global task logger manager instance.

Return type:

TaskLoggerManager

POMDPPlanners.utils.logger.reset_logger_state()[source]

Reset the global logger state for testing.

This function ensures clean state between test runs by: - Stopping any running queue manager - Clearing the global singleton - Resetting the task logger manager - Removing all Python loggers created by this module

POMDPPlanners.utils.logger.setup_task_logger_with_buffering(logger_name, output_dir, debug, console_output, use_queue, log_only_on_failure)[source]

Set up a task logger with optional buffering for failure-only logging.

This is a convenience function that delegates to TaskLoggerManager to avoid setting dynamic attributes on logger objects.

Parameters:
  • logger_name (str) – Unique name for the logger

  • output_dir (Optional[Path]) – Directory for log files (creates logs/ subdirectory if provided)

  • debug (bool) – Enable debug mode with verbose logging

  • console_output (bool) – Enable/disable console output

  • use_queue (bool) – Enable queue-based logging for multiprocessing

  • log_only_on_failure (bool) – Buffer logs in memory and only flush on failure

Return type:

Logger

Returns:

Configured logger instance ready for task execution

POMDPPlanners.utils.memory_tracker module

Memory tracking utilities for POMDP simulations.

This module provides memory tracking capabilities for monitoring memory usage during simulation execution. It includes both lightweight monitoring for production use and detailed profiling for development and debugging.

Key Features: - Conditional profiling (only when enabled) - Multiple tracking modes (lightweight, detailed, sampling-based) - Memory leak detection with configurable thresholds - Integration with existing logging infrastructure - Support for tracemalloc, psutil, and memory_profiler

class POMDPPlanners.utils.memory_tracker.MemoryTracker(logger=None, enable_tracking=None, tracking_mode='lightweight', sample_rate=1.0, interval_seconds=5.0, leak_threshold_mb=100.0)[source]

Bases: object

Memory tracking utility for simulator operations.

Provides configurable memory monitoring with minimal overhead when disabled. Supports multiple tracking modes and integrates with existing logging.

Parameters:
  • logger (Logger | None)

  • enable_tracking (bool | None)

  • tracking_mode (str)

  • sample_rate (float)

  • interval_seconds (float)

  • leak_threshold_mb (float)

checkpoint(label, force_gc=True)[source]

Record memory checkpoint.

Parameters:
  • label (str) – Label for this checkpoint.

  • force_gc (bool) – Whether to force garbage collection before measurement.

Return type:

Dict[str, Any]

Returns:

Dictionary with memory information.

cleanup()[source]

Clean up tracking resources.

Return type:

None

compare_snapshots(label1, label2, limit=10)[source]

Compare tracemalloc snapshots between two checkpoints.

Parameters:
  • label1 (str) – Label of first checkpoint.

  • label2 (str) – Label of second checkpoint.

  • limit (int) – Maximum number of differences to show.

Return type:

None

detect_memory_leak(threshold_mb=None)[source]

Detect potential memory leaks.

Parameters:

threshold_mb (Optional[float]) – Threshold for leak detection. Uses instance default if None.

Return type:

bool

Returns:

True if potential leak detected.

get_memory_growth()[source]

Get total memory growth from first to last checkpoint.

Return type:

float

get_peak_usage()[source]

Get peak memory usage in MB.

Return type:

float

log_summary()[source]

Log memory usage summary.

Return type:

None

save_report(filepath)[source]

Save memory tracking report to file.

Parameters:

filepath (Path) – Path to save the report.

Return type:

None

POMDPPlanners.utils.memory_tracker.memory_monitor(threshold_mb=1000.0, logger=None, enable_tracking=True)[source]

Context manager for memory monitoring.

Parameters:
  • threshold_mb (float) – Alert threshold for memory usage.

  • logger (Optional[Logger]) – Logger instance for output.

  • enable_tracking (bool) – Whether to enable tracking.

Yields:

MemoryTracker instance.

POMDPPlanners.utils.memory_tracker.profile_memory(func)[source]

Decorator for memory profiling functions.

Only profiles if memory_profiler is available and tracking is enabled.

POMDPPlanners.utils.multivariate_normal module

Module for efficient multivariate normal distribution with pre-computed Cholesky decomposition.

This module provides a multivariate normal distribution implementation optimized for scenarios where the covariance matrix is fixed but the mean varies (e.g., POMDP observation models). The expensive Cholesky decomposition O(n^3) is computed once during initialization, making subsequent sampling and PDF evaluations O(n^2).

Classes:

CovarianceParameterizedMultivariateNormal: Efficient multivariate normal with pre-computed Cholesky

class POMDPPlanners.utils.multivariate_normal.CovarianceParameterizedMultivariateNormal(covariance, validate=True)[source]

Bases: object

Multivariate normal distribution with pre-computed Cholesky decomposition.

This class provides efficient sampling and PDF computation for multivariate normal distributions where the covariance matrix is fixed but the mean can vary. The Cholesky decomposition is computed once during initialization, enabling O(n^2) operations for sampling and PDF evaluation.

This class does NOT inherit from the Distribution base class because it has a different interface - the mean is passed to methods rather than the constructor.

Parameters:
covariance

The covariance matrix (read-only property).

dim

Dimensionality of the distribution.

Example

>>> import numpy as np
>>> np.random.seed(42)
>>>
>>> # Create distribution with fixed covariance
>>> cov = np.array([[1.0, 0.5], [0.5, 2.0]])
>>> mvn = CovarianceParameterizedMultivariateNormal(cov)
>>>
>>> # Sample with different means
>>> mean1 = np.array([0.0, 0.0])
>>> samples1 = mvn.sample(mean1, n_samples=3)
>>> samples1.shape
(3, 2)
>>>
>>> # Compute PDF
>>> values = np.array([[0.0, 0.0], [1.0, 1.0]])
>>> pdf_values = mvn.pdf(values, mean1)
>>> pdf_values.shape
(2,)
property covariance: ndarray

Return a copy of the covariance matrix.

property dim: int

Return the dimensionality of the distribution.

log_pdf(values, mean)[source]

Compute the log probability density function.

More numerically stable than pdf() for small probability values.

Parameters:
  • values (ndarray) – Array of shape (n, dim) or (dim,) containing points to evaluate.

  • mean (ndarray) – Mean vector of shape (dim,).

Return type:

ndarray

Returns:

Array of shape (n,) containing log PDF values at each point.

Raises:

ValueError – If dimensions don’t match.

pdf(values, mean)[source]

Compute the probability density function.

Uses the pre-computed Cholesky decomposition for efficient computation: y = solve_triangular(L, (x - mean).T) mahalanobis_sq = sum(y^2) pdf = exp(log_normalization - 0.5 * mahalanobis_sq)

Parameters:
  • values (ndarray) – Array of shape (n, dim) or (dim,) containing points to evaluate.

  • mean (ndarray) – Mean vector of shape (dim,).

Return type:

ndarray

Returns:

Array of shape (n,) containing PDF values at each point.

Raises:

ValueError – If dimensions don’t match.

sample(mean, n_samples=1)[source]

Sample from the multivariate normal distribution.

Generates samples using the transformation: x = mean + L @ z, where z ~ N(0, I) and L is the Cholesky factor of the covariance.

Parameters:
  • mean (ndarray) – Mean vector of shape (dim,).

  • n_samples (int) – Number of samples to generate. Defaults to 1.

Return type:

ndarray

Returns:

Array of shape (n_samples, dim) containing the samples.

Raises:

ValueError – If mean dimension doesn’t match covariance dimension.

POMDPPlanners.utils.planner_episode_visualization module

POMDPPlanners.utils.simulations_caching module

POMDPPlanners.utils.simulations_caching.cache_episode_simulation_results(environment, policy, initial_belief, results, cache_dir_path, general_config={})[source]
Return type:

None

Parameters:
POMDPPlanners.utils.simulations_caching.get_cache_dir_path(cache_dir_path)[source]
Return type:

Path

Parameters:

cache_dir_path (Path)

POMDPPlanners.utils.simulations_caching.get_cache_key(environment, policy, initial_belief, general_config={})[source]
Return type:

str

Parameters:
POMDPPlanners.utils.simulations_caching.load_episode_simulation_results(environment, policy, initial_belief, cache_dir_path, general_config={})[source]
Return type:

List[History]

Parameters:

POMDPPlanners.utils.statistics_utils module

POMDPPlanners.utils.statistics_utils.aggregate_weights_for_duplicate_values(values, weights)[source]

Aggregate weights for duplicate values to ensure unique values.

When the same value appears multiple times in the values array with different weights, this function combines them into a single entry with the sum of all weights for that value. This is useful for discrete distributions where duplicate values should be treated as a single outcome with aggregated probability.

Parameters:
  • values (ndarray) – Array of values (may contain duplicates)

  • weights (ndarray) – Array of corresponding weights/probabilities

Returns:

  • unique_values: Array of unique values (sorted)

  • aggregated_weights: Array of weights corresponding to unique values, normalized to sum to 1

Return type:

Tuple[ndarray, ndarray]

Raises:

ValueError – If arrays are empty or have mismatched lengths

Example

>>> import numpy as np
>>> from POMDPPlanners.utils.statistics_utils import aggregate_weights_for_duplicate_values
>>> values = np.array([1.0, 2.0, 2.0, 3.0])
>>> weights = np.array([0.3, 0.2, 0.3, 0.2])
>>> unique_vals, agg_weights = aggregate_weights_for_duplicate_values(values, weights)
>>> unique_vals
array([1., 2., 3.])
>>> bool(np.isclose(agg_weights, np.array([0.3, 0.5, 0.2])).all())
True
>>> bool(np.isclose(np.sum(agg_weights), 1.0))
True
POMDPPlanners.utils.statistics_utils.confidence_interval(data, confidence=0.95)[source]

Calculate confidence interval for the mean using t-distribution.

Computes confidence intervals for algorithm performance means, providing statistical bounds on the true expected performance. This is essential for making statistically sound comparisons between POMDP algorithms.

Uses the t-distribution to account for sample size uncertainty, which is more appropriate than normal distribution for small to moderate sample sizes common in POMDP experiments.

Parameters:
  • data (array-like) – Sample data (algorithm returns, rewards, or performance metrics)

  • confidence (float) – Confidence level (0 < confidence < 1, typically 0.95)

Returns:

Tuple of (lower_bound, upper_bound) for the confidence interval

Raises:

ValueError – If insufficient data or contains NaN values

Return type:

Tuple[float, float]

Example

Statistical comparison of algorithm performance:

>>> import numpy as np
>>> # Algorithm performance from multiple runs
>>> pomcp_rewards = [12.3, 11.8, 13.1, 12.7, 11.9, 12.5, 13.0, 12.1, 12.8, 12.4]
>>> pft_rewards = [11.5, 13.2, 12.8, 11.9, 12.3, 13.5, 12.1, 12.9, 11.7, 12.6]
>>>
>>> # Calculate 95% confidence intervals
>>> pomcp_ci = confidence_interval(pomcp_rewards, confidence=0.95)
>>> pft_ci = confidence_interval(pft_rewards, confidence=0.95)
>>>
>>> # Verify confidence intervals are tuples with two elements
>>> isinstance(pomcp_ci, tuple) and len(pomcp_ci) == 2
True
>>> isinstance(pft_ci, tuple) and len(pft_ci) == 2
True
>>>
>>> # Verify confidence intervals contain the mean
>>> pomcp_mean = np.mean(pomcp_rewards)
>>> pft_mean = np.mean(pft_rewards)
>>> bool(pomcp_ci[0] <= pomcp_mean <= pomcp_ci[1])
True
>>> bool(pft_ci[0] <= pft_mean <= pft_ci[1])
True
>>>
>>> # Verify lower bound is less than upper bound
>>> pomcp_ci[0] < pomcp_ci[1]
True
>>> pft_ci[0] < pft_ci[1]
True

Calculate the confidence interval for the mean of a dataset using the t-distribution.

Parameters:
  • data (array-like) – Sample data

  • confidence (float) – Confidence level (default 0.95 for 95%)

Returns:

(lower_bound, upper_bound) of the confidence interval

Return type:

Tuple[float, float]

Raises:

ValueError – If data contains NaN values or has insufficient samples

POMDPPlanners.utils.statistics_utils.cvar_bound_const_eps(y_samp, y_sup, y_inf, eps, alpha=0.05)[source]

Calculate bounds for CVaR using a constant epsilon parameter.

Parameters:
  • y_samp (ndarray) – Array of sample values

  • y_sup (float) – Upper bound of the distribution

  • y_inf (float) – Lower bound of the distribution

  • eps (float) – Epsilon parameter for the bound calculation

  • alpha (float) – Confidence level (default 0.05)

Returns:

(lower_bound, upper_bound) representing the bounds for CVaR

Return type:

tuple[float, float]

Raises:

ValueError – If alpha or eps are not between 0 and 1, or if y_samp is empty

POMDPPlanners.utils.statistics_utils.cvar_confidence_interval(data, alpha=0.95, delta=0.05, dist_lower_bound=None, dist_upper_bound=None)[source]

Calculate the confidence interval for the CVaR of a dataset using the t-distribution.

Parameters:
  • data – Array of values

  • alpha – Confidence level (default 0.95 for 95%)

  • delta – Significance level for the probabilistic bounds (default 0.05)

  • dist_lower_bound (Optional[float]) – Known lower bound of the distribution support. If None, uses min(data) as a conservative data-driven fallback.

  • dist_upper_bound (Optional[float]) – Known upper bound of the distribution support. If None, uses max(data) as a conservative data-driven fallback.

Returns:

(lower_bound, upper_bound) of the confidence interval

Return type:

tuple

Raises:

ValueError – If data contains NaN values or has insufficient samples

POMDPPlanners.utils.statistics_utils.cvar_estimator(vec, alpha)[source]

Calculate Conditional Value at Risk (CVaR) for risk-sensitive POMDP evaluation.

CVaR measures the expected value of the worst-case outcomes, providing a risk-sensitive performance metric that goes beyond simple mean rewards. This is particularly valuable for safety-critical applications where tail risk matters more than average performance.

Mathematical Definition:

CVaR_α(X) = E[X | X ≥ VaR_α(X)]

Where VaR_α is the Value at Risk at confidence level α.

The implementation uses a vectorized approach for computational efficiency, calculating CVaR by integrating over the tail distribution above the α-quantile.

Parameters:
  • vec (ndarray) – Array of values (typically returns, costs, or performance metrics)

  • alpha (float) – Confidence level (0 < α ≤ 1), where higher values focus on worse outcomes

Return type:

float

Returns:

CVaR value representing expected worst-case performance

Raises:

ValueError – If alpha not in [0,1] or vec is empty

Example

Risk analysis of POMDP algorithm performance:

>>> import numpy as np
>>> from POMDPPlanners.utils.statistics_utils import cvar_estimator
>>> # Simulate algorithm returns from multiple episodes
>>> returns = np.array([12.5, 8.3, 15.7, -2.1, 9.8, 13.2, 6.4, 11.0, -1.5, 14.3])
>>> len(returns)
10
>>> # Calculate risk metrics
>>> mean_return = np.mean(returns)
>>> bool(mean_return > 8.0)  # Check reasonable mean
True
>>> cvar_90 = cvar_estimator(returns, alpha=0.9)  # Worst 10% outcomes
>>> cvar_95 = cvar_estimator(returns, alpha=0.95) # Worst 5% outcomes
>>> isinstance(cvar_90, (float, np.floating))
True
>>> isinstance(cvar_95, (float, np.floating))
True
>>> cvar_95 <= cvar_90  # CVaR should be lower for higher alpha
True

Example

Comparing algorithm risk profiles:

>>> # Algorithm performance data from experiments
>>> pomcp_returns = np.array([10.2, 12.8, 9.5, 11.3, 8.7, 12.1, 10.9, 9.8, 11.5, 10.4])
>>> pft_returns = np.array([15.1, 7.2, 14.8, 13.3, 6.9, 15.5, 8.1, 14.2, 12.7, 9.3])
>>> pomcp_cvar = cvar_estimator(pomcp_returns, alpha=0.9)
>>> pft_cvar = cvar_estimator(pft_returns, alpha=0.9)
Risk Assessment Applications:

Portfolio Analysis: Compare multiple algorithms’ risk-return profiles

Safety-Critical Systems: Evaluate worst-case performance guarantees

Robust Planning: Select algorithms with acceptable tail risk

Performance Bounds: Establish confidence intervals for worst-case scenarios

Mathematical Properties:
  • Monotonic: CVaR_α ≥ VaR_α (CVaR is always at least as large as VaR)

  • Coherent: Satisfies subadditivity, monotonicity, positive homogeneity

  • Tail Sensitivity: Lower α values emphasize extreme outcomes more

  • Computational: More stable than VaR, especially for small samples

POMDPPlanners.utils.statistics_utils.cvar_estimator_from_dist(values, weights, alpha)[source]

Calculate the Conditional Value at Risk (CVaR) from a discrete distribution.

Parameters:
  • values (ndarray) – Array of values

  • weights (ndarray) – Array of corresponding weights/probabilities

  • alpha (float) – Confidence level (between 0 and 1)

Returns:

The CVaR value

Return type:

float

Raises:

ValueError – If alpha is not between 0 and 1, if arrays are empty, or if weights don’t sum to 1

POMDPPlanners.utils.statistics_utils.cvar_probabilistic_lower_bound_thomas(vec, alpha, delta, dist_lower_bound)[source]

Calculate a probabilistic lower bound for CVaR using Thomas’s method.

Parameters:
  • vec (ndarray) – Array of values

  • alpha (float) – Confidence level (between 0 and 1)

  • delta (float) – Probability of the bound holding (between 0 and 1)

  • dist_lower_bound (float) – Lower bound of the distribution

Returns:

The probabilistic lower bound for CVaR

Return type:

float

Raises:

ValueError – If alpha or delta are not between 0 and 1, or if vec is empty

POMDPPlanners.utils.statistics_utils.cvar_probabilistic_upper_bound_thomas(vec, alpha, delta, dist_upper_bound)[source]

Calculate a probabilistic upper bound for CVaR using Thomas’s method.

Parameters:
  • vec (ndarray) – Array of values

  • alpha (float) – Confidence level (between 0 and 1)

  • delta (float) – Probability of the bound holding (between 0 and 1)

  • dist_upper_bound (float) – Upper bound of the distribution

Returns:

The probabilistic upper bound for CVaR

Return type:

float

Raises:

ValueError – If alpha or delta are not between 0 and 1, or if vec is empty

POMDPPlanners.utils.statistics_utils.get_min_and_max_cost(min_immediate_cost, max_immediate_cost, depth, max_depth, gamma)[source]

Calculate the minimum and maximum costs over a time horizon using a discount factor.

Parameters:
  • min_immediate_cost (float) – Minimum immediate cost

  • max_immediate_cost (float) – Maximum immediate cost

  • depth (int) – Current depth in the search tree

  • max_depth (int) – Maximum depth of the search tree

  • gamma (float) – Discount factor (between 0 and 1)

Returns:

(min_cost, max_cost) representing the minimum and maximum costs over the time horizon

Return type:

tuple[float, float]

Raises:

ValueError – If gamma is not between 0 and 1

POMDPPlanners.utils.statistics_utils.quantile_confidence_interval(data, alpha=0.95, conf_level=0.95)[source]

data: 1D array-like of samples alpha: target quantile (e.g. 0.95 for 95% VaR) conf_level: overall coverage (e.g. 0.95 for 95% CI) Returns: (lower_value, upper_value, k1, k2)

POMDPPlanners.utils.statistics_utils.tv_distance(p, q, n_samples=1000, method='grid', **kwargs)[source]

Compute Total Variation distance between two distributions.

Total Variation distance measures how different two probability distributions are, with values ranging from 0 (identical) to 1 (completely different).

Parameters:
  • p (Distribution) – First distribution

  • q (Distribution) – Second distribution

  • n_samples (int) – Number of samples (method-dependent usage)

  • method (str) – Estimation method - “grid”, “monte_carlo”, “averaged”, “mixture”

  • **kwargs – Additional method-specific parameters: For “grid”: x_min, x_max, n_points For “averaged”: n_runs For “mixture”: (uses n_samples directly) For “monte_carlo”: (uses n_samples directly)

Return type:

float

Returns:

TV distance estimate

Example

>>> import numpy as np
>>> from POMDPPlanners.core.distributions import DiscreteDistribution
>>> np.random.seed(42)
>>> # Create two similar discrete distributions
>>> values = [0, 1, 2, 3, 4]
>>> p_probs = np.array([0.2, 0.3, 0.25, 0.15, 0.1])
>>> q_probs = np.array([0.25, 0.25, 0.25, 0.15, 0.1])
>>> p = DiscreteDistribution(values, p_probs)
>>> q = DiscreteDistribution(values, q_probs)
>>> tv = tv_distance(p, q, method="monte_carlo", n_samples=10000)
>>> isinstance(tv, float)
True
>>> 0.0 <= tv <= 1.0
True
POMDPPlanners.utils.statistics_utils.tv_distance_averaged(p, q, n_samples=1000, n_runs=10)[source]

Compute TV distance by averaging multiple independent estimates.

Reduces variance by sqrt(n_runs) compared to single run.

Parameters:
  • p (Distribution) – First distribution

  • q (Distribution) – Second distribution

  • n_samples (int) – Number of samples per run

  • n_runs (int) – Number of independent runs to average

Return type:

float

Returns:

Average TV distance estimate

POMDPPlanners.utils.statistics_utils.tv_distance_grid(p, q, x_min=-5.0, x_max=5.0, n_points=10000)[source]

Compute TV distance using grid-based numerical integration.

This method has zero sampling variance and is deterministic. Works well for continuous distributions with known support.

Parameters:
  • p (Distribution) – First distribution

  • q (Distribution) – Second distribution

  • x_min (float) – Lower bound of integration range

  • x_max (float) – Upper bound of integration range

  • n_points (int) – Number of grid points (higher = more accurate)

Return type:

float

Returns:

TV distance between p and q

POMDPPlanners.utils.statistics_utils.tv_distance_mixture_sampling(p, q, n_samples=2000)[source]

Compute TV distance using mixture sampling for better coverage.

Samples from mixture (p + q) / 2 to ensure good coverage of both distributions’ support.

Parameters:
Return type:

float

Returns:

TV distance estimate

POMDPPlanners.utils.statistics_utils.tv_distance_monte_carlo(p, q, n_samples=1000)[source]

Compute TV distance using basic Monte Carlo sampling (original method).

This is the original implementation that samples randomly from both distributions. Has higher variance than other methods.

Parameters:
Return type:

float

Returns:

TV distance estimate

POMDPPlanners.utils.tree_statistics module

class POMDPPlanners.utils.tree_statistics.TreeMetrics(*values)[source]

Bases: Enum

Metric names for MCTS tree statistics.

ACTIONS_VISIT_COUNT_ENTROPY = 'actions_visit_count_entropy'
IS_LEAF = 'is_leaf'
MAX_ACTIONS_VISIT_COUNT = 'max_actions_visit_count'
MIN_ACTIONS_VISIT_COUNT = 'min_actions_visit_count'
N_ACTIONS_FROM_ROOT = 'n_actions_from_root'
ROOT_VISIT_COUNT = 'root_visit_count'
TREE_MAX_DEPTH = 'tree_max_depth'
POMDPPlanners.utils.tree_statistics.compute_tree_metrics(tree)[source]

Compute comprehensive statistics for MCTS tree analysis and debugging.

Extracts key metrics from MCTS search trees to understand algorithm behavior, convergence properties, and search quality. These metrics are essential for algorithm debugging, parameter tuning, and performance analysis.

The function analyzes the root belief node’s action children to compute visitation statistics and exploration patterns that indicate search quality.

Parameters:

tree (BeliefNode) – Root belief node of the MCTS search tree

Returns:

  • min_actions_visit_count: Minimum visits to any action

  • max_actions_visit_count: Maximum visits to any action

  • actions_visit_count_entropy: Shannon entropy of action visit distribution

  • n_actions_from_root: Number of actions from the root node

  • root_visit_count: Number of visits to the root node

Return type:

List[PolicyInfoVariable]

Raises:

TypeError – If tree is not a BeliefNode instance

Example

>>> import numpy as np
>>> np.random.seed(42)  # For reproducible results
>>>
>>> from POMDPPlanners.utils.tree_statistics import compute_tree_metrics
>>> from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
>>> from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP
>>> from POMDPPlanners.core.belief import get_initial_belief
>>>
>>> # Create POMCP planner and run planning
>>> env = TigerPOMDP(discount_factor=0.95)
>>> planner = POMCP(
...     environment=env,
...     discount_factor=0.95,
...     depth=20,
...     exploration_constant=1.0,
...     name="POMCP_Analysis",
...     n_simulations=100
... )
>>>
>>> initial_belief = get_initial_belief(env, n_particles=200)
>>> action, run_data = planner.action(initial_belief)
>>>
>>> # Extract tree metrics from run data
>>> metrics = run_data.info_variables
>>> len(metrics) > 0
True
>>> import numbers
>>> isinstance(metrics[0].value, numbers.Number)
True
Metric Interpretation:

min_actions_visit_count: - Higher values indicate more balanced exploration - Very low values suggest some actions are barely explored - Zero indicates leaf node (no actions expanded)

max_actions_visit_count: - Shows most visited action’s exploration intensity - Compare with total simulations to assess concentration - Higher values indicate convergence to preferred action

actions_visit_count_entropy: - Measures uniformity of action exploration - Higher entropy = more uniform exploration - Lower entropy = more concentrated search - log₂(|A|) is maximum possible entropy for |A| actions

Visit Ratio (max/min): - 1.0 = perfectly uniform exploration - Higher values = more concentrated/converged search - Very high ratios may indicate premature convergence

Algorithm Debugging Applications:

Parameter Tuning: - Low entropy → increase exploration parameter - High min visits but low entropy → decrease simulations - Extreme visit ratios → adjust exploration/exploitation balance

Convergence Analysis: - Entropy decreasing over time indicates convergence - Stable visit ratios suggest algorithm has converged - Oscillating metrics may indicate unstable parameters

Comparative Analysis: - Compare entropy across algorithms to assess exploration quality - Use visit patterns to understand different search strategies - Identify algorithms with better exploration-exploitation tradeoffs

POMDPPlanners.utils.tree_statistics.get_v_values_sample(action_node)[source]
Return type:

ndarray

Parameters:

action_node (ActionNode)

POMDPPlanners.utils.weighted_particle_beliefs module

class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefContinuousLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05, reinvigoration_cov_matrix=array([[1., 0.], [0., 1.]]))[source]

Bases: WeightedParticleBeliefReinvigoration

Parameters:
reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

WeightedParticleBeliefReinvigoration

Parameters:
class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDark(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]

Bases: WeightedParticleBeliefReinvigoration

Parameters:
reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

WeightedParticleBeliefReinvigoration

Parameters:
class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefDiscreteLightDarkFullCoverage(particles, log_weights, ess_factor=0.5, reinvigoration_fraction=0.05)[source]

Bases: WeightedParticleBeliefReinvigoration

Parameters:
reinvigorate(action, observation, pomdp, belief)[source]

Implement reinvigoration for specific POMDP environment.

Return type:

WeightedParticleBeliefReinvigoration

Parameters:
class POMDPPlanners.utils.weighted_particle_beliefs.WeightedParticleBeliefSanityPOMDP(particles, log_weights, resampling=False, ess_factor=0.5, reinvigoration_fraction=0.2)[source]

Bases: WeightedParticleBeliefReinvigoration

Parameters:
reinvigorate(action, observation, pomdp, belief)[source]

Reinvigorate particles by sampling from initial state distribution.

Return type:

Belief

Parameters:
POMDPPlanners.utils.weighted_particle_beliefs.create_belief(environment, belief_config)[source]

Create a belief instance from a belief config.

Parameters:
  • environment (Environment) – The POMDP environment

  • belief_config (BeliefConfig) – BeliefConfig object for the belief

Return type:

Belief

Returns:

An instance of the specified belief class

POMDPPlanners.utils.weighted_particle_beliefs.get_initial_belief(environment, n_particles, resampling=True)[source]

Create initial belief from environment’s initial state distribution.

Return type:

Belief

Parameters: