POMDPPlanners.tests.test_utils package

Subpackages

POMDPPlanners.tests.test_utils.test_visualization package

Submodules

POMDPPlanners.tests.test_utils.confidence_interval_utils module

Utility functions for confidence interval testing in environment tests.

POMDPPlanners.tests.test_utils.confidence_interval_utils.verify_metrics_within_confidence_intervals(metrics, tolerance=0.0)[source]

Verify that metric values are within their confidence intervals.

Parameters:

metrics (List[MetricValue]) – List of MetricValue objects containing name, value, and confidence bounds
tolerance (float) – Additional tolerance to add to confidence bounds (default 0.0)

Raises:

AssertionError – If any metric value is not within its confidence interval

Return type:

None

Purpose: Validates that all metric values are within their confidence intervals

Given: A list of MetricValue objects with values and confidence bounds When: The verification function is called Then: All metric values are within their confidence intervals

Test type: unit

POMDPPlanners.tests.test_utils.history_builders module

Helper functions for building test History objects.

This module provides utility functions to reduce code duplication when constructing History objects in tests.

POMDPPlanners.tests.test_utils.history_builders.build_test_history(steps, reach_terminal=False, discount_factor=0.95, average_state_sampling_time=0.0, average_action_time=0.0, average_observation_time=0.0, average_belief_update_time=0.0, average_reward_time=0.0, actual_num_steps=None, policy_run_data=None)[source]

Build a History object with standard test parameters.

This helper function reduces code duplication when constructing History objects in tests by providing sensible defaults for timing parameters.

Parameters:

steps (List[StepData]) – List of StepData objects representing the episode history.
reach_terminal (bool) – Whether the terminal state was reached. Defaults to False.
discount_factor (float) – Discount factor for the history. Defaults to 0.95.
average_state_sampling_time (float) – Average time for state sampling. Defaults to 0.0.
average_action_time (float) – Average time for action selection. Defaults to 0.0.
average_observation_time (float) – Average time for observation generation. Defaults to 0.0.
average_belief_update_time (float) – Average time for belief updates. Defaults to 0.0.
average_reward_time (float) – Average time for reward calculation. Defaults to 0.0.
actual_num_steps (Optional[int]) – Actual number of steps in the history. If None, uses len(steps).
policy_run_data (Optional[List[PolicyRunData]]) – Policy run data for the history. If None, uses empty list.

Return type:

History

Returns:

History object with standard timing values and provided steps.

Example

>>> from POMDPPlanners.core.simulation import StepData
>>> from POMDPPlanners.core.belief import WeightedParticleBelief
>>> import numpy as np
>>> belief = WeightedParticleBelief(particles=[0, 1], log_weights=np.array([0.0, -0.1]))
>>> steps = [StepData(state=0, action=1, next_state=0, observation=2, reward=10.0, belief=belief)]
>>> history = build_test_history(steps, reach_terminal=True)
>>> history.reach_terminal_state
True
>>> history.actual_num_steps
1

POMDPPlanners.tests.test_utils.test_action_samplers module

POMDPPlanners.tests.test_utils.test_belief_factory module

POMDPPlanners.tests.test_utils.test_confidence_interval_utils module

POMDPPlanners.tests.test_utils.test_config_loader module

POMDPPlanners.tests.test_utils.test_config_to_id module

class POMDPPlanners.tests.test_utils.test_config_to_id.TestConfigToId[source]

Bases: object

Test cases for the config_to_id function.

test_deterministic_hashing()[source]

Test that config_to_id produces deterministic results across multiple calls.

Purpose: Validates that identical configurations always produce identical hash IDs

Given: The same configuration dictionary used multiple times When: config_to_id is called repeatedly on the same configuration Then: All calls produce identical hash IDs

Test type: unit

test_empty_dict_handling()[source]

Test config_to_id with empty dictionary.

Purpose: Validates that empty dictionaries produce consistent hash IDs

Given: Empty dictionary configuration When: config_to_id is called on the empty dictionary Then: Consistent hash ID is produced

Test type: unit

test_hash_collision_resistance()[source]

Test that similar configurations produce different hash IDs.

Purpose: Validates that small changes in configuration produce different hash IDs

Given: Multiple configurations with small differences When: config_to_id is called on each configuration Then: Each configuration produces a unique hash ID

Test type: unit

test_key_order_independence()[source]

Test that dictionary key order does not affect hash ID.

Purpose: Validates that dictionaries with same content but different key order produce same hash

Given: Two dictionaries with identical content but different key insertion order When: config_to_id is called on both dictionaries Then: Both produce identical hash IDs

Test type: unit

test_mixed_data_types()[source]

Test config_to_id with mixed data types including edge cases.

Purpose: Validates proper handling of configurations with diverse data types

Given: Configuration containing strings, numbers, lists, None, and boolean values When: config_to_id is called on the configuration Then: All types are properly serialized and consistent hash is produced

Test type: unit

test_nested_dict_handling()[source]

Test config_to_id with nested dictionary structures.

Purpose: Validates that nested dictionaries are properly handled in configuration hashing

Given: Configurations with nested dictionary structures When: config_to_id is called on the configurations Then: Nested structures are properly serialized and consistent hashes are produced

Test type: unit

test_numpy_array_handling()[source]

Test config_to_id with NumPy arrays in configuration.

Purpose: Validates that NumPy arrays are properly handled in configuration hashing

Given: Configurations containing various NumPy array types When: config_to_id is called on the configurations Then: Arrays are properly serialized and consistent hashes are produced

Test type: unit

test_numpy_scalar_handling()[source]

Test config_to_id with NumPy scalar types in configuration.

Purpose: Validates that NumPy scalars are properly handled in configuration hashing

Given: Configurations containing various NumPy scalar types When: config_to_id is called on the configurations Then: Scalars are properly serialized and consistent hashes are produced

Test type: unit

test_simple_dict_hashing()[source]

Test hashing of simple dictionary configurations.

Purpose: Validates that simple dictionaries produce consistent hash IDs

Given: Simple dictionaries with basic Python types When: config_to_id is called on the dictionaries Then: Consistent hexadecimal hash strings are returned

Test type: unit

class POMDPPlanners.tests.test_utils.test_config_to_id.TestConfigToIdIntegration[source]

Bases: object

Integration tests for config_to_id with real POMDP components.

test_cartpole_environment_config_id_consistency()[source]

Test that CartPole POMDP environments produce consistent config IDs.

Purpose: Validates that identical CartPole POMDP configurations produce identical config IDs

Given: Multiple CartPole POMDP instances with identical parameters When: config_to_id is called on their configuration dictionaries Then: All instances produce identical config IDs

Test type: integration

test_cartpole_environment_different_configs_produce_different_ids()[source]

Test that different CartPole configurations produce different config IDs.

Purpose: Validates that CartPole environments with different parameters produce different config IDs

Given: CartPole POMDP instances with different parameters When: config_to_id is called on their configuration dictionaries Then: Each configuration produces a unique config ID

Test type: integration

test_complex_pomdp_configuration_with_numpy_arrays()[source]

Test config_to_id with complex POMDP configuration containing NumPy arrays.

Purpose: Validates that complex POMDP configurations with nested NumPy arrays produce consistent hashes

Given: A complex configuration dictionary with CartPole environment, POMCP planner, and belief parameters When: config_to_id is called on the configuration Then: A consistent hash ID is produced that handles all NumPy arrays correctly

Test type: integration

test_config_to_id_order_invariance_with_pomdp_objects()[source]

Test that config_to_id is invariant to dictionary key order with POMDP objects.

Purpose: Validates that dictionary key order does not affect config_to_id with complex POMDP objects

Given: Two dictionaries with identical POMDP content but different key ordering When: config_to_id is called on both dictionaries Then: Both produce identical config IDs

Test type: integration

test_config_to_id_with_objects_having_config_id_attribute()[source]

Test config_to_id with objects that have config_id attributes (environments, beliefs, etc.).

Purpose: Validates that objects with config_id attributes are properly serialized using their config_id

Given: A configuration containing POMDP objects with config_id attributes When: config_to_id is called on the configuration Then: Objects are serialized using their config_id rather than their full state

Test type: integration

test_pomcp_planner_config_id_consistency()[source]

Test that POMCP planners produce consistent config IDs.

Purpose: Validates that identical POMCP configurations produce identical config IDs

Given: Multiple POMCP planner instances with identical parameters When: config_to_id is called on their configuration dictionaries Then: All instances produce identical config IDs

Test type: integration

test_weighted_particle_belief_config_id_consistency()[source]

Test that WeightedParticleBelief instances produce consistent config IDs.

Purpose: Validates that identical weighted particle beliefs produce identical config IDs

Given: Multiple WeightedParticleBelief instances with identical particles and weights When: config_id property is accessed Then: All instances produce identical config IDs

Test type: integration

test_weighted_particle_belief_different_configs_produce_different_ids()[source]

Test that different WeightedParticleBelief configurations produce different config IDs.

Purpose: Validates that weighted particle beliefs with different configurations produce different config IDs

Given: WeightedParticleBelief instances with different particles, weights, or parameters When: config_id property is accessed Then: Each configuration produces a unique config ID

Test type: integration

class POMDPPlanners.tests.test_utils.test_config_to_id.TestNumpyEncoder[source]

Bases: object

Test cases for the NumpyEncoder JSON serialization class.

test_config_id_object_encoding()[source]

Test encoding of objects with config_id attribute.

Purpose: Validates that objects with config_id are serialized with class metadata

Given: A mock object with config_id attribute and class information When: Object is serialized using the encoder Then: Object is converted to dict with class and config_id information

Test type: unit

test_getstate_exception_fallback()[source]

Test fallback to string when __getstate__ raises exception.

Purpose: Validates proper fallback when __getstate__ method fails

Given: A mock object with __getstate__ that raises exception When: Object is serialized using the encoder Then: Object is converted to string representation

Test type: unit

test_getstate_object_encoding()[source]

Test encoding of objects with __getstate__ method.

Purpose: Validates that objects with __getstate__ are serialized with their state

Given: A mock object with __getstate__ method When: Object is serialized using the encoder Then: Object is converted to dict with class and state information

Test type: unit

test_numpy_array_encoding()[source]

Test encoding of NumPy arrays to lists.

Purpose: Validates that NumPy arrays are properly converted to Python lists

Given: A NumpyEncoder instance and various NumPy arrays When: Arrays are serialized using the encoder Then: Arrays are converted to their list representations

Test type: unit

test_numpy_scalar_encoding()[source]

Test encoding of NumPy scalar types to Python primitives.

Purpose: Validates that NumPy scalars are converted to native Python types

Given: A NumpyEncoder instance and various NumPy scalar types When: Scalars are serialized using the encoder Then: Scalars are converted to equivalent Python primitives

Test type: unit

test_object_without_config_id_uses_getstate()[source]

Test that objects without config_id fall back to __getstate__ mechanism.

Purpose: Validates that objects without config_id attribute use __getstate__ fallback

Given: A mock object without config_id attribute but with __getstate__ method When: Object is serialized using the encoder Then: Encoder uses __getstate__ mechanism for serialization

Test type: unit

test_unsupported_type_fallback()[source]

Test behavior with object types that fall back to __getstate__.

Purpose: Validates that objects without custom handling use __getstate__ fallback

Given: An object type that doesn’t match NumPy or config_id patterns When: Object is serialized using the encoder Then: Object is handled by __getstate__ fallback mechanism

Test type: unit

POMDPPlanners.tests.test_utils.test_hyperparameter_tuning_and_eval module

POMDPPlanners.tests.test_utils.test_logger module

POMDPPlanners.tests.test_utils.test_logger.test_cleanup_task_logger_failure_flushes_logs(tmp_path)[source]

Test that cleanup_task_logger flushes buffered logs on failure.

Purpose: Validates that failed episodes with log_only_on_failure flush buffered logs

Given: Logger with buffered messages and log_only_on_failure=True When: cleanup_task_logger is called with episode_failed=True Then: Buffered logs are flushed and written to file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_cleanup_task_logger_success_discards_logs(tmp_path)[source]

Test that cleanup_task_logger discards buffered logs on success.

Purpose: Validates that successful episodes with log_only_on_failure discard buffered logs

Given: Logger with buffered messages and log_only_on_failure=True When: cleanup_task_logger is called with episode_failed=False Then: Buffered logs are discarded without writing to file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_buffering(tmp_path)[source]

Test that ConditionalMemoryHandler buffers logs in memory.

Purpose: Validates that ConditionalMemoryHandler holds logs in buffer without writing to disk

Given: A ConditionalMemoryHandler wrapping a FileHandler When: INFO level messages are logged Then: Messages are buffered in memory and not written to the file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_flush_on_error(tmp_path)[source]

Test that ConditionalMemoryHandler auto-flushes on ERROR level.

Purpose: Validates that ERROR level messages trigger automatic flush of buffered logs

Given: A ConditionalMemoryHandler with buffered INFO messages When: An ERROR level message is logged Then: All buffered messages (including ERROR) are flushed to file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_manual_flush(tmp_path)[source]

Test that ConditionalMemoryHandler.trigger_flush() works correctly.

Purpose: Validates that manual flush trigger writes buffered logs to file

Given: A ConditionalMemoryHandler with buffered INFO messages When: trigger_flush() is called manually Then: All buffered messages are flushed to file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_flush_buffered_task_logs_flushes_handlers(tmp_path)[source]

Test that flush_buffered_task_logs flushes buffered logs to file.

Purpose: Validates that flush function triggers flush of all buffered logs

Given: Logger with buffered messages (log_only_on_failure=True) When: flush_buffered_task_logs is called Then: Buffered logs are written to file

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_console_output_false(capsys)[source]

Test logger console output false.

Purpose: Validates that logger does not output messages to console when console_output=False

Given: Logger configured with debug=True and console_output=False When: Info message is logged Then: Message does not appear in captured stdout or stderr output

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_console_output_true(capsys)[source]

Test logger console output true.

Purpose: Validates that logger correctly outputs messages to console when console_output=True

Given: Logger configured with debug=True and console_output=True When: Info message is logged Then: Message appears in captured stdout or stderr output

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_file_output(tmp_path)[source]

Test logger file output.

Purpose: Validates that logger correctly writes messages to log files when output_dir is specified

Given: Logger configured with debug=True, output_dir=tmp_path, and console_output=False When: Info message is logged Then: Log file is created in logs subdirectory containing the test message

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_no_duplicate_handlers(tmp_path)[source]

Test logger no duplicate handlers.

Purpose: Validates that repeated get_logger calls do not create duplicate handlers causing message duplication

Given: Logger called twice with identical parameters (debug=True, output_dir=tmp_path, console_output=True) When: Messages are logged after each get_logger call Then: Each message appears exactly once in the log file without duplication

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_no_handlers_no_io(capsys, tmp_path)[source]

Test logger with no handlers performs no I/O.

Purpose: Validates that logger with console_output=False and output_dir=None has zero handlers: and performs no I/O operations

Given: Logger configured with console_output=False and output_dir=None When: Multiple messages are logged at various levels Then: Logger has zero handlers (or only NullHandlers which perform no I/O), no console output, and no files are created

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_no_multiple_files_on_repeated_calls(tmp_path)[source]

Test that repeated get_logger calls with same name don’t create multiple log files.

Purpose: Validates that repeated calls to get_logger with same name reuse existing logger: and don’t create new log files with different timestamps

Given: Multiple sequential calls to get_logger with the same logger name When: Messages are logged after each get_logger call Then: Only ONE log file is created, containing all messages from all calls

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_logger_reuses_handlers_with_same_name(tmp_path)[source]

Test that get_logger with same name returns logger with existing handlers.

Purpose: Validates that calling get_logger multiple times with the same name: returns the same logger instance with the same handlers

Given: Logger created with specific name and output directory When: get_logger is called again with the same name Then: Same logger instance is returned with same handler configuration

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_backwards_compatibility(tmp_path)[source]

Test queue logger backwards compatibility with existing code.

Purpose: Validates that queue logger maintains compatibility with existing get_logger calls

Given: Logger calls with and without use_queue parameter When: Both types of loggers are used Then: Both work correctly and produce expected log files

Test type: integration

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_basic_functionality(tmp_path)[source]

Test basic queue-based logger functionality.

Purpose: Validates that queue-based logger works correctly for basic logging operations

Given: Logger configured with use_queue=True, output_dir=tmp_path, console_output=False When: Info and warning messages are logged Then: Messages are written to log file via queue mechanism without blocking

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_cleanup()[source]

Test queue logger cleanup functionality.

Purpose: Validates that cleanup functions properly shut down queue logger resources

Given: Active queue logger with writer thread and handlers When: cleanup_all_loggers is called Then: All resources are properly cleaned up and system returns to clean state

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_diagnostics(tmp_path)[source]

Test queue logger diagnostics functionality.

Purpose: Validates that diagnostic functions provide accurate information about queue logger state

Given: Queue logger system with multiple active loggers When: Diagnostics are requested Then: Accurate information about queue size, handlers, and system state is returned

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_handler_management(tmp_path)[source]

Test queue logger handler pooling and management.

Purpose: Validates that queue logger properly manages file handlers with pooling

Given: Multiple queue loggers accessing the same task repeatedly When: Handlers are created and reused Then: Handler count is managed efficiently without resource leaks

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_individual_task_files(tmp_path)[source]

Test queue logger creates individual files per task.

Purpose: Validates that queue-based logger maintains individual log files for different tasks

Given: Multiple queue loggers with different task IDs When: Messages are logged from each task Then: Separate log files are created for each task with correct content

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_adds_memory_handlers(tmp_path)[source]

Test that setup_task_logger_with_buffering adds ConditionalMemoryHandler when enabled.

Purpose: Validates that log_only_on_failure=True wraps handlers with ConditionalMemoryHandler

Given: Logger configuration with log_only_on_failure=True When: setup_task_logger_with_buffering is called Then: Handlers are wrapped with ConditionalMemoryHandler for buffering

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_creates_logger(tmp_path)[source]

Test that setup_task_logger_with_buffering creates a properly configured logger.

Purpose: Validates that setup function creates logger with correct configuration

Given: Logger name and configuration parameters When: setup_task_logger_with_buffering is called with log_only_on_failure=False Then: Logger is created with file and console handlers as expected

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_reuses_existing(tmp_path)[source]

Test that setup_task_logger_with_buffering reuses existing configured logger.

Purpose: Validates that repeated calls with same logger name return the same logger

Given: Logger already created and configured When: setup_task_logger_with_buffering is called again with same name Then: Same logger instance is returned without duplicate handlers

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_task_logger_manager_state_tracking(tmp_path)[source]

Test that TaskLoggerManager correctly tracks logger state.

Purpose: Validates that TaskLoggerManager maintains accurate state of configured loggers

Given: TaskLoggerManager with multiple loggers created When: Loggers are created and configured Then: Manager correctly tracks which loggers are configured and their handlers

Test type: unit

POMDPPlanners.tests.test_utils.test_logger.test_task_logger_manager_thread_safety(tmp_path)[source]

Test that TaskLoggerManager is thread-safe for concurrent access.

Purpose: Validates that TaskLoggerManager can handle concurrent logger creation safely

Given: Multiple threads creating loggers simultaneously When: Threads call setup_task_logger_with_buffering concurrently Then: All loggers are created correctly without race conditions

Test type: unit

POMDPPlanners.tests.test_utils.test_memory_tracker module

POMDPPlanners.tests.test_utils.test_multivariate_normal module

POMDPPlanners.tests.test_utils.test_planner_episode_visualization module

POMDPPlanners.tests.test_utils.test_probability_utils module

Utilities for testing state transition probability methods.

This module provides general utility functions for validating that probability methods correctly match empirical sampling distributions across different POMDP environments.

POMDPPlanners.tests.test_utils.test_probability_utils.validate_continuous_observation_model_pdf_consistency(observation_model, num_samples=1000, seed=42)[source]

Validate continuous observation model by checking PDF consistency properties.

This function validates that a continuous observation model’s probability() method returns valid PDF values by checking: 1. All PDF values are non-negative 2. PDF values near the mean are higher than those far from the mean 3. The PDF computation is deterministic (same input gives same output)

This is appropriate for continuous distributions where JS divergence is not meaningful.

Parameters:

observation_model (ObservationModel) – Observation model to test
num_samples (int) – Number of samples to draw. Defaults to 1000.
seed (int) – Random seed for reproducibility. Defaults to 42.

Returns:

‘pdf_values_non_negative’: Whether all PDF values are >= 0
’pdf_deterministic’: Whether probability() is deterministic
’pdf_values’: Array of computed PDF values
’num_samples’: Number of samples used

Return type:

dict

Example

>>> from POMDPPlanners.environments.cartpole_pomdp import CartPoleObservation
>>> import numpy as np
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> true_state = np.array([0.1, 0.05, 0.02, -0.1])
>>> obs_model = CartPoleObservation(next_state=true_state, action=0, noise_cov=noise_cov)
>>> results = validate_continuous_observation_model_pdf_consistency(obs_model)
>>> results['pdf_values_non_negative']
True

POMDPPlanners.tests.test_utils.test_probability_utils.validate_continuous_observation_model_with_ks_test(observation_model, num_samples=1000, significance_level=0.05, seed=42)[source]

Validate continuous observation model using Kolmogorov-Smirnov test.

This function validates that a continuous observation model’s sample() method generates samples consistent with its probability() method by using the KS test. Unlike JS divergence which is inappropriate for continuous distributions, the KS test properly handles continuous data by comparing cumulative distribution functions.

The method works by: 1. Sampling observations from the observation model 2. Computing PDF values for each sample using probability() 3. Computing the CDF from the PDF values 4. Running KS test to compare empirical CDF against theoretical CDF 5. Verifying the p-value exceeds the significance level

Parameters:

observation_model (ObservationModel) – Observation model to test (should have continuous observations)
num_samples (int) – Number of samples to draw. Defaults to 1000.
significance_level (float) – Significance level for KS test (reject if p < this). Defaults to 0.05.
seed (int) – Random seed for reproducibility. Defaults to 42.

Returns:

‘ks_statistic’: KS test statistic
’p_value’: p-value from KS test
’passed’: Whether the test passed (p_value >= significance_level)
’num_samples’: Number of samples used
’significance_level’: Significance level used
’pdf_values_valid’: Whether all PDF values are non-negative

Return type:

dict

Raises:

AssertionError – If the KS test fails (p-value < significance level) or if PDF values are negative

Example

>>> from POMDPPlanners.environments.cartpole_pomdp import CartPoleObservation
>>> import numpy as np
>>> np.random.seed(42)
>>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1])
>>> true_state = np.array([0.1, 0.05, 0.02, -0.1])
>>> obs_model = CartPoleObservation(next_state=true_state, action=0, noise_cov=noise_cov)
>>> results = validate_continuous_observation_model_with_ks_test(obs_model, seed=42)
>>> print(f"KS p-value: {results['p_value']:.4f}")
KS p-value: 0.0000

POMDPPlanners.tests.test_utils.test_probability_utils.validate_distribution_probability_matches_empirical(distribution, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js')[source]

Test that computed probabilities match empirical sampling distribution.

This function validates that a distribution’s probability() method correctly computes probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.

The method works by: 1. Sampling many values (default 1000) from the distribution 2. Computing empirical PDF by counting value occurrences 3. Computing theoretical probabilities for all sampled values using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)

Parameters:

distribution (Distribution) – Distribution to test
num_samples (int) – Number of samples for building distributions. Defaults to 1000.
max_js_divergence (float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.
max_wasserstein_distance (float) – Maximum allowed Wasserstein distance. Defaults to 0.1.
normalization_tolerance (float) – Tolerance for probability normalization check. Defaults to 0.01.
seed (int) – Random seed for reproducibility. Defaults to 42.
distance_metric (str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.

Returns:

‘unique_values’: List of unique sampled values
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_values’: Number of unique values found

Return type:

dict

Raises:

AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly

Example

>>> import numpy as np
>>> from POMDPPlanners.core.distributions import DiscreteDistribution
>>> dist = DiscreteDistribution(["a", "b", "c"], np.array([0.5, 0.3, 0.2]))
>>> results = validate_distribution_probability_matches_empirical(dist, seed=42)
>>> print(f"JS Divergence: {results['distance']:.6f}")
JS Divergence: 0.002131

POMDPPlanners.tests.test_utils.test_probability_utils.validate_observation_probability_matches_empirical_distribution(observation_model, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js', check_normalization=True)[source]

Test that computed probabilities match empirical sampling distribution.

This function validates that an observation model’s probability() method correctly computes observation probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.

The method works by: 1. Sampling many observations (default 1000) from the observation model 2. Computing empirical PDF by counting observation occurrences 3. Computing theoretical probabilities for all sampled observations using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)

Parameters:

observation_model (ObservationModel) – Observation model to test
num_samples (int) – Number of samples for building distributions. Defaults to 1000.
max_js_divergence (float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.
max_wasserstein_distance (float) – Maximum allowed Wasserstein distance. Defaults to 0.1.
normalization_tolerance (float) – Tolerance for probability normalization check. Defaults to 0.01.
seed (int) – Random seed for reproducibility. Defaults to 42.
distance_metric (str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.
check_normalization (bool)

Returns:

‘unique_observations’: List of unique sampled observations
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_observations’: Number of unique observations found

Return type:

dict

Raises:

AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly

Example

>>> from POMDPPlanners.environments.tiger_pomdp import TigerObservation
>>> obs_model = TigerObservation(next_state="tiger_left", action="listen")
>>> results = validate_observation_probability_matches_empirical_distribution(obs_model, seed=42)
>>> print(f"JS Divergence: {results['distance']:.6f}")
JS Divergence: 0.002958

POMDPPlanners.tests.test_utils.test_probability_utils.validate_probability_matches_empirical_distribution(transition_model, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js')[source]

Test that computed probabilities match empirical sampling distribution.

This function validates that a state transition model’s probability() method correctly computes transition probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.

The method works by: 1. Sampling many states (default 1000) from the transition model 2. Computing empirical PDF by counting state occurrences 3. Computing theoretical probabilities for all sampled states using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)

Parameters:

transition_model (StateTransitionModel) – State transition model to test
num_samples (int) – Number of samples for building distributions. Defaults to 1000.
max_js_divergence (float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.
max_wasserstein_distance (float) – Maximum allowed Wasserstein distance. Defaults to 0.1.
normalization_tolerance (float) – Tolerance for probability normalization check. Defaults to 0.01.
seed (int) – Random seed for reproducibility. Defaults to 42.
distance_metric (str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.

Returns:

‘unique_states’: List of unique sampled states
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_states’: Number of unique states found

Return type:

dict

Raises:

AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly

Example

>>> from POMDPPlanners.environments.tiger_pomdp import TigerStateTransition
>>> transition = TigerStateTransition(state="tiger_left", action="listen")
>>> results = validate_probability_matches_empirical_distribution(transition, seed=42)
>>> print(f"JS Divergence: {results['distance']:.6f}")
JS Divergence: 0.000000

POMDPPlanners.tests.test_utils package

Subpackages

Submodules

POMDPPlanners.tests.test_utils.confidence_interval_utils module

POMDPPlanners.tests.test_utils.history_builders module

POMDPPlanners.tests.test_utils.test_action_samplers module

POMDPPlanners.tests.test_utils.test_belief_factory module

POMDPPlanners.tests.test_utils.test_confidence_interval_utils module

POMDPPlanners.tests.test_utils.test_config_loader module

POMDPPlanners.tests.test_utils.test_config_to_id module

POMDPPlanners.tests.test_utils.test_hyperparameter_tuning_and_eval module

POMDPPlanners.tests.test_utils.test_logger module

POMDPPlanners.tests.test_utils.test_memory_tracker module

POMDPPlanners.tests.test_utils.test_multivariate_normal module

POMDPPlanners.tests.test_utils.test_planner_episode_visualization module

POMDPPlanners.tests.test_utils.test_probability_utils module

POMDPPlanners.tests.test_utils.test_simulations_caching module

POMDPPlanners.tests.test_utils.test_statistics_utils module

POMDPPlanners.tests.test_utils.test_tree_statistics module

POMDPPlanners.tests.test_utils.test_weighted_particle_belief module