POMDPPlanners.tests.test_utils package
Subpackages
- POMDPPlanners.tests.test_utils.test_visualization package
- Submodules
- POMDPPlanners.tests.test_utils.test_visualization.test_metrics_plots module
- POMDPPlanners.tests.test_utils.test_visualization.test_policy_simulation_plots module
- POMDPPlanners.tests.test_utils.test_visualization.test_returns_plots module
- POMDPPlanners.tests.test_utils.test_visualization.test_tree_plots module
Submodules
POMDPPlanners.tests.test_utils.confidence_interval_utils module
Utility functions for confidence interval testing in environment tests.
- POMDPPlanners.tests.test_utils.confidence_interval_utils.verify_metrics_within_confidence_intervals(metrics, tolerance=0.0)[source]
Verify that metric values are within their confidence intervals.
- Parameters:
metrics (
List[MetricValue]) – List of MetricValue objects containing name, value, and confidence boundstolerance (
float) – Additional tolerance to add to confidence bounds (default 0.0)
- Raises:
AssertionError – If any metric value is not within its confidence interval
- Return type:
Purpose: Validates that all metric values are within their confidence intervals
Given: A list of MetricValue objects with values and confidence bounds When: The verification function is called Then: All metric values are within their confidence intervals
Test type: unit
POMDPPlanners.tests.test_utils.history_builders module
Helper functions for building test History objects.
This module provides utility functions to reduce code duplication when constructing History objects in tests.
- POMDPPlanners.tests.test_utils.history_builders.build_test_history(steps, reach_terminal=False, discount_factor=0.95, average_state_sampling_time=0.0, average_action_time=0.0, average_observation_time=0.0, average_belief_update_time=0.0, average_reward_time=0.0, actual_num_steps=None, policy_run_data=None)[source]
Build a History object with standard test parameters.
This helper function reduces code duplication when constructing History objects in tests by providing sensible defaults for timing parameters.
- Parameters:
steps (
List[StepData]) – List of StepData objects representing the episode history.reach_terminal (
bool) – Whether the terminal state was reached. Defaults to False.discount_factor (
float) – Discount factor for the history. Defaults to 0.95.average_state_sampling_time (
float) – Average time for state sampling. Defaults to 0.0.average_action_time (
float) – Average time for action selection. Defaults to 0.0.average_observation_time (
float) – Average time for observation generation. Defaults to 0.0.average_belief_update_time (
float) – Average time for belief updates. Defaults to 0.0.average_reward_time (
float) – Average time for reward calculation. Defaults to 0.0.actual_num_steps (
Optional[int]) – Actual number of steps in the history. If None, uses len(steps).policy_run_data (
Optional[List[PolicyRunData]]) – Policy run data for the history. If None, uses empty list.
- Return type:
- Returns:
History object with standard timing values and provided steps.
Example
>>> from POMDPPlanners.core.simulation import StepData >>> from POMDPPlanners.core.belief import WeightedParticleBelief >>> import numpy as np >>> belief = WeightedParticleBelief(particles=[0, 1], log_weights=np.array([0.0, -0.1])) >>> steps = [StepData(state=0, action=1, next_state=0, observation=2, reward=10.0, belief=belief)] >>> history = build_test_history(steps, reach_terminal=True) >>> history.reach_terminal_state True >>> history.actual_num_steps 1
POMDPPlanners.tests.test_utils.test_action_samplers module
POMDPPlanners.tests.test_utils.test_belief_factory module
POMDPPlanners.tests.test_utils.test_confidence_interval_utils module
POMDPPlanners.tests.test_utils.test_config_loader module
POMDPPlanners.tests.test_utils.test_config_to_id module
- class POMDPPlanners.tests.test_utils.test_config_to_id.TestConfigToId[source]
Bases:
objectTest cases for the config_to_id function.
- test_deterministic_hashing()[source]
Test that config_to_id produces deterministic results across multiple calls.
Purpose: Validates that identical configurations always produce identical hash IDs
Given: The same configuration dictionary used multiple times When: config_to_id is called repeatedly on the same configuration Then: All calls produce identical hash IDs
Test type: unit
- test_empty_dict_handling()[source]
Test config_to_id with empty dictionary.
Purpose: Validates that empty dictionaries produce consistent hash IDs
Given: Empty dictionary configuration When: config_to_id is called on the empty dictionary Then: Consistent hash ID is produced
Test type: unit
- test_hash_collision_resistance()[source]
Test that similar configurations produce different hash IDs.
Purpose: Validates that small changes in configuration produce different hash IDs
Given: Multiple configurations with small differences When: config_to_id is called on each configuration Then: Each configuration produces a unique hash ID
Test type: unit
- test_key_order_independence()[source]
Test that dictionary key order does not affect hash ID.
Purpose: Validates that dictionaries with same content but different key order produce same hash
Given: Two dictionaries with identical content but different key insertion order When: config_to_id is called on both dictionaries Then: Both produce identical hash IDs
Test type: unit
- test_mixed_data_types()[source]
Test config_to_id with mixed data types including edge cases.
Purpose: Validates proper handling of configurations with diverse data types
Given: Configuration containing strings, numbers, lists, None, and boolean values When: config_to_id is called on the configuration Then: All types are properly serialized and consistent hash is produced
Test type: unit
- test_nested_dict_handling()[source]
Test config_to_id with nested dictionary structures.
Purpose: Validates that nested dictionaries are properly handled in configuration hashing
Given: Configurations with nested dictionary structures When: config_to_id is called on the configurations Then: Nested structures are properly serialized and consistent hashes are produced
Test type: unit
- test_numpy_array_handling()[source]
Test config_to_id with NumPy arrays in configuration.
Purpose: Validates that NumPy arrays are properly handled in configuration hashing
Given: Configurations containing various NumPy array types When: config_to_id is called on the configurations Then: Arrays are properly serialized and consistent hashes are produced
Test type: unit
- test_numpy_scalar_handling()[source]
Test config_to_id with NumPy scalar types in configuration.
Purpose: Validates that NumPy scalars are properly handled in configuration hashing
Given: Configurations containing various NumPy scalar types When: config_to_id is called on the configurations Then: Scalars are properly serialized and consistent hashes are produced
Test type: unit
- test_simple_dict_hashing()[source]
Test hashing of simple dictionary configurations.
Purpose: Validates that simple dictionaries produce consistent hash IDs
Given: Simple dictionaries with basic Python types When: config_to_id is called on the dictionaries Then: Consistent hexadecimal hash strings are returned
Test type: unit
- class POMDPPlanners.tests.test_utils.test_config_to_id.TestConfigToIdIntegration[source]
Bases:
objectIntegration tests for config_to_id with real POMDP components.
- test_cartpole_environment_config_id_consistency()[source]
Test that CartPole POMDP environments produce consistent config IDs.
Purpose: Validates that identical CartPole POMDP configurations produce identical config IDs
Given: Multiple CartPole POMDP instances with identical parameters When: config_to_id is called on their configuration dictionaries Then: All instances produce identical config IDs
Test type: integration
- test_cartpole_environment_different_configs_produce_different_ids()[source]
Test that different CartPole configurations produce different config IDs.
Purpose: Validates that CartPole environments with different parameters produce different config IDs
Given: CartPole POMDP instances with different parameters When: config_to_id is called on their configuration dictionaries Then: Each configuration produces a unique config ID
Test type: integration
- test_complex_pomdp_configuration_with_numpy_arrays()[source]
Test config_to_id with complex POMDP configuration containing NumPy arrays.
Purpose: Validates that complex POMDP configurations with nested NumPy arrays produce consistent hashes
Given: A complex configuration dictionary with CartPole environment, POMCP planner, and belief parameters When: config_to_id is called on the configuration Then: A consistent hash ID is produced that handles all NumPy arrays correctly
Test type: integration
- test_config_to_id_order_invariance_with_pomdp_objects()[source]
Test that config_to_id is invariant to dictionary key order with POMDP objects.
Purpose: Validates that dictionary key order does not affect config_to_id with complex POMDP objects
Given: Two dictionaries with identical POMDP content but different key ordering When: config_to_id is called on both dictionaries Then: Both produce identical config IDs
Test type: integration
- test_config_to_id_with_objects_having_config_id_attribute()[source]
Test config_to_id with objects that have config_id attributes (environments, beliefs, etc.).
Purpose: Validates that objects with config_id attributes are properly serialized using their config_id
Given: A configuration containing POMDP objects with config_id attributes When: config_to_id is called on the configuration Then: Objects are serialized using their config_id rather than their full state
Test type: integration
- test_pomcp_planner_config_id_consistency()[source]
Test that POMCP planners produce consistent config IDs.
Purpose: Validates that identical POMCP configurations produce identical config IDs
Given: Multiple POMCP planner instances with identical parameters When: config_to_id is called on their configuration dictionaries Then: All instances produce identical config IDs
Test type: integration
- test_weighted_particle_belief_config_id_consistency()[source]
Test that WeightedParticleBelief instances produce consistent config IDs.
Purpose: Validates that identical weighted particle beliefs produce identical config IDs
Given: Multiple WeightedParticleBelief instances with identical particles and weights When: config_id property is accessed Then: All instances produce identical config IDs
Test type: integration
- test_weighted_particle_belief_different_configs_produce_different_ids()[source]
Test that different WeightedParticleBelief configurations produce different config IDs.
Purpose: Validates that weighted particle beliefs with different configurations produce different config IDs
Given: WeightedParticleBelief instances with different particles, weights, or parameters When: config_id property is accessed Then: Each configuration produces a unique config ID
Test type: integration
- class POMDPPlanners.tests.test_utils.test_config_to_id.TestNumpyEncoder[source]
Bases:
objectTest cases for the NumpyEncoder JSON serialization class.
- test_config_id_object_encoding()[source]
Test encoding of objects with config_id attribute.
Purpose: Validates that objects with config_id are serialized with class metadata
Given: A mock object with config_id attribute and class information When: Object is serialized using the encoder Then: Object is converted to dict with class and config_id information
Test type: unit
- test_getstate_exception_fallback()[source]
Test fallback to string when __getstate__ raises exception.
Purpose: Validates proper fallback when __getstate__ method fails
Given: A mock object with __getstate__ that raises exception When: Object is serialized using the encoder Then: Object is converted to string representation
Test type: unit
- test_getstate_object_encoding()[source]
Test encoding of objects with __getstate__ method.
Purpose: Validates that objects with __getstate__ are serialized with their state
Given: A mock object with __getstate__ method When: Object is serialized using the encoder Then: Object is converted to dict with class and state information
Test type: unit
- test_numpy_array_encoding()[source]
Test encoding of NumPy arrays to lists.
Purpose: Validates that NumPy arrays are properly converted to Python lists
Given: A NumpyEncoder instance and various NumPy arrays When: Arrays are serialized using the encoder Then: Arrays are converted to their list representations
Test type: unit
- test_numpy_scalar_encoding()[source]
Test encoding of NumPy scalar types to Python primitives.
Purpose: Validates that NumPy scalars are converted to native Python types
Given: A NumpyEncoder instance and various NumPy scalar types When: Scalars are serialized using the encoder Then: Scalars are converted to equivalent Python primitives
Test type: unit
- test_object_without_config_id_uses_getstate()[source]
Test that objects without config_id fall back to __getstate__ mechanism.
Purpose: Validates that objects without config_id attribute use __getstate__ fallback
Given: A mock object without config_id attribute but with __getstate__ method When: Object is serialized using the encoder Then: Encoder uses __getstate__ mechanism for serialization
Test type: unit
- test_unsupported_type_fallback()[source]
Test behavior with object types that fall back to __getstate__.
Purpose: Validates that objects without custom handling use __getstate__ fallback
Given: An object type that doesn’t match NumPy or config_id patterns When: Object is serialized using the encoder Then: Object is handled by __getstate__ fallback mechanism
Test type: unit
POMDPPlanners.tests.test_utils.test_hyperparameter_tuning_and_eval module
POMDPPlanners.tests.test_utils.test_logger module
- POMDPPlanners.tests.test_utils.test_logger.test_cleanup_task_logger_failure_flushes_logs(tmp_path)[source]
Test that cleanup_task_logger flushes buffered logs on failure.
Purpose: Validates that failed episodes with log_only_on_failure flush buffered logs
Given: Logger with buffered messages and log_only_on_failure=True When: cleanup_task_logger is called with episode_failed=True Then: Buffered logs are flushed and written to file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_cleanup_task_logger_success_discards_logs(tmp_path)[source]
Test that cleanup_task_logger discards buffered logs on success.
Purpose: Validates that successful episodes with log_only_on_failure discard buffered logs
Given: Logger with buffered messages and log_only_on_failure=True When: cleanup_task_logger is called with episode_failed=False Then: Buffered logs are discarded without writing to file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_buffering(tmp_path)[source]
Test that ConditionalMemoryHandler buffers logs in memory.
Purpose: Validates that ConditionalMemoryHandler holds logs in buffer without writing to disk
Given: A ConditionalMemoryHandler wrapping a FileHandler When: INFO level messages are logged Then: Messages are buffered in memory and not written to the file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_flush_on_error(tmp_path)[source]
Test that ConditionalMemoryHandler auto-flushes on ERROR level.
Purpose: Validates that ERROR level messages trigger automatic flush of buffered logs
Given: A ConditionalMemoryHandler with buffered INFO messages When: An ERROR level message is logged Then: All buffered messages (including ERROR) are flushed to file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_conditional_memory_handler_manual_flush(tmp_path)[source]
Test that ConditionalMemoryHandler.trigger_flush() works correctly.
Purpose: Validates that manual flush trigger writes buffered logs to file
Given: A ConditionalMemoryHandler with buffered INFO messages When: trigger_flush() is called manually Then: All buffered messages are flushed to file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_flush_buffered_task_logs_flushes_handlers(tmp_path)[source]
Test that flush_buffered_task_logs flushes buffered logs to file.
Purpose: Validates that flush function triggers flush of all buffered logs
Given: Logger with buffered messages (log_only_on_failure=True) When: flush_buffered_task_logs is called Then: Buffered logs are written to file
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_console_output_false(capsys)[source]
Test logger console output false.
Purpose: Validates that logger does not output messages to console when console_output=False
Given: Logger configured with debug=True and console_output=False When: Info message is logged Then: Message does not appear in captured stdout or stderr output
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_console_output_true(capsys)[source]
Test logger console output true.
Purpose: Validates that logger correctly outputs messages to console when console_output=True
Given: Logger configured with debug=True and console_output=True When: Info message is logged Then: Message appears in captured stdout or stderr output
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_file_output(tmp_path)[source]
Test logger file output.
Purpose: Validates that logger correctly writes messages to log files when output_dir is specified
Given: Logger configured with debug=True, output_dir=tmp_path, and console_output=False When: Info message is logged Then: Log file is created in logs subdirectory containing the test message
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_no_duplicate_handlers(tmp_path)[source]
Test logger no duplicate handlers.
Purpose: Validates that repeated get_logger calls do not create duplicate handlers causing message duplication
Given: Logger called twice with identical parameters (debug=True, output_dir=tmp_path, console_output=True) When: Messages are logged after each get_logger call Then: Each message appears exactly once in the log file without duplication
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_no_handlers_no_io(capsys, tmp_path)[source]
Test logger with no handlers performs no I/O.
- Purpose: Validates that logger with console_output=False and output_dir=None has zero handlers
and performs no I/O operations
Given: Logger configured with console_output=False and output_dir=None When: Multiple messages are logged at various levels Then: Logger has zero handlers (or only NullHandlers which perform no I/O), no console output, and no files are created
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_no_multiple_files_on_repeated_calls(tmp_path)[source]
Test that repeated get_logger calls with same name don’t create multiple log files.
- Purpose: Validates that repeated calls to get_logger with same name reuse existing logger
and don’t create new log files with different timestamps
Given: Multiple sequential calls to get_logger with the same logger name When: Messages are logged after each get_logger call Then: Only ONE log file is created, containing all messages from all calls
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_logger_reuses_handlers_with_same_name(tmp_path)[source]
Test that get_logger with same name returns logger with existing handlers.
- Purpose: Validates that calling get_logger multiple times with the same name
returns the same logger instance with the same handlers
Given: Logger created with specific name and output directory When: get_logger is called again with the same name Then: Same logger instance is returned with same handler configuration
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_backwards_compatibility(tmp_path)[source]
Test queue logger backwards compatibility with existing code.
Purpose: Validates that queue logger maintains compatibility with existing get_logger calls
Given: Logger calls with and without use_queue parameter When: Both types of loggers are used Then: Both work correctly and produce expected log files
Test type: integration
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_basic_functionality(tmp_path)[source]
Test basic queue-based logger functionality.
Purpose: Validates that queue-based logger works correctly for basic logging operations
Given: Logger configured with use_queue=True, output_dir=tmp_path, console_output=False When: Info and warning messages are logged Then: Messages are written to log file via queue mechanism without blocking
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_cleanup()[source]
Test queue logger cleanup functionality.
Purpose: Validates that cleanup functions properly shut down queue logger resources
Given: Active queue logger with writer thread and handlers When: cleanup_all_loggers is called Then: All resources are properly cleaned up and system returns to clean state
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_diagnostics(tmp_path)[source]
Test queue logger diagnostics functionality.
Purpose: Validates that diagnostic functions provide accurate information about queue logger state
Given: Queue logger system with multiple active loggers When: Diagnostics are requested Then: Accurate information about queue size, handlers, and system state is returned
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_handler_management(tmp_path)[source]
Test queue logger handler pooling and management.
Purpose: Validates that queue logger properly manages file handlers with pooling
Given: Multiple queue loggers accessing the same task repeatedly When: Handlers are created and reused Then: Handler count is managed efficiently without resource leaks
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_queue_logger_individual_task_files(tmp_path)[source]
Test queue logger creates individual files per task.
Purpose: Validates that queue-based logger maintains individual log files for different tasks
Given: Multiple queue loggers with different task IDs When: Messages are logged from each task Then: Separate log files are created for each task with correct content
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_adds_memory_handlers(tmp_path)[source]
Test that setup_task_logger_with_buffering adds ConditionalMemoryHandler when enabled.
Purpose: Validates that log_only_on_failure=True wraps handlers with ConditionalMemoryHandler
Given: Logger configuration with log_only_on_failure=True When: setup_task_logger_with_buffering is called Then: Handlers are wrapped with ConditionalMemoryHandler for buffering
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_creates_logger(tmp_path)[source]
Test that setup_task_logger_with_buffering creates a properly configured logger.
Purpose: Validates that setup function creates logger with correct configuration
Given: Logger name and configuration parameters When: setup_task_logger_with_buffering is called with log_only_on_failure=False Then: Logger is created with file and console handlers as expected
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_setup_task_logger_with_buffering_reuses_existing(tmp_path)[source]
Test that setup_task_logger_with_buffering reuses existing configured logger.
Purpose: Validates that repeated calls with same logger name return the same logger
Given: Logger already created and configured When: setup_task_logger_with_buffering is called again with same name Then: Same logger instance is returned without duplicate handlers
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_task_logger_manager_state_tracking(tmp_path)[source]
Test that TaskLoggerManager correctly tracks logger state.
Purpose: Validates that TaskLoggerManager maintains accurate state of configured loggers
Given: TaskLoggerManager with multiple loggers created When: Loggers are created and configured Then: Manager correctly tracks which loggers are configured and their handlers
Test type: unit
- POMDPPlanners.tests.test_utils.test_logger.test_task_logger_manager_thread_safety(tmp_path)[source]
Test that TaskLoggerManager is thread-safe for concurrent access.
Purpose: Validates that TaskLoggerManager can handle concurrent logger creation safely
Given: Multiple threads creating loggers simultaneously When: Threads call setup_task_logger_with_buffering concurrently Then: All loggers are created correctly without race conditions
Test type: unit
POMDPPlanners.tests.test_utils.test_memory_tracker module
POMDPPlanners.tests.test_utils.test_multivariate_normal module
POMDPPlanners.tests.test_utils.test_planner_episode_visualization module
POMDPPlanners.tests.test_utils.test_probability_utils module
Utilities for testing state transition probability methods.
This module provides general utility functions for validating that probability methods correctly match empirical sampling distributions across different POMDP environments.
- POMDPPlanners.tests.test_utils.test_probability_utils.validate_continuous_observation_model_pdf_consistency(observation_model, num_samples=1000, seed=42)[source]
Validate continuous observation model by checking PDF consistency properties.
This function validates that a continuous observation model’s probability() method returns valid PDF values by checking: 1. All PDF values are non-negative 2. PDF values near the mean are higher than those far from the mean 3. The PDF computation is deterministic (same input gives same output)
This is appropriate for continuous distributions where JS divergence is not meaningful.
- Parameters:
observation_model (
ObservationModel) – Observation model to testnum_samples (
int) – Number of samples to draw. Defaults to 1000.seed (
int) – Random seed for reproducibility. Defaults to 42.
- Returns:
‘pdf_values_non_negative’: Whether all PDF values are >= 0
’pdf_deterministic’: Whether probability() is deterministic
’pdf_values’: Array of computed PDF values
’num_samples’: Number of samples used
- Return type:
Example
>>> from POMDPPlanners.environments.cartpole_pomdp import CartPoleObservation >>> import numpy as np >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> true_state = np.array([0.1, 0.05, 0.02, -0.1]) >>> obs_model = CartPoleObservation(next_state=true_state, action=0, noise_cov=noise_cov) >>> results = validate_continuous_observation_model_pdf_consistency(obs_model) >>> results['pdf_values_non_negative'] True
- POMDPPlanners.tests.test_utils.test_probability_utils.validate_continuous_observation_model_with_ks_test(observation_model, num_samples=1000, significance_level=0.05, seed=42)[source]
Validate continuous observation model using Kolmogorov-Smirnov test.
This function validates that a continuous observation model’s sample() method generates samples consistent with its probability() method by using the KS test. Unlike JS divergence which is inappropriate for continuous distributions, the KS test properly handles continuous data by comparing cumulative distribution functions.
The method works by: 1. Sampling observations from the observation model 2. Computing PDF values for each sample using probability() 3. Computing the CDF from the PDF values 4. Running KS test to compare empirical CDF against theoretical CDF 5. Verifying the p-value exceeds the significance level
- Parameters:
observation_model (
ObservationModel) – Observation model to test (should have continuous observations)num_samples (
int) – Number of samples to draw. Defaults to 1000.significance_level (
float) – Significance level for KS test (reject if p < this). Defaults to 0.05.seed (
int) – Random seed for reproducibility. Defaults to 42.
- Returns:
‘ks_statistic’: KS test statistic
’p_value’: p-value from KS test
’passed’: Whether the test passed (p_value >= significance_level)
’num_samples’: Number of samples used
’significance_level’: Significance level used
’pdf_values_valid’: Whether all PDF values are non-negative
- Return type:
- Raises:
AssertionError – If the KS test fails (p-value < significance level) or if PDF values are negative
Example
>>> from POMDPPlanners.environments.cartpole_pomdp import CartPoleObservation >>> import numpy as np >>> np.random.seed(42) >>> noise_cov = np.diag([0.1, 0.1, 0.1, 0.1]) >>> true_state = np.array([0.1, 0.05, 0.02, -0.1]) >>> obs_model = CartPoleObservation(next_state=true_state, action=0, noise_cov=noise_cov) >>> results = validate_continuous_observation_model_with_ks_test(obs_model, seed=42) >>> print(f"KS p-value: {results['p_value']:.4f}") KS p-value: 0.0000
- POMDPPlanners.tests.test_utils.test_probability_utils.validate_distribution_probability_matches_empirical(distribution, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js')[source]
Test that computed probabilities match empirical sampling distribution.
This function validates that a distribution’s probability() method correctly computes probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.
The method works by: 1. Sampling many values (default 1000) from the distribution 2. Computing empirical PDF by counting value occurrences 3. Computing theoretical probabilities for all sampled values using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)
- Parameters:
distribution (
Distribution) – Distribution to testnum_samples (
int) – Number of samples for building distributions. Defaults to 1000.max_js_divergence (
float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.max_wasserstein_distance (
float) – Maximum allowed Wasserstein distance. Defaults to 0.1.normalization_tolerance (
float) – Tolerance for probability normalization check. Defaults to 0.01.seed (
int) – Random seed for reproducibility. Defaults to 42.distance_metric (
str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.
- Returns:
‘unique_values’: List of unique sampled values
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_values’: Number of unique values found
- Return type:
- Raises:
AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly
Example
>>> import numpy as np >>> from POMDPPlanners.core.distributions import DiscreteDistribution >>> dist = DiscreteDistribution(["a", "b", "c"], np.array([0.5, 0.3, 0.2])) >>> results = validate_distribution_probability_matches_empirical(dist, seed=42) >>> print(f"JS Divergence: {results['distance']:.6f}") JS Divergence: 0.002131
- POMDPPlanners.tests.test_utils.test_probability_utils.validate_observation_probability_matches_empirical_distribution(observation_model, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js', check_normalization=True)[source]
Test that computed probabilities match empirical sampling distribution.
This function validates that an observation model’s probability() method correctly computes observation probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.
The method works by: 1. Sampling many observations (default 1000) from the observation model 2. Computing empirical PDF by counting observation occurrences 3. Computing theoretical probabilities for all sampled observations using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)
- Parameters:
observation_model (
ObservationModel) – Observation model to testnum_samples (
int) – Number of samples for building distributions. Defaults to 1000.max_js_divergence (
float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.max_wasserstein_distance (
float) – Maximum allowed Wasserstein distance. Defaults to 0.1.normalization_tolerance (
float) – Tolerance for probability normalization check. Defaults to 0.01.seed (
int) – Random seed for reproducibility. Defaults to 42.distance_metric (
str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.check_normalization (bool)
- Returns:
‘unique_observations’: List of unique sampled observations
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_observations’: Number of unique observations found
- Return type:
- Raises:
AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly
Example
>>> from POMDPPlanners.environments.tiger_pomdp import TigerObservation >>> obs_model = TigerObservation(next_state="tiger_left", action="listen") >>> results = validate_observation_probability_matches_empirical_distribution(obs_model, seed=42) >>> print(f"JS Divergence: {results['distance']:.6f}") JS Divergence: 0.002958
- POMDPPlanners.tests.test_utils.test_probability_utils.validate_probability_matches_empirical_distribution(transition_model, num_samples=1000, max_js_divergence=0.05, max_wasserstein_distance=0.1, normalization_tolerance=0.01, seed=42, distance_metric='js')[source]
Test that computed probabilities match empirical sampling distribution.
This function validates that a state transition model’s probability() method correctly computes transition probabilities by comparing the computed probability distribution against an empirical distribution obtained through repeated sampling.
The method works by: 1. Sampling many states (default 1000) from the transition model 2. Computing empirical PDF by counting state occurrences 3. Computing theoretical probabilities for all sampled states using probability() 4. Normalizing the computed probabilities 5. Measuring distance between the two distributions (JS divergence or Wasserstein)
- Parameters:
transition_model (
StateTransitionModel) – State transition model to testnum_samples (
int) – Number of samples for building distributions. Defaults to 1000.max_js_divergence (
float) – Maximum allowed Jensen-Shannon divergence. Defaults to 0.05.max_wasserstein_distance (
float) – Maximum allowed Wasserstein distance. Defaults to 0.1.normalization_tolerance (
float) – Tolerance for probability normalization check. Defaults to 0.01.seed (
int) – Random seed for reproducibility. Defaults to 42.distance_metric (
str) – Distance metric to use (“js” for Jensen-Shannon or “wasserstein”). Defaults to “js”.
- Returns:
‘unique_states’: List of unique sampled states
’empirical_probs’: Empirical probability distribution (normalized counts)
’computed_probs’: Computed probabilities from probability() method (normalized)
’distance’: Distance between distributions (JS divergence or Wasserstein)
’distance_metric’: Which distance metric was used
’probabilities_normalized’: Whether computed probabilities sum to ~1.0
’num_unique_states’: Number of unique states found
- Return type:
- Raises:
AssertionError – If computed probabilities don’t match empirical distribution within the specified tolerance, or if probabilities don’t normalize properly
Example
>>> from POMDPPlanners.environments.tiger_pomdp import TigerStateTransition >>> transition = TigerStateTransition(state="tiger_left", action="listen") >>> results = validate_probability_matches_empirical_distribution(transition, seed=42) >>> print(f"JS Divergence: {results['distance']:.6f}") JS Divergence: 0.000000