POMDPPlanners.simulations.simulation_apis package

Submodules

POMDPPlanners.simulations.simulation_apis.dask_simulations_api module

POMDPPlanners.simulations.simulation_apis.local_simulations_api module

POMDPPlanners.simulations.simulation_apis.pbs_simulations_api module

POMDPPlanners.simulations.simulation_apis.simulations_api_interface module

Abstract interface for POMDP simulation APIs.

This module defines the abstract base class interface that all simulation API implementations must follow. It ensures consistent method signatures across local, distributed, and cluster-based simulation execution modes.

class POMDPPlanners.simulations.simulation_apis.simulations_api_interface.SimulationsAPIInterface(cache_dir_path=None, debug=False)[source]

Bases: ABC

Abstract base class defining the interface for POMDP simulation APIs.

This interface ensures that all simulation API implementations (local, Dask, PBS, etc.) provide a consistent set of methods with standardized signatures. Subclasses must implement all abstract methods to provide specific execution strategies while maintaining API compatibility.

The interface defines methods for: - Running simulations with multiple environments and policies - Hyperparameter optimization - Comprehensive benchmarking with hyperparameter tuning - Optimize and evaluate workflows - All hyperparameter benchmarks

Note

This is an abstract base class and cannot be instantiated directly. Use concrete implementations like LocalSimulationsAPI or DaskSimulationsAPI.

Parameters:
  • cache_dir_path (Path | None)

  • debug (bool)

abstractmethod run_all_benchmark_environments_on_planner_generators(generators, n_particles=30, num_episodes=10, num_steps=20, alpha=0.1, confidence_interval_level=0.95, experiment_name='All_Benchmark_Environments_On_Planner_Generators', n_jobs=-1, cache_dir_path=None, clear_cache_on_start=False, enable_profiling=False, profiling_output_limit=50, cache_visualizations=True, is_risk_averse=False)[source]

Run all benchmark environments on planner generators.

This method runs benchmark environments on planner generators.

Parameters:
  • generators (Sequence[PlannerGenerator]) – Sequence of PlannerGenerator objects.

  • n_particles (int) – Number of particles for belief representation.

  • num_episodes (int) – Number of episodes for optimization.

  • num_steps (int) – Maximum steps per episode for optimization.

  • alpha (float) – Statistical significance level for confidence intervals.

  • confidence_interval_level (float) – Confidence level for statistical analysis.

  • experiment_name (str) – Name for the experiment.

  • n_jobs (int) – Number of parallel jobs for execution.

  • cache_dir_path (Optional[Path]) – Optional path for storing simulation results.

  • clear_cache_on_start (bool) – Whether to clear existing cache before starting simulation.

  • enable_profiling (bool) – Whether to enable performance profiling.

  • profiling_output_limit (int) – Maximum number of profiling entries to display.

  • cache_visualizations (bool) – Whether to cache visualizations.

  • is_risk_averse (bool) – Whether to run risk-averse benchmark.

Returns:

  • Dict[str, Dict[str, list]]: Raw simulation results organized by environment name, then policy name, containing lists of History objects for each episode.

  • pd.DataFrame: Statistical summary with confidence intervals, performance metrics, and policy configuration details for analysis and comparison.

Return type:

Tuple[Dict[str, Dict[str, list]], DataFrame]

abstractmethod run_all_hyperparameter_benchmarks(policy_space_info, particles=30, num_episodes=10, num_steps=20, n_trials=100, discount_factor=0.95, time_out_in_seconds=3.0, evaluation_episodes=3, evaluation_steps=6, evaluation_n_jobs=1, optimization_n_jobs=-1, is_risk_averse=False, confidence_interval_level=0.95, alpha=0.05, cache_dir_path=None, experiment_name='All_Hyperparameter_Benchmarks', debug=False, cache_visualizations=True)[source]

Run all hyperparameter benchmarks with optimization.

This method runs hyperparameter optimization for all compatible environments and planners for a given policy space, followed by evaluation.

Parameters:
  • policy_space_info (PolicySpaceInfo) – Policy space information specifying action and observation space types for compatibility matching.

  • particles (int) – Number of particles for belief representation.

  • num_episodes (int) – Number of episodes for optimization.

  • num_steps (int) – Maximum steps per episode for optimization.

  • n_trials (int) – Number of optimization trials.

  • discount_factor (float) – Discount factor for the MDP.

  • time_out_in_seconds (float) – Timeout for planner execution.

  • evaluation_episodes (int) – Number of episodes for evaluation.

  • evaluation_steps (int) – Maximum steps per episode for evaluation.

  • evaluation_n_jobs (int) – Number of parallel jobs for evaluation.

  • optimization_n_jobs (int) – Number of parallel jobs for optimization (-1 uses all cores).

  • is_risk_averse (bool) – Whether to run risk-averse benchmark.

  • confidence_interval_level (float) – Confidence level for intervals.

  • alpha (float) – Significance level for statistical tests.

  • cache_dir_path (Optional[Path]) – Optional path for storing results.

  • experiment_name (str) – Name for the experiment.

  • debug (bool) – Enable debug mode.

  • cache_visualizations (bool) – Whether to cache visualizations.

Returns:

  • Dict[str, Dict[str, list]]: Raw simulation results organized by environment and policy names.

  • pd.DataFrame: Statistical summary with performance metrics and comparisons.

Return type:

Tuple[Dict[str, Dict[str, list]], DataFrame]

abstractmethod run_hyperparameter_optimization(environment_run_params, experiment_name='POMDP_Hyperparameter_Optimization', n_jobs=-1, cache_dir_path=None, clear_cache_on_start=False, debug=False, confidence_interval_level=0.95, alpha=0.05, use_queue_logger=False, parallelization_level=ParallelizationLevel.OPTUNA_TRIALS)[source]

Run hyperparameter optimization for POMDP policies using Optuna.

This method provides a high-level interface for hyperparameter optimization by wrapping the HyperParameterOptimizer class. It supports optimization of multiple environment-policy configurations with comprehensive MLflow tracking and statistical analysis.

The optimization uses Optuna’s advanced algorithms (TPE, CMA-ES, etc.) to efficiently search the hyperparameter space and find optimal configurations for POMDP policies.

Parameters:
  • environment_run_params (List[HyperParameterRunParams]) – List of HyperParameterRunParams configurations, each specifying an environment, policy class, hyperparameter ranges, and optimization settings. Each configuration must include the required n_trials parameter.

  • experiment_name (str) – Name for the MLflow experiment tracking. Used to organize optimization runs and enable comparison across different experiments.

  • n_jobs (int) – Number of parallel jobs for episode execution. Use -1 to use all available CPU cores/workers, or specify a positive integer for a specific number of cores.

  • cache_dir_path (Optional[Path]) – Optional path for storing optimization results, logs, and MLflow artifacts. If None, results are stored in the current working directory.

  • clear_cache_on_start (bool) – Whether to clear existing cache before starting optimization. Useful for ensuring clean runs when debugging or testing.

  • debug (bool) – Whether to enable debug-level logging output. When True, provides detailed information about optimization progress and internal operations.

  • confidence_interval_level (float) – Confidence level for statistical analysis (between 0.0 and 1.0). Used for computing confidence intervals in performance statistics. Defaults to 0.95 for 95% confidence intervals.

  • alpha (float) – Significance level for statistical tests (between 0.0 and 1.0). Used for hypothesis testing and confidence interval calculations. Defaults to 0.05 for 5% significance level.

  • use_queue_logger (bool) – Whether to use queue-based logging for distributed execution scenarios. Defaults to False for local execution.

  • parallelization_level (ParallelizationLevel) – Controls where parallelization is applied. OPTUNA_TRIALS (default) parallelizes across Optuna trials while running episodes sequentially. EPISODES parallelizes across episodes within each trial while running trials sequentially.

Returns:

List of optimization results, each containing

the optimized policy with its best hyperparameters, environment reference, and optimization metadata for each input configuration.

Return type:

List[OptimizedPolicyResult]

Raises:
  • ValueError – If any configuration contains invalid parameters or missing required fields like n_trials.

  • TypeError – If policy classes are not Policy subclasses.

  • RuntimeError – If optimization fails for any configuration.

abstractmethod run_hyperparameter_tuning_experiment_with_benchmarks(generators, particles=30, num_episodes=10, num_steps=20, n_trials=100, discount_factor=0.95, time_out_in_seconds=3.0, evaluation_episodes=3, evaluation_steps=6, evaluation_n_jobs=1, optimization_n_jobs=-1, is_risk_averse=False, confidence_interval_level=0.95, alpha=0.05, cache_dir_path=None, experiment_name='Comprehensive_Benchmark', debug=False, cache_visualizations=True)[source]

Run comprehensive benchmark with hyperparameter optimization.

This method runs hyperparameter optimization followed by policy evaluation for comprehensive benchmarking. It optimizes for average return across all configured environments and benchmark planners.

Parameters:
  • generators (Sequence[HyperParamPlannerConfigGenerator]) – Hyperparameter configuration generators list.

  • particles (int) – Number of particles for belief representation.

  • num_episodes (int) – Number of episodes for optimization.

  • num_steps (int) – Maximum steps per episode for optimization.

  • n_trials (int) – Number of optimization trials.

  • discount_factor (float) – Discount factor for the MDP.

  • time_out_in_seconds (float) – Timeout for planner execution.

  • evaluation_episodes (int) – Number of episodes for evaluation.

  • evaluation_steps (int) – Maximum steps per episode for evaluation.

  • evaluation_n_jobs (int) – Number of parallel jobs for evaluation.

  • optimization_n_jobs (int) – Number of parallel jobs for optimization (-1 uses all cores).

  • is_risk_averse (bool) – Whether to run risk-averse benchmark.

  • confidence_interval_level (float) – Confidence level for intervals.

  • alpha (float) – Significance level for statistical tests.

  • cache_dir_path (Optional[Path]) – Optional path for storing results.

  • experiment_name (str) – Name for the experiment.

  • debug (bool) – Enable debug mode.

  • cache_visualizations (bool) – Whether to cache visualizations.

Returns:

  • Dict[str, Dict[str, list]]: Raw simulation results organized by environment and policy names.

  • pd.DataFrame: Statistical summary with performance metrics and comparisons.

Return type:

Tuple[Dict[str, Dict[str, list]], DataFrame]

abstractmethod run_multiple_environments_and_policies(environment_run_params, alpha, confidence_interval_level, experiment_name='POMDP_Planning_Comparison', debug=False, scheduler_address=None, n_jobs=-1, cache_dir_path=None, clear_cache_on_start=False, enable_profiling=False, profiling_output_limit=50)[source]

Run POMDP simulations with multiple environments and policies.

This method executes POMDP simulations for the given environment and policy configurations. The specific execution strategy (local, distributed, cluster) is determined by the concrete implementation.

Parameters:
  • environment_run_params (List[EnvironmentRunParams]) – List of environment configurations for simulation. Each configuration specifies an environment, belief state, policies, number of episodes, and number of steps per episode.

  • alpha (float) – Statistical significance level for confidence intervals (e.g., 0.05 for 95% CI). Used for computing risk metrics like Conditional Value at Risk (CVaR).

  • confidence_interval_level (float) – Confidence level for statistical analysis (e.g., 0.95). Determines the width of confidence intervals for performance metrics.

  • experiment_name (str) – Name for the experiment and MLflow tracking. Used to organize results and enable comparison across different experimental runs.

  • debug (bool) – Whether to enable debug-level logging output. When True, provides detailed information about simulation progress and internal operations.

  • scheduler_address (Optional[str]) – Address of the Dask scheduler for distributed execution. If None, uses local execution (LocalSimulationsAPI) or creates a local Dask cluster (DaskSimulationsAPI). Format: “tcp://scheduler-ip:port”. This parameter is ignored by LocalSimulationsAPI.

  • n_jobs (int) – Number of parallel jobs for execution. Use -1 to use all available CPU cores/workers, or specify a positive integer for a specific number.

  • cache_dir_path (Optional[Path]) – Optional path for storing simulation results, logs, and artifacts. If None, results are stored in the current working directory.

  • clear_cache_on_start (bool) – Whether to clear existing cache before starting simulation. Useful for ensuring clean runs when debugging or testing.

  • enable_profiling (bool) – Whether to enable performance profiling using cProfile. Generates detailed timing information for optimization analysis.

  • profiling_output_limit (int) – Maximum number of profiling entries to display when profiling is enabled. Helps focus on the most time-consuming operations.

Returns:

  • Dict[str, Dict[str, list]]: Raw simulation results organized by environment name, then policy name, containing lists of History objects for each episode.

  • pd.DataFrame: Statistical summary with confidence intervals, performance metrics, and policy configuration details for analysis and comparison.

Return type:

Tuple[Dict[str, Dict[str, list]], DataFrame]

abstractmethod run_optimize_and_evaluate(configs, evaluation_episodes=100, evaluation_steps=100, evaluation_n_jobs=1, optimization_n_jobs=-1, confidence_interval_level=0.95, alpha=0.05, cache_dir_path=None, experiment_name='Optimize_And_Evaluate', debug=False, cache_visualizations=True)[source]

Run hyperparameter optimization and evaluation.

This method runs hyperparameter optimization for the provided configurations, then evaluates the optimized policies.

Parameters:
  • configs (List[HyperParameterRunParams]) – List of hyperparameter run configurations.

  • evaluation_episodes (int) – Number of episodes for evaluation.

  • evaluation_steps (int) – Maximum steps per episode for evaluation.

  • evaluation_n_jobs (int) – Number of parallel jobs for evaluation.

  • optimization_n_jobs (int) – Number of parallel jobs for optimization (-1 uses all cores).

  • confidence_interval_level (float) – Confidence level for intervals.

  • alpha (float) – Significance level for statistical tests.

  • cache_dir_path (Optional[Path]) – Optional path for storing results.

  • experiment_name (str) – Name for the experiment.

  • debug (bool) – Enable debug mode.

  • cache_visualizations (bool) – Whether to cache visualizations.

Returns:

  • Dict[str, Dict[str, list]]: Raw simulation results organized by environment and policy names.

  • pd.DataFrame: Statistical summary with performance metrics and comparisons.

Return type:

Tuple[Dict[str, Dict[str, list]], DataFrame]