Hyperparameter Tuning Examples
This page demonstrates how to perform hyperparameter optimization for POMDP planners using the POMDPPlanners framework. We’ll show how to optimize different algorithms on various environments using Optuna-based optimization.
Overview
Hyperparameter tuning is crucial for achieving optimal performance in POMDP planning. The framework provides a comprehensive hyperparameter optimization system that:
Uses Optuna for efficient parameter search
Supports both numerical and categorical parameters
Provides MLflow integration for experiment tracking
Handles multiple environments and algorithms simultaneously
Includes statistical analysis and confidence intervals
Basic Hyperparameter Optimization
Let’s start with a simple example optimizing POMCP on the Tiger POMDP:
from pathlib import Path
from POMDPPlanners.simulations.simulations_api import SimulationsAPI
from POMDPPlanners.configs.environment_configs import EnvironmentConfigsAPI
from POMDPPlanners.configs.planners_hyperparam_configs import PlannersHyperparamConfigs
from POMDPPlanners.core.simulation import (
NumericalHyperParameter, CategoricalHyperParameter
)
from POMDPPlanners.core.simulation.hyperparameter_tuning import (
HyperParameterRunParams, HyperParameterOptimizationDirection
)
# Initialize the API
api = SimulationsAPI(
cache_dir_path=Path("./hyperparameter_results"),
debug=True
)
# Create environment configuration
env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
tiger_env, tiger_belief = env_configs.tiger_pomdp_config(n_particles=1000)
# Create planner configuration
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)
# Define hyperparameter optimization configuration
optimization_configs = [
HyperParameterRunParams(
environment=tiger_env,
belief=tiger_belief,
policy_cls=POMCP,
hyper_parameters=[
NumericalHyperParameter("exploration_constant", 0.1, 100.0),
NumericalHyperParameter("depth", 5, 30),
NumericalHyperParameter("min_samples_per_node", 1, 20)
],
constant_parameters={
"discount_factor": 0.95,
"name": "OptimizedPOMCP_Tiger"
},
num_episodes=50, # Episodes for final evaluation
num_steps=30, # Steps per episode
n_trials=100, # Number of optimization trials
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
)
]
# Run hyperparameter optimization
results = api.run_hyperparameter_optimization(
environment_run_params=optimization_configs,
experiment_name="Tiger_POMCP_Optimization",
n_jobs=4, # Use 4 CPU cores
)
# Analyze results
for i, result in enumerate(results):
print(f"Configuration {i+1} Results:")
print(f" Environment: {result.environment.__class__.__name__}")
print(f" Policy: {result.policy.__class__.__name__}")
print(f" Best hyperparameters: {result.chosen_hyper_parameters}")
print(f" Policy name: {result.policy.name}")
Multi-Environment Optimization
Now let’s optimize multiple algorithms on different environments:
from POMDPPlanners.planners.mcts_planners.sparse_pft import SparsePFT
from POMDPPlanners.planners.mcts_planners.pomcpow import POMCPOW
from POMDPPlanners.planners.planners_utils.dpw import SimpleActionSampler
# Initialize APIs
api = SimulationsAPI(
cache_dir_path=Path("./multi_env_optimization"),
debug=True
)
env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)
# Create environments
rock_sample_env, rock_sample_belief = env_configs.rock_sample_pomdp_config(n_particles=1000)
laser_tag_env, laser_tag_belief = env_configs.laser_tag_pomdp_config(n_particles=1000)
# Create action sampler for POMCPOW
action_sampler = SimpleActionSampler()
# Define multiple optimization configurations
optimization_configs = [
# SparsePFT on Rock Sample
HyperParameterRunParams(
environment=rock_sample_env,
belief=rock_sample_belief,
policy_cls=SparsePFT,
hyper_parameters=[
NumericalHyperParameter("depth", 5, 15),
NumericalHyperParameter("c_ucb", 0.0, 50.0),
NumericalHyperParameter("beta_ucb", 0.0, 50.0),
NumericalHyperParameter("belief_child_num", 3, 15)
],
constant_parameters={
"discount_factor": 0.95,
"gamma": 0.95,
"name": "OptimizedSparsePFT_RockSample"
},
num_episodes=30,
num_steps=25,
n_trials=80,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
),
# POMCPOW on Laser Tag
HyperParameterRunParams(
environment=laser_tag_env,
belief=laser_tag_belief,
policy_cls=POMCPOW,
hyper_parameters=[
NumericalHyperParameter("exploration_constant", 0.0, 50.0),
NumericalHyperParameter("depth", 5, 15),
NumericalHyperParameter("k_a", 1, 10),
NumericalHyperParameter("alpha_a", 0.01, 0.5),
NumericalHyperParameter("k_o", 1, 10),
NumericalHyperParameter("alpha_o", 0.01, 0.5)
],
constant_parameters={
"discount_factor": 0.95,
"name": "OptimizedPOMCPOW_LaserTag",
"action_sampler": action_sampler,
"time_out_in_seconds": 3.0
},
num_episodes=30,
num_steps=25,
n_trials=80,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
),
# SparsePFT on Laser Tag
HyperParameterRunParams(
environment=laser_tag_env,
belief=laser_tag_belief,
policy_cls=SparsePFT,
hyper_parameters=[
NumericalHyperParameter("depth", 5, 15),
NumericalHyperParameter("c_ucb", 0.0, 50.0),
NumericalHyperParameter("beta_ucb", 0.0, 50.0),
NumericalHyperParameter("belief_child_num", 3, 15)
],
constant_parameters={
"discount_factor": 0.95,
"gamma": 0.95,
"name": "OptimizedSparsePFT_LaserTag"
},
num_episodes=30,
num_steps=25,
n_trials=80,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
)
]
# Run multi-environment optimization
results = api.run_hyperparameter_optimization(
environment_run_params=optimization_configs,
experiment_name="Multi_Environment_Algorithm_Optimization",
n_jobs=4,
)
# Analyze and compare results
print("=== Multi-Environment Optimization Results ===")
for i, result in enumerate(results):
env_name = result.environment.__class__.__name__
policy_name = result.policy.__class__.__name__
best_params = result.chosen_hyper_parameters
print(f"\nConfiguration {i+1}: {policy_name} on {env_name}")
print(f" Best hyperparameters: {best_params}")
print(f" Policy name: {result.policy.name}")
Using Predefined Hyperparameter Configurations
The framework provides predefined hyperparameter configurations for common algorithms:
from POMDPPlanners.configs.planners_hyperparam_configs import PlannersHyperparamConfigs
from POMDPPlanners.planners.planners_utils.dpw import SimpleActionSampler
# Initialize configuration APIs
env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)
# Create environments
rock_sample_env, rock_sample_belief = env_configs.rock_sample_pomdp_config(n_particles=1000)
laser_tag_env, laser_tag_belief = env_configs.laser_tag_pomdp_config(n_particles=1000)
# Create action sampler
action_sampler = SimpleActionSampler()
# Use predefined configurations
sparse_pft_config = planner_configs.sparse_pft_config(
env=rock_sample_env,
name="PredefinedSparsePFT_RockSample"
)
pomcpow_config = planner_configs.pomcpow_config(
env=laser_tag_env,
action_sampler=action_sampler,
name="PredefinedPOMCPOW_LaserTag"
)
# Convert to optimization parameters
optimization_configs = [
HyperParameterRunParams(
environment=rock_sample_env,
belief=rock_sample_belief,
policy_cls=sparse_pft_config.policy_cls,
hyper_parameters=sparse_pft_config.hyper_parameters,
constant_parameters=sparse_pft_config.constant_parameters,
num_episodes=40,
num_steps=30,
n_trials=100,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
),
HyperParameterRunParams(
environment=laser_tag_env,
belief=laser_tag_belief,
policy_cls=pomcpow_config.policy_cls,
hyper_parameters=pomcpow_config.hyper_parameters,
constant_parameters=pomcpow_config.constant_parameters,
num_episodes=40,
num_steps=30,
n_trials=100,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
)
]
# Run optimization with predefined configurations
results = api.run_hyperparameter_optimization(
environment_run_params=optimization_configs,
experiment_name="Predefined_Config_Optimization",
n_jobs=4,
)
Advanced Optimization with Risk-Averse Environments
Let’s optimize algorithms on risk-averse environment configurations:
from POMDPPlanners.configs.environment_configs import RiskAverseEnvironmentConfigsAPI
# Initialize risk-averse environment configurations
risk_averse_env_configs = RiskAverseEnvironmentConfigsAPI(discount_factor=0.95)
# Create risk-averse environments
risk_rock_sample_env, risk_rock_sample_belief = risk_averse_env_configs.rock_sample_pomdp_config(n_particles=1000)
risk_laser_tag_env, risk_laser_tag_belief = risk_averse_env_configs.laser_tag_pomdp_config(n_particles=1000)
# Define risk-aware optimization configurations
risk_optimization_configs = [
# SparsePFT on Risk-Averse Rock Sample
HyperParameterRunParams(
environment=risk_rock_sample_env,
belief=risk_rock_sample_belief,
policy_cls=SparsePFT,
hyper_parameters=[
NumericalHyperParameter("depth", 5, 15),
NumericalHyperParameter("c_ucb", 0.0, 50.0),
NumericalHyperParameter("beta_ucb", 0.0, 50.0),
NumericalHyperParameter("belief_child_num", 3, 15)
],
constant_parameters={
"discount_factor": 0.95,
"gamma": 0.95,
"name": "RiskAverseSparsePFT_RockSample"
},
num_episodes=50,
num_steps=30,
n_trials=120,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
),
# POMCPOW on Risk-Averse Laser Tag
HyperParameterRunParams(
environment=risk_laser_tag_env,
belief=risk_laser_tag_belief,
policy_cls=POMCPOW,
hyper_parameters=[
NumericalHyperParameter("exploration_constant", 0.0, 50.0),
NumericalHyperParameter("depth", 5, 15),
NumericalHyperParameter("k_a", 1, 10),
NumericalHyperParameter("alpha_a", 0.01, 0.5),
NumericalHyperParameter("k_o", 1, 10),
NumericalHyperParameter("alpha_o", 0.01, 0.5)
],
constant_parameters={
"discount_factor": 0.95,
"name": "RiskAversePOMCPOW_LaserTag",
"action_sampler": action_sampler,
"time_out_in_seconds": 3.0
},
num_episodes=50,
num_steps=30,
n_trials=120,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
)
]
# Run risk-aware optimization
risk_results = api.run_hyperparameter_optimization(
environment_run_params=risk_optimization_configs,
experiment_name="Risk_Averse_Optimization",
n_jobs=4,
)
print("=== Risk-Averse Optimization Results ===")
for i, result in enumerate(risk_results):
env_name = result.environment.__class__.__name__
policy_name = result.policy.__class__.__name__
best_params = result.chosen_hyper_parameters
print(f"\nRisk-Averse Configuration {i+1}: {policy_name} on {env_name}")
print(f" Best hyperparameters: {best_params}")
Optimization with Custom Parameter Types
You can also use categorical parameters for algorithm selection:
from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
from POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling import SparseSamplingDiscreteActionsPlanner
# Define optimization with categorical parameters
categorical_optimization_configs = [
HyperParameterRunParams(
environment=tiger_env,
belief=tiger_belief,
policy_cls=POMCP, # Base class, actual algorithm determined by categorical param
hyper_parameters=[
CategoricalHyperParameter("algorithm_type", ["POMCP", "SparseSampling"]),
NumericalHyperParameter("exploration_constant", 0.1, 100.0),
NumericalHyperParameter("depth", 5, 20),
NumericalHyperParameter("num_simulations", 100, 2000)
],
constant_parameters={
"discount_factor": 0.95,
"name": "CategoricalOptimized"
},
num_episodes=30,
num_steps=25,
n_trials=60,
direction=HyperParameterOptimizationDirection.MAXIMIZE,
parameter_to_optimize="average_return"
)
]
# Note: This requires custom policy class that handles categorical parameters
# For demonstration purposes, we'll use separate configurations instead
Analyzing Optimization Results
After optimization, you can analyze the results and use the optimized policies:
import pandas as pd
import matplotlib.pyplot as plt
# Extract optimization results
optimized_policies = [result.policy for result in results]
optimization_metadata = [result.optimization_metadata for result in results]
# Create a comparison table
comparison_data = []
for i, (policy, metadata) in enumerate(zip(optimized_policies, optimization_metadata)):
comparison_data.append({
'Configuration': i+1,
'Environment': policy.environment.__class__.__name__,
'Algorithm': policy.__class__.__name__,
'Policy_Name': policy.name,
'Best_Parameters': str(metadata.get('best_params', {})),
'Best_Value': metadata.get('best_value', 0.0),
'N_Trials': metadata.get('n_trials', 0)
})
comparison_df = pd.DataFrame(comparison_data)
print("=== Optimization Results Comparison ===")
print(comparison_df.to_string(index=False))
# Use optimized policies for further analysis
print(f"\nSuccessfully optimized {len(optimized_policies)} policies:")
for policy in optimized_policies:
print(f" - {policy.name} ({policy.__class__.__name__})")
# You can now use these optimized policies in simulations
from POMDPPlanners.core.simulation import EnvironmentRunParams
# Create simulation configurations with optimized policies
simulation_configs = []
for i, policy in enumerate(optimized_policies):
simulation_configs.append(
EnvironmentRunParams(
environment=policy.environment,
belief=rock_sample_belief if "RockSample" in policy.name else laser_tag_belief,
policies=[policy],
num_episodes=100,
num_steps=30
)
)
# Run final evaluation simulation
final_results, final_stats = api.run_multiple_environments_and_policies_local_run(
environment_run_params=simulation_configs,
alpha=0.05,
confidence_interval_level=0.95,
experiment_name="Final_Optimized_Policy_Evaluation",
n_jobs=4
)
print("\n=== Final Evaluation Results ===")
print(final_stats[['environment', 'policy', 'average_return', 'average_return_ci_lower', 'average_return_ci_upper']].to_string(index=False))
Best Practices for Hyperparameter Optimization
- Parameter Range Selection
Start with wide ranges and narrow down based on initial results
Use logarithmic scales for parameters that vary over orders of magnitude
Consider problem-specific constraints (e.g., depth should not exceed episode length)
- Trial Configuration
Start with fewer trials (50-100) for initial exploration
Increase trials (200-500) for final optimization
Use more episodes (50-100) for reliable performance estimates
- Computational Resources
Use parallel execution (
n_jobs=-1) when availableConsider using distributed computing for large-scale optimization
Monitor memory usage with large particle counts
- Evaluation Strategy
Use consistent evaluation metrics across all configurations
Consider multiple performance metrics (average return, success rate, planning time)
Validate optimized policies on held-out test episodes
- MLflow Integration
All optimization runs are automatically tracked in MLflow
Use descriptive experiment names for easy organization
Compare results across different optimization runs
Troubleshooting Common Issues
- Low Performance After Optimization
Check if parameter ranges are appropriate for the problem
Verify that the optimization direction is correct (maximize vs minimize)
Ensure sufficient trials and episodes for reliable estimates
- Optimization Taking Too Long
Reduce the number of trials or episodes
Use fewer particles in belief representation
Decrease the planning depth or timeout limits
- Memory Issues
Reduce particle count in belief representation
Use smaller planning depths
Consider using sparse belief representations
- Convergence Problems
Increase the number of trials
Adjust parameter ranges based on initial results
Consider using different optimization algorithms in Optuna
Next Steps
Try Planners Comparison Study for comparing optimized policies
See Basic Usage Examples for using optimized policies in simulations
Check the ../api/simulations for advanced simulation features
Explore the ../api/core for detailed API reference