Hyperparameter Tuning Examples

This page demonstrates how to perform hyperparameter optimization for POMDP planners using the POMDPPlanners framework. We’ll show how to optimize different algorithms on various environments using Optuna-based optimization.

Overview

Hyperparameter tuning is crucial for achieving optimal performance in POMDP planning. The framework provides a comprehensive hyperparameter optimization system that:

  • Uses Optuna for efficient parameter search

  • Supports both numerical and categorical parameters

  • Provides MLflow integration for experiment tracking

  • Handles multiple environments and algorithms simultaneously

  • Includes statistical analysis and confidence intervals

Basic Hyperparameter Optimization

Let’s start with a simple example optimizing POMCP on the Tiger POMDP:

from pathlib import Path
from POMDPPlanners.simulations.simulations_api import SimulationsAPI
from POMDPPlanners.configs.environment_configs import EnvironmentConfigsAPI
from POMDPPlanners.configs.planners_hyperparam_configs import PlannersHyperparamConfigs
from POMDPPlanners.core.simulation import (
    NumericalHyperParameter, CategoricalHyperParameter
)
from POMDPPlanners.core.simulation.hyperparameter_tuning import (
    HyperParameterRunParams, HyperParameterOptimizationDirection
)

# Initialize the API
api = SimulationsAPI(
    cache_dir_path=Path("./hyperparameter_results"),
    debug=True
)

# Create environment configuration
env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
tiger_env, tiger_belief = env_configs.tiger_pomdp_config(n_particles=1000)

# Create planner configuration
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)

# Define hyperparameter optimization configuration
optimization_configs = [
    HyperParameterRunParams(
        environment=tiger_env,
        belief=tiger_belief,
        policy_cls=POMCP,
        hyper_parameters=[
            NumericalHyperParameter("exploration_constant", 0.1, 100.0),
            NumericalHyperParameter("depth", 5, 30),
            NumericalHyperParameter("min_samples_per_node", 1, 20)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "name": "OptimizedPOMCP_Tiger"
        },
        num_episodes=50,       # Episodes for final evaluation
        num_steps=30,          # Steps per episode
        n_trials=100,         # Number of optimization trials
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    )
]

# Run hyperparameter optimization
results = api.run_hyperparameter_optimization(
    environment_run_params=optimization_configs,
    experiment_name="Tiger_POMCP_Optimization",
    n_jobs=4,  # Use 4 CPU cores
)

# Analyze results
for i, result in enumerate(results):
    print(f"Configuration {i+1} Results:")
    print(f"  Environment: {result.environment.__class__.__name__}")
    print(f"  Policy: {result.policy.__class__.__name__}")
    print(f"  Best hyperparameters: {result.chosen_hyper_parameters}")
    print(f"  Policy name: {result.policy.name}")

Multi-Environment Optimization

Now let’s optimize multiple algorithms on different environments:

from POMDPPlanners.planners.mcts_planners.sparse_pft import SparsePFT
from POMDPPlanners.planners.mcts_planners.pomcpow import POMCPOW
from POMDPPlanners.planners.planners_utils.dpw import SimpleActionSampler

# Initialize APIs
api = SimulationsAPI(
    cache_dir_path=Path("./multi_env_optimization"),
    debug=True
)

env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)

# Create environments
rock_sample_env, rock_sample_belief = env_configs.rock_sample_pomdp_config(n_particles=1000)
laser_tag_env, laser_tag_belief = env_configs.laser_tag_pomdp_config(n_particles=1000)

# Create action sampler for POMCPOW
action_sampler = SimpleActionSampler()

# Define multiple optimization configurations
optimization_configs = [
    # SparsePFT on Rock Sample
    HyperParameterRunParams(
        environment=rock_sample_env,
        belief=rock_sample_belief,
        policy_cls=SparsePFT,
        hyper_parameters=[
            NumericalHyperParameter("depth", 5, 15),
            NumericalHyperParameter("c_ucb", 0.0, 50.0),
            NumericalHyperParameter("beta_ucb", 0.0, 50.0),
            NumericalHyperParameter("belief_child_num", 3, 15)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "gamma": 0.95,
            "name": "OptimizedSparsePFT_RockSample"
        },
        num_episodes=30,
        num_steps=25,
        n_trials=80,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    ),

    # POMCPOW on Laser Tag
    HyperParameterRunParams(
        environment=laser_tag_env,
        belief=laser_tag_belief,
        policy_cls=POMCPOW,
        hyper_parameters=[
            NumericalHyperParameter("exploration_constant", 0.0, 50.0),
            NumericalHyperParameter("depth", 5, 15),
            NumericalHyperParameter("k_a", 1, 10),
            NumericalHyperParameter("alpha_a", 0.01, 0.5),
            NumericalHyperParameter("k_o", 1, 10),
            NumericalHyperParameter("alpha_o", 0.01, 0.5)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "name": "OptimizedPOMCPOW_LaserTag",
            "action_sampler": action_sampler,
            "time_out_in_seconds": 3.0
        },
        num_episodes=30,
        num_steps=25,
        n_trials=80,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    ),

    # SparsePFT on Laser Tag
    HyperParameterRunParams(
        environment=laser_tag_env,
        belief=laser_tag_belief,
        policy_cls=SparsePFT,
        hyper_parameters=[
            NumericalHyperParameter("depth", 5, 15),
            NumericalHyperParameter("c_ucb", 0.0, 50.0),
            NumericalHyperParameter("beta_ucb", 0.0, 50.0),
            NumericalHyperParameter("belief_child_num", 3, 15)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "gamma": 0.95,
            "name": "OptimizedSparsePFT_LaserTag"
        },
        num_episodes=30,
        num_steps=25,
        n_trials=80,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    )
]

# Run multi-environment optimization
results = api.run_hyperparameter_optimization(
    environment_run_params=optimization_configs,
    experiment_name="Multi_Environment_Algorithm_Optimization",
    n_jobs=4,
)

# Analyze and compare results
print("=== Multi-Environment Optimization Results ===")
for i, result in enumerate(results):
    env_name = result.environment.__class__.__name__
    policy_name = result.policy.__class__.__name__
    best_params = result.chosen_hyper_parameters

    print(f"\nConfiguration {i+1}: {policy_name} on {env_name}")
    print(f"  Best hyperparameters: {best_params}")
    print(f"  Policy name: {result.policy.name}")

Using Predefined Hyperparameter Configurations

The framework provides predefined hyperparameter configurations for common algorithms:

from POMDPPlanners.configs.planners_hyperparam_configs import PlannersHyperparamConfigs
from POMDPPlanners.planners.planners_utils.dpw import SimpleActionSampler

# Initialize configuration APIs
env_configs = EnvironmentConfigsAPI(discount_factor=0.95)
planner_configs = PlannersHyperparamConfigs(discount_factor=0.95)

# Create environments
rock_sample_env, rock_sample_belief = env_configs.rock_sample_pomdp_config(n_particles=1000)
laser_tag_env, laser_tag_belief = env_configs.laser_tag_pomdp_config(n_particles=1000)

# Create action sampler
action_sampler = SimpleActionSampler()

# Use predefined configurations
sparse_pft_config = planner_configs.sparse_pft_config(
    env=rock_sample_env,
    name="PredefinedSparsePFT_RockSample"
)

pomcpow_config = planner_configs.pomcpow_config(
    env=laser_tag_env,
    action_sampler=action_sampler,
    name="PredefinedPOMCPOW_LaserTag"
)

# Convert to optimization parameters
optimization_configs = [
    HyperParameterRunParams(
        environment=rock_sample_env,
        belief=rock_sample_belief,
        policy_cls=sparse_pft_config.policy_cls,
        hyper_parameters=sparse_pft_config.hyper_parameters,
        constant_parameters=sparse_pft_config.constant_parameters,
        num_episodes=40,
        num_steps=30,
        n_trials=100,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    ),

    HyperParameterRunParams(
        environment=laser_tag_env,
        belief=laser_tag_belief,
        policy_cls=pomcpow_config.policy_cls,
        hyper_parameters=pomcpow_config.hyper_parameters,
        constant_parameters=pomcpow_config.constant_parameters,
        num_episodes=40,
        num_steps=30,
        n_trials=100,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    )
]

# Run optimization with predefined configurations
results = api.run_hyperparameter_optimization(
    environment_run_params=optimization_configs,
    experiment_name="Predefined_Config_Optimization",
    n_jobs=4,
)

Advanced Optimization with Risk-Averse Environments

Let’s optimize algorithms on risk-averse environment configurations:

from POMDPPlanners.configs.environment_configs import RiskAverseEnvironmentConfigsAPI

# Initialize risk-averse environment configurations
risk_averse_env_configs = RiskAverseEnvironmentConfigsAPI(discount_factor=0.95)

# Create risk-averse environments
risk_rock_sample_env, risk_rock_sample_belief = risk_averse_env_configs.rock_sample_pomdp_config(n_particles=1000)
risk_laser_tag_env, risk_laser_tag_belief = risk_averse_env_configs.laser_tag_pomdp_config(n_particles=1000)

# Define risk-aware optimization configurations
risk_optimization_configs = [
    # SparsePFT on Risk-Averse Rock Sample
    HyperParameterRunParams(
        environment=risk_rock_sample_env,
        belief=risk_rock_sample_belief,
        policy_cls=SparsePFT,
        hyper_parameters=[
            NumericalHyperParameter("depth", 5, 15),
            NumericalHyperParameter("c_ucb", 0.0, 50.0),
            NumericalHyperParameter("beta_ucb", 0.0, 50.0),
            NumericalHyperParameter("belief_child_num", 3, 15)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "gamma": 0.95,
            "name": "RiskAverseSparsePFT_RockSample"
        },
        num_episodes=50,
        num_steps=30,
        n_trials=120,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    ),

    # POMCPOW on Risk-Averse Laser Tag
    HyperParameterRunParams(
        environment=risk_laser_tag_env,
        belief=risk_laser_tag_belief,
        policy_cls=POMCPOW,
        hyper_parameters=[
            NumericalHyperParameter("exploration_constant", 0.0, 50.0),
            NumericalHyperParameter("depth", 5, 15),
            NumericalHyperParameter("k_a", 1, 10),
            NumericalHyperParameter("alpha_a", 0.01, 0.5),
            NumericalHyperParameter("k_o", 1, 10),
            NumericalHyperParameter("alpha_o", 0.01, 0.5)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "name": "RiskAversePOMCPOW_LaserTag",
            "action_sampler": action_sampler,
            "time_out_in_seconds": 3.0
        },
        num_episodes=50,
        num_steps=30,
        n_trials=120,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    )
]

# Run risk-aware optimization
risk_results = api.run_hyperparameter_optimization(
    environment_run_params=risk_optimization_configs,
    experiment_name="Risk_Averse_Optimization",
    n_jobs=4,
)

print("=== Risk-Averse Optimization Results ===")
for i, result in enumerate(risk_results):
    env_name = result.environment.__class__.__name__
    policy_name = result.policy.__class__.__name__
    best_params = result.chosen_hyper_parameters

    print(f"\nRisk-Averse Configuration {i+1}: {policy_name} on {env_name}")
    print(f"  Best hyperparameters: {best_params}")

Optimization with Custom Parameter Types

You can also use categorical parameters for algorithm selection:

from POMDPPlanners.planners.mcts_planners.pomcp import POMCP
from POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling import SparseSamplingDiscreteActionsPlanner

# Define optimization with categorical parameters
categorical_optimization_configs = [
    HyperParameterRunParams(
        environment=tiger_env,
        belief=tiger_belief,
        policy_cls=POMCP,  # Base class, actual algorithm determined by categorical param
        hyper_parameters=[
            CategoricalHyperParameter("algorithm_type", ["POMCP", "SparseSampling"]),
            NumericalHyperParameter("exploration_constant", 0.1, 100.0),
            NumericalHyperParameter("depth", 5, 20),
            NumericalHyperParameter("num_simulations", 100, 2000)
        ],
        constant_parameters={
            "discount_factor": 0.95,
            "name": "CategoricalOptimized"
        },
        num_episodes=30,
        num_steps=25,
        n_trials=60,
        direction=HyperParameterOptimizationDirection.MAXIMIZE,
        parameter_to_optimize="average_return"
    )
]

# Note: This requires custom policy class that handles categorical parameters
# For demonstration purposes, we'll use separate configurations instead

Analyzing Optimization Results

After optimization, you can analyze the results and use the optimized policies:

import pandas as pd
import matplotlib.pyplot as plt

# Extract optimization results
optimized_policies = [result.policy for result in results]
optimization_metadata = [result.optimization_metadata for result in results]

# Create a comparison table
comparison_data = []
for i, (policy, metadata) in enumerate(zip(optimized_policies, optimization_metadata)):
    comparison_data.append({
        'Configuration': i+1,
        'Environment': policy.environment.__class__.__name__,
        'Algorithm': policy.__class__.__name__,
        'Policy_Name': policy.name,
        'Best_Parameters': str(metadata.get('best_params', {})),
        'Best_Value': metadata.get('best_value', 0.0),
        'N_Trials': metadata.get('n_trials', 0)
    })

comparison_df = pd.DataFrame(comparison_data)
print("=== Optimization Results Comparison ===")
print(comparison_df.to_string(index=False))

# Use optimized policies for further analysis
print(f"\nSuccessfully optimized {len(optimized_policies)} policies:")
for policy in optimized_policies:
    print(f"  - {policy.name} ({policy.__class__.__name__})")

# You can now use these optimized policies in simulations
from POMDPPlanners.core.simulation import EnvironmentRunParams

# Create simulation configurations with optimized policies
simulation_configs = []
for i, policy in enumerate(optimized_policies):
    simulation_configs.append(
        EnvironmentRunParams(
            environment=policy.environment,
            belief=rock_sample_belief if "RockSample" in policy.name else laser_tag_belief,
            policies=[policy],
            num_episodes=100,
            num_steps=30
        )
    )

# Run final evaluation simulation
final_results, final_stats = api.run_multiple_environments_and_policies_local_run(
    environment_run_params=simulation_configs,
    alpha=0.05,
    confidence_interval_level=0.95,
    experiment_name="Final_Optimized_Policy_Evaluation",
    n_jobs=4
)

print("\n=== Final Evaluation Results ===")
print(final_stats[['environment', 'policy', 'average_return', 'average_return_ci_lower', 'average_return_ci_upper']].to_string(index=False))

Best Practices for Hyperparameter Optimization

Parameter Range Selection
  • Start with wide ranges and narrow down based on initial results

  • Use logarithmic scales for parameters that vary over orders of magnitude

  • Consider problem-specific constraints (e.g., depth should not exceed episode length)

Trial Configuration
  • Start with fewer trials (50-100) for initial exploration

  • Increase trials (200-500) for final optimization

  • Use more episodes (50-100) for reliable performance estimates

Computational Resources
  • Use parallel execution (n_jobs=-1) when available

  • Consider using distributed computing for large-scale optimization

  • Monitor memory usage with large particle counts

Evaluation Strategy
  • Use consistent evaluation metrics across all configurations

  • Consider multiple performance metrics (average return, success rate, planning time)

  • Validate optimized policies on held-out test episodes

MLflow Integration
  • All optimization runs are automatically tracked in MLflow

  • Use descriptive experiment names for easy organization

  • Compare results across different optimization runs

Troubleshooting Common Issues

Low Performance After Optimization
  • Check if parameter ranges are appropriate for the problem

  • Verify that the optimization direction is correct (maximize vs minimize)

  • Ensure sufficient trials and episodes for reliable estimates

Optimization Taking Too Long
  • Reduce the number of trials or episodes

  • Use fewer particles in belief representation

  • Decrease the planning depth or timeout limits

Memory Issues
  • Reduce particle count in belief representation

  • Use smaller planning depths

  • Consider using sparse belief representations

Convergence Problems
  • Increase the number of trials

  • Adjust parameter ranges based on initial results

  • Consider using different optimization algorithms in Optuna

Next Steps

  • Try Planners Comparison Study for comparing optimized policies

  • See Basic Usage Examples for using optimized policies in simulations

  • Check the ../api/simulations for advanced simulation features

  • Explore the ../api/core for detailed API reference