Quickstart Guide ================ This guide will get you up and running with POMDPPlanners in just a few minutes. Your First POMDP Solution -------------------------- Let's solve the classic Tiger POMDP problem using POMCP: .. code-block:: python from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP from POMDPPlanners.planners.mcts_planners.pomcp import POMCP from POMDPPlanners.core.belief import get_initial_belief # Create the environment and initial belief env = TigerPOMDP(discount_factor=0.95) belief = get_initial_belief(env, n_particles=500) # Create the planner planner = POMCP( environment=env, discount_factor=0.95, depth=10, exploration_constant=50.0, name="tiger_planner", n_simulations=1000, ) # Plan: returns a list of actions (length=1 for closed-loop planning) actions, run_data = planner.action(belief) action = actions[0] print(f"Recommended action: {action}") # Execute: sample the next state, observation, and reward state = belief.sample() next_state, observation, reward = env.sample_next_step(state=state, action=action) print(f"Observation: {observation}, Reward: {reward}") Running a Complete Episode --------------------------- Use ``run_episode`` to run a full episode with automatic belief updates: .. code-block:: python from POMDPPlanners.simulations.episodes import run_episode from POMDPPlanners.utils.logger import get_logger logger = get_logger("quickstart") history = run_episode( environment=env, policy=planner, initial_belief=belief, num_steps=20, logger=logger, ) total_reward = sum(step.reward for step in history.history if step.reward is not None) print(f"Steps: {len(history.history)}, Total reward: {total_reward:.2f}") # Each step exposes: action, observation, reward, state for i, step in enumerate(history.history[:5]): print(f"Step {i}: action={step.action}, obs={step.observation}, reward={step.reward}") Core Concepts ------------- **Environments** Environments can be created directly or via ``EnvironmentConfigsAPI``: .. code-block:: python # Direct construction from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP env = TigerPOMDP(discount_factor=0.95) # Via config API (also returns a ready-made initial belief) from POMDPPlanners.configs.environment_configs import EnvironmentConfigsAPI config_api = EnvironmentConfigsAPI(discount_factor=0.95) env, belief = config_api.tiger_pomdp_config(n_particles=500) # Discrete environments expose their state/action/observation spaces print(env.states) # ['tiger_left', 'tiger_right'] print(env.actions) # ['listen', 'open_left', 'open_right'] print(env.observations) # ['hear_left', 'hear_right', 'hear_nothing'] # Core interaction method next_state, observation, reward = env.sample_next_step(state=state, action=action) done = env.is_terminal(next_state) **Belief States** .. code-block:: python from POMDPPlanners.core.belief import get_initial_belief belief = get_initial_belief(env, n_particles=500) # Sample a single state from the belief state = belief.sample() # Inspect the weighted distribution distribution = belief.to_unique_support_distribution() **Planners** All planners share the same interface: ``planner.action(belief)`` returns ``(List[action], PolicyRunData)``. A single-element list means closed-loop (replans each step); a multi-element list means open-loop (executes the sequence before replanning). .. code-block:: python from POMDPPlanners.planners.mcts_planners.pomcp import POMCP planner = POMCP( environment=env, discount_factor=0.95, depth=10, exploration_constant=50.0, name="my_planner", n_simulations=1000, ) actions, run_data = planner.action(belief) action = actions[0] # closed-loop: take the single planned action Continuous Action Spaces ------------------------- For environments with continuous actions, pair ``PFT_DPW`` with an action sampler: .. code-block:: python import numpy as np from POMDPPlanners.environments.light_dark_pomdp.continuous_light_dark_pomdp import ( ContinuousLightDarkPOMDP, RewardModelType, ) from POMDPPlanners.planners.mcts_planners.pft_dpw import PFT_DPW from POMDPPlanners.planners.planners_utils.dpw import ActionSampler from POMDPPlanners.core.belief import get_initial_belief env = ContinuousLightDarkPOMDP( discount_factor=0.95, goal_state=np.array([10, 5]), start_state=np.array([0, 5]), reward_model_type=RewardModelType.STANDARD, ) class VelocityActionSampler(ActionSampler): def sample(self, belief_node=None): angle = np.random.uniform(0, 2 * np.pi) speed = np.random.uniform(0, 1.0) return np.array([speed * np.cos(angle), speed * np.sin(angle)]) planner = PFT_DPW( environment=env, discount_factor=0.95, depth=10, name="navigation_planner", action_sampler=VelocityActionSampler(), n_simulations=500, ) belief = get_initial_belief(env, n_particles=500) actions, _ = planner.action(belief) print(f"Navigation action: {actions[0]}") Comparing Planners ------------------- Use ``LocalSimulationsAPI`` to run a statistically rigorous multi-planner, multi-environment comparison study: .. code-block:: python from pathlib import Path from POMDPPlanners.configs.environment_configs import EnvironmentConfigsAPI from POMDPPlanners.planners.mcts_planners.pomcpow import POMCPOW from POMDPPlanners.planners.mcts_planners.pft_dpw import PFT_DPW from POMDPPlanners.simulations.simulation_apis.local_simulations_api import LocalSimulationsAPI from POMDPPlanners.core.simulation import EnvironmentRunParams from POMDPPlanners.utils.action_samplers import DiscreteActionSampler config_api = EnvironmentConfigsAPI(discount_factor=0.95) env, belief = config_api.continuous_observations_discrete_actions_light_dark_pomdp_config( n_particles=500 ) action_sampler = DiscreteActionSampler(actions=env.get_actions()) planners = [ POMCPOW( environment=env, discount_factor=0.95, depth=10, exploration_constant=100.0, k_o=10, k_a=4, alpha_o=0.01, alpha_a=0.01, action_sampler=action_sampler, n_simulations=1500, name="POMCPOW", ), PFT_DPW( environment=env, discount_factor=0.95, depth=10, k_a=4, alpha_a=0.01, k_o=10, alpha_o=0.01, exploration_constant=100.0, action_sampler=action_sampler, n_simulations=1500, name="PFT_DPW", ), ] run_params = [ EnvironmentRunParams( environment=env, belief=belief, policies=planners, num_episodes=100, num_steps=30, ) ] api = LocalSimulationsAPI(cache_dir_path=Path("./results")) results, stats_df = api.run_multiple_environments_and_policies_with_initial_debug_run( environment_run_params=run_params, alpha=0.05, confidence_interval_level=0.95, experiment_name="planner_comparison", n_jobs=-1, ) Hyperparameter Tuning ---------------------- Automatically find the best hyperparameters using Optuna, then evaluate the optimised policy: .. code-block:: python from pathlib import Path from POMDPPlanners.environments.tiger_pomdp import TigerPOMDP from POMDPPlanners.planners.mcts_planners.pomcp import POMCP from POMDPPlanners.core.belief import get_initial_belief from POMDPPlanners.core.simulation import NumericalHyperParameter from POMDPPlanners.core.simulation.hyperparameter_tuning import ( HyperParamPlannerConfig, HyperParameterRunParams, HyperParameterOptimizationDirection, ) from POMDPPlanners.simulations.simulation_apis.local_simulations_api import LocalSimulationsAPI env = TigerPOMDP(discount_factor=0.95) belief = get_initial_belief(env, n_particles=200) optimization_config = HyperParameterRunParams( environment=env, belief=belief, hyper_param_planner_config=HyperParamPlannerConfig( policy_cls=POMCP, hyper_parameters=[ NumericalHyperParameter(0.1, 100.0, "exploration_constant"), NumericalHyperParameter(3, 10, "depth"), ], constant_parameters={ "discount_factor": 0.95, "n_simulations": 500, "name": "OptimizedPOMCP", }, ), num_episodes=20, num_steps=30, n_trials=50, parameters_to_optimize=[ ("average_return", HyperParameterOptimizationDirection.MAXIMIZE) ], ) api = LocalSimulationsAPI(cache_dir_path=Path("./tuning_results"), debug=True) # Optimise then evaluate in one call results, stats_df = api.run_optimize_and_evaluate( configs=[optimization_config], evaluation_episodes=100, evaluation_steps=30, evaluation_n_jobs=-1, optimization_n_jobs=-1, confidence_interval_level=0.95, alpha=0.05, experiment_name="tiger_pomcp_tuning", ) print(stats_df[["environment_name", "policy_name", "mean_total_return", "ci_lower", "ci_upper"]]) Use predefined search spaces from ``PlannersHyperparamConfigs`` to skip writing parameter ranges by hand: .. code-block:: python from POMDPPlanners.configs.planners_hyperparam_configs import PlannersHyperparamConfigs from POMDPPlanners.utils.action_samplers import DiscreteActionSampler action_sampler = DiscreteActionSampler(actions=env.get_actions()) planner_configs = PlannersHyperparamConfigs(discount_factor=0.95) predefined = planner_configs.pomcpow_config( env=env, action_sampler=action_sampler, name="POMCPOW_Tuned" ) Viewing Results ---------------- All simulation runs and optimization trials are tracked in MLflow. After any run, launch the UI from the cache directory: .. code-block:: bash cd ./results # or whichever cache_dir_path you used mlflow ui Then open http://localhost:5000 to browse metrics, compare runs, and inspect confidence intervals. Available Environments ----------------------- .. code-block:: python from POMDPPlanners.configs.environment_configs import EnvironmentConfigsAPI config_api = EnvironmentConfigsAPI(discount_factor=0.95) # Classic env, belief = config_api.tiger_pomdp_config(n_particles=500) # Navigation (discrete actions, continuous observations) env, belief = config_api.continuous_observations_discrete_actions_light_dark_pomdp_config(n_particles=500) # Navigation (fully continuous) env, belief = config_api.continuous_observations_continuous_actions_light_dark_pomdp_config(n_particles=500) # Manipulation env, belief = config_api.push_pomdp_config(n_particles=500) # Classic control env, belief = config_api.cartpole_pomdp_config(n_particles=500) env, belief = config_api.mountain_car_pomdp_config(n_particles=500) Available Planners ------------------- .. code-block:: python # POMCP — discrete actions and observations from POMDPPlanners.planners.mcts_planners.pomcp import POMCP planner = POMCP(environment=env, discount_factor=0.95, depth=10, exploration_constant=50.0, name="pomcp", n_simulations=1000) # POMCPOW — continuous actions/observations via double progressive widening from POMDPPlanners.planners.mcts_planners.pomcpow import POMCPOW planner = POMCPOW(environment=env, discount_factor=0.95, depth=10, exploration_constant=100.0, k_o=10, k_a=4, alpha_o=0.01, alpha_a=0.01, action_sampler=action_sampler, n_simulations=1500, name="pomcpow") # PFT-DPW — particle filter trees with double progressive widening from POMDPPlanners.planners.mcts_planners.pft_dpw import PFT_DPW planner = PFT_DPW(environment=env, discount_factor=0.95, depth=10, k_a=4, alpha_a=0.01, k_o=10, alpha_o=0.01, exploration_constant=100.0, action_sampler=action_sampler, n_simulations=1500, name="pft_dpw") # Sparse Sampling — simple model-based baseline (depth=2, branching_factor=10) from POMDPPlanners.planners.sparse_sampling_planners.sparse_sampling import SparseSamplingDiscreteActionsPlanner planner = SparseSamplingDiscreteActionsPlanner(env, branching_factor=10, depth=2) Next Steps ---------- **Run the example notebooks** .. code-block:: bash jupyter notebook docs/examples/basic_usage.ipynb jupyter notebook docs/examples/planners_comparison.ipynb jupyter notebook docs/examples/hyperparameter_tuning.ipynb jupyter notebook docs/examples/advanced_optimization.ipynb **API Reference** Browse the complete API documentation: :doc:`api/POMDPPlanners`