POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero package

Submodules

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_belief_representation module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler module

Tests for BetaZeroActionSampler network-guided action sampling.

This module tests the BetaZeroActionSampler class including fallback behaviour, discrete and continuous network-guided sampling, and pickle serialisation.

class POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.SimpleFallbackSampler[source]

Bases: ActionSampler

sample(belief_node=None)[source]

Sample a new action for progressive widening.

Parameters:: belief_node – Optional belief node context for informed sampling
Returns:: A sampled action compatible with the environment’s action space

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_continuous_sampling_centered_on_predicted_mean()[source]

Test that continuous samples are in a reasonable range around the network mean.

Purpose: Validates that for a continuous action space, the sampler produces: action vectors whose components are within a plausible range of the network’s predicted mean (not wildly divergent).
Given: A BetaZeroActionSampler with a continuous BetaZeroNetwork (action_dim=2),: a ParticleMeanStdRepresentation, and no discrete actions list.

When: sample is called 50 times with a valid belief node. Then: All samples are finite numpy arrays of the correct shape (action_dim,),

and the empirical mean of samples is within 5 standard deviations of the network’s predicted mean.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_discrete_sampling_follows_policy()[source]

Test that discrete sampling produces a non-uniform distribution guided by the network.

Purpose: Validates that when a network and belief representation are attached,: the sampler draws actions according to the network’s softmax policy rather than uniformly at random.
Given: A BetaZeroActionSampler with a discrete BetaZeroNetwork, a: ParticleMeanStdRepresentation, and a list of three actions.

When: sample is called 100 times with a valid belief node. Then: The distribution of sampled actions is non-uniform. Specifically, the

most frequently sampled action is selected more often than 1/3 of the time (the uniform expectation), indicating the network policy influences sampling.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_fallback_without_belief_node()[source]

Test that the fallback sampler is used when belief_node is None.

Purpose: Validates that BetaZeroActionSampler delegates to the fallback: sampler when no belief node context is provided.
Given: A BetaZeroActionSampler with a SimpleFallbackSampler and no: network/representation attached.

When: sample is called with belief_node=None. Then: The returned action equals “fallback_action” from the fallback sampler.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_fallback_without_network()[source]

Test that the fallback sampler is used when the network is not set.

Purpose: Validates that BetaZeroActionSampler delegates to the fallback: sampler when set_network_and_representation has not been called, even if a valid belief node is provided.
Given: A BetaZeroActionSampler without network/representation and a valid: belief node.

When: sample is called with the belief node. Then: The returned action equals “fallback_action” from the fallback sampler.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_pickle_round_trip()[source]

Test full pickle.dumps → pickle.loads round trip for BetaZeroActionSampler.

Purpose: Validates that BetaZeroActionSampler can be successfully pickled: and unpickled using the full pickle protocol, which is critical for joblib/multiprocessing compatibility.

Given: A BetaZeroActionSampler with network and representation attached. When: The sampler is pickled with pickle.dumps and then unpickled with

pickle.loads (simulating joblib/multiprocessing serialization).

Then: The unpickled sampler retains all non-network attributes (fallback,: actions, noise_scale), has network and representation set to None, and correctly delegates to fallback when used.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_pickle_round_trip_continuous()[source]

Test full pickle round trip for BetaZeroActionSampler with continuous actions.

Purpose: Validates that BetaZeroActionSampler for continuous action spaces: can be successfully pickled and unpickled.
Given: A BetaZeroActionSampler configured for continuous actions with: network and representation attached.

When: The sampler is pickled and unpickled via pickle.dumps/loads. Then: The unpickled sampler preserves all attributes and functions correctly.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler.test_pickle_serialization()[source]

Test that BetaZeroActionSampler serialisation strips the network.

Purpose: Validates that __getstate__ removes the network and belief: representation, that pickle.dumps succeeds, and that a manual reconstruction via __setstate__ restores a working sampler that falls back correctly.

Given: A BetaZeroActionSampler with network and representation attached. When: __getstate__ is called (as pickle.dumps does internally), the state

is inspected, and a fresh instance is reconstructed via __setstate__.

Then: The serialised state has _network and _belief_representation set to: None, pickle.dumps succeeds, and the reconstructed sampler retains the fallback sampler, actions list, and noise_scale, and delegates to the fallback when sampled without re-attaching a network.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_network module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_puct module

Tests for the BetaZero PUCT action selection module.

This module tests the PUCT selection rule and progressive widening with PUCT used in BetaZero, covering formula correctness, Q-value normalisation, prior weighting, and widening threshold behaviour on the arena Tree backend.

class POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_puct.SimpleActionSampler[source]

Bases: ActionSampler

sample(belief_node=None)[source]

Sample a new action for progressive widening.

Parameters:: belief_node – Optional belief node context for informed sampling
Returns:: A sampled action compatible with the environment’s action space

class POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_puct.TestPUCTActionProgressiveWidening[source]

Bases: object

Tests for the puct_action_progressive_widening_arena function.

test_widening_adds_action_at_threshold()[source]

Test that progressive widening adds new actions when the threshold is met and stops when it is not.

Purpose: Validates the widening gate condition len(children) <= k_a * N^alpha_a. When the condition holds, a new action should be sampled; once it no longer holds, existing actions should be selected via PUCT.

Given: A belief node at depth 1 (non-root) with k_a=1.0 and alpha_a=0.5,: and a SimpleActionSampler that produces unique action names.
When: The node starts with 0 children and visit_count=0 (widening: condition met) so the first call adds a child. Then visit_count and children are manipulated so that a second call still meets the threshold (adds another child), and finally a state where the threshold is exceeded so PUCT selection is used instead.
Then: New actions are created while the widening condition is met, and: an existing action is returned (via PUCT) once the condition is no longer met.

Test type: unit

class POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_puct.TestPUCTSelection[source]

Bases: object

Tests for the puct_selection_arena function.

test_low_visit_high_prior_preferred_over_high_visit_low_prior()[source]

Test that an under-visited action with a high prior beats a well-visited action with a low prior.

Purpose: Validates the interplay between visit counts and priors in the exploration bonus: low N(b,a) and high P(a|b) should produce a large exploration term that can outweigh a higher visit count.

Given: Two actions with equal Q-values. Action 0 has N=100 and: prior=0.1; Action 1 has N=1 and prior=0.9. Parent visits=101.

When: puct_selection_arena is called with a sufficiently large exploration constant. Then: Action 1 (low visits, high prior) is selected.

Test type: unit

test_puct_converges_to_highest_q_with_many_visits()[source]

Test that PUCT selects the highest-Q action when visit counts are large.

Purpose: Validates that the exploration bonus vanishes relative to the Q-value term when actions have been visited many times, causing PUCT to behave greedily with respect to Q.

Given: A belief node with 3 actions having Q=[10.0, 5.0, 8.0] and very: high visit counts [10000, 10000, 10000], parent_visits=30000, and uniform priors.

When: puct_selection_arena is called with a moderate exploration constant. Then: The action with the highest Q-value (10.0) is selected.

Test type: unit

test_puct_manual_formula_computation()[source]

Test PUCT scores against a manual computation with known values.

Purpose: Validates the numerical correctness of the PUCT formula by comparing the selected action to a hand-computed expected result.

Given: Q=[1.0, 0.5], N=[10, 5], parent_N=15, priors=[0.6, 0.4], c=1.0. When: puct_selection_arena is called. Then: Action a0 (index 0) is selected because its PUCT score is higher.

Test type: unit

test_puct_selects_highest_prior_when_all_unvisited()[source]

Test that PUCT selects the action with the highest prior when all visit counts are zero.

Purpose: Validates that with zero visit counts and equal Q-values the exploration bonus is dominated entirely by the prior probabilities.

Given: A belief node with 3 action children, all with Q=0 and N=0,: and action priors [0.1, 0.7, 0.2].

When: puct_selection_arena is called with exploration_constant=1.0. Then: The action child corresponding to the highest prior (0.7) is selected.

Test type: unit

test_q_values_normalized_to_01()[source]

Test that Q-values are normalised to [0, 1] before computing PUCT scores.

Purpose: Validates that arbitrarily scaled Q-values are mapped to the unit interval so that the exploration constant is problem-independent.

Given: Q=[100, 200] with high visit counts and uniform priors. When: puct_selection_arena is called with a small exploration constant so

that the Q-term dominates.

Then: The action with Q=200 (normalised to 1.0) is selected.

Test type: unit

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero package

Submodules

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_belief_representation module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_action_sampler module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_beta_zero_network module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_puct module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_training module

POMDPPlanners.tests.test_planners.test_mcts_planners.test_beta_zero.test_training_buffer module