`unreal.LearningAgentsTrainer`¶

class unreal.LearningAgentsTrainer(outer: Object | None = None, name: Name | str = 'None')¶

Bases: LearningAgentsManagerListener

The ULearningAgentsTrainer is the core class for reinforcement learning training. It defines how agents rewards, completions, and resets are implemented and provides methods for orchestrating the training process.

To use this class, you need to implement the GatherAgentRewards and GatherAgentCompletions functions, which will define the rewards and penalties the agent receives and what conditions cause an episode to end. see: ULearningAgentsInteractor to understand how observations and actions work.

C++ Source:

Plugin: LearningAgents
Module: LearningAgentsTraining
File: LearningAgentsTrainer.h

Editor Properties: (see get_editor_property/set_editor_property)

critic (LearningAgentsCritic): [Read-Only] The current critic.
has_training_failed (bool): [Read-Only] True if trainer encountered an unrecoverable error during training (e.g. the trainer process timed out). Otherwise, false. This exists mainly to keep the editor from locking up if something goes wrong during training.
interactor (LearningAgentsInteractor): [Read-Only] The agent interactor associated with this component.
is_setup (bool): [Read-Only] True if this object has been setup. Otherwise, false.
is_training (bool): [Read-Only] True if training is currently in-progress. Otherwise, false.
manager (LearningAgentsManager): [Read-Only] The manager this object is associated with.
policy (LearningAgentsPolicy): [Read-Only] The current policy for experience gathering.
visual_logger_objects (Map[Name, LearningAgentsVisualLoggerObject]): [Read-Only] The visual logger objects associated with this listener.

begin_training(trainer_training_settings=[], trainer_game_settings=[], trainer_path_settings=[], reset_agents_on_begin=True) → None¶

Begins the training process with the provided settings.

Parameters:

trainer_training_settings (LearningAgentsTrainerTrainingSettings) – The settings for this training run.
trainer_game_settings (LearningAgentsTrainerGameSettings) – The settings that will affect the game’s simulation.
trainer_path_settings (LearningAgentsTrainerPathSettings) – The path settings used by the trainer.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.

end_training() → None¶: Stops the training process.

gather_agent_completion(agent_id) → LearningAgentsCompletionEnum¶

This callback should be overridden by the Trainer and gathers the completion for a given agent.

Parameters:: agent_id (int32) – Agent id to gather completion for.
Returns:: out_completion (LearningAgentsCompletionEnum): Output completion for the given agent.
Return type:: LearningAgentsCompletionEnum

gather_agent_completions(agent_ids) → Array[LearningAgentsCompletionEnum]¶

This callback can be overridden by the Trainer and gathers all the completions for the given set of agents. By default this will call GatherAgentCompletion on each agent.

Parameters:: agent_ids (Array[int32]) – Agents to gather completions for.
Returns:: out_completions (Array[LearningAgentsCompletionEnum]): Output completions for each agent in AgentIds
Return type:: Array[LearningAgentsCompletionEnum]

gather_agent_reward(agent_id) → float¶

This callback should be overridden by the Trainer and gathers the reward value for the given agent.

Parameters:: agent_id (int32) – Agent id to gather reward for.
Returns:: out_reward (float): Output reward for the given agent.
Return type:: float

gather_agent_rewards(agent_ids) → Array[float]¶

This callback can be overridden by the Trainer and gathers all the reward values for the given set of agents. By default this will call GatherAgentReward on each agent.

Parameters:: agent_ids (Array[int32]) – Agents to gather rewards for.
Returns:: out_rewards (Array[float]): Output rewards for each agent in AgentIds
Return type:: Array[float]

gather_completions() → None¶: Call this function when it is time to evaluate the completions for your agents. This should be done at the beginning of each iteration of your training loop after the initial step, i.e. after taking an action, you want to get into the next state before evaluating the completions.

gather_rewards() → None¶: Call this function when it is time to evaluate the rewards for your agents. This should be done at the beginning of each iteration of your training loop after the initial step, i.e. after taking an action, you want to get into the next state before evaluating the rewards.

get_completion(agent_id=-1) → LearningAgentsCompletionEnum¶

Gets the current completion for an agent. Should be called only after GatherCompletions.

Parameters:: agent_id (int32) – The AgentId to look-up the completion for
Returns:: The completion type
Return type:: LearningAgentsCompletionEnum

get_episode_step_num(agent_id=-1) → int32¶

Gets the number of step recorded in an episode for the given agent.

Parameters:: agent_id (int32) – The AgentId to look-up the number of recorded episode steps for
Returns:: The number of recorded episode steps
Return type:: int32

get_episode_time(agent_id=-1) → float¶

Gets the current elapsed episode time for the given agent.

Parameters:: agent_id (int32) – The AgentId to look-up the episode time for
Returns:: The elapsed episode time
Return type:: float

get_reward(agent_id=-1) → float¶

Gets the current reward for an agent. Should be called only after GatherRewards.

Parameters:: agent_id (int32) – The AgentId to look-up the reward for
Returns:: The reward
Return type:: float

has_completion(agent_id=-1) → bool¶

Returns true if GatherCompletions has been called and the completion already set for the given agent.

Parameters:: agent_id (int32) –
Return type:: bool

has_reward(agent_id=-1) → bool¶

Returns true if GatherRewards has been called and the reward already set for the given agent.

Parameters:: agent_id (int32) –
Return type:: bool

has_training_failed() → bool¶

Returns true if the trainer has failed to communicate with the external training process. This can be used in combination with RunTraining to avoid filling the logs with errors.

Returns:: True if the training has failed. Otherwise, false.
Return type:: bool

is_training() → bool¶

Returns true if the trainer is currently training; Otherwise, false.

Return type:: bool

classmethod make_trainer(manager, interactor, policy, critic, class_, name='Trainer', trainer_settings=[]) → LearningAgentsTrainer¶

Constructs the trainer and runs the setup functions for rewards and completions.

Parameters:

manager (LearningAgentsManager) – The agent manager we are using.
interactor (LearningAgentsInteractor) – The agent interactor we are training with.
policy (LearningAgentsPolicy) – The policy to be trained.
critic (LearningAgentsCritic) – The critic to be trained.
class (type(Class)) – The trainer class
name (Name) – The trainer name
trainer_settings (LearningAgentsTrainerSettings) – The trainer settings to use.

Return type:

LearningAgentsTrainer

process_experience(reset_agents_on_update=True) → None¶

Call this function at the end of each step of your training loop. This takes the current observations/actions/ rewards and moves them into the episode experience buffer. All agents with full episode buffers or those which have been signaled complete will be reset. If enough experience is gathered, it will be sent to the training process and an iteration of training will be run and the updated policy will be synced back.

Parameters:: reset_agents_on_update (bool) – If true, reset all agents whenever an updated policy is received.

reset_agent_episode(agent_id) → None¶

This callback should be overridden by the Trainer and resets the episode for the given agent.

Parameters:: agent_id (int32) – The id of the agent that need resetting.

reset_agent_episodes(agent_ids) → None¶

This callback can be overridden by the Trainer and resets all episodes for each agent in the given set. By default this will call ResetAgentEpisode on each agent.

Parameters:: agent_ids (Array[int32]) – The ids of the agents that need resetting.

run_training(trainer_training_settings=[], trainer_game_settings=[], trainer_path_settings=[], reset_agents_on_begin=True, reset_agents_on_update=True) → None¶

Convenience function that runs a basic training loop. If training has not been started, it will start it, and then call RunInference. On each following call to this function, it will call GatherRewards, GatherCompletions, and ProcessExperience, followed by RunInference.

Parameters:

trainer_training_settings (LearningAgentsTrainerTrainingSettings) – The settings for this training run.
trainer_game_settings (LearningAgentsTrainerGameSettings) – The settings that will affect the game’s simulation.
trainer_path_settings (LearningAgentsTrainerPathSettings) – The path settings used by the trainer.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.
reset_agents_on_update (bool) – If true, reset all agents whenever an updated policy is received.

setup_trainer(manager, interactor, policy, critic, trainer_settings=[]) → None¶

Initializes the trainer and runs the setup functions for rewards and completions.

Parameters:

manager (LearningAgentsManager) – The agent manager we are using.
interactor (LearningAgentsInteractor) – The agent interactor we are training with.
policy (LearningAgentsPolicy) – The policy to be trained.
critic (LearningAgentsCritic) – The critic to be trained.
trainer_settings (LearningAgentsTrainerSettings) – The trainer settings to use.

`unreal.LearningAgentsTrainer`¶

Table of Contents

Previous topic

Next topic

unreal.LearningAgentsTrainer¶

`unreal.LearningAgentsTrainer`¶