unreal.LearningAgentsCritic

class unreal.LearningAgentsCritic(outer: Object | None = None, name: Name | str = 'None')

Bases: LearningAgentsManagerListener

A critic used for training the policy. Can be used at inference time to estimate the discounted returns.

C++ Source:

  • Plugin: LearningAgents

  • Module: LearningAgents

  • File: LearningAgentsCritic.h

Editor Properties: (see get_editor_property/set_editor_property)

  • critic_network (LearningAgentsNeuralNetwork): [Read-Only] The underlying neural network.

  • interactor (LearningAgentsInteractor): [Read-Only] The agent interactor this critic is associated with.

  • is_setup (bool): [Read-Only] True if this object has been setup. Otherwise, false.

  • manager (LearningAgentsManager): [Read-Only] The manager this object is associated with.

  • policy (LearningAgentsPolicy): [Read-Only] The policy this critic is associated with.

  • visual_logger_objects (Map[Name, LearningAgentsVisualLoggerObject]): [Read-Only] The visual logger objects associated with this listener.

evaluate_critic() None

Calling this function will run the underlying neural network on the previously buffered observations to populate the output value buffer. This should be called after the corresponding agent interactor’s EncodeObservations.

get_critic_network_asset() LearningAgentsNeuralNetwork

Gets the current Network Asset being used

Return type:

LearningAgentsNeuralNetwork

get_estimated_discounted_return(agent_id=-1) float

Gets an estimate of the average discounted return expected by an agent according to the critic. I.E. the total sum of future rewards, scaled by the discount factor that was used during training. This value can be useful if you want to make some decision based on how well the agent thinks they are doing at achieving their task. This should be called only after EvaluateCritic.

Parameters:

agent_id (int32) – The AgentId to look-up the estimated discounted return for

Returns:

The estimated average discounted return according to the critic

Return type:

float

classmethod make_critic(manager, interactor, policy, class_=None, name='Critic', critic_neural_network_asset=None, reinitialize_critic_network=True, critic_settings=[], seed=1234) LearningAgentsCritic

Constructs a Critic to be used with the given agent interactor and critic settings.

Parameters:
  • manager (LearningAgentsManager) – The input Manager

  • interactor (LearningAgentsInteractor) – The input Interactor object

  • policy (LearningAgentsPolicy) – The input Policy object

  • class (type(Class)) – The critic class

  • name (Name) – The critic name

  • critic_neural_network_asset (LearningAgentsNeuralNetwork) – Optional Network Asset to use. If not provided, asset is empty, or bReinitializeNetwork is set then a new neural network object will be created according to the given CriticSettings.

  • reinitialize_critic_network (bool) – If to reinitialize the critic network

  • critic_settings (LearningAgentsCriticSettings) – The critic settings to use on creation of a new critic network

  • seed (int32) – Random seed to use for initializing the critic weights

Return type:

LearningAgentsCritic

setup_critic(manager, interactor, policy, critic_neural_network_asset=None, reinitialize_critic_network=True, critic_settings=[], seed=1234) None

Initializes a critic to be used with the given agent interactor and critic settings.

Parameters:
  • manager (LearningAgentsManager) – The input Manager

  • interactor (LearningAgentsInteractor) – The input Interactor object

  • policy (LearningAgentsPolicy) – The input Policy object

  • critic_neural_network_asset (LearningAgentsNeuralNetwork) – Optional Network Asset to use. If not provided, asset is empty, or bReinitializeNetwork is set then a new neural network object will be created according to the given CriticSettings.

  • reinitialize_critic_network (bool) – If to reinitialize the critic network

  • critic_settings (LearningAgentsCriticSettings) – The critic settings to use on creation of a new critic network

  • seed (int32) – Random seed to use for initializing the critic weights