unreal.LearningAgentsImitationTrainerTrainingSettings

class unreal.LearningAgentsImitationTrainerTrainingSettings

Bases: StructBase

The configurable settings for the training process.

C++ Source:

  • Plugin: LearningAgents

  • Module: LearningAgentsTraining

  • File: LearningAgentsImitationTrainer.h

Editor Properties: (see get_editor_property/set_editor_property)

  • action_entropy_weight (float): [Read-Write] Weighting used for the entropy bonus. Larger values encourage larger action noise and therefore greater exploration but can make actions very noisy.

  • action_regularization_weight (float): [Read-Write] Weight used to regularize actions. Larger values will encourage smaller actions but too large will cause actions to become always zero.

  • batch_size (uint32): [Read-Write] Batch size to use for training. Smaller values tend to produce better results at the cost of slowing down training. Large batch sizes are much more computationally efficient when training on the GPU.

  • device (LearningAgentsTrainerDevice): [Read-Write] The device to train on.

  • learning_rate (float): [Read-Write] Learning rate of the policy network. Typical values are between 0.001 and 0.0001.

  • learning_rate_decay (float): [Read-Write] Amount by which to multiply the learning rate every 1000 iterations.

  • number_of_iterations (int32): [Read-Write] The number of iterations to run before ending training.

  • random_seed (int32): [Read-Write] The seed used for any random sampling the trainer will perform, e.g. for weight initialization.

  • save_snapshots (bool): [Read-Write] If true, snapshots of the trained networks will be emitted to the intermediate directory.

  • use_tensorboard (bool): [Read-Write] If true, TensorBoard logs will be emitted to the intermediate directory.

  • weight_decay (float): [Read-Write] Amount of weight decay to apply to the network. Larger values encourage network weights to be smaller but too large a value can cause the network weights to collapse to all zeros.

  • window (uint32): [Read-Write] The number of consecutive steps of observations and actions over which to train the policy. Increasing this value will encourage the policy to use its memory effectively. Too large and training can become unstable. Given we don’t know the memory state during imitation learning it is better this is slightly larger than when we are doing reinforcement learning.