Temporal Difference

This type of synapse implements Temporal Difference learning. Temporal difference is a method of learning reward predictions. As a result of this type of learning, the target neuron learns to predict the expected reward value for the given inputs. For updating the weights, this synapse computes the error in reward expectations, as follows:

In the above equation, delta (error in reward expectation) is being computed for the tth iteration. V stands for the target neuron's activation value and r stands for the reward value. Gamma is the reward discount factor.

This delta value is used to update the strength of the synapse, as follows:

where x is the input neuron's activation value and eta is the learning rate.

The strength of this synapse is clipped so as to remain between the lower and upper bounds specified for this synapse.

Learning rate

This value changes the rate of the change of the synapse, denoted by eta in the equations above.

Reward discount factor

The value of this parameter should be set between 0 and 1. Smaller the value assigned to this parameter, lesser the weight given to future rewards. This parameter is denoted as gamma in the above equations.

See for the working of the TDSynapse.