Temporal Difference
This type of synapse implements Temporal Difference learning. Temporal difference is a method of learning reward predictions. As a result of this type of learning, the target neuron learns to predict the expected reward value for the given inputs. For updating the weights, this synapse computes the error in reward expectations, as follows:
In the above equation, delta (error in reward expectation) is being computed for the tth iteration. V stands for the target neuron's activation value and r stands for the reward value. Gamma is the reward discount factor.
This delta value is used to update the strength of the synapse, as follows:
where x is the input neuron's activation value and eta is the learning rate.
The strength of this synapse is clipped so as to remain between the lower and upper bounds specified for this synapse.
This value changes the rate of the change of the synapse, denoted by eta in the equations above.
The value of this parameter should be set between 0 and 1. Smaller the value assigned to this parameter, lesser the weight given to future rewards. This parameter is denoted as gamma in the above equations.