STDP, rate-based plasticity and neural noise in evolved agents
Spike-timing synaptic plasticity (STDP) has been a hot research topic in neuroscience over the last 5 years. This mechanism affect synaptic stregth asymmetrically depending on the precise relative timing between pre- and post-synaptic spikes at the millisecond range (left). There is much debate regarding the function of STDP on regulating plasticity at much longer perceptual and behavioural timescales. Investigating the role of STDP in full embodied agents can be a way of resolving some of these issues. It is however, impossible to design such circuits by hand due to the large range of timescales involved. The task is even more complex if we want to investigate the role of neural noise. Fortunately, it may be possible to use evolutionary robotics to synthesize such STDP-based neurocontrollers.
Comparative studies have been carried out for different kinds of plastic neural networks with low and high levels of neural noise. In all cases, the evolved controllers are highly robust against internal synaptic decay and other perturbations. The importance of the precise timing of spikes is demonstrated by randomizing the spike trains. In the low neural noise scenario, weight values undergo rhythmic changes at the mesoscale due to bursting, but during periods of high activity they are finely regulated at the microscale by synchronous or entrained firing. Spike train randomization results in loss of performance in this case. In contrast, in the high neural noise scenario, robots are robust to loss of information in the timing of the spike trains, demonstrating the counterintuitive results that plasticity, which is dependent on precise spike timing, can work even in its absence, provided the behavioral strategies make use of robust longer-term invariants of sensorimotor interaction. A comparison with a rate-based model of synaptic plasticity shows that under similarly noisy conditions, asymmetric spike-timing dependent plasticity achieves better performance by means of efficient reduction in weight variance over time. Performance also presents negative sensitivity to reduced levels of noise, showing that random firing has a functional value.
Di Paolo, E. A., (2002). Spike timing dependent plasticity for evolved robots in Adaptive Behavior 10 (3/4). pp 243-263.
Single-trial learning using STDP
Single-trial learning is studied in an evolved robot model of STDP. Robots must perform positive phototaxis but must learn to perform negative phototaxis in the presence of a short-lived aversive sound stimulus. STDP acts at the millisecond range and depends asymmetrically on the relative timing of pre- and post-synaptic spikes. Although it has been involved in learning models of input prediction, these models require the iterated presentation of the input pattern, and it is hard to see how this mechanism could sustain single-trial learning over a time-scale of tens of seconds. An incremental evolutionary approach is used to answer this question. The evolved robots succeed in learning the appropriate behaviour, but learning does not depend on achieving the right synaptic configuration but rather the right pattern of neural activity. Robot performance during positive phototaxis is quite robust to loss of spike-timing information, but in contrast, this loss is catastrophic for learning negative phototaxis where entrained firing is common. Tests show that the final weight configuration carries no information about whether a robot is performing one behaviour or the other. Fixing weights, however, has the effect of degrading performance, thus demonstrating that plasticity is used to sustain the neural activity corresponding both to the normal phototaxis condition and to the learned behaviour.
Di Paolo, E. A., (2003). Evolving spike-timing dependent plasticity for single-trial learning in robots Philosophical Transactions of the Royal Society A. 361, pp. 2299 – 2319.