Introduction

The application of artificial neural networks (ANNs) to EEG analysis may yield improvements in classification accuracy over more traditional, linear methods, particularly in the case of single-trial analysis. Statistical models require a priori determination of exactly what features of the signal are significant, whereas analysis by ANN demands only a reasonable choice of network architecture. Generalised ANNs also have the potential to perform better than linear methods such as discriminant analysis because of their capacity to implement nonlinear boundaries in the problem space.

A completely connected, feed-forward ANN such as the ones used in this study consists of a set of neuron-like `units' that generate outputs by applying a nonlinear function to a weighted sum of all their inputs. Each of these units belongs to a particular layer of the ANN. It takes inputs from every unit in the layer below it (or from the input vector, in the case of the bottom layer), and delivers its output to every unit in the layer above it (or to the output vector, in the case of the top layer). The weights on all these connections between units of neighbouring layers are initially random, and so the performance of the network begins at a chance level. During the training of the network, input vectors are presented to the bottom layer and data propagates forward to produce an output vector at the top layer. An error measure is generated based on the difference between the desired output vector and the output vector that was actually produced. Back-propagation, a gradient descent method, is used to propagate a correction for this error backward along the connections between units, altering the weights of these connections. Trained networks are then tested on new data that were not part of the training set. The resulting performance measures reflect how well the networks have generalised from their training examples.

The standard methodology in analysis of event-related potentials has been to apply analyses only to the averages of trials under each experimental condition. Given the very low signal-to-noise ratio of event-related potentials this makes sense. However, averaging removes not only the noise in which the signal of interest is embedded, but also useful information that may not be phase-locked to stimuls onset [Makeig 1993]. Each average must be built from a large number of single trials, and this renders the amount of information extracted from each trial very small.

Several investigators have explored the application of ANNs to single-trial EEG analysis, with varying degrees of success. Jansen [1990] attempted to automate the recognition of K-complexes using ANNs. In the hope of increasing the proportion of useful information in the input to his networks, he passed the raw EEG through filters tailored to the morphologies of K-complexes and sleep spindles. Nevertheless, the performances of his networks on test sets reached only 55% to 67% correct classification, and Jansen concluded, `The neural net approaches explored in the present study do not seem to be adequate for the detection of K-complexes.' Bankman & al. [1992] also tried detecting K-complexes using ANNs but ended up having to preprocess the signal with a custom-built feature detector in order to achieve reliable (90%) identification with a low rate of false positives (8%). They concluded, `feedforward ANNs cannot be expected to perform K-complex detection based on raw EEG data'.

Gabor and Seyal [1992] trained an ANN to recognise high-voltage spikes in the multiple-channel continuous EEG of epilepsy patients by feeding the network the derivative of the EEG during the rising and falling edges of peaks. Jandó & al. [1993] solved the same detection problem using EEG from epileptic rats, but used raw EEG from a single channel instead of detecting peaks in several channels and computing the derivative. They found that training went faster by an order of magnitude when the input was Fourier-transformed before presentation to the network.

In another use of Fourier-transformed single-trial data, Venturini, Lytton, and Sejnowski [1992] used power spectra computed over short intervals of continuous EEG to compute reliable, causal estimates of vigilance in sonar operators. Their analysis of weights revealed that power changes in certain frequency bands are good predictors of attentional state.

Attentional tasks in which the P3 can be used as a marker seem to offer a good chance of success for single-trial analysis. The P3 is a very high-amplitude, long-duration response and often can be distinguished at a glance in plots of single trials. Donchin & al. [1988] used linear analyses of single-trial P3 in a prototype mental prosthesis. They highlighted in turn rows and columns of a matrix of characters on a computer screen and compared P3 amplitudes averaged over several repetitions to decide which of these positions was being attended to. Normal subjects were able to attain an output rate of 2.3 characters per minute. Using waveforms from an auditory oddball paradigm averaged over 100 trials as input to an ANN, Wu & al. [1993] achieved 81% correct classification of subjects as either normal or suffering from multiple sclerosis. P3 latency and waveform were cited as major factors in this decision problem, though no analysis of network weights was done to suggest additional or alternative factors.

Seeing some success in the application of ANNs to single-trial EEG analysis, we undertook to bring this method to our own population of autistic subjects. Infantile autism is a behaviourally diagnosed syndrome involving severe deficiencies in socialisation, language, and cognition [Kanner 1943]. Some autistic people are mute or have difficulty using spoken language purposively. Autistic people seem to maintain unusually narrow foci of attention [Townsend & Courchesne 1994], and have difficulty shifting their attention rapidly [Courchesne & al. 1994]. However, when focussed on a task that requires only maintenance of attention, autistic people score similarly to normal controls, despite attenuation and unpredictability of attention-related evoked potentials such as the auditory P3 and the P700 [Courchesne & al. 1990, 1994]. This raises the question, have autistic people developed alternative, compensatory cognitive strategies for responding to attention-getting stimuli, and if so, are these strategies reflected in some other features of the evoked potential? ANN analyses may shed light on this question.

Also, in cases of impairment in spoken language, successful single-trial analysis using ANNs may pave the way toward a mental prosthesis that would allow people with autism to communicate without much resort to human facilitators or the controversial technique of facilitated communication. The prototype of Donchin & al. was a first step in this direction.

Methods