This document is a roughly HTML-ised version of a talk given at the NSYN meeting in Edinburgh, Scotland, on 28 February 1996. Please email me comments, but remember that this is just the slides from an introductory talk!
Why would anyone want a `new' sort of computer?
Some algorithms and architectures.
What new applications are likely?
Some useful sources of information.
What are (everyday) computer systems good at... .....and not so good at?
Good at | Not so good at |
---|---|
Fast arithmetic | Interacting with noisy data or data from the environment |
Doing | Massive parallelism |
Massive parallelism | |
Fault tolerance | |
Adapting to circumstances |
Where can neural network systems help?
Neural networks are a form of multiprocessor computer system, with
A biological neuron may have as many as 10,000 different inputs, and may send its output (the presence or absence of a short-duration spike) to many other neurons. Neurons are wired up in a 3-dimensional pattern.
Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.
Example: A simple single unit adaptive network:
The network has 2 inputs, and one output. All are binary. The output is
$1\; if\; W$_{0} *I_{0} + W_{1} * I_{1} + W_{b} > 0
$0\; if\; W$_{0} *I_{0} + W_{1} * I_{1} + W_{b} <= 0
We want it to learn simple OR: output a 1 if either I_{0} or I_{1} is 1.
The network adapts as follows: change the weight by an amount proportional to the difference between the desired output and the actual output.
As an equation:
$\&Delta\; W$_{i} = &eta * (D-Y).I_{i}
where &eta is the learning rate, D is the desired output, and Y is the actual output.
This is called the Perceptron Learning Rule, and goes back to the early 1960's.
We expose the net to the patterns:
I_{0} | I_{1} | Desired output |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
We train the network on these examples.
Weights after each epoch (exposure to complete set of patterns)
At this point (8) the network has finished learning. Since (D-Y)=0 for all patterns, the weights cease adapting. Single perceptrons are limited in what they can learn:
If we have two inputs, the decision surface is a line. ... and its equation is
$I$_{1} = (W_{0}/W_{1}).I_{0} + (W_{b}/W_{1})
In general, they implement a simple hyperplane decision surface
This restricts the possible mappings available.
Back-Propagated Delta Rule Networks (BP) (sometimes known and multi-layer perceptrons (MLPs)) and Radial Basis Function Networks (RBF) are both well-known developments of the Delta rule for single layer networks (itself a development of the Perceptron Learning Rule). Both can learn arbitrary mappings or classifications. Further, the inputs (and outputs) can have real values
is a development from the simple Delta rule in which extra hidden layers (layers additional to the input and output layers, not connected externally) are added. The network topology is constrained to be feedforward: i.e. loop-free - generally connections are allowed from the input layer to the first (and possibly only) hidden layer; from the first hidden layer to the second,..., and from the last hidden layer to the output layer.
The hidden layer learns to recode (or to provide a representation for) the inputs. More than one hidden layer can be used.
The architecture is more powerful than single-layer networks: it can be shown that any mapping can be learned, given two hidden layers (of units).
The units are a little more complex than those in the original perceptron: their input/output graph is
As a function:
$Y\; =\; 1\; /\; (1+exp(-k.(\&sum\; W$_{in } * X_{in}))
The graph shows the output for k=0.5, 1, and 10, as the activation varies from -10 to 10.
The weight change rule is a development of the perceptron learning rule. Weights are changed by an amount proportional to the error at that unit times the output of the unit feeding into the weight.
Running the network consists of
For each data pair to be learned a forward pass and backwards pass is performed. This is repeated over and over again until the error is at a low enough level (or we give up).
Radial basis function networks are also feedforward, but have only one hidden layer.
Like BP, RBF nets can learn arbitrary mappings: the primary difference is in the hidden layer.
RBF hidden layer units have a receptive field which has a centre: that is, a particular input value at which they have a maximal output.Their output tails off as the input moves away from this point.
Generally, the hidden unit function is a Gaussian:
Gaussians with three different standard deviations.
RBF networks are trained by
Generally, the centres and SDs are decided on first by examining the vectors in the training data. The output layer weights are then trained using the Delta rule. BP is the most widely applied neural network technique. RBFs are gaining in popularity.
Nets can be
RBFs have the advantage that one can add extra units with centres near parts of the input which are difficult to classify. Both BP and RBFs can also be used for processing time-varying data: one can consider a window on the data:
Networks of this form (finite-impulse response) have been used in many applications.
There are also networks whose architectures are specialised for processing time-series.
Simple Perceptrons, BP, and RBF networks need a teacher to tell the network what the desired output should be. These are supervised networks.
In an unsupervised net, the network adapts purely in response to its inputs. Such networks can learn to pick out structure in their input.
Although learning in these nets can be slow, running the trained net is very fast - even on a computer simulation of a neural net.
- takes a high-dimensional input, and clusters it, but retaining some topological ordering of the output.
After training, an input will cause some the output units in some area to become active.
Such clustering (and dimensionality reduction) is very useful as a preprocessing stage, whether for further neural network data processing, or for more traditional techniques.
..... or are they just a solution in search of a problem?
Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult.
In particular, they can form a model from their training data (or possibly input data) alone.
This is particularly useful with sensory data, or with data from a complex (e.g. chemical, manufacturing, or commercial) process. There may be an algorithm, but it is not known, or has too many variables. It is easier to let the network learn from examples.
The Neural Computing Applications Forum runs meetings (with attendees from industry, commerce and academe) on applications of Neural Networks. Contact NCAF through Dr. Tom Harris, (44) 1784 477271.
Internet addresses: NeuroNet at Kings College, London. http://www.neuronet.ph.kcl.ac.uk/
DTI NeuroComputing Web http://www.globalweb.co.uk/nctt/
IEEE Neural Networks Council http://www.ieee.org/nnc/index.html
CRC NCRC Institute for Information Technology Artificial Intelligence subject index has a useful entry on Neural Networks.
Backpropagater's review http://www.mcs.com/~drt/bprefs.html
News comp.ai.neural-nets has an very useful set of frequently asked questions (FAQ's), available as a WWW document at: ftp://ftp.sas.com/pub/neural/FAQ.html
Some further information about applications can be found at the Stimulation Initiative for European Neural Applications (SIENA) pages, and there is also an interesting page about applications.
For more information on Neural Networks in the Process Industries, try A. Bulsari's home page .
The company BrainMaker has a nice list of references on applications
The best journal for application-oriented information is
Neural Computing and Applications, Springer-Verlag. (address: Sweetapple Ho, Catteshall Rd., Godalming, GU7 3DJ)
There's a lot of books on Neural Computing. See the FAQ above for a much longer list.
For a not-too-mathematical introduction, try
Fausett L., Fundamentals of Neural Networks, Prentice-Hall, 1994. ISBN 0 13 042250 9 or
Gurney K., An Introduction to Neural Networks, UCL Press, 1997, ISBN 1 85728 503 4
A great deal of research is going on in neural networks worldwide.
This ranges from basic research into new and more efficient learning algorithms, to networks which can respond to temporally varying patterns (both ongoing at Stirling), to techniques for implementing neural networks directly in silicon. Already one chip commercially available exists, but it does not include adaptation. Edinburgh University have implemented a neural network chip, and are working on the learning problem.
Production of a learning chip would allow the application of this technology to a whole range of problems where the price of a PC and software cannot be justified.
There is particular interest in sensory and sensing applications: nets which learn to interpret real-world sensors and learn about their environment.
You may be visitor number 26359 to this page since 24 July 1997, but probably not!.