A neural network is, in essence, an attempt to simulate the brain. Neural network theory revolves around the idea that certain key properties of biological neurons can be extracted and applied to simulations, thus creating a simulated (and very much simplified) brain. The first important thing to understand then, is that the components of an artificial neural network are an attempt to recreate the computing potential of the brain. The second important thing to understand, however, is that no one has ever claimed to simulate anything as complex as an actual brain. Whereas the human brain is estimated to have something on the order of ten to a hundred billion neurons, a typical artificial neural network (ANN) is not likely to have more than 1,000 artificial neurons.
Before discussing the specifics of artificial neural nets though, let us examine what makes real neural nets - brains - function the way they do. Perhaps the single most important concept in neural net research is the idea of connection strength. Neuroscience has given us good evidence for the idea that connection strengths - that is, how strongly one neuron influences those neurons connected to it - are the real information holders in the brain. Learning, repetition of a task, even exposure to a new or continuing stimulus can cause the brain's connection strengths to change, some synaptic connections becoming reinforced and new ones are being created, others weakening or in some cases disappearing altogether. The second essential element of neural connectivity is the excitation/inhibition distinction. In human brains, each neuron is either excitatory or inhibitory, which is to say that its activation will either increase the firing rates of connected neurons, or decrease the rate, respectively. The amount of excitation or inhibition produced is of course, dependent on the connection strength - a stronger connection means more inhibition or excitation, a weaker connection means less. The third important component in determining a neuron's response is called the transfer function. Without getting into more technical detail, the transfer function describes how a neuron's firing rate varies with the input it receives. A very sensitive neuron may fire with very little input, for example. A neuron may have a threshold, and fire rarely below threshold, and vigorously above it. A neuron may have a bell-curve style firing pattern, increasing its firing rate up to a maximum, and then levelling off or decreasing when over-stimulated. A neuron may sum its inputs, or average them, or something entirely more complicated. Each of these behaviours can be represented mathematically, and that representation is called the transfer function. It is often convenient to forget the transfer function, and think of the neurons as being simple addition machines, more activity in equals more activity out. This is not really accurate though, and to develop a good understanding of an artificial neural network, the transfer function must be taken into account.
Armed with these three concepts: Connection Strength, Inhibition/Excitation, and the Transfer Function, we can now look at how artificial neural nets are constructed. In theory, an artificial neuron (often called a 'node') captures all the important elements of a biological one. Nodes are connected to each other and the strength of that connection is normally given a numeric value between -1.0 for maximum inhibition, to +1.0 for maximum excitation. All values between the two are acceptable, with higher magnitude values indicating stronger connection strength. The transfer function in artificial neurons whether in a computer simulation, or actual microchips wired together, is typically built right into the nodes' design.
Perhaps the most significant difference between artificial and biological neural nets is their organization. While many types of artificial neural nets exist, most are organized according to the same basic structure (see diagram). There are three components to this organization: a set of input nodes, one or more layers of 'hidden' nodes, and a set of output nodes. The input nodes take in information, and are akin to sensory organs. Whether the information is in the form of a digitised picture, or a series of stock values, or just about any other form that can be numerically expressed, this is where the net gets its initial data. The information is supplied as activation values, that is, each node is given a number, higher numbers representing greater activation. This is just like human neurons except that rather than conveying their activation level by firing more frequently, as biological neurons do, artificial neurons indicate activation by passing this activation value to connected nodes. After receiving this initial activation, information is then passed through the network. Connection strengths, inhibition/excitation conditions, and transfer functions determine how much of the activation value is passed on to the next node. Each node sums the activation values it receives, arrives at its own activation value, and then passes that along to the next nodes in the network (after modifying its activation level according to its transfer function). Thus the activation flows through the net in one direction, from input nodes, through the hidden layers, until eventually the output nodes are activated. If a network is properly trained, this output should reflect the input in some meaningful way. For instance, a gender recognition net might be presented with a picture of a man or woman at its input nodes and must set an output node to 0.0 if the picture depicts a man, or 1.0 for a woman. In this way, the network communicates its knowledge to the outside world.How They Learn
Having explained that connection strengths are storehouses of knowledge in neural net architectures, it should come as no surprise that learning in neural nets is primarily a process of adjusting connection strengths. In neural nets of the type described so far, the most popular method of learning is called Back-Propagation. To begin, the network is initialised, all the connection strength are set randomly, and the network sits as a blank slate. The network is then presented with some information, let us suppose that we are designing the "gender detector" mentioned earlier, and that the input nodes are receiving a digitised version of a photograph. The activation flows through the net (albeit haphazardly since we have not yet set the connection strengths to anything but random values). And eventually the output node registers an activation level. However, since the net has not yet been trained, its responses will initially be random. This is where back-propagation steps in. The net's response is compared with the correct response for that picture (i.e. 0.0 for male, 1.0 for female). Then working backwards from the output node, each connection strength is adjusted so that next time it's shown that picture, its answer will be closer to the desired one (the process by which each node is adjusted involves mathematics more complicated than this course requires. Students who are interested will find that some of the papers provided at the bottom of this chapter will discuss these methods in more detail).
This whole process: input, processing, comparing output with correct answer, and adjusting connection strengths is called one 'back-propagation cycle', or often just one 'iteration'. The net is then presented with another picture and its answer is compared with the correct answer, the connection strengths adjusted where needed. This process can often take hundreds or thousands of iterations. Eventually, the net should become fairly proficient at identifying males and females. There is always a risk however, that the net has not learned to discriminate males from females, but rather that it has effectively memorized the response for each picture. To test for this, the pictures (or whatever input is being used) should be divided into two groups: The training set, and the transfer set. The training set is used during back-propagation cycles, and the transfer set is used once learning is complete. If the net performs as well on the novel transfer stimuli as it did on the training set, then we conclude that learning has occured.Successes and Failures
It's fine in theory to talk about neural nets that tell males from females, but if that was all they were useful for, they would be a sad project indeed. In fact, neural nets have been enjoying growing success in a number of fields, and significantly: their successes tend to be in fields that posed large difficulties for symbolic AI. Neural networks are, by design, pattern processors - they can identify trends and important features, even in relatively complex information. What's more, they can work with less-than-perfect information, such as blurry or static-filled pictures, which has been an insurmountable difficulty for symbolic AI systems. Discerning patterns allows neural nets to read handwriting, detect potential sites for new mining and oil extraction, predict the stock market, and even learn to drive.
Interestingly, neural nets seem to be good at the same things we are, and struggle with the same things we struggle with. Symbolic AI is very good at producing machines that play grandmaster-level chess, that deduce logic theorems, and that compute complex mathematical functions. But Symbolic AI has enormous difficulty with things like processing a visual scene (discussed in a later chapter), dealing with noisy or imperfect data, and adapting to change. Neural nets are almost the exact reverse - their strength lies in the complex, fault-tolerant, parallel processing involved in vision, and their weaknesses are in formal reasoning and rule-following. Although humans are capable of both forms of intellectual functioning, it is generally thought that humans possess exceptional pattern recognition ability. In contrast, the limited capacity of human information processing systems often makes us less-than-perfect in tasks requiring abstract reasoning and logic.
Critics charge that a neural net's inability to learn something like logic, which has distinct and unbreakable rules, proves that neural nets cannot be an explanation of how the mind works. Neural net advocates have countered that a large part of the problem is that abstract rule-following ability requires many more nodes than current artificial neural nets implement. Some attempts are now being made at producing larger networks, but the computational load increases dramatically as nodes are added, making larger networks very difficult. Another set of critics charge that neural nets are too simplistic to be considered accurate models of human brain function. While artificial neural networks do contain some neuron-like attributes (connection strengths, inhibition/excitation, etc.) they overlook many other factors which may be significant to the brain's functioning. The nervous system uses many different neurotransmitters, for instance, and artificial neural nets do not account for those differences. Different neurons have different conduction velocities, different energy supplies, even different spatial locations, which may be significant. Moreover, brains do not start as a jumbled, randomised set of connection strengths, there is a great deal of organization present even during fetal development. Any or all of these can be seen as absolutely essential to the functioning of the brain, and without their inclusion in the artificial neural network models, it is possible that the models end up oversimplified.
One of the fundamental objections that has been raised towards back-propogation style networks like the ones discussed here is that humans seem to learn even in the absence of an explicit 'teacher' which corrects our outputs and models the response. For neural networks to succeed as a model of cognition, it is imperative that they produce a more biologically (or psychologically) plausible simulation of learning. In fact, research is being conducted with a new type of neural net, known as an 'Unsupervised Neural Net', which appears to successfully learn in the absence of an external teacher.Introductory Papers
These papers have been assembled to provide students with a more developed understanding of neural net architectures, and the current issues in research. The first three articles should be read by all students, they are clear and well explained. They avoid some of the more technical detail, but provide an excellent beginner's understanding of the field. The remaining three are included for students interested in more technical details. Students interested in neural networks are encouraged to read them for a more thorough understanding, but some calculus and computer science will be introduced that is beyond the scope of this course.