Neural Network Benchmarks
The CMU repository contains many types of data. For convenience, we
have roughly divided these data sets into the following categories:
- I/O Mappings- I/O Mappings are those data sets that
map input data vectors directly to output data vectors. The output
is completely determined by the local input vector.
- Temporal I/O Mappings- Temporal I/O
Mappings are I/O Mappings where the correct output vector may be
partially determined by non-local input vectors.
- Time Series- Time Series problems take points
along a single series of data and use that information to predict an
element of the series at some point in the future.
Please read the benchmark descriptions carefully and report your results using
the same methods as those described for the benchmark in question. If you have
any questions concerning how to properly use a benchmark, please do not
hestitate to contact us at neural-bench@cs.cmu.edu. It is vitally
important that new results be reported in a consistant manner so that
comparisons are not made between dissimilar methods.
-Download a
gzip'ed tar file of all the benchmarks (303.5k)-
The task is to train a network to produce the sum, mod 2, of N binary inputs
-- otherwise known as computing the odd parity function. See also the
XOR benchmark, which is the 2-input case of
parity.
This is the data set used by Gorman and Sejnowski in their study of the
classification of sonar signals using a neural network. The task is to train
a network to discriminate between sonar signals bounced off a metal cylinder
and those bounced off a roughly cylindrical rock.
The task is to learn to discriminate between two sets of training points which
lie on two distinct spirals in the x-y plane. These spirals coil three times
around the origin and around one another. This appears to be a very difficult
task for back-propagation
networks and their relatives. Problems like this one, whose inputs are points on the
2-D plane, are interesting because we can display the 2-D receptive field of any unit
in the network.
Speaker independant recognition of the eleven steady state vowels of British
English using a specified training set of lpc derived log area ratios.
The task is to train a network to produce the boolean exclusive-or function of
two variables. This is perhaps the simplest learning problem that is not
linearly separable. It therefore cannot be performed by a perceptron-like
network with onaly a single layer of trainable weights. In its various forms,
XOR has been the most popular learning benchmark in recent literature. XOR is
a special case of the
parity function, but here we will treat it
as a seperate benchmark in its own right.
This is an updated and corrected version of the data set used by Sejnowski and
Rosenberg in their influential study of speech generation using neural
networks. The file nettalk.data contains a list of 20,008 English words,
along with a phonetic transcription for each word. The task is to train a
network to produce the proper phonemes, given a string of letters as input.
This is an exampleof an input/output mapping task that exhibits strong global
regularities, but also a large number of more specialized rules and
exceptional cases.
This is the data set used by Ning Qian and Terry Sejnowski in their study using
a neural net to predict the secondary structure of certain globular proteins.
The idea is to take a linear sequence of amino acids and to predict, for each
of these amino acids, what secondary structure it is a part of within the
protein. There are three choices: alpha-helix, beta-sheet, random-coil. The
data set contains both a large set of training data and a distinct set of data
that can be used for testing the resulting network. Qian and Sejnowski use a
NetTalk-like approach and report an
accuracy of 64.3% on the test set. They speculate that this is about the best
that can be done using only local context.
The task is to use currently available points in a chaotic time series to
predict future point, t+P. Since the Mackey-Glass time series is chaotic, this
is a difficult problem for values of P greater than its characteristic period
of approximately 50.
neural-bench@cs.cmu.edu (Last updated: 21-Jan-97)