Neural Network Benchmarks


The CMU repository contains many types of data. For convenience, we have roughly divided these data sets into the following categories: Please read the benchmark descriptions carefully and report your results using the same methods as those described for the benchmark in question. If you have any questions concerning how to properly use a benchmark, please do not hestitate to contact us at neural-bench@cs.cmu.edu. It is vitally important that new results be reported in a consistant manner so that comparisons are not made between dissimilar methods.

-Download a gzip'ed tar file of all the benchmarks (303.5k)-


I/O Mappings

Parity ( gzip'ed tar archive - 14.6k )

The task is to train a network to produce the sum, mod 2, of N binary inputs -- otherwise known as computing the odd parity function. See also the XOR benchmark, which is the 2-input case of parity.

Sonar, Mines vs. Rocks ( gzip'ed tar archive - 46.5k )

This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.

Two Spirals ( gzip'ed tar archive - 15.7k )

The task is to learn to discriminate between two sets of training points which lie on two distinct spirals in the x-y plane. These spirals coil three times around the origin and around one another. This appears to be a very difficult task for back-propagation networks and their relatives. Problems like this one, whose inputs are points on the 2-D plane, are interesting because we can display the 2-D receptive field of any unit in the network.

Vowel Recognition (Deterding data) ( gzip'ed tar archive - 43.3k )

Speaker independant recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.

XOR ( gzip'ed tar archive - 13.2k )

The task is to train a network to produce the boolean exclusive-or function of two variables. This is perhaps the simplest learning problem that is not linearly separable. It therefore cannot be performed by a perceptron-like network with onaly a single layer of trainable weights. In its various forms, XOR has been the most popular learning benchmark in recent literature. XOR is a special case of the parity function, but here we will treat it as a seperate benchmark in its own right.


Temporal I/O Mappings

NetTalk Corpus ( gzip'ed tar archive - 175.9k )

This is an updated and corrected version of the data set used by Sejnowski and Rosenberg in their influential study of speech generation using neural networks. The file nettalk.data contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes, given a string of letters as input. This is an exampleof an input/output mapping task that exhibits strong global regularities, but also a large number of more specialized rules and exceptional cases.

Secondary Structure of Globular Proteins ( gzip'ed tar archive - 36.8k )

This is the data set used by Ning Qian and Terry Sejnowski in their study using a neural net to predict the secondary structure of certain globular proteins. The idea is to take a linear sequence of amino acids and to predict, for each of these amino acids, what secondary structure it is a part of within the protein. There are three choices: alpha-helix, beta-sheet, random-coil. The data set contains both a large set of training data and a distinct set of data that can be used for testing the resulting network. Qian and Sejnowski use a NetTalk-like approach and report an accuracy of 64.3% on the test set. They speculate that this is about the best that can be done using only local context.


Time Series

Mackey-Glass Chaotic Time Series ( gzip'ed tar archive - 26.9k )

The task is to use currently available points in a chaotic time series to predict future point, t+P. Since the Mackey-Glass time series is chaotic, this is a difficult problem for values of P greater than its characteristic period of approximately 50.


neural-bench@cs.cmu.edu (Last updated: 21-Jan-97)