Optimal Brain Surgeon and general network pruning

Creators: Hassibi, Babak; Stork, David G.; Wolff, Gregory J.

Abstract

The use of information from all second-order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and, in some cases, enable rule extraction, is investigated. The method, Optimal Brain Surgeon (OBS), is significantly better than magnitude-based methods and Optimal Brain Damage, which often remove the wrong weights. OBS, permits pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H^-1 from training data and structural information of the set. OBS permits a 76%, a 62%, and a 90% reduction in weights over backpropagation with weight decay on three benchmark MONK'S problems. Of OBS, Optimal Brain Damage, and a magnitude-based method, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1,560 weights, yielding better generalization.

Additional Information

© 1993 IEEE. Supported in part by grants AFOSR 91-0060 and DAAL03-91-C-0010 to T. Kailath, who in turn provided constant encouragement. Deep thanks go to Jerome Friedman (Stanford) for pointers to relevant statistics literature.

Attached Files

Published - Optimal_Brain_Surgeon_and_general_network_pruning.pdf

Files

Optimal_Brain_Surgeon_and_general_network_pruning.pdf

Files (537.4 kB)

Name	Size	Download all
Optimal_Brain_Surgeon_and_general_network_pruning.pdf md5:0e4acf2b5271c28906a07fcd3f55796e	537.4 kB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes