Zeng, Zheng and Goodman, Rodney M. and Smyth, Padhraic (1993) Learning finite state machines with self-clustering recurrent networks. Neural Computation, 5 (6). pp. 976-990. ISSN 0899-7667 http://resolver.caltech.edu/CaltechAUTHORS:ZENnc93
- Published Version
See Usage Policy.
Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:ZENnc93
Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task. In studying the performance and learning behavior of such networks we have found that the second-order network model attempts to form clusters in activation space as its internal representation of states. However, these learned states become unstable as longer and longer test input strings are presented to the network. In essence, the network “forgets” where the individual states are in activation space. In this paper we propose a new method to force such a network to learn stable states by introducing discretization into the network and using a pseudo-gradient learning rule to perform training. The essence of the learning rule is that in doing gradient descent, it makes use of the gradient of a sigmoid function as a heuristic hint in place of that of the hard-limiting function, while still using the discretized value in the feedback update path. The new structure uses isolated points in activation space instead of vague clusters as its internal representation of states. It is shown to have similar capabilities in learning finite state automata as the original network, but without the instability problem. The proposed pseudo-gradient learning rule may also be used as a basis for training other types of networks that have hard-limiting threshold activation functions.
|Additional Information:||© 1993 Massachusetts Institute of Technology. Received 15 June 1992; accepted 8 March 1993. Posted Online April 4, 2008. The research described in this paper was supported in part by ONR and ARPA under Grants AFOSR-90-0199 and N00014-92-J-1860. In addition this work was carried out in part by the Jet Propulsion Laboratories, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.|
|Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Tony Diaz|
|Deposited On:||17 Jun 2009 21:39|
|Last Modified:||26 Dec 2012 10:53|
Repository Staff Only: item control page