Publications about 'learning'
Articles in journal or book chapters
  1. P. Kuusela, D. Ocone, and E.D. Sontag. Learning Complexity Dimensions for a Continuous-Time Control System. SIAM J. Control Optim., 43(3):872-898, 2004. [PDF] [doi:] Keyword(s): theory of computing and complexity, VC dimension.
    This paper takes a computational learning theory approach to a problem of linear systems identification. It is assumed that input signals have only a finite number k of frequency components, and systems to be identified have dimension no greater than n. The main result establishes that the sample complexity needed for identification scales polynomially with n and logarithmically with k.

  2. E.D. Sontag. VC dimension of neural networks. In C.M. Bishop, editor, Neural Networks and Machine Learning, pages 69-95. Springer, Berlin, 1998. [PDF] Keyword(s): neural networks, VC dimension, learning, neural networks, shattering.
    The Vapnik-Chervonenkis (VC) dimension is an integer which helps to characterize distribution-independent learning of binary concepts from positive and negative samples. This paper, based on lectures delivered at the Isaac Newton Institute in August of 1997, presents a brief introduction, establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. (It does not address the learning and estimation-theoretic applications of VC dimension, and the applications to uniform convergence theorems for empirical probabilities, for which many suitable references are available.)

  3. E.D. Sontag. A learning result for continuous-time recurrent neural networks. Systems Control Lett., 34(3):151-158, 1998. [PDF] [doi:] Keyword(s): neural networks, VC dimension, recurrent neural networks.
    The following learning problem is considered, for continuous-time recurrent neural networks having sigmoidal activation functions. Given a ``black box'' representing an unknown system, measurements of output derivatives are collected, for a set of randomly generated inputs, and a network is used to approximate the observed behavior. It is shown that the number of inputs needed for reliable generalization (the sample complexity of the learning problem) is upper bounded by an expression that grows polynomially with the dimension of the network and logarithmically with the number of output derivatives being matched.

  4. B. DasGupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. IEEE Trans. Inform. Theory, 42(5):1479-1487, 1996. [PDF] Keyword(s): neural networks, VC dimension, recurrent neural networks.
    Recurrent perceptron classifiers generalize the usual perceptron model. They correspond to linear transformations of input vectors obtained by means of "autoregressive moving-average schemes", or infinite impulse response filters, and allow taking into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. The results are expressed in the context of the theory of probably approximately correct (PAC) learning.

  5. B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the complexity of training neural networks with continuous activation functions. IEEE Trans. Neural Networks, 6:1490-1504, 1995. [PDF] Keyword(s): neural networks, analog computing, theory of computing, neural networks, computational complexity, machine learning.
    Blum and Rivest showed that any possible neural net learning algorithm based on fixed architectures faces severe computational barriers. This paper extends their NP-completeness result, which applied only to nets based on hard threshold activations, to nets that employ a particular continuous activation. In view of neural network practice, this is a more relevant result to understanding the limitations of backpropagation and related techniques.

  6. B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the Intractability of Loading Neural Networks. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 357-389. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): analog computing, neural networks, computational complexity, machine learning.

  7. E.D. Sontag. Neural networks for control. In H. L. Trentelman and J. C. Willems, editors, Essays on control: perspectives in the theory and its applications (Groningen, 1993), volume 14 of Progr. Systems Control Theory, pages 339-380. Birkhäuser Boston, Boston, MA, 1993. Note: A longer version (tech report with more details) is here: [PDF] Keyword(s): neural networks, recurrent neural networks, neural networks.
    This paper has an expository introduction to two related topics: (a) Some mathematical results regarding "neural networks", and (b) so-called "neurocontrol" and "learning control" (each part can be read independently of the other). It was prepared for a short course given at the 1993 European Control Conference.

  8. E.D. Sontag. Capabilities and training of feedforward nets. In Neural networks (New Brunswick, NJ, 1990), pages 303-321. Academic Press, Boston, MA, 1991. [PDF] Keyword(s): neural networks, neural networks.
    This paper surveys recent work by the author on learning and representational capabilities of feedforward nets. The learning results show that, among two possible variants of the so-called backpropagation training method for sigmoidal nets, both of which variants are used in practice, one is a better generalization of the older perceptron training algorithm than the other. The representation results show that nets consisting of sigmoidal neurons have at least twice the representational capabilities of nets that use classical threshold neurons, at least when this increase is quantified in terms of classification power. On the other hand, threshold nets are shown to be more useful when approximating implicit functions, as illustrated with an application to a typical control problem.

  9. E.D. Sontag and H.J. Sussmann. Back propagation separates where perceptrons do. Neural Networks, 4(2):243-249, 1991. [PDF] [doi:] Keyword(s): neural networks, neural networks.
    Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = { neural networks , feedforward neural nets },

Conference articles
  1. E.D. Sontag. Some learning and systems-theoretic questions regarding recurrent neural networks. In Proc. Conf. on Information Sciences and Systems (CISS 97), Johns Hopkins, Baltimore, MD, March 1997, pages 630-635, 1997. Keyword(s): neural networks, VC dimension, recurrent neural networks.

  2. B. Dasgupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 204-210, 1996. MIT Press, Cambridge, MA. Keyword(s): neural networks, VC dimension, recurrent neural networks.

  3. B. DasGupta, H. T. Siegelmann, and E.D. Sontag. On a learnability question associated to neural networks with continuous activations (extended abstract). In COLT '94: Proceedings of the seventh annual conference on Computational learning theory, New York, NY, USA, pages 47-56, 1994. ACM Press. [doi:] Keyword(s): analog computing, neural networks, computational complexity, machine learning.

  4. C. Darken, M.J. Donahue, L. Gurvits, and E.D. Sontag. Rate of approximation results motivated by robust neural network learning. In COLT '93: Proceedings of the sixth annual conference on Computational learning theory, New York, NY, USA, pages 303-309, 1993. ACM Press. [doi:] Keyword(s): neural networks, optimization problems, approximation theory.

  5. A. Macintyre and E.D. Sontag. Finiteness results for sigmoidal neural networks. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, New York, NY, USA, pages 325-334, 1993. ACM Press. [PDF] [doi:] Keyword(s): neural networks, theory of computing and complexity.
    This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions.) Applications to learnability of sparse polynomials are also mentioned.

  6. H.T. Siegelmann, E.D. Sontag, and C.L. Giles. The Complexity of Language Recognition by Neural Networks. In Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1, pages 329-335, 1992. North-Holland. Keyword(s): neural networks, computational complexity, machine learning, recurrent neural networks, theory of computing and complexity.

  7. T. Asano, J. Hershberger, J. Pach, E.D. Sontag, D. Souivaine, and S. Suri. Separating bi-chromatic points by parallel lines. In Proceedings of the Second Canadian Conf. on Computational Geometry, Ottawa, Canada, 1990, pages 46-49, 1990. [PDF] Keyword(s): computational geometry.
    Given a 2-coloring of the vertices of a regular n-gon P, how many parallel lines are needed to separate the vertices into monochromatic subsets? We prove that floor(n/2) is a tight upper bound, and also provide an O(n log n) time algorithm to determine the direction that gives the minimum number of lines. If the polygon is a non-regular convex polygon, then n-3 lines may be necessary, while n-2 lines always suffice. This problem arises in machine learning and has implications about the representational capabilities of some neural networks.

Internal reports
  1. E.D. Sontag. Some remarks on the backpropagation algorithm for neural net learning. Technical report SYCON-88-02, Rutgers Center for Systems and Control, 1988. [PDF] Keyword(s): neural networks, neural networks.
    This is a very old informal report that discusses the study of local minima of quadratic loss functions for fitting errors in sigmoidal neural net learning. It also includes several remarks concerning the growth of weights during gradient descent. There is nothing very interesting here - far better knowledge is now available - but the report was placed here by request.



This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.

Last modified: Thu Nov 23 10:40:56 2017
Author: sontag.

This document was translated from BibTEX by bibtex2html