BACK TO INDEX

Publications about 'neural networks'
Articles in journal or book chapters
  1. W. Maass, P. Joshi, and E.D. Sontag. Computational aspects of feedback in neural circuits. PLoS Computational Biology, 3:e165 1-20, 2007. [PDF] Keyword(s): neural networks, feedback linearization, computation by cortical microcircuits.
    Abstract:
    It had previously been shown that generic cortical microcircuit models can perform complex real-time computations on continuous input streams, provided that these computations can be carried out with a rapidly fading memory. We investigate in this article the computational capability of such circuits in the more realistic case where not only readout neurons, but in addition a few neurons within the circuit have been trained for specific tasks. This is essentially equivalent to the case where the output of trained readout neurons is fed back into the circuit. We show that this new model overcomes the limitation of a rapidly fading memory. In fact, we prove that in the idealized case without noise it can carry out any conceivable digital or analog computation on time-varying inputs. But even with noise the resulting computational model can perform a large class of biologically relevant real-time computations that require a non-fading memory.


  2. W. Maass, P. Joshi, and E.D. Sontag. Principles of real-time computing with feedback applied to cortical microcircuit models. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, 2006. Keyword(s): neural networks.


  3. W. Maass and E.D. Sontag. Neural Systems as Nonlinear Filters. Neural Comput., 12(8):1743-1772, 2000. [PDF] [doi:http://dx.doi.org/10.1162/089976600300015123] Keyword(s): neural networks, Volterra series.
    Abstract:
    We analyze computations on temporal patterns and spatio-temporal patterns in formal network models whose temporal dynamics arises from empirically established quantitative models for short term dynamics at biological synapses. We give a complete characterization of all linear and nonlinear filters that can be approximated by such dynamic network models: it is the class of all filters that can be approximated by Volterra series. This characterization is shown to be rather stable with regard to changes in the model. For example it is shown that synaptic facilitation and one layer of neurons suffices for approximating arbitrary filters from this class. Our results provide a new complexity hierarchy for all filters that are approximable by Volterra series, which appears to be closer related to the actual cost of implementing such filters in neural hardware than preceding complexity measures. Our results also provide a new parameterization for approximations to such filters in terms of parameters that are arguable related to those that are tunable in biological neural systems.


  4. W. Maass and E.D. Sontag. Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Comput., 11(3):771-782, 1999. [PDF] [doi:http://dx.doi.org/10.1162/089976699300016656] Keyword(s): neural networks.
    Abstract:
    We consider recurrent analog neural nets where the output of each gate is subject to Gaussian noise, or any other common noise distribution that is nonzero on a large set. We show that many regular languages cannot be recognized by networks of this type, and we give a precise characterization of those languages which can be recognized. This result implies severe constraints on possibilities for constructing recurrent analog neural nets that are robust against realistic types of analog noise. On the other hand we present a method for constructing feedforward analog neural nets that are robust with regard to analog noise of this type.


  5. E.D. Sontag and Y. Qiao. Further results on controllability of recurrent neural networks. Systems Control Lett., 36(2):121-129, 1999. [PDF] Keyword(s): controllability, recurrent neural networks, neural networks.
    Abstract:
    This paper studies controllability properties of recurrent neural networks. The new contributions are: (1) an extension of the result in "Complete controllability of continuous-time recurrent neural networks" to a slightly different model, where inputs appear in an affine form, (2) a formulation and proof of a necessary and sufficient condition, in terms of local-local controllability, and (3) a complete analysis of the 2-dimensional case for which the hypotheses made in previous work do not apply.


  6. E.D. Sontag. Automata and neural networks. In The handbook of brain theory and neural networks, pages 119-122. MIT Press, Cambridge, MA, USA, 1998. Keyword(s): neural networks.


  7. E.D. Sontag. VC dimension of neural networks. In C.M. Bishop, editor, Neural Networks and Machine Learning, pages 69-95. Springer, Berlin, 1998. [PDF] Keyword(s): neural networks, VC dimension, learning, neural networks, shattering.
    Abstract:
    The Vapnik-Chervonenkis (VC) dimension is an integer which helps to characterize distribution-independent learning of binary concepts from positive and negative samples. This paper, based on lectures delivered at the Isaac Newton Institute in August of 1997, presents a brief introduction, establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. (It does not address the learning and estimation-theoretic applications of VC dimension, and the applications to uniform convergence theorems for empirical probabilities, for which many suitable references are available.)


  8. P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. Discrete Appl. Math., 86(1):63-79, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0166-218X(98)00014-6] Keyword(s): neural networks, recurrent neural networks.
    Abstract:
    This paper provides lower and upper bounds for the VC dimension of recurrent networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. Ignoring multiplicative constants, the main results say roughly the following: 1. For architectures whose activation is any fixed nonlinear polynomial, the VC dimension is proportional to wk. 2. For architectures whose activation is any fixed piecewise polynomial, the VC dimension is between wk and w**2k. 3. For architectures with threshold activations, the VC dimension is between wlog(k/w) and the smallest of wklog(wk) and w**2+wlog(wk). 4. For the standard sigmoid tanh(x), the VC dimension is between wk and w**4 k**2.


  9. E.D. Sontag. A learning result for continuous-time recurrent neural networks. Systems Control Lett., 34(3):151-158, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(98)00006-1] Keyword(s): neural networks, VC dimension, recurrent neural networks.
    Abstract:
    The following learning problem is considered, for continuous-time recurrent neural networks having sigmoidal activation functions. Given a ``black box'' representing an unknown system, measurements of output derivatives are collected, for a set of randomly generated inputs, and a network is used to approximate the observed behavior. It is shown that the number of inputs needed for reliable generalization (the sample complexity of the learning problem) is upper bounded by an expression that grows polynomially with the dimension of the network and logarithmically with the number of output derivatives being matched.


  10. P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. In Computational learning theory (Jerusalem, 1997), volume 1208 of Lecture Notes in Comput. Sci., pages 223-237. Springer-Verlag, London, UK, 1997. Keyword(s): neural networks, VC dimension, recurrent neural networks.


  11. E.D. Sontag. Recurrent neural networks: Some systems-theoretic aspects. In M. Karny, K. Warwick, and V. Kurkova, editors, Dealing with Complexity: a Neural Network Approach, pages 1-12. Springer-Verlag, London, 1997. [PDF] Keyword(s): neural networks, recurrent neural networks.
    Abstract:
    This paper provides an exposition of some recent results regarding system-theoretic aspects of continuous-time recurrent (dynamic) neural networks with sigmoidal activation functions. The class of systems is introduced and discussed, and a result is cited regarding their universal approximation properties. Known characterizations of controllability, observability, and parameter identifiability are reviewed, as well as a result on minimality. Facts regarding the computational power of recurrent nets are also mentioned.


  12. M. J. Donahue, L. Gurvits, C. Darken, and E.D. Sontag. Rates of convex approximation in non-Hilbert spaces. Constr. Approx., 13(2):187-220, 1997. [PDF] Keyword(s): neural networks, optimization, approximation theory.
    Abstract:
    This paper deals with sparse approximations by means of convex combinations of elements from a predetermined "basis" subset S of a function space. Specifically, the focus is on the rate at which the lowest achievable error can be reduced as larger subsets of S are allowed when constructing an approximant. The new results extend those given for Hilbert spaces by Jones and Barron, including in particular a computationally attractive incremental approximation scheme. Bounds are derived for broad classes of Banach spaces. The techniques used borrow from results regarding moduli of smoothness in functional analysis as well as from the theory of stochastic processes on function spaces.


  13. P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. J. Comput. System Sci., 54(1, part 2):190-198, 1997. Note: (1st Annual Dagstuhl Seminar on Neural Computing, 1994). [PDF] [doi:http://dx.doi.org/10.1006/jcss.1997.1479] Keyword(s): neural networks, VC dimension.
    Abstract:
    This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles the open question of whether whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.


  14. R. Koplon and E.D. Sontag. Using Fourier-neural recurrent networks to fit sequential input/output data. Neurocomputing, 15:225-248, 1997. [PDF] Keyword(s): neural networks, recurrent neural networks.
    Abstract:
    This paper suggests the use of Fourier-type activation functions in fully recurrent neural networks. The main theoretical advantage is that, in principle, the problem of recovering internal coefficients from input/output data is solvable in closed form.


  15. E.D. Sontag. Shattering all sets of k points in `general position' requires (k-1)/2 parameters. Neural Comput., 9(2):337-348, 1997. [PDF] Keyword(s): neural networks, VC dimension.
    Abstract:
    For classes of concepts defined by certain classes of analytic functions depending on k parameters, there are nonempty open sets of samples of length 2k+2 which cannot be shattered. A slighly weaker result is also proved for piecewise-analytic functions. The special case of neural networks is discussed.


  16. E.D. Sontag and H.J. Sussmann. Complete controllability of continuous-time recurrent neural networks. Systems Control Lett., 30(4):177-183, 1997. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(97)00002-9] Keyword(s): neural networks, recurrent neural networks.
    Abstract:
    This paper presents a characterization of controllability for the class of control systems commonly called (continuous-time) recurrent neural networks. The characterization involves a simple condition on the input matrix, and is proved when the activation function is the hyperbolic tangent.


  17. Y. Yang, E.D. Sontag, and H.J. Sussmann. Global stabilization of linear discrete-time systems with bounded feedback. Systems Control Lett., 30(5):273-281, 1997. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(97)00021-2] Keyword(s): discrete-time, saturation.
    Abstract:
    This paper deals with the problem of global stabilization of linear discrete time systems by means of bounded feedback laws. The main result proved is an analog of one proved for the continuous time case by the authors, and shows that such stabilization is possible if and only if the system is stabilizable with arbitrary controls and the transition matrix has spectral radius less or equal to one. The proof provides in principle an algorithm for the construction of such feedback laws, which can be implemented either as cascades or as parallel connections (``single hidden layer neural networks'') of simple saturation functions.


  18. B. DasGupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. IEEE Trans. Inform. Theory, 42(5):1479-1487, 1996. [PDF] Keyword(s): neural networks, VC dimension, recurrent neural networks.
    Abstract:
    Recurrent perceptron classifiers generalize the usual perceptron model. They correspond to linear transformations of input vectors obtained by means of "autoregressive moving-average schemes", or infinite impulse response filters, and allow taking into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. The results are expressed in the context of the theory of probably approximately correct (PAC) learning.


  19. E.D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. Adv. Comput. Math., 5(2-3):245-268, 1996. [PDF] Keyword(s): subanalytic sets, semianalytic sets, analytic geometry, critical points, approximation theory, neural networks.
    Abstract:
    This paper deals with nonlinear least-squares problems involving the fitting to data of parameterized analytic functions. For generic regression data, a general result establishes the countability, and under stronger assumptions finiteness, of the set of functions giving rise to critical points of the quadratic loss function. In the special case of what are usually called "single-hidden layer neural networks", which are built upon the standard sigmoidal activation tanh(x) or equivalently 1/(1+exp(-x)), a rough upper bound for this cardinality is provided as well.


  20. B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the complexity of training neural networks with continuous activation functions. IEEE Trans. Neural Networks, 6:1490-1504, 1995. [PDF] Keyword(s): neural networks, analog computing, theory of computing, neural networks, computational complexity, machine learning.
    Abstract:
    Blum and Rivest showed that any possible neural net learning algorithm based on fixed architectures faces severe computational barriers. This paper extends their NP-completeness result, which applied only to nets based on hard threshold activations, to nets that employ a particular continuous activation. In view of neural network practice, this is a more relevant result to understanding the limitations of backpropagation and related techniques.


  21. H. T. Siegelmann and E.D. Sontag. On the computational power of neural nets. J. Comput. System Sci., 50(1):132-150, 1995. [PDF] [doi:http://dx.doi.org/10.1006/jcss.1995.1013] Keyword(s): neural networks, recurrent neural networks, analog computing, theory of computing, neural networks, computational complexity, super-Turing computation.
    Abstract:
    This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a rational-coefficient linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by such nets. In particular, one can simulate any multi-stack Turing Machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Non-deterministic Turing Machines can be simulated by non-deterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.


  22. B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the Intractability of Loading Neural Networks. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 357-389. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): analog computing, neural networks, computational complexity, machine learning.


  23. W. Maass, G. Schnitger, and E.D. Sontag. A comparison of the computational power of sigmoid and Boolean threshold circuits. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 127-151. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): neural networks, boolean systems.
    Abstract:
    We examine the power of constant depth circuits with sigmoid threshold gates for computing boolean functions. It is shown that, for depth 2, constant size circuits of this type are strictly more powerful than constant size boolean threshold circuits (i.e. circuits with linear threshold gates). On the other hand it turns out that, for any constant depth d, polynomial size sigmoid threshold circuits with polynomially bounded weights compute exactly the same boolean functions as the corresponding circuits with linear threshold gates.


  24. F. Albertini and E.D. Sontag. State observability in recurrent neural networks. Systems Control Lett., 22(4):235-244, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0167-6911(94)90054-X] Keyword(s): neural networks, recurrent neural networks, observability, identifiability.
    Abstract:
    This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. Our main result is that observability can be characterized, if one assumes certain conditions on the nonlinearity and on the system, in a manner very analogous to that of the linear case. Recall that for the latter, observability is equivalent to the requirement that there not be any nontrivial A-invariant subspace included in the kernel of C. We show that the result generalizes in a natural manner, except that one now needs to restrict attention to certain special "coordinate" subspaces.


  25. H. T. Siegelmann and E.D. Sontag. Analog computation via neural networks. Theoret. Comput. Sci., 131(2):331-360, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0304-3975(94)90178-3] Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks, neural networks, computational complexity.
    Abstract:
    We consider recurrent networks with real-valued weights. If allowed exponential time for computation, they turn out to have unbounded power. However, under polynomial-time constraints there are limits on their capabilities, though being more powerful than Turing Machines. Moreover, there is a precise correspondence between nets and standard non-uniform circuits with equivalent resources, and as a consequence one has lower bound constraints on what they can compute. We note that these networks are not likely to solve polynomially NP-hard problems, as the equality "P=NP" in our model implies the almost complete collapse of the standard polynomial hierarchy. We show that a large class of different networks and dynamical system models have no more computational power than this neural (first-order) model with real weights. The results suggest the following Church-like Thesis of Time-bounded Analog Computing: "Any reasonable analog computer will have no more power (up to polynomial time) than first-order recurrent networks."


  26. H.J. Sussmann, E.D. Sontag, and Y. Yang. A general result on the stabilization of linear systems using bounded controls. IEEE Trans. Automat. Control, 39(12):2411-2425, 1994. [PDF] Keyword(s): saturation, neural networks, global stability, nonlinear stability.
    Abstract:
    We present two constructions of controllers that globally stabilize linear systems subject to control saturation. We allow essentially arbitrary saturation functions. The only conditions imposed on the system are the obvious necessary ones, namely that no eigenvalues of the uncontrolled system have positive real part and that the standard stabilizability rank condition hold. One of the constructions is in terms of a "neural-network type" one-hidden layer architecture, while the other one is in terms of cascades of linear maps and saturations.


  27. F. Albertini, E.D. Sontag, and V. Maillot. Uniqueness of weights for neural networks. In R. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 115-125. Chapman and Hall, London, 1993. [PDF] Keyword(s): neural networks, recurrent neural networks.
    Abstract:
    In this short expository survey, we sketch various known facts about uniqueness of weights in neural networks, including results about recurrent nets, and we provide a new and elementary complex-variable proof of a uniqueness result that applies in the single hidden layer case.


  28. E.D. Sontag. Neural networks for control. In H. L. Trentelman and J. C. Willems, editors, Essays on control: perspectives in the theory and its applications (Groningen, 1993), volume 14 of Progr. Systems Control Theory, pages 339-380. Birkhäuser Boston, Boston, MA, 1993. Note: A longer version (tech report with more details) is here: http://www.math.rutgers.edu/(tilde)sontag/FTPDIR/neural-nets-siemens.pdf. [PDF] Keyword(s): neural networks, recurrent neural networks, neural networks.
    Abstract:
    This paper has an expository introduction to two related topics: (a) Some mathematical results regarding "neural networks", and (b) so-called "neurocontrol" and "learning control" (each part can be read independently of the other). It was prepared for a short course given at the 1993 European Control Conference.


  29. F. Albertini and E.D. Sontag. For neural networks, function determines form. Neural Networks, 6(7):975-990, 1993. [PDF] Keyword(s): neural networks, identifiability, recurrent neural networks, realization theory, observability, neural networks.
    Abstract:
    This paper shows that the weights of continuous-time feedback neural networks x'=s(Ax+Bu), y=Cx (where s is a sigmoid) are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function s; if the two nets have equal behaviors as "black boxes" then necessarily they must have the same number of neurons and -except at most for sign reversals at each node- the same weights. Moreover, even if the activations are not a priori known to coincide, they are shown to be also essentially determined from the external measurements.


  30. E.D. Sontag. Feedback stabilization using two-hidden-layer nets. IEEE Trans. Neural Networks, 3:981-990, 1992. [PDF] Keyword(s): neural networks, feedback stabilization.
    Abstract:
    This paper compares the representational capabilities of one hidden layer and two hidden layer nets consisting of feedforward interconnections of linear threshold units. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the known approximation theorems. The differences are not based on numerical accuracy or number of units needed, nor on capabilities for feature extraction, but rather on a much more basic classification into "direct" and "inverse" problems. The former correspond to the approximation of continuous functions, while the latter are concerned with approximating one-sided inverses of continuous functions - and are often encountered in the context of inverse kinematics determination or in control questions. A general result is given showing that nonlinear control systems can be stabilized using two hidden layers, but not in general using just one.


  31. E.D. Sontag. Feedforward nets for interpolation and classification. J. Comput. System Sci., 45(1):20-48, 1992. [PDF] [doi:http://dx.doi.org/10.1016/0022-0000(92)90039-L] Keyword(s): neural networks, VC dimension, boolean systems.
    Abstract:
    This paper deals with single-hidden-layer feedforward nets, studying various aspects of classification power and interpolation capability. In particular, a worst-case analysis shows that direct input to output connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows doubling both. For other measures of classification, including the Vapnik-Chervonenkis dimension, the effect of direct connections or sigmoidal activations is studied in the special case of two-dimensional inputs.


  32. E.D. Sontag. Capabilities and training of feedforward nets. In Neural networks (New Brunswick, NJ, 1990), pages 303-321. Academic Press, Boston, MA, 1991. [PDF] Keyword(s): neural networks, neural networks.
    Abstract:
    This paper surveys recent work by the author on learning and representational capabilities of feedforward nets. The learning results show that, among two possible variants of the so-called backpropagation training method for sigmoidal nets, both of which variants are used in practice, one is a better generalization of the older perceptron training algorithm than the other. The representation results show that nets consisting of sigmoidal neurons have at least twice the representational capabilities of nets that use classical threshold neurons, at least when this increase is quantified in terms of classification power. On the other hand, threshold nets are shown to be more useful when approximating implicit functions, as illustrated with an application to a typical control problem.


  33. H. T. Siegelmann and E.D. Sontag. Turing computability with neural nets. Appl. Math. Lett., 4(6):77-80, 1991. [PDF] Keyword(s): neural networks, computational complexity, recurrent neural networks.
    Abstract:
    This paper shows the existence of a finite neural network, made up of sigmoidal neurons, which simulates a universal Turing machine. It is composed of less than 100,000 synchronously evolving processors, interconnected linearly. High-order connections are not required. (Note: this paper was placed here by special request. The results in this paper have been by now improved considerably: see the JCSS pape which among other aspects provides a polynomial time simulation. This paper, based on a unary encoding, results in an exponential slowdown).


  34. E.D. Sontag and H.J. Sussmann. Back propagation separates where perceptrons do. Neural Networks, 4(2):243-249, 1991. [PDF] [doi:http://dx.doi.org/10.1016/0893-6080(91)90008-S] Keyword(s): neural networks, neural networks.
    Abstract:
    Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = { neural networks , feedforward neural nets },


  35. E.D. Sontag. Sigmoids distinguish more efficiently than Heavisides. Neural Computation, 1:470-472, 1989. [PDF] Keyword(s): neural networks, boolean systems.
    Abstract:
    Every dichotomy on a 2k-point set in Rn can be implemented by a neural net with a single hidden layer containing k sigmoidal neurons. If the neurons were of a hardlimiter (Heaviside) type, 2k-1 would be in general needed.


  36. E.D. Sontag and H.J. Sussmann. Backpropagation can give rise to spurious local minima even for networks without hidden layers. Complex Systems, 3(1):91-106, 1989. [PDF] Keyword(s): neural networks, neural networks.
    Abstract:
    We give an example of a neural net without hidden layers and with a sigmoid transfer function, together with a training set of binary vectors, for which the sum of the squared errors, regarded as a function of the weights, has a local minimum which is not a global minimum. The example consists of a set of 125 training instances, with four weights and a threshold to be learnt. We do not know if substantially smaller binary examples exist.


Conference articles
  1. T. Natschläger, W. Maass, E.D. Sontag, and A. Zador. Processing of time series by neural circuits with biologically realistic synaptic dynamics. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13 (NIPS2000), pages 145-151, 2000. MIT Press, Cambridge. [PDF] Keyword(s): neural networks, Volterra series.
    Abstract:
    Experimental data show that biological synapses are dynamic, i.e., their weight changes on a short time scale by several hundred percent in dependence of the past input to the synapse. In this article we explore the consequences that this synaptic dynamics entails for the computational power of feedforward neural networks. It turns out that even with just a single hidden layer such networks can approximate a surprisingly large large class of nonlinear filters: all filters that can be characterized by Volterra series. This result is robust with regard to various changes in the model for synaptic dynamics. Furthermore we show that simple gradient descent suffices to approximate a given quadratic filter by a rather small neural system with dynamic synapses.


  2. W. Maass and E.D. Sontag. A precise characterization of the class of languages recognized by neural nets under Gaussian and other common noise distributions. In Proceedings of the 1998 conference on Advances in neural information processing systems II, Cambridge, MA, USA, pages 281-287, 1999. MIT Press. Keyword(s): neural networks.


  3. E.D. Sontag and Y. Qiao. Remarks on controllability of recurrent neural networks. In Proc. IEEE Conf. Decision and Control, Tampa, Dec. 1998, IEEE Publications, 1998, pages 501-506, 1998. Keyword(s): neural networks, recurrent neural networks.


  4. E.D. Sontag. Some learning and systems-theoretic questions regarding recurrent neural networks. In Proc. Conf. on Information Sciences and Systems (CISS 97), Johns Hopkins, Baltimore, MD, March 1997, pages 630-635, 1997. Keyword(s): neural networks, VC dimension, recurrent neural networks.


  5. B. Dasgupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 204-210, 1996. MIT Press, Cambridge, MA. Keyword(s): neural networks, VC dimension, recurrent neural networks.


  6. P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 197-203, 1996. MIT Press, Cambridge, MA. Keyword(s): neural networks, VC dimension.


  7. E.D. Sontag. Critical points for neural net least-squares problems. In Proc. 1995 IEEE Internat. Conf. Neural Networks, IEEE Publications, 1995, pages 2949-2954, 1995. Keyword(s): neural networks.


  8. B. DasGupta, H. T. Siegelmann, and E.D. Sontag. On a learnability question associated to neural networks with continuous activations (extended abstract). In COLT '94: Proceedings of the seventh annual conference on Computational learning theory, New York, NY, USA, pages 47-56, 1994. ACM Press. [doi:http://doi.acm.org/10.1145/180139.181009] Keyword(s): analog computing, neural networks, computational complexity, machine learning.


  9. R. Koplon and E.D. Sontag. Techniques for parameter reconstruction in Fourier-Neural recurrent networks. In Proc. IEEE Conf. Decision and Control, Orlando, Dec. 1994, IEEE Publications, 1994, pages 213-218, 1994. Keyword(s): neural networks, recurrent neural networks.


  10. F. Albertini and E.D. Sontag. Identifiability of discrete-time neural networks. In Proc. European Control Conf., Groningen, June 1993, pages 460-465, 1993. Keyword(s): neural networks, recurrent neural networks.


  11. F. Albertini and E.D. Sontag. State observability in recurrent neural networks. In Proc. IEEE Conf. Decision and Control, San Antonio, Dec. 1993, IEEE Publications, 1993, pages 3706-3707, 1993. Keyword(s): neural networks, observability, recurrent neural networks.


  12. F. Albertini and E.D. Sontag. Uniqueness of weights for recurrent nets. In Systems and Networks: Mathematical Theory and Applications, Proc. MTNS '93, Vol. 2, Akad. Verlag, Regensburg, pages 599-602, 1993. Note: Full version, never submitted for publication, is here: http://www.math.rutgers.edu/(tilde)sontag/FTPDIR/93mtns-nn-extended.pdf. [PDF] Keyword(s): neural networks, identifiability, recurrent neural networks.
    Abstract:
    This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. The paper establishes parameter identifiability under stronger assumptions on the activation than in "For neural networks, function determines form", but on the other hand deals with arbitrary (nonzero) initial states.


  13. J. L. Balcázar, R. Gavaldà, H. T. Siegelmann, and E.D. Sontag. Some structural complexity aspects of neural computation. In Proceedings of the Eighth Annual Structure in Complexity Theory Conference (San Diego, CA, 1993), Los Alamitos, CA, pages 253-265, 1993. IEEE Comput. Soc. Press. Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, theory of computing and complexity.


  14. C. Darken, M.J. Donahue, L. Gurvits, and E.D. Sontag. Rate of approximation results motivated by robust neural network learning. In COLT '93: Proceedings of the sixth annual conference on Computational learning theory, New York, NY, USA, pages 303-309, 1993. ACM Press. [doi:http://doi.acm.org/10.1145/168304.168357] Keyword(s): neural networks, optimization problems, approximation theory.


  15. A. Macintyre and E.D. Sontag. Finiteness results for sigmoidal neural networks. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, New York, NY, USA, pages 325-334, 1993. ACM Press. [PDF] [doi:http://doi.acm.org/10.1145/167088.167192] Keyword(s): neural networks, theory of computing and complexity.
    Abstract:
    This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions.) Applications to learnability of sparse polynomials are also mentioned.


  16. H.T. Siegelmann and E.D. Sontag. Analog computation via neural networks. In Proc. 2nd Israel Symposium on Theory of Computing and Systems (ISTCS93), IEEE Computer Society Press, 1993, 1993. Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.


  17. F. Albertini and E.D. Sontag. For neural networks, function determines form. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 26-31, 1992. Keyword(s): neural networks, recurrent neural networks.


  18. H.T. Siegelmann and E.D. Sontag. On the computational power of neural nets. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, New York, NY, USA, pages 440-449, 1992. ACM Press. [doi:http://doi.acm.org/10.1145/130385.130432] Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.


  19. H.T. Siegelmann and E.D. Sontag. Some results on computing with neural nets. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 1476-1481, 1992. Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.


  20. H.T. Siegelmann, E.D. Sontag, and C.L. Giles. The Complexity of Language Recognition by Neural Networks. In Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1, pages 329-335, 1992. North-Holland. Keyword(s): neural networks, computational complexity, machine learning, recurrent neural networks, theory of computing and complexity.


  21. E.D. Sontag. Neural nets as systems models and controllers. In Proc. Seventh Yale Workshop on Adaptive and Learning Systems, Yale University, 1992, pages 73-79, 1992. [PDF] Keyword(s): neural networks, recurrent neural networks, neural networks.
    Abstract:
    A conference paper. Placed here because it was requested, but contains little that is not also contained in the survey on neural nets mentioned above.


  22. E.D. Sontag. Systems combining linearity and saturations, and relations to neural nets. In Nonlinear Control Systems Design 1992, IFAC Symposia Series, 1993, M. Fliess Ed., Pergamon Press, Oxford, 1993, pages 15-21, 1992. Note: (Also in Proc. Nonlinear Control Systems Design Symp., Bordeaux, June 1992, M. Fliess, Ed., IFAC Publications, pp. 242-247). Keyword(s): neural networks, recurrent neural networks.


  23. W. Maass, G. Schnitger, and E.D. Sontag. On the computational power of sigmoid versus Boolean threshold circuits (extended abstract). In Proceedings of the 32nd annual symposium on Foundations of computer science, Los Alamitos, CA, USA, pages 767-776, 1991. IEEE Computer Society Press. Keyword(s): neural networks, theory of computing and complexity.


  24. T. Asano, J. Hershberger, J. Pach, E.D. Sontag, D. Souivaine, and S. Suri. Separating bi-chromatic points by parallel lines. In Proceedings of the Second Canadian Conf. on Computational Geometry, Ottawa, Canada, 1990, pages 46-49, 1990. [PDF] Keyword(s): computational geometry.
    Abstract:
    Given a 2-coloring of the vertices of a regular n-gon P, how many parallel lines are needed to separate the vertices into monochromatic subsets? We prove that floor(n/2) is a tight upper bound, and also provide an O(n log n) time algorithm to determine the direction that gives the minimum number of lines. If the polygon is a non-regular convex polygon, then n-3 lines may be necessary, while n-2 lines always suffice. This problem arises in machine learning and has implications about the representational capabilities of some neural networks.


  25. H. Dewan and E.D. Sontag. Extrapolatory methods for speeding up the BP algorithm. In Proc. Int. Joint Conf. on Neural Networks, Washington, DC, Jan. 1990, IEEE Publications, 1990, pages I.613-616, 1990. Keyword(s): neural networks.


  26. E.D. Sontag. Comparing sigmoids and heavisides. In Proc. Conf. Info. Sci. and Systems, Princeton, 1990, pages 654-659, 1990. Keyword(s): neural networks, boolean systems.


  27. E.D. Sontag. Remarks on interpolation and recognition using neural nets. In NIPS-3: Proceedings of the 1990 conference on Advances in neural information processing systems 3, San Francisco, CA, USA, pages 939-945, 1990. Morgan Kaufmann Publishers Inc.. Keyword(s): neural networks.


  28. E.D. Sontag and H.J. Sussmann. Remarks on local minima in backpropagation. In Proc. Conf. Info. Sciences and Systems, Johns Hopkins University Press, 1989, pages 432-435, 1989. Keyword(s): neural networks.


Internal reports
  1. E.D. Sontag. Sigmoids distinguish more efficiently than Heavisides. Technical report SYCON-89-12, Rutgers Center for Systems and Control, 1989. Keyword(s): neural networks.


  2. E.D. Sontag. Some remarks on the backpropagation algorithm for neural net learning. Technical report SYCON-88-02, Rutgers Center for Systems and Control, 1988. [PDF] Keyword(s): neural networks, neural networks.
    Abstract:
    This is a very old informal report that discusses the study of local minima of quadratic loss functions for fitting errors in sigmoidal neural net learning. It also includes several remarks concerning the growth of weights during gradient descent. There is nothing very interesting here - far better knowledge is now available - but the report was placed here by request.



BACK TO INDEX




Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.




Last modified: Thu Nov 23 10:40:56 2017
Author: sontag.


This document was translated from BibTEX by bibtex2html