Research Project: Collaborative Research: New Perspectives on Deep Learning: Bridging Approximation, Statistical, and Algorithmic Theories
Loading...
Date
Authors
Principal Investigators
Co-Principal Investigators
- Devore, Ronald
Department
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract or Project Summary
Deep Learning (DL) has led to a renaissance in neural network methods in data-driven science and engineering. The development of DL systems and applications, including computer vision and natural language understanding, has been led primarily by experiments and engineering practice. Mathematical analysis has only begun to provide insights into these complex machine learning systems. The lack of basic understanding has contributed to serious challenges and shortcomings ranging from the fragility and susceptibility to corrupted data to their uninterpretable behaviors. These problems can be traced to fundamental gaps in the mathematical understanding of DL. This project tackles this challenge by bringing approximation, statistical, and algorithmic theories together to develop new mathematical foundations for DL. The goals of the project are to mathematically characterize the strengths and limitations of DL models, and to understand the properties of DL models trained using examples of desired behavior (training data) as well as the tradeoffs between the performance of DL systems and the training dataset size. While DL is already in widespread use, the continued success of DL requires far more complete mathematical understandings and principled approaches to guide its use and reliable application. The project will provide practitioners with clearer guidance on the strengths, limitations, and best approaches to using DL. Broader impacts of the project also include education and mentoring, including the training of graduate students in mathematical fields such as approximation theory, signal processing, statistics, and machine learning and, most importantly, how these fields collectively inform the theory and practice of DL.
DL seeks to learn an unknown function from data using compositions (layers) of linear combinations of simple functions (neurons). The shortcomings of DL can be traced to fundamental gaps in its mathematical theory including the following issues. The function spaces that capture the salient properties of DL applications are poorly understood. The characteristics of functions learned through neural network training are mysterious. The ability of DL models to discriminate between data distributions has not yet been quantified satisfactorily. Understanding of the tradeoffs between accuracy and training set size is lacking. This project tackles these challenges by bringing approximation, statistical, and algorithmic theories together to develop new theoretical foundations for DL. This project builds innovative bridges between approximation theory, nonparametric statistics, learning theory and algorithms to develop new mathematical foundations for DL. This includes the development of new model classes of functions that are naturally suited to characterize the properties, strengths, and limitations of deep neural network architectures and applications; novel approaches to understand the roles of regularization and sparsity in DL; fundamental frameworks to quantify the discrimination power of DL and generalized adversarial networks; and innovative theory to make DL algorithms more data efficient through the use of side-information, partial differential equations, and richer forms of data than the conventional function evaluations.
Description
Grant