Browsing by Author "Pati, Debdeep"

Now showing 1 - 2 of 2

Collaborative Research: Scalable Bayesian Methods for Complex Data with Optimality Guarantees
Statistics; TAMU; https://hdl.handle.net/20.500.14641/662; National Science Foundation
Spectacular advances in data acquisition, processing, and storage present the opportunity to analyze datasets of ever-increasing size and complexity in various applications, such as social and biological networks, epidemiology, genomics, and Internet recommender systems. Underlying the massive size and dimension of these data, there is often a parsimonious structure. The Bayesian approach to statistical inference is attractive in this context in terms of incorporating structural assumptions through prior distributions, enabling probabilistic modeling of complex phenomenon, and providing an automatic characterization of uncertainty. This research project aims to advance eliciting and translating prior knowledge regarding the low-dimensional skeleton of big data to provide realistic uncertainty characterizations while maintaining computational efficiency. Bayesian computation poses substantial challenge in high-dimensional and big data problems. The research aims to develop cutting-edge computational strategies and software packages for implementation to be made available publicly. The project involves graduate students in the research. The research project focuses on theoretical foundations and computational strategies for Bayesian methods in high-dimensional and big data problems motivated by applications in social networks and epidemiology. Techniques for systematically developing and evaluating prior distributions in high-dimensional problems will be investigated with a special emphasis on the trade-off between statistical efficiency and computational scalability. Specific directions include efficient algorithms for posterior sampling with shrinkage priors, a theoretical framework for divide and conquer strategies in big data problems, fast algorithms for clustering nodes in large networks with unknown number of communities, and methods for discovering structure in sparse contingency tables. The algorithms will be motivated by rigorous theoretical understanding of the behavior of the posterior distribution with a particular emphasis on proper quantification of uncertainty in a distributed computing framework. Software will be developed for each application.
Prior Calibration and Algorithmic Guarantees Under Parameter Restrictions
Statistics; TAMU; https://hdl.handle.net/20.500.14641/662; National Science Foundation
Statistical learning of many real systems can be significantly enhanced by harnessing and translating domain knowledge into meaningful parameter restrictions. With the advent of high throughput datasets, such restrictions are often present on high-dimensional parameter spaces thereby complicating inference. This research aims to develop novel statistical methods and computational algorithms for such problems drawing motivation from a number of real applications. Working within a nonparametric Bayes framework, the first part of the research project lays emphasis on the importance of calibrating prior distributions in these constrained problems and theoretically quantifying the impact of the constraints on parameter learning. The second part aims to develop efficient Markov Chain Monte Carlo and variational algorithms and analyze their convergence behaviors for the said problems. The PIs will also propose undergraduate courses that will focus on the modeling and applied components of Bayesian methods. When teaching the courses, the PIs will use daily life as well as scientific examples across different disciplines to inspire students' learning. The Activity-Based Learning (ABL) courses aim to enrich students' academic experience and learning outcomes by connecting theory with practice and concepts with methods, using data & insights obtained through engagement with the larger world. The research project is motivated by statistical and computing challenges posed by a number of real scientific applications where various complex restrictions are posed on key parameters, necessitating novel statistical methods and associated computational algorithms. Operating in a Bayesian paradigm which enables incorporation of various constraints in a principled framework and provides readily available uncertainty estimates often sought after in scientific applications, a major emphasis will be laid on calibration of prior distributions under these constrained spaces. Examples will be provided where seemingly innocuous prior choices routinely used in practice can lead to biased inferences in certain specific situations. A rigorous theoretical understanding of such phenomenon will be provided along with development of alternative default priors on these constrained spaces. The methodological and theoretical developments will be accompanied by efficient computational algorithms using novel approximation techniques in the context of Markov chain Monte Carlo and variational algorithms that meet the scalability demanded by the specific applications and beyond. The algorithm development will be paralleled by novel convergence analysis, bridging ideas between the optimization and sampling literature. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.