RESEARCH INTERESTS:
David van Dyk's scholarly work focuses on methodological and computational issues involved with the Bayesian analysis of highly structured statistical models and emphasizes serious interdisciplinary research, especially in astrophysics, solar physics, and particle physics. Today’s data analysis pipelines often involve a network of research groups, where the output from one group’s analysis is an input for subsequent analyses. Van Dyk is interested in the development of principled methods for uncertainty quantification in such settings, particularly when researchers in the analysis chain use different but possibly overlapping data sets and/or employ non-congenial model assumptions. In the context of Astrostatistics, he aims to develop models and methods that capture the complexities inherent in astrophysical data, including varying detector sensitivities, observational errors, background contamination, selection effects, overlapping sources, etc. These methods must be robust to patterns of missing data and/or non-representative data. The overall models typically exhibit multilevel or hierarchical structures and combine computationally efficient data-driven or learning methods with science-driven models. The overall models are designed to leverage efficient computational techniques while enabling principled estimation and uncertainty quantification for scientifically meaningful parameters. Model selection, model checking, and sensitivity-analysis techniques are fundamental in the context of these models.
Van Dyk is particularly interested in using the complexity of the data, instruments, and models used in astrophysics to develop general-purpose statistical methods, for example to improve the efficiency of computationally intensive methods involving data augmentation, such as EM-type algorithms and various Markov chain Monte Carlo methods.
Professor van Dyk founded and continues to coordinate the CHASC International Center for Astrostatistics. This group includes Astronomers and Statisticians and provides open forums for discussing statistical issues that arise in astrophysical research.
INTRODUCTORY READING:
In these pages you can find a (nearly) complete list of David van Dyk's publications. Here we mention a selection of his research articles that serve as an introduction to various threads of his work.
A recent paper, for example, describes a general-purpose method for correcting for non-representative training sets in machine learning using techniques for causal inference in observational studies (Download paper). A 2018 JASA paper shows how work accounting for calibration uncertainty in astronomical instruments led to a new family of multiplicative shrinkage estimators (Download paper). Work on a common statistical testing problem in particle physics where the standard regularity conditions of the Likelihood Ratio Test fail led van Dyk and collaborators to a new general testing strategy (Download paper). To address computational challenges arrising in the Bayesian fitting of complex astrophysical models van Dyk developed a number of new Markov chain Monte Carlo technique (e.g., the Partially-Colapsed Gibb Sampler, the MH within PCG sampler, and the Repelling-Attracting Metropolis Algorithm).
An introduction to Professor van Dyk's earlier work on efficient statistical computation can be found in his review paper on EM-type algorithms and Data Augmentation algorithms, jointly authored with Xiao-Li Meng (Download paper).
If you are interested in van Dyk's work on statistical methods in astronomy, you might want to read the CHASC review paper on the analysis of high-resolution high-energy astrophysical spectra and images (Download paper) or a 2008 AoAS paper on the analysis of stellar evolution using color magnitude diagrams (Download paper). His 2014 reivew paper describes the statistical methods used by particle physicists in the serach for and discovery of the Higgs Boson (Download paper).
The gravitational field of a galaxy can act as a lens and deflect the light emitted by a more distant object, sometiems causing multiple images of the same object to appear in the sky. Since the light in each gravitationally lensed image traverses a different path length from the source to the Earth, fluctuations in the source brightness are observed in the several images at different times. The time delay between these fluctuations can be inferred from the time series of brightness data or light curves of each image. This delay can then be used to constrain cosmological parameters. This apaper introduces a new Bayesian method to accurately estimate this time delay.
In this paper you can read about how Bayesian hierarchical modeling of observations of a specific type of super nova can be used to map the expansion history of the Universe. Finally, a 2024 paper shows how modern machine learning methods can be used to build a Bayesain prior distribuiton that dramatically improves the quality of fitted images of faint high-energy sources sources while enabling the quantifiation of uncertainty of image features (Download paper).