Monday, January 13, 2025
Click arrows to toggle for more information.
9:40 am
Welcome & Introduction
Yisong Yue, Caltech
9:45 am
AI-Assisted Approaches to Data Collection and Inference
Tijana Zrnic
Recent breakthroughs in AI offer tremendous potential to reduce the costs of data collection. For example, there is a growing interest in leveraging large language models (LLMs) as efficient substitutes for human judgment in tasks such as model evaluation and survey research. However, AI systems are not without flaws—generative language models often lack factual accuracy, and predictive models remain vulnerable to subtle perturbations. These issues are particularly concerning when critical decisions, such as scientific discoveries or policy choices, rely on AI-generated outputs. In this talk, I will present recent and ongoing work on AI-assisted approaches to data collection and statistical inference. Rather than treating AI as a replacement for data collection, our methods leverage AI to strategically guide data collection and improve the power of subsequent inferences, all the while retaining provable validity guarantees. I will demonstrate the benefits of this methodology through examples from computational social science, proteomics, and more.
11:00 am
Towards an algorithmic theory for high-dimensional statistical inference
Sidhanth Mohanty
When does a statistical inference problem admit an efficient algorithm? There is an emergent body of research that studies this question by trying to understand the power and limitations of various algorithmic paradigms in solving statistical inference problems; for example, convex programming, Markov chain Monte Carlo (MCMC) algorithms, and message passing algorithms to name a few.
Of these, MCMC algorithms are easy to adapt to new inference problems and have shown strong performance in practice, which makes them promising as a universal algorithm for inference. However, provable guarantees for MCMC have been scarce, lacking even for simple toy models of inference.
In this talk, I will survey some recent strides I made with collaborators on achieving provable guarantees for MCMC in inference, along with some of the mathematical tools we developed to study the behavior of slow-mixing Markov chains.
1:30 pm
Many Facets of Boolean Function Analysis
Kewen Wu
Boolean functions are perhaps the most basic objects in computer science. Analyzing them has led to numerous exciting questions, rich theories, and surprising connections to other topics.
Among many facets of Boolean function analysis, I will present my explorations on
(1) "How efficient can proofs be"
Probabilistic checkable proof (PCP) is a short proof that can be checked by reading constant many locations. The celebrated PCP theorem asserts that the Boolean formula satisfiability problem has a PCP of binary alphabet and polynomial length. Relying on structured reduction and property testing of Boolean functions, my work provides a smooth trade-off between the alphabet size and proof length in the PCP theorem. This established the so-called "Parameterized Inapproximability Hypothesis" and received Best Paper award at STOC'24.
(2) "Inevitable structure in chaos"
Ramsey theory is a theory on finding structures in chaos. As a famous problem in Ramsey theory back in 1960, the sunflower conjecture aims to find sunflower structures in an arbitrarily large system of small sets. Using pseudorandomness and switching lemma from Boolean function analysis, my work significantly advanced the quantitative bound in the sunflower conjecture, which subsequently led to breakthroughs in theoretical computer science and mathematics. This work received Best Paper award at STOC'20 and was later published in Annals of Mathematics.
(3) "Quantum advantage, security, and cost"
Using various tools from Boolean functions, my works established unconditional and exponential quantum advantages in many models and provided rigorous and optimal (post-quantum) security for generic cryptographic heuristics. To save expensive quantum resources in practice, my work also designed optimal quantum circuits for them, which has been patented by Google.
2:45 pm
Finding the line between quantum and classical
Ewin Tang
Quantum computers—computers which exploit quantum mechanics—are poised to reshape the landscape of computation. But understanding when 'quantum' can help us solve algorithmic tasks is tricky, particularly for those tasks with the greatest potential for real-world impact. In this talk, I will survey my research in quantum algorithms to understand: where will a quantum computer be useful? And where will we instead be better off with our usual, classical computers?
4:00 pm
Learning Theoretic Foundations for Modern (Data) Science
Allen Liu
In this talk, I will explain how fundamental problems in computational learning theory are at the core of modern problems in machine learning and scientific applications and how algorithmic insights in mathematically tractable models can inspire new solutions in a wide variety of domains.
First, I will explore algorithmic foundations for model stealing of language models. Model stealing, where a learner tries to recover an unknown model through query access, is a critical problem in machine learning, as it threatens the security of proprietary models. Here, we aim to build a theoretical foundation. We give an efficient algorithm for learning any low-rank language model from queries, showing that a natural notion of the "rank" of a language model provides a useful proxy for the complexity of stealing.
I will then introduce Hamiltonian learning, a central computational task towards understanding and benchmarking quantum systems. Crucially, Hamiltonian learning is most relevant in the low-temperature regime, where interesting quantum phenomena arise and yet previous works gave efficient algorithms only at high temperatures. We give an efficient algorithm at any temperature. I will highlight how the lens of learning theory plays a key role in identifying and circumventing previous barriers.