Below are the abstracts for the talks, in the order of the presentation.
Convergence and Optimality of the EM Algorithm Under Multi-Component Gaussian Mixture Models
Gaussian mixture models (GMMs) are fundamental statistical tools for modeling heterogeneous data. Due to the nonconcavity of the likelihood function, the Expectation-Maximization (EM) algorithm is widely used for parameter estimation of each Gaussian component. Existing analyses of the EM algorithm’s convergence to the true parameter focus on either the two-component case or multi-component settings with known mixing probabilities and isotropic covariance matrices.
In this work, we study the convergence of the EM algorithm for multi-component GMMs in full generality. The population-level EM is shown to converge to the true parameter when the smallest separation among all pairs of Gaussian components exceeds a logarithmic factor of the largest separation and the reciprocal of the minimal mixing probabilities. At the sample level, the EM algorithm is shown to be minimax rate-optimal, up to a logarithmic factor. We develop two distinct novel analytical approaches, each tailored to a different regime of separation, reflecting two complementary perspectives on the use of EM. As a byproduct of our analysis, we show that the EM algorithm, when used for community detection, also achieves the minimax optimal rate of misclustering error under milder separation conditions than spectral clustering and Lloyd’s algorithm, an interesting result in its own right. Our analysis allows the number of components, the minimal mixing probabilities, the separation between Gaussian components and the dimension to grow with the sample size. Simulation studies corroborate our theoretical findings.
Sparsity Meets Low-Rank: A Stein-Based Framework for Unbiased Risk Estimation in Matrix Denoising
Low-rank matrix approximation serves as a cornerstone for recovering structured data in fields from medical imaging to recommendation systems. However, its performance hinges on the delicate calibration of regularization parameters, a task that remains non-trivial for estimators defined as solutions to regularized problems. This work leverages Stein’s Unbiased Risk Estimate (SURE) to provide a principled framework for risk estimation of spectral shrinkage estimators. By characterizing the differentiability of these estimators, we derive an intuitive interpretation of the risk formula by linking degrees of freedom to a notion of effective matrix rank. We also extend this framework to a broad class of spectral operators, specifically addressing singular value sparsity and slow decay patterns. Finally, we propose extensions for weighted cost functions and non-Gaussian noise, offering a path towards more robust low-rank methods that can be applied to high-stakes applications like clinical imaging.
Scalable Gaussian Process Inference via Deep Generative Models: A New Framework for High-Cadence Time Series
Daily Severity Rating (DSR), a component of the Canadian Fire Weather Index, reflects the intensity and persistence of fire-conducive conditions. This study evaluates whether extreme DSR has increased significantly in British Columbia from 1981-2023 and whether a structural change point marks an acceleration consistent with climate-driven warming. Using a 20×20 km grid derived from blended station and reanalysis data, we analyze the upper tail of the DSR distribution via annual 95th percentiles and quantify persistence using sliding windows of exposure days. Distribution-free scan permutation tests are applied to detect statistically significant change points. climatic data was clustered to reveal internally homogeneous spatial fire-weather clusters. Cluster specific change-point analyses were then conducted to identify localized shifts in upper tail persistence that may be masked in province wide summaries. Results identify a significant upward shift in extreme DSR and increased persistence of exposure windows, indicating an emerging regime of more sustained and severe fire-weather conditions across the province.
Hierarchical Bayesian Copula Model for Probabilistic Population Projection
Population forecasts inform critical decisions in public policy and economic planning, yet existing state-of-the-art methods often underestimate predictive uncertainty by modeling key demographic variables, such as fertility rates and life expectancy, as independent stochastic processes. This work proposes a hierarchical Bayesian framework for probabilistic population forecasting that models the joint dynamics of fertility and mortality across countries, regions, and time. Dependence structure will be represented using copulas, which allow flexible joint modeling while preserving interpretable marginal structures. The hierarchical design enables partial pooling across countries and regions, allowing countries with sparse data to be partially informed by broader regional patterns of demographic change while retaining country-specific trajectories. By producing more faithful joint predictive uncertainty quantification, this work delivers uncertainty-aware population projections that better support evidence-based policy and equitable decision-making for all countries worldwide. This project is still in progress.
A Spatiotemporal Analysis of Extreme Fire Weather in British Columbia: Identifying a Structural Change Point from 1981 to 2023
Daily Severity Rating (DSR), a component of the Canadian Fire Weather Index, reflects the intensity and persistence of fire-conducive conditions. This study evaluates whether extreme DSR has increased significantly in British Columbia from 1981-2023 and whether a structural change point marks an acceleration consistent with climate-driven warming. Using a 20×20 km grid derived from blended station and reanalysis data, we analyze the upper tail of the DSR distribution via annual 95th percentiles and quantify persistence using sliding windows of exposure days. Distribution-free scan permutation tests are applied to detect statistically significant change points. climatic data was clustered to reveal internally homogeneous spatial fire-weather clusters. Cluster specific change-point analyses were then conducted to identify localized shifts in upper tail persistence that may be masked in province wide summaries. Results identify a significant upward shift in extreme DSR and increased persistence of exposure windows, indicating an emerging regime of more sustained and severe fire-weather conditions across the province.
Adaptive Window Selection for Financial Risk Forecasting
Risk forecasts in financial regulation and internal management are calculated through historical data. The unknown structural changes of financial data poses a substantial challenge in selecting an appropriate look-back window for risk modeling and forecasting. We develop a data-driven online learning method, called the bootstrap-based adaptive win-dow selection (BAWS), that adaptively determines the window size in a sequential manner. A central component of BAWS is to compare the realized scores against a data-dependent threshold, which is evaluate based on an idea of bootstrap. The proposed method is applicable to the forecast of risk measures that are elicitable individually or jointly, such as the Value-at-Risk (VaR) and the pair of the VaR and the corresponding Expected Shortfall. Through simulation studies and empirical analyses, we demonstrate that BAWS generally outperforms the standard rolling window approach and the recently developed method of stability-based adaptive window selection, especially when there are structural changes in the data-generating process.
Dynamic Pareto Optima in Multi-Period Pure-Exchange Economies
We study a problem of optimal allocation in a discrete-time multi-period pure-exchange economy, where agents have preferences over stochastic endowment processes that are represented by strongly time-consistent dynamic risk measures. We introduce the notion of dynamic Pareto-optimal allocation processes and show that such processes can be constructed recursively starting with the allocation at the terminal time. We further derive a comonotone improvement theorem for allocation processes, and we provide a recursive approach to constructing comonotone dynamic Pareto optima when the agents’ preferences are coherent and satisfy a property that we call equidistribution-preserving. In the special case where each agent’s dynamic risk measure is of the distortion type, we provide a closed-form characterization of comonotone dynamic Pareto optima. We illustrate our results in a two-period setting.
A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers
Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.
Sequential Probability Assignment against Smoothed Adversaries with Unknown Base Measure
Smoothed online learning has recently been studied as a way to bypass hardness results for the fully adversarial setting, which can be overly pessimistic. In this framework, the adversary is constrained in the sense that contexts are generated by distributions whose densities have to be bounded with respect to some base measure \(\mu\). Most prior work makes the strong assumption that \(\mu\) is known to the learner, with notable exceptions including Block et al. (2024) and Blanchard (2025). In this paper, we study sequential probability assignment (a.k.a. online learning with log loss) with smooth, well-specified data in the more general setting where \(\mu\) is , going beyond the Lipschitz loss studied in Block et al. (2024) and Blanchard (2025). Our main result is a regret upper bound in terms of the , a notion that has been shown to characterize the complexity of learning with i.i.d. data (Bilodeau et al., 2023). We also prove a matching lower bound showing that our upper bound is essentially tight for a broad range of classes, which also implies a separation in the difficulty of smoothed online learning between regimes where \(\mu\) is known and where it is unknown.
The abstracts are listed in alphabetical order.
Reducing Dimensionality and Multicollinearity in K-mer Data Using Correlation-Based Clustering and Penalized Regression
K-mers are nucleotide sequences derived from DNA and serve as biomarkers for detecting pathogens and antimicrobial resistance. However, k-mer data is high dimensional, sparse and highly collinear, limiting the performance of traditional penalized regression models. To address these challenges, a two-stage framework is proposed. In stage one, pairwise kendall correlations among k-mers are computed and transformed into a pseudo-Euclidean distance to cluster co-occurring k-mers, reducing multicollinearity and dimensionality. In stage two, representative k-mers from each cluster are selected for a penalized regression model where balanced resampling and stability selection are used to address severe class imbalance and enhance robustness. This framework is applied to 156 swine microbiome samples containing over 26,000 k-mers to predict swine type (piglet or sow) and farm type (pasture or conventional), improving model stability, interpretability and preserving biological relevance.
Predicting and improving test-time scaling laws via reward tail-guided search
Test-time scaling has emerged as a critical avenue for enhancing the reasoning capabilities of Large Language Models (LLMs). Though the straight-forward ``best-of-\(N\)’’ (BoN) strategy has already demonstrated significant improvements in performance, it lacks principled guidance on the choice of \(N\), budget allocation, and multi-stage decision-making, thereby leaving substantial room for optimization. While many works have explored such optimization, rigorous theoretical guarantees remain limited. In this work, we propose new methodologies to predict and improve scaling properties via tail-guided search.
By estimating the tail distribution of rewards, our method predicts the scaling law of LLMs without the need for exhaustive evaluations. Leveraging this prediction tool, we introduce Scaling-Law Guided (SLG) Search, a new test-time algorithm that dynamically allocates compute to identify and exploit intermediate states with the highest predicted potential.
We theoretically prove that SLG achieves vanishing regret compared to perfect-information oracles, and achieves expected rewards that would otherwise require a polynomially larger compute budget required when using BoN. Empirically, we validate our framework across different LLMs and reward models, confirming that tail-guided allocation consistently achieves higher reward yields than Best-of-\(N\) under identical compute budgets. Our code is available at https://github.com/PotatoJnny/Scaling-Law-Guided-search.
Fast computation and marginal density estimation in nonparametric exponential family mixtures
We study the computational and statistical properties of the approximate nonparametric maximum likelihood estimator (NPMLE) for a broad class of exponential family mixture models. This framework includes Gaussian location mixtures and scaled chi-square mixtures as important special cases. We first develop a data compression strategy that reduces the computational cost of the approximate NPMLE to logarithmic order in the sample size. We then show that, for a broad class of approximate NPMLEs, the resulting marginal density estimator attains an almost parametric convergence rate.
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates — yielding a non-correlational update rule — and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an “information exponent regime” with small learning rate to a “generative exponent regime” with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.