Emergence and scaling laws in SGD learning of neural networks [slides]
Learning low-dimensional polynomials with SGD [slides]
Feature learning in two-layer neural networks under structured data [slides]
Precise learning curve of overparameterized models [slides]
* denotes alphabetical ordering or equal contribution. For complete list see [Google Scholar].
Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws. Gérard Ben Arous*, Murat A. Erdogdu*, N. Mert Vural*, and Denny Wu*. NeurIPS 2025.
Emergence and scaling laws in SGD learning of shallow neural networks. Yunwei Ren, Eshaan Nichani, Denny Wu, and Jason D. Lee. NeurIPS 2025.
When do transformers outperform feedforward and recurrent networks? a statistical perspective. Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, and Murat A. Erdogdu. NeurIPS 2025.
Learning compositional functions with transformers from easy-to-hard data. Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, and Denny Wu. COLT 2025.
Propagation of chaos in one-hidden-layer neural networks beyond logarithmic time. Margalit Glasgow, Denny Wu, and Joan Bruna. COLT 2025.
Nonlinear transformers can perform inference-time feature learning. Naoki Nishikawa, Yujin Song, Kazusato Oko, Denny Wu, and Taiji Suzuki. ICML 2025.
Metastable dynamics of chain-of-thought reasoning: provable benefits of search, RL and distillation. Juno Kim, Denny Wu, Jason D. Lee, and Taiji Suzuki. ICML 2025.
Learning multi-index models with neural networks via mean-field Langevin dynamics. Alireza Mousavi-Hosseini, Denny Wu, and Murat A. Erdogdu. ICLR 2025.
Pretrained transformer efficiently learns low-dimensional target functions in-context. Kazusato Oko*, Yujin Song*, Taiji Suzuki*, and Denny Wu*. NeurIPS 2024.
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit. Jason D. Lee*, Kazusato Oko*, Taiji Suzuki*, and Denny Wu*. NeurIPS 2024.
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations. Kazusato Oko*, Yujin Song*, Taiji Suzuki*, and Denny Wu*. COLT 2024.
Nonlinear spiked covariance matrices and signal propagation in deep neural networks. Zhichao Wang, Denny Wu, and Zhou Fan. COLT 2024.
SILVER: single-loop variance reduction and application to federated learning. Kazusato Oko, Shunta Akiyama, Denny Wu, Tomoya Murata, and Taiji Suzuki. ICML 2024.
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data. Atsushi Nitanda*, Kazusato Oko*, Taiji Suzuki*, and Denny Wu*. ICLR 2024.
Why is parameter averaging beneficial in SGD? An objective smoothing perspective. Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, and Denny Wu. AISTATS 2024.
Gradient-based feature learning under structured data. Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, and Murat A. Erdogdu. NeurIPS 2023.
Learning in the presence of low-dimensional structure: a spiked random matrix perspective. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Zhichao Wang*, and Denny Wu*. NeurIPS 2023.
Convergence of mean-field Langevin dynamics: time-space discretization, stochastic gradient, and variance reduction. Taiji Suzuki, Denny Wu, and Atsushi Nitanda. NeurIPS 2023.
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond. Taiji Suzuki, Denny Wu, Atsushi Nitanda, and Kazusato Oko. NeurIPS 2023.
Primal and dual analysis of entropic fictitious play for finite-sum problems. Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, and Taiji Suzuki. ICML 2023.
Uniform-in-time propagation of chaos for the mean-field Langevin dynamics. Taiji Suzuki, Atsushi Nitanda, and Denny Wu. ICLR 2023.
High-dimensional asymptotics of feature learning: how one gradient step improves the representation. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Zhichao Wang*, Denny Wu*, and Greg Yang*. NeurIPS 2022.
Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime. Naoki Nishikawa, Taiji Suzuki, Atsushi Nitanda, and Denny Wu. NeurIPS 2022.
Convex analysis of the mean-field Langevin dynamics. Atsushi Nitanda, Denny Wu, and Taiji Suzuki. AISTATS 2022.
Particle stochastic dual coordinate ascent: exponential convergent algorithm for mean-field neural network optimization. Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, and Denny Wu. ICLR 2022.
Understanding the variance collapse of SVGD in high dimensions. Jimmy Ba*, Murat A. Erdogdu*, Marzyeh Ghassemi*, Taiji Suzuki*, Shengyang Sun*, Denny Wu*, and Tianzong Zhang*. ICLR 2022.
Particle dual averaging: optimization of mean field neural networks with global convergence rate analysis. Atsushi Nitanda, Denny Wu, and Taiji Suzuki. NeurIPS 2021.
When does preconditioning help or hurt generalization? Shun-ichi Amari*, Jimmy Ba*, Roger Grosse*, Xuechen Li*, Atsushi Nitanda*, Taiji Suzuki*, Denny Wu*, and Ji Xu*. ICLR 2021.
On the optimal weighted $\ell_2$ regularization in overparameterized linear regression. Denny Wu* and Ji Xu*. NeurIPS 2020.
Generalization of two-layer neural networks: an asymptotic viewpoint. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Denny Wu*, and Tianzong Zhang*. ICLR 2020.
Stochastic runge-kutta accelerates langevin monte carlo and beyond. Xuechen Li, Denny Wu, Lester Mackey, and Murat A. Erdogdu. NeurIPS 2019.
Post selection inference with incomplete maximum mean discrepancy estimator. Makoto Yamada*, Denny Wu*, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, and Kenji Fukumizu. ICLR 2019.