Learning (sum of) low-dimensional polynomials with SGD [slides]
Feature learning in two-layer neural networks [slides]
Precise learning curve of overparameterized models [slides]
* denotes alphabetical ordering or equal contribution. For complete list see [Google Scholar].
Learning multi-index models with neural networks via mean-field Langevin dynamics. Alireza Mousavi-Hosseini, Denny Wu, and Murat A. Erdogdu. Preprint.
Pretrained transformer efficiently learns low-dimensional target functions in-context. Kazusato Oko*, Yujin Song*, Taiji Suzuki*, and Denny Wu*. NeurIPS 2024.
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit. Jason D. Lee*, Kazusato Oko*, Taiji Suzuki*, and Denny Wu*. NeurIPS 2024.
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations. Kazusato Oko*, Yujin Song*, Taiji Suzuki*, and Denny Wu*. COLT 2024.
Nonlinear spiked covariance matrices and signal propagation in deep neural networks. Zhichao Wang, Denny Wu, and Zhou Fan. COLT 2024.
SILVER: single-loop variance reduction and application to federated learning. Kazusato Oko, Shunta Akiyama, Denny Wu, Tomoya Murata, and Taiji Suzuki. ICML 2024.
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data. Atsushi Nitanda*, Kazusato Oko*, Taiji Suzuki*, and Denny Wu*. ICLR 2024.
Why is parameter averaging beneficial in SGD? An objective smoothing perspective. Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, and Denny Wu. AISTATS 2024.
Gradient-based feature learning under structured data. Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, and Murat A. Erdogdu. NeurIPS 2023.
Learning in the presence of low-dimensional structure: a spiked random matrix perspective. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Zhichao Wang*, and Denny Wu*. NeurIPS 2023.
Convergence of mean-field Langevin dynamics: time-space discretization, stochastic gradient, and variance reduction. Taiji Suzuki, Denny Wu, and Atsushi Nitanda. NeurIPS 2023.
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond. Taiji Suzuki, Denny Wu, Atsushi Nitanda, and Kazusato Oko. NeurIPS 2023.
Primal and dual analysis of entropic fictitious play for finite-sum problems. Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, and Taiji Suzuki. ICML 2023.
Uniform-in-time propagation of chaos for the mean-fieldLangevin dynamics. Taiji Suzuki, Atsushi Nitanda, and Denny Wu. ICLR 2023.
High-dimensional asymptotics of feature learning: how one gradient step improves the representation. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Zhichao Wang*, Denny Wu*, and Greg Yang*. NeurIPS 2022.
Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime. Naoki Nishikawa, Taiji Suzuki, Atsushi Nitanda, and Denny Wu. NeurIPS 2022.
Convex analysis of the mean-field Langevin dynamics. Atsushi Nitanda, Denny Wu, and Taiji Suzuki. AISTATS 2022.
Particle stochastic dual coordinate ascent: exponential convergent algorithm for mean-field neural network optimization. Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, and Denny Wu. ICLR 2022.
Understanding the variance collapse of SVGD in high dimensions. Jimmy Ba*, Murat A. Erdogdu*, Marzyeh Ghassemi*, Taiji Suzuki*, Shengyang Sun*, Denny Wu*, and Tianzong Zhang*. ICLR 2022.
Particle dual averaging: optimization of mean field neural networks with global convergence rate analysis. Atsushi Nitanda, Denny Wu, and Taiji Suzuki. NeurIPS 2021.
When does preconditioning help or hurt generalization? Shun-ichi Amari*, Jimmy Ba*, Roger Grosse*, Xuechen Li*, Atsushi Nitanda*, Taiji Suzuki*, Denny Wu*, and Ji Xu*. ICLR 2021.
On the optimal weighted $\ell_2$ regularization in overparameterized linear regression. Denny Wu* and Ji Xu*. NeurIPS 2020.
Generalization of two-layer neural networks: an asymptotic viewpoint. Jimmy Ba*, Murat A. Erdogdu*, Taiji Suzuki*, Denny Wu*, and Tianzong Zhang*. ICLR 2020.
Stochastic runge-kutta accelerates langevin monte carlo and beyond. Xuechen Li, Denny Wu, Lester Mackey, and Murat A. Erdogdu. NeurIPS 2019.
Post selection inference with incomplete maximum mean discrepancy estimator. Makoto Yamada*, Denny Wu*, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, and Kenji Fukumizu. ICLR 2019.