学术报告
12月3日 霍晓明教授学术报告
发布时间:2021-12-03
报告题目:Two Statistical Results in Deep Learning
主讲人:霍晓明,美国佐治亚理工学院教授
报告时间:2021年12月3日16:00-17:00
报告地点:实验室312会议室(东校区)
主持:陆遥教授
摘要:
This talk has two parts.
Regularization Matters for Generalization of Overparametrized Deep Neural Network under Noisy Observations. In part one, we study the generalization properties of the overparameterized deep neural network (DNN) with ReLUactivations. Under the non-parametric regression framework, it is assumed that the ground-truth functionis from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLUDNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove that the over parametrized DNN trained by vanilla gradient descent does not recover the ground-truth function. It turns out that the estimated DNN's L2 prediction error is bounded away from 0. As a complement of the above result, we show that the L2-regularized gradient descent enables the overparametrized DNN achieve the minimax optimal convergence rate of the L2 prediction error, without early stopping. Notably, the rate we obtained is faster than the one that is known in the literature.
Directional Bias Helps SGD to Generalize. We study the Stochastic Gradient Descent (SGD) algorithm in kernel regression. Specifically, SGD with moderate and annealing step size converges along the direction corresponding to the large eigenvalue of the Kernel matrix, on the contrary the Gradient Descent (GD) with a moderate or small step size converges along the direction corresponding to the small eigenvalue. For a general squared risk minimization problem, we show that directional bias towards a large eigenvalue of the Hessian (which is the Kernel matrix in our case) results in an estimator that is closer to the ground truth. Adopt this result to kernel regression, the directional bias helps SGD estimator generalize better. This result gives one way to explain how noise helps in generalization when learning with a nontrivial step size, which may be useful for promoting further understanding of stochastic algorithms in deep learning.
主讲人简介:
霍晓明,1993年获中国科学技术大学数学学士学位,1997年和1999年分别获得斯坦福大学电气工程研究生和统计学博士学位。自1999年8月以来,在全美排名第一的佐治亚理工学院工业与系统工程学院任助理和全职教授。其研究兴趣包括统计理论、统计计算以及与数据分析相关的问题。其在诸如稀疏表示、小波和可检测性统计问题等领域做出了许多贡献,相关论文刊登在顶级期刊,其中一些被高度引用。霍教授自2004年5月起担任IEEE fellow,于2004年9月担任IPAM研究员。