报告题目:Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix
报告时间: 2022年05月27日 星期五 下午16:00
报告地点:翡翠湖校区科教楼B座1008会议室
报 告 人:郑泽敏(中国科学技术大学)
主办单位:加拿大2.8在线预测飞飞经济学院
报告人简介:郑泽敏,现为中国科学技术大学管理学院教授、统计与金融系主任、博士生导师,其研究方向是高维统计推断和大数据问题。郑泽敏在横跨这一领域的若干关键研究课题上取得了富有创造性的研究成果,目前已有20余篇学术成果发表在国内外权威期刊上,其中包括Journal of the Royal Statistical Society: Series B(JRSSB)、Annals of Statistics(AOS)、Operations Research(OR)、Journal of Machine Learning Research(JMLR)、Journal of Business & Economic Statistics (JBES)等国际统计学、机器学习、计量经济学及管理优化等领域的顶级期刊。他曾获美国数理统计协会颁发的科研新人奖、南加州大学授予的优秀科研奖和中国科大海外校友基金会青年教师事业奖,并入选中组部青年创新人才计划以及福布斯中国U30(30位30岁以下)精英榜。
内容摘要:The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.