(0)

学术报告-Novel Geometric Algorithms for Learning from Big Data

来源:计算机科学与技术学院作者: 发布时间:2017-07-07 浏览次数:6

中国矿业大学学术报告

报告题目:Novel Geometric Algorithms for Learning from Big Data

人: 徐金辉 教授,博导纽约州立大学布法罗分校

    间: 2017 79 下午4:00-5:00 

    点:计算机学院计B518

主办单位:计算机学院

欢迎全校师生踊跃参加!

报告摘要:Big Data provide great potential for us to improve many aspects of our daily life. However, due to their intrinsic nature, they also impose many obstacles for us to fully utilize them. In this talk, I will address several such obstacles and present our geometric solutions. Particularly, I will use constrained clustering as an example to illustrated our ideas. Constrained clustering is a commonly encountered unsupervised learning problem. Due to the additional constraints, its optimal solution no longer has the so-called locality property, which is a key property required by many existing algorithms for achieving quality guaranteed solutions. To overcome the difficulty caused by the loss of locality, we present a unified framework, called Peeling-and-Enclosing, to iteratively solve two variants of the constrained clustering problems, constrained k-means clustering (k-CMeans) and constrained k-median clustering (k-CMedian). Our framework is based on a standalone geometric technique, called Simplex Lemma, which enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or its surrounding region), and thus can be used to handle high dimensional data. With this technique, our framework generates, in nearly linear time, a small set of candidates for the k means or k medians, and ensures that one of them achieves a  (1 + ε)- approximation. Combining this unified framework with a problem-specific selection algorithm, we obtain a (1 + ε)-approximation for each constrained clustering problem. Our framework improves considerably the best known results for many constrained clustering problems, including some longstanding open problems.