2021-03-01 | Peilin Zhong:Processing Massive Datasets via Efficient Data Representations



The computation on massivedata has been the foundation of many important breakthroughs in machinelearning and data mining. In this talk, I will show how to process massive datavia efficient data representations.

As a concrete example, I willpresent and dive into a recent result (STOC'20) that develops a(1+\varepsilon)-approximate parallel algorithm for computing shortest paths inundirected graphs, using \poly(\log n) parallel time and m\poly(\log n)work/total space for n-nodes m-edges graphs. For (1+\varepsilon)-approximation,all prior algorithms with \poly(\log n) parallel time require at least\Omega(mn^{c}) work/total space for some constant c>0. Improving thislong-standing upper bound obtained by Cohen (STOC'94) has been open for morethan 25 years. We developed several new tools of independent interest. One ofthem is an efficient representation of the input graph --- low hop emulator ---a \poly(\log n)-approximate emulator graph in which every shortest path has atmost O(\log\log n) hops (edges).

In the talk, I will also givea brief overview of how to use efficient data representations to developalgorithms for other modern machine learning tasks such as training generativeadversarial networks.



31  10:00--11:00



Peilin Zhong is afifth year Ph.D. student in computer science at Columbia University (Undersupervision of Alexandr Andoni, Clifford Stein, and Mihalis Yannakakis). He waspart of the Yao Class at Tsinghua University and graduated in 2016 with abachelor’s degree in engineering. He is one of the recipients of the 2019Google PhD fellowship.

Hisresearch interests are mainly in algorithm design and analysis. In particular,he is interested in parallel and massively parallel algorithms, sketching,streaming algorithms, graph algorithms, machine learning, high dimensionalgeometry, metric embedding, and other algorithms related to large-scale datacomputation.



ZoomID: 65928464685