报告题目:Data Analytics as a Service for Data Scientists
报告人: Chen Li教授
所在单位:UC Irvine
主持人: 林学民 教授
报告时间:9月20日 周四 10:30-12:00
报告地点: 中北校区数学馆东201
报告人简介:
Chen Li is a professor in the Department of Computer Science at UC Irvine. He received his Ph.D. degree in Computer Science from Stanford University, and his M.S. and B.S. in Computer Science from Tsinghua University, China, respectively. His research interests are in the field of data management, including data-intensive computing, query processing and optimization, visualization, and text analytics. His current focus is building open source systems for data management and analytics. He was a recipient of an NSF CAREER Award, several test-of-time publication awards, and many grants and industry gifts. He was once a part-time Visiting Research Scientist at Google. He took a roller coaster ride by doing a company to commercialize university research.
报告摘要:
Data scientists and domain experts often face challenges when dealing with large amounts of data, especially due to the scale and limited IT knowledge and infrastructure maintenance skills. In this talk, I will present several software solutions we are developing to support data analytics as a service to these users. These solutions include Apache AsterixDB as an open source parallel database, Cloudberry as a middleware system to support big data visualization, and Texera as a system for text analytics using interactive declarative workflows. These solutions can be integrated to support data ingestion, storage, indexing, querying, visualization, and analytics. As an example, we will report experiences of using these solutions to support management of large-scale social media data (e.g., billions of tweets in terabytes) as a service to researchers of various disciplines such as social science and public health from several schools and universities.