Biography

Education

Ph.D., Computer Science and Engineering, CUHK 2019 - 2023
B.Eng., with First Class Honor, Computer Engineering, CUHK 2014 - 2019

Research Interests

  • Big data systems: timeseries management system and databases.
  • Storage engines: LSM-tree-based key-value stores.
  • File systems and in-storage computing.

Publications

MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and Querying

Z. Wang, and Z. Shao

The 43rd ACM SIGMOD International Conference on Management of Data (SIGMOD 2024)(CCF-A)

ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency

Z. Wang, and Z. Shao

The 42nd ACM SIGMOD International Conference on Management of Data (SIGMOD 2023)(CCF-A)

TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud Storage

Z. Wang, and Z. Shao

The 41st ACM SIGMOD International Conference on Management of Data (SIGMOD 2022)(CCF-A)

Heracles: An Efficient Storage Model and Data Flushing for Performance Monitoring Timeseries

Z. Wang, J. Xue, and Z. Shao

The 47th International Conference on Very Large Data Bases (VLDB 2021)(CCF-A), Volume 14(6), 1080-1092

A Spatio-Temporal Series Data Model with Efficient Indexing and Layout for Cloud-Based Trajectory Data Management

Y. Guo, Z. Wang, J. Xue, and Z. Shao

The 40th International Conference on Data Engineering (ICDE 2024)(CCF-A)

Lightning Talk: Model, Framework and Integration for In-Storage Computing with Computational SSDs

T. Wang, J. Xue, Z. Du, Z. Wang, Y. Cui, and Z. Shao

The 60th ACM/IEEE Design Automation Conference (DAC 2023)(CCF-A)(invited paper)

BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems

K. Zhang, Z. Wang, and Z. Shao

Proceedings of the 51st International Conference on Parallel Processing (ICPP 2022)(CCF-B)

TagTree: Global Tagging Index with Efficient Querying for Time Series Databases

J. Xue, Z. Wang, and Z. Shao

The 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)(CCF-B)

Working Experience

Postdoctoral Fellow in CUHK 11/2023 - 07/2024
Huawei Cloud Database Innovation Lab (internship) 06/2022 - 08/2022
Google Summer of Code in Prometheus (internship) 06/2019 - 08/2019

Research Experience

Big Data Systems

Timeseries Management Systems

A thorough research on the main design decisions of the timeseries management systems, including the data model, memory data management, and persistent data management.

  • Data model: To solve the data redundancy issue of the timeseries data from the same data source, we propose a unified data model for both tags and data samples of timeseries, with a novel compression mechanism and a two-level indexing design.
  • Memory data management: To mitigate the memory overhead and maintain more timeseries with limited memory, we design a flexible inverted index that can dynamically adapt its structure to the memory pressure.
  • Persistent data management: To achieve high insertion throughput of big timeseries data, we design a dynamic time-partitioned LSM-tree with high insertion throughput, decent space efficiency, and efficient out-of-order data handling.

Storage Engines

LSM-Tree-Based Key-Value Stores with Hybrid Cloud Storage

LSM-tree-based key-value stores are widely used as the storage engines of big data systems. As the data volume scales up, it is a natural trend to deploy the system on the cloud. However, the existing LSM-tree designs can not adapt to cloud storage because of the huge performance gap. We design MirrorKV with a balanced read/write performance which separates keys and values into two mirrored LSM-trees for better data locality and read performance, and designs different compaction mechanisms for fast and slow storage to improve write performance.

File Systems and In-Storage Computing

A Monolithic Software/Hardware Co-Design Key-Value File System

To mitigate the metadata manipulation overhead and I/O amplification of the traditional file systems designed for block storage, we implement a file system with a key-value interface, which offloads the data management to our computational storage platform.

  • Host-side key-value filesystem: It translates the file semantics (inode and page contents) to key-value commands correspondingly.
  • Host storage communication: We customize the Linux NVMe driver to bypass the Linux block layer and transmit the key-value commands.
  • Storage-side design: We carefully design the flash translation layer (FTL) to handle the received key-value commands and manage the physical area of the SSD.

Awards

  • Dean's List of Faculty of Engineering, CUHK: 2016-2017, 2018-2019
  • CUHK New Asia College Scholarship in 2018

Teaching Experience

CSCI3150: Introduction to Operating Systems

  • Fall 2019
  • Spring, Fall 2020
  • Spring, Fall 2021

Professional Experience

Participation & Talks

  • SIGMOD 2024, Santiago, Chile
  • SIGMOD 2023, Seattle, WA, USA. Remote participation
  • SIGMOD 2022, Philadelphia, PA, USA. Remote participation
  • VLDB 2021, Copenhagen, Denmark. Remote participation

External Reviewer

Journal

  • ACM Transactions on Database Systems (TODS)
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

Conference

  • Design Automation Conference (DAC)
  • International Conference on Computer Design (ICCD)
  • Design Automation and Test in Europe Conference (DATE)
  • International Conference on Computer Aided Design (ICCAD)
  • Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)