Biography
Education
-
Ph.D., Computer Science and Engineering, CUHK
2019 - 2023
-
B.Eng., with First Class Honor, Computer Engineering, CUHK
2014 - 2019
My research includes -
-
Big data systems: timeseries management system and databases.
-
Storage engines: LSM-tree-based key-value stores.
-
File systems and in-storage computing.
Publications
-
MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and Querying.
Zhiqi Wang, and Zili Shao.
The 43rd ACM SIGMOD International Conference on Management of Data (SIGMOD 2024)(CCF-A).
[paper]
-
ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency.
Zhiqi Wang, and Zili Shao.
The 42nd ACM SIGMOD International Conference on Management of Data (SIGMOD 2023)(CCF-A).
[code] [paper]
-
TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud Storage.
Zhiqi Wang, and Zili Shao.
The 41st ACM SIGMOD International Conference on Management of Data (SIGMOD 2022)(CCF-A).
[code] [paper]
-
Heracles: An Efficient Storage Model and Data Flushing for Performance Monitoring Timeseries.
Zhiqi Wang, Jin Xue, and Zili Shao.
The 47th International Conference on Very Large Data Bases (VLDB 2021)(CCF-A), Volume 14(6), 1080-1092.
[code] [paper]
-
A Spatio-Temporal Series Data Model with Efficient Indexing and Layout for Cloud-Based Trajectory Data Management.
Yang Guo, Zhiqi Wang, Jin Xue, and Zili Shao.
The 40th International Conference on Data Engineering (ICDE 2024)(CCF-A).
[code] [paper]
-
Lightning Talk: Model, Framework and Integration for In-Storage Computing with Computational SSDs.
Tianyu Wang, Jin Xue, Zelin Du, Zhiqi Wang, Yaotian Cui, and Zili Shao.
The 60th ACM/IEEE Design Automation Conference (DAC 2023)(CCF-A)(invited paper).
[paper]
-
BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems.
Kai Zhang, Zhiqi Wang, and Zili Shao.
Proceedings of the 51st International Conference on Parallel Processing (ICPP 2022)(CCF-B).
[code] [paper]
-
TagTree: Global Tagging Index with Efficient Querying for Time Series Databases.
Jin Xue, Zhiqi Wang, and Zili Shao.
The 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)(CCF-B).
[code] [paper]
Working Experience
-
Postdoctoral Fellow in CUHK
11/2023 - 07/2024
-
Huawei Cloud Database Innovation Lab (internship)
06/2022 - 08/2022
-
Google Summer of Code in Prometheus (internship)
06/2019 - 08/2019
Research Experience
Big Data Systems
-
Timeseries Management Systems
A thorough research on the main design decisions of the timeseries management systems, including the data model, memory data management, and persistent data management.
- Data model: To solve the data redundancy issue of the timeseries data from the same data source, we propose a unified data model for both tags and data samples of timeseries, with a novel compression mechanism and a two-level indexing design.
- Memory data management: To mitigate the memory overhead and maintain more timeseries with limited memory, we design a flexible inverted index that can dynamically adapt its structure to the memory pressure.
- Persistent data management: To achieve high insertion throughput of big timeseries data, we design a dynamic time-partitioned LSM-tree with high insertion throughput, decent space efficiency, and efficient out-of-order data handling.
Storage Engines
-
LSM-Tree-Based Key-Value Stores with Hybrid Cloud Storage
LSM-tree-based key-value stores are widely used as the storage engines of big data systems. As the data volume scales up, it is a natural trend to deploy the system on the cloud. However, the existing LSM-tree designs can not adapt to cloud storage because of the huge performance gap. We design MirrorKV with a balanced read/write performance which separates keys and values into two mirrored LSM-trees for better data locality and read performance, and designs different compaction mechanisms for fast and slow storage to improve write performance.
File Systems and In-Storage Computing
-
A Monolithic Software/Hardware Co-Design Key-Value File System
To mitigate the metadata manipulation overhead and I/O amplification of the traditional file systems designed for block storage, we implement a file system with a key-value interface, which offloads the data management to our computational storage platform.
- Host-side key-value filesystem: It translates the file semantics (inode and page contents) to key-value commands correspondingly.
- Host storage communication: We customize the Linux NVMe driver to bypass the Linux block layer and transmit the key-value commands.
- Storage-side design: We carefully design the flash translation layer (FTL) to handle the received key-value commands and manage the physical area of the SSD.
Awards
-
Dean's List of Faculty of Engineering, CUHK: 2016-2017, 2018-2019
-
CUHK New Asia College Scholarship in 2018
Teaching Experience
CSCI3150: Introduction to Operating Systems
-
Fall 2019
-
Spring, Fall 2020
-
Spring, Fall 2021
Professional Experience
-
Participation & Talks
- SIGMOD 2024, Santiago, Chile
- SIGMOD 2023, Seattle, WA, USA. Remote participation
- SIGMOD 2022, Philadelphia, PA, USA. Remote participation
- VLDB 2021, Copenhagen, Denmark. Remote participation
-
External Reviewer
- Journal
- ACM Transactions on Database Systems (TODS)
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
- Conference
- Design Automation Conference (DAC)
- International Conference on Computer Design (ICCD)
- Design Automation and Test in Europe Conference (DATE)
- International Conference on Computer Aided Design (ICCAD)
- Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)