2:35pm, Wege Auditorium
Building Deletion-Compliant Data Systems
Data-intensive applications fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as second-class citizens. A delete inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM-engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, these LSM-engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written.
In this talk, I will present Lethe, an LSM-based key-value storage, that introduces a set of delete-aware compaction policies and a new physical data storage layout, which together, allows for efficient and timely persistent deletion of data. Lethe is capable of supporting any user-defined threshold for the delete persistence latency offering higher read throughput and lower space amplification, with a modest increase in write amplification. In addition, by physically storing data in an interweaved order on the sort and the delete key, Lethe supports efficient range deletes on a secondary (delete) key without employing a costly full tree merge.
Subhadeep Sarkar is an assistant professor of Computer Science at Brandeis University. His current research interests are data systems, storage layouts, access methods, and their intersection with data privacy. Before joining Brandeis, Subhadeep spent 7 years as a post-doctoral researcher – 5 years at Boston University and, prior to that, 2 years at INRIA Rennes, France. He completed his Ph.D. in 2017 from the IIT Kharagpur. Subhadeep’s research results are published at leading database conferences and journals, including ACM SIGMOD, VLDB, EDBT, IEEE ICDE, and ACM Transactions on Database Systems.