Publications
Building brick by brick
2025
- TPCTCTectonic: Bridging Synthetic and Real-World Workloads for Key-Value BenchmarkingAlexander H. Ott, Shubham Kaushik, Boao Chen, and 1 more author17th TPC Technology Conference on Performance Evaluation & Benchmarking, Sep 2025
Key-value stores are the backbone of many modern SQL- and NoSQL-based data systems, serving a variety of real-world applications. Despite their widespread adoption, existing key-value benchmarks fall short across multiple dimensions when accurately replicating complex and dynamic real-world workloads. For instance, state-of-the-art key-value benchmarks, such as YCSB, KVBench, and db_bench, are unable to (i) emulate dynamic workloads where the workload composition and distribution changes arbitrarily over time; (ii) generate composite keys with different prefix distributions; and (iii) generate workloads with varied degrees of data sortedness. These limitations result in inaccurate performance evaluations and limit the ability to understand how a commercial key-value store performs under dynamically shifting workloads. In this paper, we introduce Tectonic, a Rust-based, highly configurable, and resource-efficient key-value workload generator designed to model the temporal, structural, and dynamic properties of real-world workloads. Tectonic offers (i) fine-grained control over data access patterns for inserts, updates, merges, point and range queries, and point and range deletes; (ii) configurable composite key generation/selection strategies; (iii) dynamic workload generation where the workload properties change over time; and (iv) generation of workloads with user-specified data sortedness. Tectonic does so (v) at a 2× higher throughput than the state-of-the-art, (vi) while recording up to 84% lower main memory footprint. By bridging the gap between synthetic and production workloads, Tectonic enables in-depth analysis of key-value data systems under conditions that better reflect the demands of real-world applications. We benchmark Tectonic’s performance against YCSB and KVBench in terms of latency, resource utilization, and ability to emulate production workloads. The code for Tectonic is available at: https://github.com/SSD-Brandeis/tectonic.
2024
- DBTestAnatomy of LSM Memory Buffer: Insights & ImplicationsShubham Kaushik, and Subhadeep SarkarProceedings of the Tenth International Workshop on Testing Database Systems, Jun 2024
Log-structured merge (LSM) tree is an ingestion-optimized data structure that is widely used in modern NoSQL key-value stores. To support high throughput for writes, LSM-trees maintain an in-memory buffer that absorbs the incoming entries before writing them to slower secondary storage. We point out that the choice of the data structure and implementation of the memory buffer has a significant impact on the overall performance of LSM-based storage engines. In fact, even with the same implementation of the buffer, the performance of a storage engine can vary by up to several orders of magnitude if there is a shift in the input workload. In this paper, we benchmark the performance of LSM-based storage engines with different memory buffer implementations and under different workload characteristics. We experiment with four buffer implementations, namely, (i) vector, (ii) skip-list, (iii) hash skip-list, and (iv) hash linked-list, and for each implementation, we vary any design choices (such as bucket count in a hash skip-list and prefix length in a hash linked-list). We present a comprehensive performance benchmark for each buffer configuration, and highlight how the relative performance of the different buffer implementations varies with a shift in input workload. Lastly, we present a guideline for selecting the appropriate buffer implementation for a given workload and performance goal.
2019
- JCSEFault Modelling of an Object-Oriented System using CPNShubham Kaushik, and RatneshwerInternational Journal of Computer Sciences and Engineering, May 2019
Object-oriented development is a mechanism in which objects provide services to other objects by various means like inheritance, polymorphism, etc. Faults, in object-oriented software, may occur at two levels i.e. object level and interaction level (when one object provides/receives some services from others). A formal representation, of an object-oriented system, may be helpful to understand the behavior of software faults. Faults identification, at earlier stages, may help during the development and testing stages. In this paper, an attempt has been made to model several faults, in an object-oriented system, with the help of Colored Petri Nets. First, a formal representation of object-oriented properties is depicted by Colored Petri Nets. Secondly, various possible faults are modeled using different programming scenarios. The main emphasis was on faults that may arise due to objects and their interactions i.e. inheritance and polymorphism state. Such information may be useful during the testing and maintenance phases of software development.