Performance Evaluation of Delta Lake for Data Reliability and Analytic Workloads
DOI:
https://doi.org/10.53469/jpce.2026.08(02).04Keywords:
Delta Lake optimization, transactional data lakes, big data architecture, Apache Spark performance, Z - order clusteringAbstract
Abstract: Delta Lake is an open - source storage layer that enhances data lakes with ACID transactional guarantees, scalable metadata handling, and unified batch/stream processing on Apache Spark. It has become integral to modern data architectures by providing reliability, schema enforcement, and support for time travel. However, achieving low - latency, high - throughput query execution over large - scale Delta tables require deliberate optimization across multiple system layers. This paper examines Delta Lake's underlying architecture including its transaction log, snapshot isolation model, and Parquet - based file layout and presents advanced performance tuning techniques. These include optimizing partitioning schemes for effective pruning, leveraging data skipping via file - level statistics, reducing file fragmentation through compaction, utilizing Spark caching for reuse, applying Z - order clustering for multi - column filtering efficiency, and maintaining compact, query - friendly metadata.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Josiah Ravikumar, Rupini Arulmozhi

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

