Performance Evaluation of Delta Lake for Data Reliability and Analytic Workloads

Josiah  Ravikumar; Rupini Arulmozhi

doi:10.53469/jpce.2026.08(02).04

Authors

Josiah Ravikumar Tech Enthusiast
Rupini Arulmozhi Tech Enthusiast

DOI:

https://doi.org/10.53469/jpce.2026.08(02).04

Keywords:

Delta Lake optimization, transactional data lakes, big data architecture, Apache Spark performance, Z - order clustering

Abstract

Abstract: Delta Lake is an open - source storage layer that enhances data lakes with ACID transactional guarantees, scalable metadata handling, and unified batch/stream processing on Apache Spark. It has become integral to modern data architectures by providing reliability, schema enforcement, and support for time travel. However, achieving low - latency, high - throughput query execution over large - scale Delta tables require deliberate optimization across multiple system layers. This paper examines Delta Lake's underlying architecture including its transaction log, snapshot isolation model, and Parquet - based file layout and presents advanced performance tuning techniques. These include optimizing partitioning schemes for effective pruning, leveraging data skipping via file - level statistics, reducing file fragmentation through compaction, utilizing Spark caching for reuse, applying Z - order clustering for multi - column filtering efficiency, and maintaining compact, query - friendly metadata.

Performance Evaluation of Delta Lake for Data Reliability and Analytic Workloads

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Browse

Developed By

Information

Make a Submission

Keywords

CONTACT US