It was revealed at the Spark AI Summit 2020 that Spark 3.0's synergy with Delta Engine gives it a major boost in terms of data optimization.
All thanks to Spark 3.0 and its potential, Databricks flaunted its most impressive technology for its Delta Engine at the Spark AI Summit 2020. For those unaware, this summit is a conference meant for developers, data scientists and engineers to discuss and showcase their achievements in the field of Machine learning and AI. This was the first time that this summit was held in a virtual manner.
Matei Zaharia, CTO of Databricks and maker of Apache Spark, laid out the advancement of Spark over its time of presence and featured the developments that have come in Spark 3.0. It is said to consist of over 3,000 patches.
Out of innumerable developments, the most prominent one is the latest Adaptive Query Execution (AQE) that allows for execution plan for calculation at runtime.
Zaharia said “This [AQE] makes it much easier to run Spark because you don't need to configure these things in advance, so it will actually adapt and optimize based on your data and also leads to better performance in many cases."
Delta Engine is a superior inquiry motor for Delta Lake. Thus its synergy with Spark 3.0 adds query optimizer and caching layer to the system helping a company with its data management. It accelerates SQL and houses native vectorized execution engine.
Databricks, CEO and co-founder Ali Ghodsi “Every company wants to be a data company. If you think about it what that actually means -- it requires a new way of empowering people working with data, enabling them to organize around the data they need to collaborate and get to the answers they need more quickly."
This application was quickly recognised by Starbucks and they adopted it. Thus Starbucks own data analytics platform known as “Brewkit” was made with the help of Microsoft Azure and Databricks Delta Lake.
Vish Subramanian, director of data and analytics engineering at Starbucks revealed “Delta Lake has now helped us build out our historical data and live data aggregations together, to make sure we are now giving our store partners real-time insights on data based on history and on current time."
Learn Apache Spark at: Apache Spark 3.x Tutorial.