Spark, Hadoop, Hive and Programming: Adaptive Execution in Spark

Thursday, July 4, 2019

Adaptive Execution in Spark

Adaptive Query Execution (aka Adaptive Optimisation or Adaptive Execution) is an optimisation of a query execution plan that Spark Planner uses for allowing alternative execution plans at runtime that would be optimized better based on runtime statistics.

Quoting the description of a talk by the authors of Adaptive Query Execution:

At runtime, the adaptive execution mode can change shuffle join to broadcast join if it finds the size of one table is less than the broadcast threshold. It can also handle skewed input data for join and change the partition number of the next stage to better fit the data scale. In general, adaptive execution decreases the effort involved in tuning SQL query parameters and improves the execution performance by choosing a better execution plan and parallelism at runtime.

Adaptive Query Execution is disabled by default. Set spark.sql.adaptive.enabled configuration property to true to enable it.

References:

(video) An Adaptive Execution Engine For Apache Spark SQL — Carson Wang
An adaptive execution mode for Spark SQL by Carson Wang (Intel), Yucai Yu (Intel) at Strata Data Conference in Singapore, December 7, 2017
https://issues.apache.org/jira/browse/SPARK-23128
https://issues.apache.org/jira/browse/SPARK-9850

Spark, Hadoop, Hive and Programming

Thursday, July 4, 2019

Adaptive Execution in Spark

No comments:

Post a Comment