The coalesce reduces the number of partitions in a DataFrame.
The repartition either increase or decrease the number of partitions in a DataFrame.
The repartition algorithm does a full shuffle of the data and creates equal sized partitions of data. coalesce combines existing partitions to avoid a full shuffle.
Summary Of Difference
coalesce() | repartition() |
reduce the number of partitions | increase or decrease the number of partitions. |
Tries to minimize data movement by avoiding network shuffle. | A network shuffle will be triggered which can increase data movement. |
Creates unequal sized partitions | Creates equal sized partitions |
No comments:
Post a Comment