Spark SQL Performance Tuning
For some workloads it is possible to improve performance by either caching data in memory, or by turning on some experimental options.
Caching Data In Memory
Spark SQL can cache tables using an in-memory columnar format by calling
spark.catalog.cacheTable("tableName")
or dataFrame.cache()
. Then Spark SQL will scan only required columns and will automatically tune compression to minimize memory usage and GC pressure. You can call spark.catalog.uncacheTable("tableName")
to remove the table from memory.
No comments:
Post a Comment