[SPARK-52516] Memory Leak with coalesce foreachpartitions and v2 datasources - ASF Jira

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.3
Fix Version/s: 4.1.0, 4.0.1, 3.5.7
Component/s: Spark Core
Labels:
- pull-request-available

Description

Doing the following should not leak any significant amount of memory.

sparkSession.sql("select * from icebergcatalog.db.table").coalesce(4).foreachPartition( 
    (iterator) -> { while (iterator.hasNext()) iterator.next(); }
);

Some of the details of this are contained in this thread here

https://github.com/apache/iceberg/issues/13297

In summary there is a bug where adding a heavy reference in

context.addTaskCompletionListener

can lead to an OOM as the callback is preventing garbage collection of those heavy references. In particular doing a coalesce piles up "sub-tasks" such that they cannot be cleaned up until the coalesce task completes.

This same issue manifested in 2 different scala classes

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala

Iceberg is affected by the first but using the v2 parquet readers are affected by the 2nd.

The proposed solution is to use a delegate class to de-reference the heavy objects on iterator exhaustion or close. Which only requires changes local to those classes without any public api changes.

The proposed changes were tested on spark 3.4.X but not on 4.0.0 But I believe 4.0.0 is likely impacted.

Attachments

Issue Links

depends upon

SPARK-52809 Don't hold reader and iterator references for all partitions in task completion listeners for metric update

Closed

links to

GitHub Pull Request #51528

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Joshua Kolash

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Jun/25 19:24

Updated:: 15/Feb/26 19:20

Resolved:: 18/Jul/25 15:53