[SPARK-17816] [Core] Fix ConcurrentModificationException issue in BlockStatusesAccumulator#15371
[SPARK-17816] [Core] Fix ConcurrentModificationException issue in BlockStatusesAccumulator#15371seyfe wants to merge 15 commits into
Conversation
|
ok to test |
|
Test build #66425 has finished for PR 15371 at commit
|
|
Unfortunately, this PR doesn't fix the java.util.ConcurrentModificationException. I can still repro it. I will spend more time on it tomorrow morning. |
srowen
left a comment
There was a problem hiding this comment.
Yeah I'm not sure this is the issue. It's being modified while it's being serialized.
There was a problem hiding this comment.
I don't see why this would help. You add a wrapper, but, synchronizing your local access to it doesn't do anything because nothing else is synchronizing on it.
PS can the List[(BlockId,BlockStatus)] type just be part of the match predicate?
There was a problem hiding this comment.
What does this do -- just puts a copy before the work of mapping? I could see how that would tend to help.
There was a problem hiding this comment.
This wasn't the root cause but it's something nice to have. If you prefer, I can revert this line.
There was a problem hiding this comment.
Seems OK if it's related cleanup, and potentially helps a closely related manifestation
|
One comment: once you figured out the proper fix, please add some comments inline so they don't get accidentally removed in the future. |
|
@seyfe Thanks for reporting this one. Actually, it's different from SPARK-17463. Could you create a new ticket for this issue, please? The cause is we send a mutable TaskInfo to listeners but we may still update TaskInfo's fields (e.g., accumulables) in another thread... Ideally, all events sent to the listeners should be immutable. |
|
Hi @zsxwing. I have a fix ready and testing it now. I will create a new ticket and send an updated PR today. |
|
Test build #66473 has finished for PR 15371 at commit
|
|
@seyfe The issue is |
|
@zsxwing.
|
@seyfe The comment you are deleting explains why it's safe: the driver doesn't modify |
|
@zsxwing . |
|
I also want to point out that below is the core part of fix. Rest of the code changes are side-effects of it. |
srowen
left a comment
There was a problem hiding this comment.
Dumb question but how big is this? the solution here is to copy the data structure, which is a good defensive move, as long as it's not big and nothing is actually relying on observing changes to the underlying data. Is that valid?
There was a problem hiding this comment.
Seems OK if it's related cleanup, and potentially helps a closely related manifestation
|
Hi @srowen, This PR doesn't introduce any extra data copy operations. It moves the data copy code from I checked the data size using 3 different pipelines. 99% of the time ArrayList has less than 4 items. There was one of case where it maxed at 4000 items but that was less than 1% of the time. I ran my test with 4000 executors, I think that is why this 4000 number came up. I debated other options as well. Moving Json serialization into I don't know the answer for the second part of your question (below), but existing behavior is not changed. Only change is that we can convert ArrayList to a Scala List inside a synchronized block so we won't get
|
|
I see the new copy (of course or else this wouldn't help) but where is a copy removed? I'm probably overlooking it. A |
|
Hi @srowen , this is the part that I removed extra copy operation. I changed this line because this conversion is already done by BlockStatusesAccumulator.
|
|
@seyfe I'm taking my words back. Yea, |
There was a problem hiding this comment.
_seq.synchronized is wrong. Collections.synchronizedList uses its internal mutex to lock instead of this.
Why changes them to Scala List? Just change this one to java.util.Collections.unmodifiableList(new ArrayList[(BlockId, BlockStatus)](_seq)) should be enough.
There was a problem hiding this comment.
Thanks for the review @zsxwing . I checked the java doc and it says that getting iterator is not thread safe and suggests below usage. That's why I did _seq.synchronized
https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Regarding your second questions, JsonProtocal is using it as Scala collection that is why I converted it to a Scala collection so we won't need to convert again.
There was a problem hiding this comment.
- Sorry. Didn't noticed this line (
mutex = this;) inCollections.synchronizedList... - I just took a look at
CollectionAccumulator. I think we can just makeBlockStatusesAccumulatorextendsCollectionAccumulator. This would eliminate these duplicated codes.
There was a problem hiding this comment.
Regarding #2, I take a look at CollectionAccumulator as well and it seems like a good idea. Let me give it a try.
|
Test build #66534 has finished for PR 15371 at commit
|
|
Test build #66535 has finished for PR 15371 at commit
|
|
Test build #66537 has finished for PR 15371 at commit
|
|
Hi @zsxwing , The test failed with below error. I don't think that it's related with my change. Should we just re-run the test or do you have any suggestion? |
|
retest this please |
|
Test build #66539 has finished for PR 15371 at commit
|
|
I don't know if it's related but I found a bug in the last iteration. We also need to override copy and copyAndReset methods. Otherwise it throws java.lang.ClassCastException error. |
|
Test build #66541 has finished for PR 15371 at commit
|
|
@zsxwing , I built it and the test pipelines works fine. So fix is good. But I don't know how to fix the MiMa tests. Would you mind helping me on this? |
|
@seyfe I think we can remove |
|
@zsxwing , I think that is a good idea. I search it and that is the only place we use |
|
Test build #66627 has finished for PR 15371 at commit
|
|
retest this please |
|
Test build #66632 has finished for PR 15371 at commit
|
da2311a to
5e00dc3
Compare
|
Test build #66669 has finished for PR 15371 at commit
|
|
LGTM. Thanks! Merging to master and |
|
There are some conflicts with 2.0. @seyfe could you submit a PR for branch-2.0, please? Thanks! |
…kStatusesAccumulator
Change the BlockStatusesAccumulator to return immutable object when value method is called.
Existing tests plus I verified this change by running a pipeline which consistently repro this issue.
This is the stack trace for this exception:
`
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:590)
at scala.collection.AbstractTraversable.to(Traversable.scala:104)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at scala.collection.AbstractTraversable.toList(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at scala.Option.map(Option.scala:146)
at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
`
Author: Ergin Seyfe <eseyfe@fb.com>
Closes apache#15371 from seyfe/race_cond_jsonprotocal.
…kStatusesAccumulator
## What changes were proposed in this pull request?
Change the BlockStatusesAccumulator to return immutable object when value method is called.
## How was this patch tested?
Existing tests plus I verified this change by running a pipeline which consistently repro this issue.
This is the stack trace for this exception:
`
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.to(TraversableLike.scala:590)
at scala.collection.AbstractTraversable.to(Traversable.scala:104)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at scala.collection.AbstractTraversable.toList(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291)
at scala.Option.map(Option.scala:146)
at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283)
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
`
Author: Ergin Seyfe <eseyfe@fb.com>
Closes apache#15371 from seyfe/race_cond_jsonprotocal.
What changes were proposed in this pull request?
Change the BlockStatusesAccumulator to return immutable object when value method is called.
How was this patch tested?
Existing tests plus I verified this change by running a pipeline which consistently repro this issue.
This is the stack trace for this exception:
java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) at scala.collection.TraversableLike$class.to(TraversableLike.scala:590) at scala.collection.AbstractTraversable.to(Traversable.scala:104) at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294) at scala.collection.AbstractTraversable.toList(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at scala.Option.map(Option.scala:146) at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145) at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)