Skip to content

[SPARK-23390][SQL] Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7#20584

Closed
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:flaky-test
Closed

[SPARK-23390][SQL] Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7#20584
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:flaky-test

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:

  1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
  2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

How was this patch tested?

N/A

@cloud-fan

Copy link
Copy Markdown
Contributor Author

val fs = thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
// Make sure all data files are deleted and can't be opened.
files.foreach(f => fs.delete(f, false))
assert(fs.delete(thirdPath, true))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm .. but it asserts true on the delete completion .. I would be surprised if it's something not guaranteed ..

@HyukjinKwon

Copy link
Copy Markdown
Member

BTW, my rough wild guess was that case 2. (reading it but not closing it) happens in schema inference path.

@gatorsmile

gatorsmile commented Feb 12, 2018

Copy link
Copy Markdown
Member

LGTM, I would merge this first if Jenkins does not fail and see whether this can help fix the flaky tests.

@HyukjinKwon

Copy link
Copy Markdown
Member

I won't get in the way but I am less sure on this. I thought this is also flaky in PR builder too anyway.

@cloud-fan

cloud-fan commented Feb 12, 2018

Copy link
Copy Markdown
Contributor Author

BTW, my rough wild guess was that case 2. (reading it but not closing it) happens in schema inference path.

According to the log, the leaked file stream was created when building the ORC columnar reader. Schema inference is fine.

@HyukjinKwon

Copy link
Copy Markdown
Member

You are right. I have run out of ideas. LGTM too for a try if it happens more frequently in spark-branch-2.3-test-sbt-hadoop-2.7.

@sameeragarwal

Copy link
Copy Markdown
Member

LGTM, seems plausible!

@kiszk

kiszk commented Feb 12, 2018

Copy link
Copy Markdown
Member

I am also thinking about this. I agree with this.

According to the log, the leaked file stream was created when building the ORC columnar reader.

I am suspicious about relationship between afterEach() and addTaskCompletionListener (call close()). But, not sure. Let us try this approach first.

@SparkQA

SparkQA commented Feb 12, 2018

Copy link
Copy Markdown

Test build #87321 has finished for PR 20584 at commit 51bb48a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal

Copy link
Copy Markdown
Member

merging this to master/2.3. Thanks!

asfgit pushed a commit that referenced this pull request Feb 12, 2018
…k 2.3/hadoop 2.7

## What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:
1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #20584 from cloud-fan/flaky-test.

(cherry picked from commit 6efd5d1)
Signed-off-by: Sameer Agarwal <sameerag@apache.org>
@asfgit asfgit closed this in 6efd5d1 Feb 12, 2018
@cloud-fan

Copy link
Copy Markdown
Contributor Author

I am suspicious about relationship between afterEach() and addTaskCompletionListener (call close()). But, not sure. Let us try this approach first.

This is one of my speculations. There 2 possibilities I can think of: 1) the task completion listener is not called before afterEach. 2) the orc columnar reader's close method doesn't close the file stream.

For 1), seems we've fixed it in c5a31d1 . For 2), I'm not sure and may need help from ORC folks.

@HyukjinKwon

HyukjinKwon commented Feb 12, 2018

Copy link
Copy Markdown
Member

I think I rushed to take a look Initially. Thanks for fixing this.

robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 12, 2018
…k 2.3/hadoop 2.7

## What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:
1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#20584 from cloud-fan/flaky-test.
@gatorsmile

Copy link
Copy Markdown
Member

@dongjoon-hyun

dongjoon-hyun commented Feb 12, 2018

Copy link
Copy Markdown
Member

My bad. Thank you, guys. For the following, I'll investigate it more.

According to the log, the leaked file stream was created when building the ORC columnar reader.

@dongjoon-hyun

dongjoon-hyun commented Feb 12, 2018

Copy link
Copy Markdown
Member

This patch helps branch-2.3/sbt/hadoop2.7. So, I'm seriously monitoring the latest consecutive failures at sbt and hadoop-2.6 at master branch, too.

  • 4210 Passed!
  • 4209 Failed with FileBasedDataSourceSuite and ParquetQuerySuite
  • 4208 This patch landed here but failed with StreamingOuterJoinSuite and ReceiverSuite.
  • 4207 Failed with ParquetQuerySuite
  • 4206 Failed with BufferHolderSparkSubmitSuite

@dongjoon-hyun

Copy link
Copy Markdown
Member

For the following case, I'll make a PR for Spark ORC columnar reader very soon.

  1. the orc columnar reader's close method doesn't close the file stream.

@dongjoon-hyun

Copy link
Copy Markdown
Member

I created a PR, #20590 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants