Describe the bug
Running the example never ends. It correctly shows the check results, but the application never ends because of some deequ's alive thread. Upon crtl+c, the following message appears:
^CException ignored in: <module 'threading' from '/usr/lib64/python3.6/threading.py'>
Traceback (most recent call last):
File "/usr/lib64/python3.6/threading.py", line 1294, in _shutdown
t.join()
File "/usr/lib64/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/usr/lib64/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/context.py", line 256, in signal_handler
KeyboardInterrupt:
To Reproduce
Just run the following code, as described here:
from pyspark.sql import SparkSession, Row
import pydeequ
spark = (SparkSession
.builder
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())
df = spark.sparkContext.parallelize([
Row(a="foo", b=1, c=5),
Row(a="bar", b=2, c=6),
Row(a="baz", b=3, c=None)]).toDF()
from pydeequ.checks import *
from pydeequ.verification import *
check = Check(spark, CheckLevel.Warning, "Review Check")
checkResult = VerificationSuite(spark) \
.onData(df) \
.addCheck(
check.hasSize(lambda x: x >= 3) \
.hasMin("b", lambda x: x == 0) \
.isComplete("c") \
.isUnique("a") \
.isContainedIn("a", ["foo", "bar", "baz"]) \
.isNonNegative("b")) \
.run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()
Command used to run it:
$ SPARK_VERSION=2.3 PYSPARK_PYTHON=/usr/bin/python3.6 spark-submit --master yarn --deploy-mode client --packages com.amazon.deequ:deequ:1.0.5 --exclude-packages net.sourceforge.f2j:arpack_combined_all pydeequ-test.py
Expected behavior
The application example ends.
Infrastructure
Describe the bug
Running the example never ends. It correctly shows the check results, but the application never ends because of some deequ's alive thread. Upon crtl+c, the following message appears:
To Reproduce
Just run the following code, as described here:
Command used to run it:
Expected behavior
The application example ends.
Infrastructure