Spark v2.1.1 is used higher needs to be installed in the system IPython Shell for running document clustering ENV variable SPARK_HOME should point to home directory of downloaded spark
$ sudo apt-get install ipython
$ cd $SPARK_HOME
$ PYSPARK_DRIVER_PYTHON=ipython ./spark/bin/pyspark
%run doc_clustering.py --input-file NLP-test.json --feature-extractor=Word2Vec --total-clusters=20 --max-iterations=20
%run doc_clustering.py --input-file NLP-test.json --feature-extractor=TFIDF --total-clusters=20 --max-iterations=20