Economic event type detection on the SentiFM dataset using biLSTM and SVM for the paper: Gilles Jacobs, Els Lefever, and Véronique Hoste. 2018. Economic event detection in company-specific news text. In Proceedings of the 1st Workshop on Economics and NLP (ECONLP). ACL 2018, Melbourne, AUS, 1-10.
This repo includes data and code for company-specific sentence level event type classification for the English SentiFM dataset.
Please cite the original paper when using the dataset.
This code can completely replicate the experiments described in the paper with pre-processing, word-vector creation & evaluation, hyperparameter optimization in crossvalidation and holdout-prediction.
-
Install non-python dependencies:
- Install CUDA if not installed (it is already on phil)
sudo apt-get install libopenblas-base libopenblas-base python-dev- download and unpack latest Stanford CoreNLP:
cd ~/software; wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip ; unzip stanford-corenlp-full-2018-02-27.zip; rm stanford-corenlp-full-2018-02-27.zipAnd set the envvar for the python Corenlp package to useCORENLP_HOME=~/software/stanford-corenlp-full-2018-02-27
-
Configure Keras to use TensorFlow:
- set
$HOME/.keras/keras.jsonto:
{ "image_data_format": "channels_last", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" } - set
Set your experiment storage/output paths and experimental settings in settings.py
settings.pyfor defining the experimental constants for crossvalidation optim. & testing, & wordvector training.featurize.pyfeature engineering: tokenisation, indexing, sequencing & making the embedding matrix.crossvalidate.pyrun validation test & multi-label crossvalidation experiment using featurized data.crossvalidate.pyrun validation test & one-vs-rest crossvalidation experiment using featurized data.datahandler.pyloading, parsing, writing, making splits and general handling of dataset.classifier.pycustom sklearn-compatible classifiers and classifier handling.scorer.pycustom classifier scoring for logging multiple scores in crossvalidation.wordvectors_train.pyscript for training glove word vectors.wordvectors_eval.pyscript for evaluating trained glove vectors with the google analogy suite.util.pycommonly used, general pythonic utility functions.clean_output_dir.py: removes empty dirs made as output when testing.
best score: 0.75F1.