Skip to content

Commit a6d0b42

Browse files
committed
Merge branch 'rename-data-files' into add-code
# Conflicts: # data/LICENSE
2 parents 1b24417 + e401a0d commit a6d0b42

File tree

6 files changed

+16
-8
lines changed

6 files changed

+16
-8
lines changed

data/LICENSE

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
There are two datasets in this folder:
1+
There are four dataset files in this folder:
22

3-
- airlines_raw_merged.csv.bz2 and airlines_500_mergedb.csv.bz2 are derived from Kaggle 'Customer Support on Twitter' dataset, https://www.kaggle.com/thoughtvector/customer-support-on-twitter,
3+
- airlines_raw.csv.bz2 and airlines_processed.csv.bz2 are derived from Kaggle 'Customer Support on Twitter' dataset,
4+
https://www.kaggle.com/thoughtvector/customer-support-on-twitter,
45
which is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license,
56
https://creativecommons.org/licenses/by-nc-sa/4.0/
67

7-
- askubuntu_raw_mergedb.csv.bz2 and askubuntu_raw_mergedb are derived from Stack Exchange Data Dump, https://archive.org/details/stackexchange, which is provided
8-
under a Creative Common Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license, https://creativecommons.org/licenses/by-sa/3.0/
8+
- askubuntu_raw.csv.bz2 and askubuntu_processed.csv.bz2 are derived from Stack Exchange Data Dump, https://archive.org/details/stackexchange,
9+
which is provided under a Creative Common Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license, https://creativecommons.org/licenses/by-sa/3.0/

data/README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,23 @@
11
Data
22
----
33

4+
Training vs test data
5+
---------------------
6+
7+
We train on all data, without labels. We use the labels in order to evaluate the resulting clusters.
8+
49
Twitter Airlines Customer Support
510
---------------------------------
611

712
The data is available in two version:
813

9-
- raw: minimal redaction (company and customer twitter id), no preprocessing: [airlines_raw_merged.csv.bz2](airlines_raw_merged.csv.bz2)
10-
- redacted, and preprocessed: [airlines_500_mergedb.csv.bz2](airlines_500_mergedb.csv.bz2)
14+
- raw: minimal redaction (company and customer twitter id), no preprocessing: [airlines_raw.csv.bz2](airlines_raw.csv.bz2)
15+
- redacted, and preprocessed: [airlines_processed.csv.bz2](airlines_processed.csv.bz2)
16+
17+
492 examples are labeled using annotators. The remaining examples are labeled `UNK`.
1118

1219
AskUbuntu
1320
---------
1421

15-
- raw, no preprocessing: [askubuntu_raw_mergedb.csv.bz2](askubuntu_raw_mergedb.csv.bz2)
16-
- preprocessed: [askubuntu_merged_d.csv.bz2](askubuntu_merged_d.csv.bz2)
22+
- raw, no preprocessing: [askubuntu_raw.csv.bz2](askubuntu_raw.csv.bz2)
23+
- preprocessed: [askubuntu_processed.csv.bz2](askubuntu_processed.csv.bz2)

0 commit comments

Comments
 (0)