Implement the idea of having config files, giving all the options for corpus creation, which could facilitate having several sets of features. It could be some sort of yaml, like:
sampling:
sample_size: 1000
sample_random: True
max_samples: 1000
features:
feature:
type: words
n: 1
feat_list: functionwords.txt
feature:
type: affixes
k: 500
feature:
type: pos
n: 3
k: 500
etc.
Implement the idea of having config files, giving all the options for corpus creation, which could facilitate having several sets of features. It could be some sort of yaml, like:
etc.