A set of tools for leveraging active learning and model explainability for effecient document classification
One component of my vision of FULLY AUTOMATED competative debate case production. When I take in massive sums of articles from a news API, I need a way to classify these documents into various buckets. I have to generate my own labeled data for this. That is a problem. Most people don't realize that the sample effeciency in models which utilize transfer learning is so great that AI-assisted data labeling is extremely useful and can significantly shorten what is ordinarily a painful data labeling process.
-
We need a way to quickly create word embedding powered document classifiers which learn with a human in the loop. For some classes, an extremely limited number of examples may be all that is necessary to get results that a user would consider to be succesful for their task.
-
I want to know what my model is learning - so I integrate the word embeddings avalible with Flair, combine with Classifiers in Sklearn and PyTorch, and finish it off with the LIME algorithim for model interpretability (implemented within the ELI5 Library)
TODO: 1. Finish README - Cite relavent technologies and papers 2. Documentation/Examples/Installation Instructions 3. More examples

