jtmart/text-processing
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
This repository contains a series of scripts to batch process textual data for analysis in R, Python, TXM and IRaMuTeQ. It mainly consists of tools to extract text – and its metadata – from digital sources (PDFs, HTML, SRT), clean it (layout and OCR corrections) and format it in a CSV+TXT format for analysis.