This provides guidance and tools to migrate and extract data from the Windows version of the obsolete Neat Digital Filing System. The extracted documents will be saved on the filesystem as searchable PDF files. The documents exist in the Neat database as JPEG blobs. The tools in this repository will help migrating and extracting the document and receipt data.
The NeatDesk Desktop Scanner and bundled software was designed for both Windows and macOS. The main audience was the home user. The software provided an easy way to perform paper digitisation will full text search capabilities. When using Windows the underlying data was stored in an SQL Server Compact 3.5 database. The process outlined here will guide you through:
- Migrating the documents to an SQLite database.
- Extracting the image data from the SQLite database in preparation for digitisation.
- Digitising using Tesseract to create searchable PDF files.
- Enriching using document and receipt information stored in the Neat database.
The Neat application software has multiple backing databases. Follow these steps to migrate the databases to a single SQLite database:
- Locate your Neat database. The path will typically be
%USERPROFILE%\Documents\Neat Data. - Copy the files with the extension
.nwdbto another location. These are the SQL Server Compact databases. - Rename the files to have the extension
.sdf. - Download and run SQLite & SQL Server Compact Toolbox for version 3.5.
- Open the first image database which should now be named
ImageDB1.sdf. - Right click on the database and select "Script" > "Script Database Schema and Data for SQLite (beta)..." (see figure 1).
Figure 1: script the first image database
- Name the file
dump.sqland save in a new directory calledImages_001. The process should create multiple files. - Repeat this step for all image databases. Name the output folders sequentially (e.g.
Images_002). - Open the document database which should now be named
Neatworks.sdf. - Right click on the database and select "Script" > "Script Database Schema and Data for SQLite (beta)...".
- Name the file
dump.sqland save in a new directory calledNeatworks. - Move the script files to a Linux machine.
- Run
scripts\migrate.shin a directory containing all of the output folders.
Upon completion you will be left with neat.db which is an SQLite database containing the migrated Neat data. Move this file to /tmp/neatr/neat.db.
The first step in the process is to extract the image blobs from the database. The documents can be extracted by running the neatr module (see here):
python -m neatr -n /tmp/neatr/neat.db -p /tmp/documents -dThis will extract the image blobs and store them under /tmp/documents using the internal Neat image UUID as the file name. The page relationships are also stored in text files using the document UUID.
The receipts can be extracted as follows:
python -m neatr -n /tmp/neatr/neat.db -p /tmp/receipts -rThe two folders /tmp/documents and /tmp/receipts will contain images and page relationships. These are in a format which can be digitised by Tesseract. In both folders run:
./ocr.shThis will generate text searchable PDF files from the images listed in the page relationship files.
It is recommended to use VSCode as the main development environment.
The configuration I've found which works best is via using the core Python extension and Pylance as a language server.
Starting with Python 3.3 there is support for virtual environments. In the project root execute the following to create a virtual environment:
python3 -m venv .venvDepending upon your shell you should then activate the environment within any active shell:
source .venv/bin/activate.fishCheck that pip is working within the virtual environment:
python -m pip --versionOnce this has been created you should ensure that VSCode is using this as it's environment. From within VS Code, select a Python 3 interpreter by opening the Command Palette (Ctrl+Shift+P), start typing the Python: Select Interpreter command to search, then select the command.
http://myembeddedlinux.blogspot.com/2016/03/convert-sql-server-compact-database-sdf.html http://erikej.blogspot.com/2009/04/sql-compact-3rd-party-tools.html
