Implementation of TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images
This repo contains the code for our implementation and tests for the reproducibility of TableNet. The entire report of the Reproduction can be found here: TableNet: Reproduction Report
Read the Paper here: TableNet Paper
Model Architecture Overview (May change based on the Encoder used):

The Marmot Dataset as well as the Annotated Data can be found here: Marmot Dataset v1.0 ; Marmot Extended Dataset by the Authors
The Saved Model Checkpoints and the DenseNet121 Pre-trained Weights can be found here: TableNet Saved Weights
Information of the Trained parameters, accuracy and similar things for upto 300 epochs can be found here: Train Info Text
Unfortunately, due to how heavy the model is, the model has so far been trained only upto 1130 epochs. However, we will upload more rigorously trained checkpoints in the future, following a similarly trained model as the original paper.
We tested the model against ICDAR2019 Table dataset. ICDAR2019 dataset contains 2030 table images with PASCAL VOC format annotations. We picked 34 images and annotated XMLs and then extracted the data and predicted using the trained model weights as was done in the original paper. More info can be found in the Test Notebook The orignal ground truth labels and boundary boxes from the ICDAR2019 are stored in this CSV : Ground Truth CSV.
The model predicted boundary boxes on 1130 epochs are stored in this CSV : Model Prediction 1130 epochs CSV
Testing image on Marmot Dataset:
Testing image on ICDAR Tables:
Some keynotes regarding the changes we had to make from the original repo:
- Install required libraries following requirements.txt
- Some of the libraries no longer exist or are backdated and threw an error.
- I.e: Sklearn is now scikit-learn
- Changed requirements.txt file to accommodate the changes in the library name.
- efficientnet_pytorch - was not mentioned in the requirements.txt however, it is required to run the code.
- Running the Training:
- Needed to change all the directories within the path_constants.py file to match that of our local drive.
- Running the Tests:
- Needed to install tesseract OCR in local pc and specify the executable file path in the python code :
- pytesseract.pytesseract.tesseract_cmd = r'/opt/homebrew/bin/tesseract’
- PIL -> Image.ANTIALIAS method has been removed from the library after PIL 10.0 update, which is necessary for testing our model. We can now use PIL.Image.Resampling.LANCZOS
- Needed to install tesseract OCR in local pc and specify the executable file path in the python code :
- Added additional scripts for testing the model against ICDAR Table dataset.
Training Info (for 1130 Epochs):
Table:
Column:
This repo is inspired from two repos: TableNet - Pytorch And TableNet2DF







