Skip to content

SPARK-UNI/IDcard-ocr-to-contract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDcard OCR Contract Filler

A Streamlit application that extracts personal information from Vietnamese ID cards (CCCD/CMND) using OCR (Gemini API or EasyOCR), then automatically fills a telecom contract template (MobiFone) or generates a customizable .docx contract. Supports live preview on the web and export to PDF/PNG/DOCX.


🚀 Features

  • 📷 Upload an ID card image (.jpg, .jpeg, .png).
  • 🔎 OCR extraction using:
    • Google Gemini Vision API (cloud, high accuracy).
    • EasyOCR (local, offline).
  • ✏️ Auto-detect fields: Full name, ID number, Date of birth, Gender, Nationality, Place of origin, Residence, Issue date, Expiry date.
  • 📝 Editable form: review and adjust extracted fields directly in the web UI.
  • 📄 Contract generation:
    • Official MobiFone telecom contract template (with correct formatting and checkboxes).
    • Generic .docx contract for other use cases.
  • 👀 Live preview of contract on the web before downloading.
  • ⬇️ Export to PNG / PDF / DOCX.

📦 Installation

Clone the repository:

git clone https://github.com/SPARK-UNI/IDcard-ocr-to-contract.git
cd IDcard-ocr-to-contract

Install dependencies:

pip install -r requirements.txt

Dependencies include:

  • streamlit
  • pillow
  • numpy
  • python-docx
  • easyocr
  • google-generativeai (for Gemini API)
  • opencv-python

⚙️ Configuration

1. Gemini API Key

Create an API key in Google AI Studio.

Set the environment variable:

Windows (PowerShell):

setx GOOGLE_API_KEY "AIza..."

Linux / Mac:

export GOOGLE_API_KEY=AIza...

2. Fonts

Make sure you have a Unicode font that supports Vietnamese.
Recommended: Noto Serif.
Place the .ttf file in the assets/ folder, or use system fonts (Arial, Times New Roman).


▶️ Usage

Run the app with Streamlit:

streamlit run app.py

Steps:

  1. Upload an ID card image.
  2. Wait for OCR (Gemini or EasyOCR).
  3. Review & edit extracted fields.
  4. Preview the filled contract (MobiFone form).
  5. Download as DOCX / PDF / PNG.

📸 Demo

ID Card OCR → Field Extraction

OCR Fields Demo

Auto-filled MobiFone Contract

Contract Demo


📂 Project Structure

IDcard-ocr-contract-filler/
│── app.py                        # Main Streamlit app
│── services/
│   ├── ocr_service.py            # OCR with Gemini/EasyOCR
│   ├── field_extraction.py       # Regex & normalization
│   ├── contract_form_mobifone.py # Template filling & rendering
│   └── contract_docx.py          # Generic DOCX generation
│── assets/
│   ├── mobifone_template.png     # MobiFone contract form
│   ├── Demo_UI.png               # UI 
│   └── NotoSerif-Regular.ttf     # Font for Vietnamese text (You should add by yourself)
│── requirements.txt
│── README.md

📜 License

This project is licensed under the MIT License. You can freely use and modify it.


✨ Acknowledgements

About

A Streamlit application that extracts personal information from Vietnamese ID cards (CCCD/CMND) using OCR (Gemini API or EasyOCR), then automatically fills a telecom contract template (MobiFone) or generates a customizable .docx contract. Supports live preview on the web and export to PDF/PNG/DOCX.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages