coreml-edocr2

eDOCr2 engineering-drawing OCR converted to CoreML for Apple Silicon.

Three CoreML ML-Program packages (FP16) targeting the Apple Neural Engine on M-series Macs, covering the full eDOCr2 OCR cascade:

Stage	Architecture	Input	Output	`.mlpackage` size
Detector	CRAFT (VGG backbone)	`(1, 1280, 1280, 3)` RGB, ImageNet-normalised	`(1, 640, 640, 2)` region + affinity heatmap	~40 MB
Recogniser	CRNN + STN	`(1, 31, 200, 1)` grayscale	`(1, 48, 39)` CTC softmax (38-char dim alphabet + blank)	~17 MB
GD&T classifier	CRNN + STN	`(1, 31, 200, 1)` grayscale	`(1, 48, 40)` CTC softmax (39-char GD&T alphabet + blank)	~17 MB

The "GD&T classifier" is architecturally identical to the recogniser — it uses a different alphabet and weights trained on engineering GD&T symbols (∅⌖⌒⌓⏤⏥⏊⌭⫽◎↗⌰⌯ and datum letters ⒺⒻⓁⓂⓅⓈⓉⓊ).

Post-processing (heatmap → bounding boxes, CTC decoding) runs on the Swift side; see test_ane.swift for a complete loader + greedy decoder.

Measured latency (Apple M4, `.cpuAndNeuralEngine`)

From test_ane.swift, 20 iterations after a 3-iteration warm-up:

Stage	Mean	Min	Std
detector	102.3 ms	97.6 ms	3.3 ms
recogniser	4.0 ms	3.9 ms	0.07 ms
gdt_classifier	4.1 ms	4.0 ms	0.06 ms

Worst-case end-to-end for a single text-bearing crop (one detector pass + ~5 recogniser calls): ~125 ms, well under the 200 ms target in the conversion plan.

The detector exceeds its original 40 ms per-stage target because 1280×1280 is a large input for a 21 M-parameter CNN on ANE. On smaller tile sizes (e.g. 640×640) it drops to under 40 ms — if you need that, rerun convert.py with DETECTOR_H = DETECTOR_W = 640.

Parity vs Keras (FP32)

Max-abs-difference between tf.keras (FP32, CPU) and CoreML (FP16, ANE):

Stage	Max abs diff	Mean rel diff
detector	5.2 e-3	1.6 e-3
recogniser	3.1 e-2	5.8 e-3
gdt_classifier	6.9 e-3	2.1 e-3

Recogniser error is higher because the CTC softmax amplifies small logit differences. Greedy-decoded output on real test crops is identical between Keras and CoreML in spot-checks.

Usage (Swift)

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine

let detector = try MLModel(contentsOf: MLModel.compileModel(
    at: URL(fileURLWithPath: "artefacts/edocr2_detector.mlpackage")),
    configuration: config)

let recogniser = try MLModel(contentsOf: MLModel.compileModel(
    at: URL(fileURLWithPath: "artefacts/edocr2_recogniser.mlpackage")),
    configuration: config)

// feed: 1×1280×1280×3 float32, ImageNet-normalised
let det = try detector.prediction(from: ...)
// heatmap → bboxes in Swift/Python (see eDOCr2 upstream tools.getBoxes)

// for each detected bbox, crop to 31×200 grayscale, feed:
let rec = try recogniser.prediction(from: ...)
// rec output shape: [1, 48, 39]; run CTC greedy decode against
// the alphabet stored in mlmodel.metadata[.creatorDefinedKey]["alphabet"].

The full working example is in test_ane.swift — run it with swift test_ane.swift from the repo root (macOS 15+, Xcode CLT installed). It loads all three models on the ANE, benchmarks them, and prints a CTC-decoded sample.

Reproducing the conversion

python3.12 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python convert.py

Downloads upstream weights (~220 MB total) into weights/ and writes three .mlpackage bundles into artefacts/. End-to-end conversion takes ~2 minutes on an M4.

See convert.py for the full pipeline:

Detector — upstream build_keras_model (VGG backbone) loaded with craft_mlt_25k.h5 from the faustomorales/keras-ocr v0.8.4 release, then monkey-patched to a fixed (1, 1280, 1280, 3) input so Core ML / ANE can dispatch it. Converted via coremltools.convert(source="tensorflow", compute_precision=FLOAT16).
Recogniser / GD&T classifier — upstream Recognizer class loaded with the matching .keras weight file from the eDOCr2 v1.0.0 release, using the alphabet from the sibling .txt file. The converted model stops at the CTC softmax layer; the prediction_model's built-in CTCDecoder Lambda is dropped because tf.keras.backend.ctc_decode has no Core ML equivalent.

Both recognisers include a Spatial Transformer Network (STN) sub-graph which converts cleanly via the TensorFlow frontend — no hand-rewriting was needed. The two Lambda(_transform, …) and Lambda(image-flip) layers traced as raw TF ops (matmul, gather, reshape, slice, tile, add_n, clip_by_value) and go through Core ML intact.

Every model's alphabet, expected input shape, and blank-index are stashed in mlmodel.user_defined_metadata so the Swift side can inspect them rather than hard-coding values.

Alphabets

Stored in each .mlpackage under metadata[.creatorDefinedKey]["alphabet"].

Dimensions (edocr2_recogniser): 0123456789AaBCDRGHhMmnxZt(),.+-±:/°"⌀= (38 chars + blank)
GD&T (edocr2_gdt_classifier): 0123456789,.⌀ABCD⏤⏥○⌭⌒⌓⏊∠⫽⌯⌖◎↗⌰ⒺⒻⓁⓂⓅⓈⓉⓊ (39 chars + blank)

Both end with a CTC blank token at index len(alphabet).

Attribution

This repository converts eDOCr2 to CoreML format.

Original work: eDOCr2 by Javier Villena Toro
Paper: "eDOCr2: Engineering Drawing OCR" (MDPI Machines, 2025), DOI 10.2139/ssrn.5045921
CRAFT detector weights: faustomorales/keras-ocr v0.8.4, originally clovaai/CRAFT-pytorch
License: MIT (see LICENSE file)

The .mlpackage files in artefacts/ are derived from the upstream Keras weights; the conversion scripts under convert.py and test_ane.swift are MIT-licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artefacts		artefacts
test_images		test_images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmarks.json		benchmarks.json
convert.py		convert.py
requirements.txt		requirements.txt
test_ane.swift		test_ane.swift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coreml-edocr2

Measured latency (Apple M4, `.cpuAndNeuralEngine`)

Parity vs Keras (FP32)

Usage (Swift)

Reproducing the conversion

Alphabets

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

coreml-edocr2

Measured latency (Apple M4, .cpuAndNeuralEngine)

Parity vs Keras (FP32)

Usage (Swift)

Reproducing the conversion

Alphabets

Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Measured latency (Apple M4, `.cpuAndNeuralEngine`)

Packages