Skip to content

feat: Phase 2 close-out — threshold alignment, DEVLOG, docs update#16

Merged
AD2000X merged 1 commit into
mainfrom
feature/phase2-doclaynet-layout
Jun 2, 2026
Merged

feat: Phase 2 close-out — threshold alignment, DEVLOG, docs update#16
AD2000X merged 1 commit into
mainfrom
feature/phase2-doclaynet-layout

Conversation

@AD2000X
Copy link
Copy Markdown
Owner

@AD2000X AD2000X commented Jun 2, 2026

src/:

  • layout_detector.py, table_detection.py: default threshold aligned to 0.3
  • layout_parsing.py: DEFAULT_TABLE_SCORE/DEDUP_IOU named constants; shared defaults
  • config.py: minor cleanup
  • tatr_postprocess.py: empty row/col grid -> invalid; dedup_row_col_bands as default

scripts/:

  • run_layout_batch.py, eval_layout_iou.py: read shared defaults from layout_parsing
  • smoke_layout_detector.py: --allow-no-table flag; doc/output cleanup
  • smoke_structure.py: dedup now default (no --dedup-bands flag needed)

notebooks/04_phase2_layout.ipynb:

  • Step 3a syntax fix; Step 5/6/7 text updated to reflect final results
  • Step 7d annotated: validator fix -> expect 285 OK / 1 WARN

docs/:

  • DEVLOG.md: Phase 2 MVP finding (2026-06-02); unblocked from .gitignore
  • PLAN.md: Phase 2 status updated; fixed DocLayNet subset eval framing
  • README.md: Phase 2 layout/crop metrics added
  • DESIGN_SPEC.md: fallback rule + dedup_row_col_bands documented
  • .gitignore: remove DEVLOG.md exclusion

src/:
- layout_detector.py, table_detection.py: default threshold aligned to 0.3
- layout_parsing.py: DEFAULT_TABLE_SCORE/DEDUP_IOU named constants; shared defaults
- config.py: minor cleanup
- tatr_postprocess.py: empty row/col grid -> invalid; dedup_row_col_bands as default

scripts/:
- run_layout_batch.py, eval_layout_iou.py: read shared defaults from layout_parsing
- smoke_layout_detector.py: --allow-no-table flag; doc/output cleanup
- smoke_structure.py: dedup now default (no --dedup-bands flag needed)

notebooks/04_phase2_layout.ipynb:
- Step 3a syntax fix; Step 5/6/7 text updated to reflect final results
- Step 7d annotated: validator fix -> expect 285 OK / 1 WARN

docs/:
- DEVLOG.md: Phase 2 MVP finding (2026-06-02); unblocked from .gitignore
- PLAN.md: Phase 2 status updated; fixed DocLayNet subset eval framing
- README.md: Phase 2 layout/crop metrics added
- DESIGN_SPEC.md: fallback rule + dedup_row_col_bands documented
- .gitignore: remove DEVLOG.md exclusion
@AD2000X AD2000X merged commit aae066f into main Jun 2, 2026
@AD2000X
Copy link
Copy Markdown
Owner Author

AD2000X commented Jun 2, 2026

Summary

  • Layout detection: Aryn/deformable-detr-DocLayNet primary, table_threshold=0.30, dedup_iou=0.70; fallback fires only when primary finds ≥1 table but none above threshold
  • MVP gate (seed=42): recall@0.50=0.900, precision@0.50=0.916, FP crop rate 6.5%
  • Band dedup postprocess: dedup_row_col_bands (1-D NMS) now default in normalize_tatr_prediction; fixes TATR overlapping row/col bands on dense crops
  • Structure handoff smoke (seed=42, n=286): 285 OK / 1 WARN (0.35%) — both Phase 2 gates passed

Test plan

  • pytest — 219 passed
  • Step 6b/6c: IoU + FP gate passed on Colab T4
  • Step 7d: structure smoke 285/286 OK (0.35% WARN, well under 5% gate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant