Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2b77d8b
Use cache mount for genai docker (#4954)
Bobholamovic Jan 29, 2026
9d39bc9
Fix HPS order bug (#4955)
Bobholamovic Jan 29, 2026
966969f
Fix transformers version (#4956)
Bobholamovic Jan 29, 2026
306430a
Fix HPS and remove scipy from required deps (#4957)
Bobholamovic Jan 29, 2026
01f63a6
[Cherry-Pick]bugfix: unexpected change of the constant IMAGE_LABELS (…
changdazhou Jan 30, 2026
363b508
[METAX] add ppdoclayv3 to METAX_GPU_WHITELIST (#4959)
handsomecoderyang Jan 30, 2026
d59a344
vllm 0.10.2 needs transformers 4.x (#4963)
zhang-prog Jan 30, 2026
622b602
Bump version to 3.4.1
Bobholamovic Jan 30, 2026
c78fb95
Support setting PDF rendering scale factor (#4967)
Bobholamovic Feb 2, 2026
45989f0
Fix/doc vlm async cancellation (#4969) (#4971)
scyyh11 Feb 4, 2026
0a936ba
Fix typo (#4982)
Bobholamovic Feb 6, 2026
f790eff
add llama.cpp support (#4983)
zhang-prog Feb 9, 2026
a10d7c5
Add Intel GPU config (#4992)
Bobholamovic Feb 11, 2026
92a190e
Remove PaddleOCR-VL server page limit (#4991)
Bobholamovic Feb 11, 2026
04476cb
PaddleX Add ROCm 7.0 compatibility patches (#4990) (#4996)
M4jupitercannon Feb 12, 2026
edb4022
[Feat] Support setting expiration for BOS URLs (#4993)
Bobholamovic Feb 12, 2026
69e8d75
add \n for seal rec && bugfix for text in table && delete_pass by mod…
changdazhou Feb 13, 2026
f95d873
Fix auto batch size for PaddleOCR-VL-1.5-0.9B (#5003)
Bobholamovic Feb 13, 2026
c88d4c1
Bump version to 3.4.2
Bobholamovic Feb 13, 2026
e92d21f
Update HPS frozon deps (#5004)
Bobholamovic Feb 13, 2026
41b695b
update vlm batch_size (#5005)
zhang-prog Feb 13, 2026
901393a
support modular langchain as well
np-n Feb 25, 2026
b39f430
fix
np-n Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[Cherry-Pick]bugfix: unexpected change of the constant IMAGE_LABELS (#…
…4961)

* bugfix: unexpected change of the constant IMAGE_LABELS

* update doc
  • Loading branch information
changdazhou authored Jan 30, 2026
commit 01f63a6a22d28f238faa742ef4ce13a69c90d209
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ comments: true

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

On January 29, 2026, we released PaddleOCR-VL-1.5. PaddleOCR-VL-1.5 not only significantly improved the accuracy on the OmniDocBench v1.5 evaluation set to 94.5%, but also innovatively supports irregular-shaped bounding box localization. As a result, PaddleOCR-VL-1.5 demonstrates outstanding performance in real-world scenarios such as Skew, Warping, Screen Photography, Illumination, and Scanning. In addition, the model has added new capabilities for seal (stamp) recognition and text detection and recognition, with key metrics continuing to lead the industry.
**On January 29, 2026, we released PaddleOCR-VL-1.5. PaddleOCR-VL-1.5 not only significantly improved the accuracy on the OmniDocBench v1.5 evaluation set to 94.5%, but also innovatively supports irregular-shaped bounding box localization. As a result, PaddleOCR-VL-1.5 demonstrates outstanding performance in real-world scenarios such as Skew, Warping, Screen Photography, Illumination, and Scanning. In addition, the model has added new capabilities for seal (stamp) recognition and text detection and recognition, with key metrics continuing to lead the industry.**

<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/paddleocr-vl-1.5_metrics.png"/>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ comments: true

PaddleOCR-VL 是一款先进、高效的文档解析模型,专为文档中的元素识别设计。其核心组件为 PaddleOCR-VL-0.9B,这是一种紧凑而强大的视觉语言模型(VLM),它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成,能够实现精准的元素识别。该模型支持 109 种语言,并在识别复杂元素(如文本、表格、公式和图表)方面表现出色,同时保持极低的资源消耗。通过在广泛使用的公开基准与内部基准上的全面评测,PaddleOCR-VL 在页级级文档解析与元素级识别均达到 SOTA 表现。它显著优于现有的基于Pipeline方案和文档解析多模态方案以及先进的通用多模态大模型,并具备更快的推理速度。这些优势使其非常适合在真实场景中落地部署。

2026年1月29日,我们发布了PaddleOCR-VL-1.5。PaddleOCR-VL-1.5不仅以94.5%精度大幅刷新了评测集OmniDocBench v1.5,更创新性地支持了异形框定位,使得PaddleOCR-VL-1.5 在扫描、倾斜、弯折、屏幕拍摄及复杂光照等真实场景中均表现优异。此外,模型还新增了印章识别与文本检测识别能力,关键指标持续领跑。
**2026年1月29日,我们发布了PaddleOCR-VL-1.5。PaddleOCR-VL-1.5不仅以94.5%精度大幅刷新了评测集OmniDocBench v1.5,更创新性地支持了异形框定位,使得PaddleOCR-VL-1.5 在扫描、倾斜、弯折、屏幕拍摄及复杂光照等真实场景中均表现优异。此外,模型还新增了印章识别与文本检测识别能力,关键指标持续领跑。**

<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/paddleocr-vl-1.5_metrics.png"/>

Expand Down
2 changes: 1 addition & 1 deletion paddlex/inference/pipelines/paddleocr_vl/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def get_layout_parsing_results(
id2pixel_key_map = {}
image_path_to_obj_map = {}
vis_image_labels = IMAGE_LABELS + ["seal"]
image_labels = [] if use_ocr_for_image_block else IMAGE_LABELS
image_labels = [] if use_ocr_for_image_block else IMAGE_LABELS.copy()
if not use_chart_recognition:
image_labels += ["chart"]
vis_image_labels += ["chart"]
Expand Down
2 changes: 1 addition & 1 deletion paddlex/inference/pipelines/paddleocr_vl/result.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ def __init__(self, data) -> None:
"markdown_ignore_labels", []
)
self.skip_order_labels = [
label for label in SKIP_ORDER_LABELS + markdown_ignore_labels
label for label in SKIP_ORDER_LABELS.copy() + markdown_ignore_labels
]

def _to_img(self) -> dict[str, np.ndarray]:
Expand Down