Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
7f45ff5
Fix bugs (#4616)
Bobholamovic Oct 16, 2025
680a148
Do not use direct links (#4618)
Bobholamovic Oct 16, 2025
e1664fe
fix PaddleOCR-VL name - local (#4617)
zhang-prog Oct 16, 2025
1c2f286
Fix mkdocs.yml (#4619)
Bobholamovic Oct 16, 2025
682c15b
Fix typos (#4621)
Bobholamovic Oct 16, 2025
a2be29d
support concatenate_markdown_pages (#4622)
changdazhou Oct 16, 2025
0e27be0
Bump version to 3.3.1
Bobholamovic Oct 16, 2025
518d72c
PaddleOCR-VL, PP-DocLayoutV2 has been upload to models hosting platform
TingquanGao Oct 16, 2025
b661273
Bump version to 3.2.2
Bobholamovic Oct 16, 2025
f887a30
genai plugin: add wheel package (#4626)
zhang-prog Oct 16, 2025
599daa2
bugfix: map PaddleOCR-VL-0.9B to PaddleOCR-VL
TingquanGao Oct 16, 2025
d442a10
Bump version to 3.3.3
Bobholamovic Oct 16, 2025
5c8b02f
[cherry-pick] use FlashAttention 2.8.2 (#4631)
zhang-prog Oct 17, 2025
d82d091
Fix HPS bugs (#4633)
Bobholamovic Oct 17, 2025
2ffd6c7
[cherry-pick] fix typo (#4634)
zhang-prog Oct 17, 2025
c8d21e6
Cap langchain version
Bobholamovic Oct 20, 2025
0d397f5
[Cherry-Pick] #4643 #4645 #4648 (#4649)
Bobholamovic Oct 20, 2025
9824644
Merge branch 'develop' into release/3.3
Bobholamovic Oct 20, 2025
6622b3b
Bump version to 3.3.4
Bobholamovic Oct 20, 2025
677ea06
Fix assemble script (#4650)
Bobholamovic Oct 20, 2025
5955254
bugfix: fix PaddleOCR-VL downloading from AIStudio
TingquanGao Oct 23, 2025
c1ca660
fix: use cv2.imdecode to support reading files with Chinese character…
TingquanGao Oct 23, 2025
32fe2f7
support set max_new_tokens
changdazhou Oct 23, 2025
f6bb816
Remove broken quantization_config logic (#4654)
Bobholamovic Oct 23, 2025
803bdd1
PaddleOCR-VL supports FP32 (#4658)
Bobholamovic Oct 23, 2025
b2ebed2
Bump version to 3.3.5
Bobholamovic Oct 23, 2025
406d84d
PaddleOCR-VL supports CPU and CUDA 11 (#4666)
Bobholamovic Oct 24, 2025
7905c55
update docs
changdazhou Oct 24, 2025
61932c3
compatible with python3.9
changdazhou Oct 24, 2025
eaa32c1
support print parsing_res_list
changdazhou Oct 27, 2025
9579f20
update for new chat_template (#4672)
zhang-prog Oct 27, 2025
0af6510
[cherry-pick]mv crop formula from gen_ai_client to pipeline (#4679)
changdazhou Oct 28, 2025
e0c509e
use model cache files when network is unavailable (#4676)
TingquanGao Oct 28, 2025
1da53a1
[Feat] Add genai-vllm-server Dockerfile and build script (#4680)
Bobholamovic Oct 28, 2025
802629c
Bump version to 3.3.6
Bobholamovic Oct 28, 2025
89d37a2
Merge branch 'develop' into release/3.3
Bobholamovic Nov 5, 2025
2348ac0
Bump version to 3.3.7
Bobholamovic Nov 5, 2025
56078fe
Fix bugs (#4707)
Bobholamovic Nov 5, 2025
ddacf07
Bump version to 3.3.8
Bobholamovic Nov 5, 2025
f89f8c7
Fix bugs (#4708)
Bobholamovic Nov 5, 2025
8cb7434
Fix bug (#4709)
Bobholamovic Nov 5, 2025
a88b267
disable mkldnn by default for PP-DocLayoutV2
TingquanGao Nov 10, 2025
54baddb
[Feat] Support vLLM deployment on DCUs (#4710)
Bobholamovic Nov 10, 2025
24acf03
Bump FD version from 2.3.0rc0 to 2.3.0 (#4721)
Bobholamovic Nov 10, 2025
2526aad
Bump version to 3.3.9
Bobholamovic Nov 10, 2025
acab8aa
Replace naive eager attention with SDPA (#4725)
Bobholamovic Nov 13, 2025
dc0075e
HPI Supports paddle 3.2 (#4754)
Bobholamovic Nov 21, 2025
d8719aa
update fd config (#4760)
zhang-prog Nov 24, 2025
1bec5c2
Bump version to 3.3.10
Bobholamovic Nov 24, 2025
acdc053
Fix: Update imports to resolve ModuleNotFoundError for 'langchain.doc…
Yugsolanki Nov 26, 2025
d50bb5f
Refactor: Eliminate langchain_classic dependency using core langchain…
Yugsolanki Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions deploy/genai_vllm_server_docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ RUN python -m pip install "paddlex${PADDLEX_VERSION}"

ARG BUILD_FOR_SM120=false
RUN if [ "${BUILD_FOR_SM120}" = 'true' ]; then \
python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.3+cu128torch2.8-cp310-cp310-linux_x86_64.whl \
python -m pip install torch==2.8.0 https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.4.11/flash_attn-2.8.3%2Bcu128torch2.8-cp310-cp310-linux_x86_64.whl; \
else \
python -m pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl \
python -m pip install torch==2.8.0 https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp310-cp310-linux_x86_64.whl; \
fi \
&& paddlex --install genai-vllm-server

EXPOSE 8080
Expand Down
4 changes: 2 additions & 2 deletions deploy/genai_vllm_server_docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ while [[ $# -gt 0 ]]; do
shift
;;
*)
echo "Unknown option: $1"
exit 1
echo "Unknown option: $1" >&2
exit 2
;;
esac
done
Expand Down
1 change: 1 addition & 0 deletions deploy/hps/server_env/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PIP_INDEX_URL=${PIP_INDEX_URL}

RUN python -m pip install pip==25.2

# Requirement collection
FROM base AS rc
Expand Down
2 changes: 1 addition & 1 deletion deploy/hps/server_env/cpu_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.3.9
0.3.10
2 changes: 1 addition & 1 deletion deploy/hps/server_env/gpu_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.3.10
0.3.11
1 change: 1 addition & 0 deletions deploy/hps/server_env/requirements/app.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ numpy >= 1.24
opencv-contrib-python == 4.10.0.84
pycocotools >= 2
pydantic >= 2
safetensors @ https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
typing-extensions >= 4.11
2 changes: 1 addition & 1 deletion deploy/hps/server_env/requirements/cpu.in
Original file line number Diff line number Diff line change
@@ -1 +1 @@
paddlepaddle @ https://paddle-whl.bj.bcebos.com/stable/cpu/paddlepaddle/paddlepaddle-3.1.1-cp310-cp310-linux_x86_64.whl
paddlepaddle @ https://paddle-whl.bj.bcebos.com/stable/cpu/paddlepaddle/paddlepaddle-3.2.1-cp310-cp310-linux_x86_64.whl
13 changes: 10 additions & 3 deletions deploy/hps/server_env/requirements/cpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ lxml==5.3.1
# via
# paddlex (../../../setup.py)
# premailer
# python-docx
markupsafe==3.0.2
# via jinja2
marshmallow==3.26.1
Expand Down Expand Up @@ -238,7 +239,7 @@ packaging==24.2
# matplotlib
# paddlex (../../../setup.py)
# scikit-image
paddlepaddle @ https://paddle-whl.bj.bcebos.com/stable/cpu/paddlepaddle/paddlepaddle-3.1.1-cp310-cp310-linux_x86_64.whl
paddlepaddle @ https://paddle-whl.bj.bcebos.com/stable/cpu/paddlepaddle/paddlepaddle-3.2.1-cp310-cp310-linux_x86_64.whl
# via -r requirements/cpu.in
pandas==1.3.5
# via paddlex (../../../setup.py)
Expand Down Expand Up @@ -295,6 +296,8 @@ python-dateutil==2.9.0.post0
# via
# matplotlib
# pandas
python-docx==1.2.0
# via paddlex (../../../setup.py)
pytz==2025.1
# via pandas
pyyaml==6.0.2
Expand Down Expand Up @@ -326,8 +329,11 @@ ruamel-yaml==0.18.10
# via paddlex (../../../setup.py)
ruamel-yaml-clib==0.2.12
# via ruamel-yaml
safetensors==0.6.2
# via paddlex (../../../setup.py)
safetensors @ https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
# via
# -r requirements/app.in
# paddlepaddle
# paddlex (../../../setup.py)
scikit-image==0.24.0
# via paddlex (../../../setup.py)
scikit-learn==1.6.1
Expand Down Expand Up @@ -396,6 +402,7 @@ typing-extensions==4.12.2
# paddlex (../../../setup.py)
# pydantic
# pydantic-core
# python-docx
# sqlalchemy
# typing-inspect
# uvicorn
Expand Down
2 changes: 1 addition & 1 deletion deploy/hps/server_env/requirements/gpu.in
Original file line number Diff line number Diff line change
@@ -1 +1 @@
paddlepaddle-gpu @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/deps/paddlepaddle/paddlepaddle_gpu-3.1.1%2Bfc-cp310-cp310-linux_x86_64.whl
paddlepaddle-gpu @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/deps/paddlepaddle/paddlepaddle_gpu-3.2.1%2Bfc-cp310-cp310-linux_x86_64.whl
13 changes: 10 additions & 3 deletions deploy/hps/server_env/requirements/gpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ lxml==5.3.1
# via
# paddlex (../../../setup.py)
# premailer
# python-docx
markupsafe==3.0.2
# via jinja2
marshmallow==3.26.1
Expand Down Expand Up @@ -238,7 +239,7 @@ packaging==24.2
# matplotlib
# paddlex (../../../setup.py)
# scikit-image
paddlepaddle-gpu @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/deps/paddlepaddle/paddlepaddle_gpu-3.1.1%2Bfc-cp310-cp310-linux_x86_64.whl
paddlepaddle-gpu @ https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/deps/paddlepaddle/paddlepaddle_gpu-3.2.1%2Bfc-cp310-cp310-linux_x86_64.whl
# via -r requirements/gpu.in
pandas==1.3.5
# via paddlex (../../../setup.py)
Expand Down Expand Up @@ -295,6 +296,8 @@ python-dateutil==2.9.0.post0
# via
# matplotlib
# pandas
python-docx==1.2.0
# via paddlex (../../../setup.py)
pytz==2025.1
# via pandas
pyyaml==6.0.2
Expand Down Expand Up @@ -326,8 +329,11 @@ ruamel-yaml==0.18.10
# via paddlex (../../../setup.py)
ruamel-yaml-clib==0.2.12
# via ruamel-yaml
safetensors==0.6.2
# via paddlex (../../../setup.py)
safetensors @ https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
# via
# -r requirements/app.in
# paddlepaddle-gpu
# paddlex (../../../setup.py)
scikit-image==0.24.0
# via paddlex (../../../setup.py)
scikit-learn==1.6.1
Expand Down Expand Up @@ -396,6 +402,7 @@ typing-extensions==4.12.2
# paddlex (../../../setup.py)
# pydantic
# pydantic-core
# python-docx
# sqlalchemy
# starlette
# typing-inspect
Expand Down
4 changes: 2 additions & 2 deletions docs/pipeline_usage/tutorials/ocr_pipelines/PaddleOCR-VL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1038,8 +1038,8 @@ paddlex --get_pipeline_config PaddleOCR-VL
VLRecognition:
...
genai_config:
backend: vllm-server
server_url: http://127.0.0.1:8118/v1
backend: vllm
server_url: http://127.0.0.1:8118
```

之后,可以使用修改好的配置文件进行产线调用。例如通过 CLI 调用:
Expand Down
2 changes: 1 addition & 1 deletion paddlex/.version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.3.0
3.3.10
11 changes: 11 additions & 0 deletions paddlex/inference/genai/backends/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from ....utils import logging
from ....utils.deps import is_genai_engine_plugin_available, require_genai_engine_plugin
from ..configs.utils import (
backend_config_to_args,
Expand Down Expand Up @@ -61,6 +62,16 @@ def run_vllm_server(host, port, model_name, model_dir, config, chat_template_pat
},
)

import torch

if torch.version.hip is not None and torch.version.cuda is None:
# For DCU
if "api-server-count" in config:
logging.warning(
"Key 'api-server-count' will be popped as it is not supported"
)
config.pop("api-server-count")

args = backend_config_to_args(config)
args = parser.parse_args(args)
validate_parsed_serve_args(args)
Expand Down
2 changes: 1 addition & 1 deletion paddlex/inference/genai/configs/paddleocr_vl_09b.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def get_config(backend):
"max-model-len": 16384,
"max-num-batched-tokens": 16384,
"max-num-seqs": 256,
"workers": 2,
"workers": 4,
"graph-optimization-config": '{"graph_opt_level":0, "use_cudagraph":true}',
}
elif backend == "vllm":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ def forward(self, hidden_states):
3. Scale by learned weight parameter
- Maintains original dtype for numerical stability during computation
"""
if self.config.fuse_rms_norm:
if hidden_states.dtype != paddle.float16 and self.config.fuse_rms_norm:
return fused_rms_norm_ext(
hidden_states, self.weight, self.variance_epsilon
)[0].astype(self.weight.dtype)
Expand Down Expand Up @@ -854,8 +854,15 @@ def core_attn(
v = tensor.transpose(x=v, perm=perm)

replicate = self.config.num_attention_heads // self.config.num_key_value_heads
is_float16 = k.dtype == paddle.float16
if is_float16:
k = k.cast(paddle.float32)
v = v.cast(paddle.float32)
k = paddle.repeat_interleave(k, replicate, axis=1)
v = paddle.repeat_interleave(v, replicate, axis=1)
if is_float16:
k = k.cast(paddle.float16)
v = v.cast(paddle.float16)

scale_qk_coeff = self.config.scale_qk_coeff * self.head_dim**0.5
product = paddle.matmul(x=q.scale(1.0 / scale_qk_coeff), y=k, transpose_y=True)
Expand Down
70 changes: 46 additions & 24 deletions paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_siglip.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import paddle.nn as nn
import paddle.nn.functional as F

from ......utils.env import get_gpu_compute_capability
from ....common.vlm.activations import ACT2FN
from ....common.vlm.transformers import PretrainedModel
from ....common.vlm.transformers.model_outputs import (
Expand Down Expand Up @@ -100,15 +101,22 @@ def eager_attention_forward(
dropout: float = 0.0,
**kwargs,
):
attn_weights = paddle.matmul(query, key.transpose((0, 1, 3, 2))) * scaling
origin_dtype = query.dtype

attn_weights = paddle.matmul(x=query.scale(scaling), y=key, transpose_y=True)
attn_weights = attn_weights.cast(paddle.float32)

if attention_mask is not None:
attnetion_mask = attention_mask.cast(paddle.float32)
attn_weights = attn_weights + attention_mask

attn_weights = F.softmax(attn_weights, axis=-1, dtype="float32").astype(query.dtype)
attn_weights = F.softmax(attn_weights, axis=-1)
attn_weights = attn_weights.cast(origin_dtype)

attn_weights = F.dropout(attn_weights, p=dropout, training=module.training)

attn_output = paddle.matmul(attn_weights, value)
attn_output = attn_output.transpose((0, 2, 1, 3)).contiguous()
attn_output = attn_output.transpose((0, 2, 1, 3))

return attn_output, attn_weights

Expand All @@ -130,6 +138,9 @@ def __init__(self, config):
self.q_proj = nn.Linear(self.embed_dim, self.embed_dim)
self.out_proj = nn.Linear(self.embed_dim, self.embed_dim)

cap = get_gpu_compute_capability()
self._supports_sdpa = cap >= (8, 0) if cap is not None else False

def forward(
self,
hidden_states: paddle.Tensor, # [B, L, D]
Expand All @@ -138,44 +149,55 @@ def forward(
cu_seqlens: Optional[List[paddle.Tensor]] = None,
rope_emb: Optional[Tuple[paddle.Tensor, paddle.Tensor]] = None, # (cos, sin)
):
if output_attentions:
raise NotImplementedError

B, L, D = hidden_states.shape

q = self.q_proj(hidden_states)
k = self.k_proj(hidden_states)
v = self.v_proj(hidden_states)

# [B, L, H, Dh]

q = q.reshape([B, L, self.num_heads, self.head_dim])
k = k.reshape([B, L, self.num_heads, self.head_dim])
v = v.reshape([B, L, self.num_heads, self.head_dim])
if rope_emb is not None:
cos, sin = rope_emb
q, k = apply_rotary_pos_emb_vision(q, k, cos, sin)

# → [B, H, L, Dh]
q = q.transpose([0, 2, 1, 3])
k = k.transpose([0, 2, 1, 3])
v = v.transpose([0, 2, 1, 3])

attn_output, attn_weights = eager_attention_forward(
self,
q,
k,
v,
attention_mask,
is_causal=self.is_causal,
scaling=self.scale,
dropout=0.0 if not self.training else self.dropout,
)
attn_output = attn_output.reshape([B, L, D]).contiguous()
if not self._supports_sdpa or q.dtype == paddle.float32:
# → [B, H, L, Dh]
q = q.transpose([0, 2, 1, 3])
k = k.transpose([0, 2, 1, 3])
v = v.transpose([0, 2, 1, 3])

attn_output, _ = eager_attention_forward(
self,
q,
k,
v,
attention_mask,
is_causal=self.is_causal,
scaling=self.scale,
dropout=0.0 if not self.training else self.dropout,
)
attn_output = attn_output.reshape([B, L, D])
else:
attn_output = paddle.nn.functional.scaled_dot_product_attention(
q,
k,
v,
attention_mask,
dropout_p=self.dropout,
is_causal=self.is_causal,
training=self.training,
)
attn_output = attn_output.reshape([B, L, D])

attn_output = self.out_proj(attn_output)

if not output_attentions:
attn_weights = None

return attn_output, attn_weights
return attn_output, None


class SiglipVisionEmbeddings(nn.Layer):
Expand Down
9 changes: 7 additions & 2 deletions paddlex/inference/models/doc_vlm/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
from ....utils.deps import require_genai_client_plugin
from ....utils.device import TemporaryDeviceChanger
from ...common.batch_sampler import DocVLMBatchSampler
from ...utils.misc import is_bfloat16_available
from ...utils.misc import is_bfloat16_available, is_float16_available
from ..base import BasePredictor
from .result import DocVLMResult

Expand All @@ -54,7 +54,12 @@ def __init__(self, *args, **kwargs):

if self._use_local_model:
self.device = kwargs.get("device", None)
self.dtype = "bfloat16" if is_bfloat16_available(self.device) else "float32"
if is_bfloat16_available(self.device):
self.dtype = "bfloat16"
elif is_float16_available(self.device):
self.dtype = "float16"
else:
self.dtype = "float32"

self.infer, self.processor = self._build(**kwargs)

Expand Down
4 changes: 2 additions & 2 deletions paddlex/inference/pipelines/components/retriever/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
from .....utils.subclass_register import AutoRegisterABCMetaClass

if is_dep_available("langchain"):
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents.base import Document
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
if is_dep_available("langchain-community"):
from langchain_community import vectorstores
from langchain_community.vectorstores import FAISS
Expand Down
9 changes: 9 additions & 0 deletions paddlex/inference/utils/hpi.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from pydantic import BaseModel, Field
from typing_extensions import Annotated, TypeAlias

from ...utils import logging
from ...utils.deps import function_requires_deps, is_paddle2onnx_plugin_available
from ...utils.env import get_paddle_cuda_version, get_paddle_version
from ...utils.flags import USE_PIR_TRT
Expand Down Expand Up @@ -156,6 +157,14 @@ def suggest_inference_backend_and_config(
return None, f"Inference backend {repr(hpi_config.backend)} is unavailable."

paddle_version = get_paddle_version()

if paddle_version[:3] >= (3, 1, 0):
logging.debug(
"Paddle version %s is not supported yet. The prior knowledge of Paddle 3.1.1 will be used.",
paddle_version,
)
paddle_version = (3, 1, 1, None)

if (3, 0) <= paddle_version[:2] <= (3, 1) and paddle_version[3] is None:
if paddle_version[2] == 0:
paddle_version = f"paddle{paddle_version[0]}{paddle_version[1]}"
Expand Down
Loading