Table Of Contents
- Description
- How does this sample work?
- Prerequisites
- Download and preprocess the ONNX model
- Running the sample
- Additional resources
- License
- Changelog
- Known issues
This sample, onnx_custom_plugin, demonstrates how to use plugins written in C++ with the TensorRT Python bindings and ONNX Parser. This sample uses the BiDAF Model from ONNX Model Zoo.
This sample implements a Hardmax layer using cuBLAS, wraps the implementation in a TensorRT plugin (with a corresponding plugin creator) and then generates a shared library module containing its code. The user then dynamically loads this library in Python, which causes the plugin to be registered in TensorRT's PluginRegistry and makes it available to the ONNX parser.
This sample includes:
plugin/
This directory contains files for the Hardmax layer plugin.
customHardmaxPlugin.cpp
A custom TensorRT plugin implementation.
customHardmaxPlugin.h
The Hardmax Plugin headers.
model.py
This script downloads the BiDAF onnx model and uses Onnx Graphsurgeon to replace layers unsupported by TensorRT.
sample.py
This script loads the ONNX model and performs inference using TensorRT.
load_plugin_lib.py
This script contains a helper function to load the customHardmaxPlugin library in Python.
test_custom_hardmax_plugin.py
This script tests the Hardmax Plugin against a reference numpy implementation.
requirements.txt
This file specifies all the Python packages required to run this Python sample.
For specific software versions, see the TensorRT Installation Guide.
- Install the dependencies for Python.
pip3 install -r requirements.txt-
(For Windows builds) Visual Studio 2017 Community or Enterprise edition
Run the model script to download the BiDAF model from the Onnx Model Zoo. The script will replace the Hardmax layer with an op called CustomHardmax to match the custom Plugin name. It will also replace the unsupported Compress node with an equivalent operation, and remove the CategoryMapper nodes which do a String-to-Int conversion of the model inputs.
python3 model.py- Build the plugin and its corresponding Python bindings.
-
On Linux, run:
mkdir build && pushd build cmake .. && make -j popd
NOTE: If any of the dependencies are not installed in their default locations, you can manually specify them. For example:
cmake .. -DCMAKE_CUDA_COMPILER=/usr/local/cuda-x.x/bin/nvcc # (Or adding /path/to/nvcc into $PATH) -DCUDA_INC_DIR=/usr/local/cuda-x.x/include/ # (Or adding /path/to/cuda/include into $CPLUS_INCLUDE_PATH) -DTRT_LIB=/path/to/tensorrt/lib/ -DTRT_INCLUDE=/path/to/tensorrt/include/
-
On Windows, run the following in Powershell, replacing paths appropriately:
mkdir build; pushd build cmake .. -G "Visual Studio 15 Win64" / -DTRT_LIB=C:\path\to\tensorrt\lib / -DTRT_INCLUDE=C:\path\to\tensorrt\lib / -DCUDA_INC_DIR="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v<CUDA_VERSION>\include" / -DCUDA_LIB_DIR="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v<CUDA_VERSION>\lib\x64" # NOTE: msbuild is usually located under C:\Program Files (x86)\Microsoft Visual Studio\2017\<EDITION>\MSBuild\<VERSION>\Bin # You should add this path to your PATH environment variable. msbuild ALL_BUILD.vcxproj popd
The command cmake .. displays a complete list of configurable variables. If a variable is set to VARIABLE_NAME-NOTFOUND, then you’ll need to specify it manually or set the variable it is derived from correctly.
- Run inference using TensorRT with the custom Hardmax plugin implementation:
python3 sample.py- Verify that the sample ran successfully.
=== Testing ===
Input context: Garry the lion is 5 years old. He lives in the savanna.
Input query: Where does the lion live?
Model prediction: savanna
Input context: A quick brown fox jumps over the lazy dog.
Input query: What color is the fox?
Model prediction: brown
The model can also be run interactively:
python3 sample.py --interactiveThe context and query can then be entered from the command line:
=== Testing ===
Enter context: Waldo wears a striped shirt. He also wears glasses.
Enter query: Who wears glasses?
Model prediction: waldo
The following resources provide a deeper understanding about getting started with TensorRT using Python:
Model
Documentation
- Introduction To NVIDIA’s TensorRT Samples
- Working With TensorRT Using The Python API
- NVIDIA’s TensorRT Documentation Library
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
October 2025
- Migrate to strongly typed APIs.
August 2025:
- Removed support for Python versions < 3.10.
January 2024:
- Create cublas handle with cublasCreate instead of using the cublasContext argument from attachToContext. The cublasContext is still valid if TacticSource::kCUBLAS is enabled. TacticSource::kCUBLAS is deprecated.
- Added the Cublas library as a prerequisite.
August 2023:
- Update ONNX version support to 1.14.0
- Removed support for Python versions < 3.8.
September 2022: This README.md file was created and reviewed.
There are no known issues in this sample.