chaiNNer-org · joeyballentine · Feb 2, 2026 · Jan 29, 2026 · Jan 29, 2026 · Jan 30, 2026
diff --git a/README.md b/README.md
@@ -21,17 +21,15 @@ ChaiNNer is also cross-platform, meaning you can run it on Windows, MacOS, and L
 
 For help, suggestions, or just to hang out, you can join the [chaiNNer Discord server](https://discord.gg/pzvAKPKyHM)
 
-Remember: chaiNNer is still a work in progress and in alpha. While it is slowly getting more to where we want it, it is going to take quite some time to have every possible feature we want to add. If you're knowledgeable in TypeScript, React, or Python, feel free to contribute to this project and help us get closer to that goal.
-
-Note: As of right now, chaiNNer is not under active development. This may resume in the future, but at the moment there is no active dev work aside from community contributions via PRs.
+ChaiNNer is under active development. If you're knowledgeable in TypeScript, React, or Python, feel free to contribute to this project and help us continue to improve it.
 
 ## Installation
 
 Download the latest release from the [Github releases page](https://github.com/chaiNNer-org/chaiNNer/releases) and run the installer best suited for your system. Simple as that.
 
 You don't even need to have Python installed, as chaiNNer will download an isolated integrated Python build on startup. From there, you can install all the other dependencies via the Dependency Manager.
 
-If you do wish to use your system Python installation still, you can turn the system Python setting on. However, it is much more recommended to use integrated Python. If you do wish to use your system Python, we recommend using Python 3.11, but we try to support 3.10 as well.
+If you do wish to use your system Python installation still, you can turn the system Python setting on. However, it is much more recommended to use integrated Python. If you do wish to use your system Python, Python 3.10 or later is required (3.11+ recommended).
 
 If you'd like to test the latest changes and tweaks, try out our [nightly builds](https://github.com/chaiNNer-org/chaiNNer-nightly)
 
@@ -45,7 +43,7 @@ While it might seem intimidating at first due to all the possible options, chaiN
     <img src="docs/assets/simple_screenshot.png" width="480" />
 </p>
 
-Before you get to this point though, you'll need to install one of the neural network frameworks from the dependency manager. You can access this via the button in the upper-right-hand corner. ChaiNNer offers support for PyTorch (with select model architectures), NCNN, and ONNX. For Nvidia users, PyTorch will be the preferred way to upscale. For AMD users, NCNN will be the preferred way to upscale.
+Before you get to this point though, you'll need to install one of the neural network frameworks from the dependency manager. You can access this via the button in the upper-right-hand corner. ChaiNNer offers support for PyTorch (with select model architectures), NCNN, ONNX, and TensorRT. For Nvidia users, PyTorch or TensorRT will be the preferred way to upscale. For AMD users, NCNN will be the preferred way to upscale (or PyTorch with ROCm on Linux).
 
 All the other Python dependencies are automatically installed, and chaiNNer even carries its own integrated Python support so that you do not have to modify your existing Python configuration.
 
@@ -82,21 +80,27 @@ You can right-click in the editor viewport to show an inline nodes list to selec
 
 -   Windows versions 8.1 and below are also not supported.
 
--   Apple Silicon Macs should support almost everything. Although, ONNX only supports the CPU Execution Provider, and NCNN sometimes does not work properly.
+-   Apple Silicon Macs are supported with PyTorch MPS acceleration. ONNX only supports the CPU Execution Provider, and NCNN may not work properly on some configurations.
 
 -   Some NCNN users with non-Nvidia GPUs might get all-black outputs. I am not sure what to do to fix this as it appears to be due to the graphics driver crashing as a result of going out of memory. If this happens to you, try manually setting a tiling amount.
 
 -   To use the Clipboard nodes, Linux users need to have xclip or, for wayland users, wl-copy installed.
 
 ## GPU Support
 
-For PyTorch inference, only Nvidia GPUs are officially supported. If you do not have an Nvidia GPU, you will have to use PyTorch in CPU mode. This is because PyTorch only supports Nvidia's CUDA. MacOS users on Apple Silicon Macs can also take advantage of PyTorch's MPS mode, which should work with chaiNNer.
+**Nvidia GPUs:** Full support via PyTorch (CUDA), ONNX, and TensorRT. TensorRT offers the best performance for supported models.
 
-If you have an AMD or Intel GPU that supports NCNN however, chaiNNer now supports NCNN inference. You can use any existing NCNN .bin/.param model files (only ESRGAN-related SR models have been tested), or use chaiNNer to convert a PyTorch or ONNX model to NCNN.
+**AMD GPUs:**
+- On Linux, AMD GPUs can use PyTorch via ROCm
+- NCNN is available on all platforms for AMD GPUs
 
-For NCNN, make sure to select which GPU you want to use in the settings. It might be defaulting to your integrated graphics!
+**Apple Silicon (M1/M2/M3):** PyTorch MPS acceleration is supported.
 
-For Nvidia GPUs, ONNX is also an option to be used. ONNX will use CPU mode on non-Nvidia GPUs, similar to PyTorch.
+**Intel GPUs:** NCNN inference is supported for Intel GPUs.
+
+**CPU:** All frameworks support CPU-only mode as a fallback.
+
+For NCNN, make sure to select which GPU you want to use in the settings. It might be defaulting to your integrated graphics!
 
 ## Model Architecture Support
 
@@ -123,6 +127,10 @@ As of v0.21.0, chaiNNer uses our new package called [Spandrel](https://github.co
 -   [u2net](https://github.com/danielgatis/rembg) | [u2net](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net.onnx), [u2netp](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2netp.onnx), [u2net_cloth_seg](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_cloth_seg.onnx), [u2net_human_seg](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_human_seg.onnx), [silueta](https://github.com/danielgatis/rembg/releases/download/v0.0.0/silueta.onnx)
 -   [isnet](https://github.com/xuebinqin/DIS) | [isnet](https://github.com/danielgatis/rembg/releases/download/v0.0.0/isnet-general-use.onnx)
 
+### TensorRT
+
+TensorRT provides optimized inference for Nvidia GPUs. Models must be converted to TensorRT engine format for use. This offers the best performance on supported hardware.
+
 ## Troubleshooting
 
 For troubleshooting information, view the [troubleshooting document](https://github.com/chaiNNer-org/chaiNNer/wiki/06--Troubleshooting).

diff --git a/backend/src/nodes/impl/tensorrt/__init__.py b/backend/src/nodes/impl/tensorrt/__init__.py
@@ -0,0 +1,37 @@
+"""TensorRT implementation utilities."""
+
+from .auto_split import tensorrt_auto_split
+from .engine_builder import BuildConfig, build_engine_from_onnx
+from .inference import (
+    TensorRTSession,
+    clear_session_cache,
+    get_tensorrt_session,
+    run_inference,
+)
+from .memory import (
+    CudaBuffer,
+    CudaMemoryManager,
+    check_cuda_available,
+    cuda_memory_context,
+    get_cuda_compute_capability,
+    get_cuda_device_name,
+)
+from .model import TensorRTEngine, TensorRTEngineInfo
+
+__all__ = [
+    "BuildConfig",
+    "CudaBuffer",
+    "CudaMemoryManager",
+    "TensorRTEngine",
+    "TensorRTEngineInfo",
+    "TensorRTSession",
+    "build_engine_from_onnx",
+    "check_cuda_available",
+    "clear_session_cache",
+    "cuda_memory_context",
+    "get_cuda_compute_capability",
+    "get_cuda_device_name",
+    "get_tensorrt_session",
+    "run_inference",
+    "tensorrt_auto_split",
+]
diff --git a/backend/src/nodes/impl/tensorrt/auto_split.py b/backend/src/nodes/impl/tensorrt/auto_split.py
@@ -0,0 +1,118 @@
+"""Auto-tiling support for TensorRT inference."""
+
+from __future__ import annotations
+
+import gc
+
+import numpy as np
+
+from ..upscale.auto_split import Tiler, auto_split
+from .inference import get_tensorrt_session
+from .model import TensorRTEngine
+
+
+def _into_batched_form(img: np.ndarray) -> np.ndarray:
+    """Convert image to NCHW batched format."""
+    shape_size = len(img.shape)
+    if shape_size == 3:
+        # (H, W, C) -> (1, C, H, W)
+        return img.transpose((2, 0, 1))[np.newaxis, :]
+    elif shape_size == 2:
+        # (H, W) -> (1, 1, H, W)
+        return img[np.newaxis, np.newaxis, :, :]
+    else:
+        raise ValueError("Unsupported input tensor shape")
+
+
+def _into_standard_image_form(img: np.ndarray) -> np.ndarray:
+    """Convert NCHW output back to HWC format."""
+    shape_size = len(img.shape)
+    if shape_size == 4:
+        # (1, C, H, W) -> (H, W, C)
+        return img.squeeze(0).transpose(1, 2, 0)
+    elif shape_size == 3:
+        # (C, H, W) -> (H, W, C)
+        return img.transpose(1, 2, 0)
+    elif shape_size == 2:
+        # (H, W)
+        return img
+    else:
+        raise ValueError("Unsupported output tensor shape")
+
+
+def _flip_r_b_channels(img: np.ndarray) -> np.ndarray:
+    """Flip R and B channels (RGB <-> BGR conversion)."""
+    shape_size = len(img.shape)
+    if shape_size != 3:
+        return img
+    if img.shape[2] == 3:
+        # (H, W, C) RGB -> BGR - use ascontiguousarray to avoid stride issues
+        return np.ascontiguousarray(np.flip(img, 2))
+    elif img.shape[2] == 4:
+        # (H, W, C) RGBA -> BGRA
+        return np.dstack((img[:, :, 2], img[:, :, 1], img[:, :, 0], img[:, :, 3]))
+    return img
+
+
+def tensorrt_auto_split(
+    img: np.ndarray,
+    engine: TensorRTEngine,
+    tiler: Tiler,
+    gpu_index: int = 0,
+) -> np.ndarray:
+    """
+    Run TensorRT inference with automatic tiling for large images.
+
+    Args:
+        img: Input image in HWC format (float32, 0-1 range)
+        engine: TensorRT engine
+        tiler: Tiler configuration for splitting
+        gpu_index: GPU device index
+
+    Returns:
+        Upscaled image in HWC format
+    """
+    session = get_tensorrt_session(engine, gpu_index)
+    is_fp16 = engine.precision == "fp16"
+
+    def upscale(img: np.ndarray, _: object):
+        try:
+            # Convert to appropriate precision
+            lr_img = img.astype(np.float16) if is_fp16 else img.astype(np.float32)
+
+            # Convert RGB to BGR (most models expect BGR)
+            lr_img = _flip_r_b_channels(lr_img)
+
+            # Convert to NCHW batched format
+            lr_img = _into_batched_form(lr_img)
+
+            # Run inference
+            output = session.infer(lr_img)
+
+            # Convert back to HWC format
+            output = _into_standard_image_form(output)
+
+            # Convert BGR back to RGB
+            output = _flip_r_b_channels(output)
+
+            return output.astype(np.float32)
+
+        except Exception as e:
+            error_str = str(e).lower()
+            # Check for CUDA OOM errors
+            if (
+                "out of memory" in error_str
+                or ("cuda" in error_str and "memory" in error_str)
+                or "allocation" in error_str
+            ):
+                raise RuntimeError(  # noqa: B904
+                    "A VRAM out-of-memory error has occurred. Please try using a smaller tile size."
+                )
+            else:
+                # Re-raise the exception if not an OOM error
+                raise
+
+    try:
+        return auto_split(img, upscale, tiler)
+    finally:
+        gc.collect()