Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,15 @@ ChaiNNer is also cross-platform, meaning you can run it on Windows, MacOS, and L

For help, suggestions, or just to hang out, you can join the [chaiNNer Discord server](https://discord.gg/pzvAKPKyHM)

Remember: chaiNNer is still a work in progress and in alpha. While it is slowly getting more to where we want it, it is going to take quite some time to have every possible feature we want to add. If you're knowledgeable in TypeScript, React, or Python, feel free to contribute to this project and help us get closer to that goal.

Note: As of right now, chaiNNer is not under active development. This may resume in the future, but at the moment there is no active dev work aside from community contributions via PRs.
ChaiNNer is under active development. If you're knowledgeable in TypeScript, React, or Python, feel free to contribute to this project and help us continue to improve it.

## Installation

Download the latest release from the [Github releases page](https://github.com/chaiNNer-org/chaiNNer/releases) and run the installer best suited for your system. Simple as that.

You don't even need to have Python installed, as chaiNNer will download an isolated integrated Python build on startup. From there, you can install all the other dependencies via the Dependency Manager.

If you do wish to use your system Python installation still, you can turn the system Python setting on. However, it is much more recommended to use integrated Python. If you do wish to use your system Python, we recommend using Python 3.11, but we try to support 3.10 as well.
If you do wish to use your system Python installation still, you can turn the system Python setting on. However, it is much more recommended to use integrated Python. If you do wish to use your system Python, Python 3.10 or later is required (3.11+ recommended).

If you'd like to test the latest changes and tweaks, try out our [nightly builds](https://github.com/chaiNNer-org/chaiNNer-nightly)

Expand All @@ -45,7 +43,7 @@ While it might seem intimidating at first due to all the possible options, chaiN
<img src="docs/assets/simple_screenshot.png" width="480" />
</p>

Before you get to this point though, you'll need to install one of the neural network frameworks from the dependency manager. You can access this via the button in the upper-right-hand corner. ChaiNNer offers support for PyTorch (with select model architectures), NCNN, and ONNX. For Nvidia users, PyTorch will be the preferred way to upscale. For AMD users, NCNN will be the preferred way to upscale.
Before you get to this point though, you'll need to install one of the neural network frameworks from the dependency manager. You can access this via the button in the upper-right-hand corner. ChaiNNer offers support for PyTorch (with select model architectures), NCNN, ONNX, and TensorRT. For Nvidia users, PyTorch or TensorRT will be the preferred way to upscale. For AMD users, NCNN will be the preferred way to upscale (or PyTorch with ROCm on Linux).

All the other Python dependencies are automatically installed, and chaiNNer even carries its own integrated Python support so that you do not have to modify your existing Python configuration.

Expand Down Expand Up @@ -82,21 +80,27 @@ You can right-click in the editor viewport to show an inline nodes list to selec

- Windows versions 8.1 and below are also not supported.

- Apple Silicon Macs should support almost everything. Although, ONNX only supports the CPU Execution Provider, and NCNN sometimes does not work properly.
- Apple Silicon Macs are supported with PyTorch MPS acceleration. ONNX only supports the CPU Execution Provider, and NCNN may not work properly on some configurations.

- Some NCNN users with non-Nvidia GPUs might get all-black outputs. I am not sure what to do to fix this as it appears to be due to the graphics driver crashing as a result of going out of memory. If this happens to you, try manually setting a tiling amount.

- To use the Clipboard nodes, Linux users need to have xclip or, for wayland users, wl-copy installed.

## GPU Support

For PyTorch inference, only Nvidia GPUs are officially supported. If you do not have an Nvidia GPU, you will have to use PyTorch in CPU mode. This is because PyTorch only supports Nvidia's CUDA. MacOS users on Apple Silicon Macs can also take advantage of PyTorch's MPS mode, which should work with chaiNNer.
**Nvidia GPUs:** Full support via PyTorch (CUDA), ONNX, and TensorRT. TensorRT offers the best performance for supported models.

If you have an AMD or Intel GPU that supports NCNN however, chaiNNer now supports NCNN inference. You can use any existing NCNN .bin/.param model files (only ESRGAN-related SR models have been tested), or use chaiNNer to convert a PyTorch or ONNX model to NCNN.
**AMD GPUs:**
- On Linux, AMD GPUs can use PyTorch via ROCm
- NCNN is available on all platforms for AMD GPUs

For NCNN, make sure to select which GPU you want to use in the settings. It might be defaulting to your integrated graphics!
**Apple Silicon (M1/M2/M3):** PyTorch MPS acceleration is supported.

For Nvidia GPUs, ONNX is also an option to be used. ONNX will use CPU mode on non-Nvidia GPUs, similar to PyTorch.
**Intel GPUs:** NCNN inference is supported for Intel GPUs.

**CPU:** All frameworks support CPU-only mode as a fallback.

For NCNN, make sure to select which GPU you want to use in the settings. It might be defaulting to your integrated graphics!

## Model Architecture Support

Expand All @@ -123,6 +127,10 @@ As of v0.21.0, chaiNNer uses our new package called [Spandrel](https://github.co
- [u2net](https://github.com/danielgatis/rembg) | [u2net](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net.onnx), [u2netp](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2netp.onnx), [u2net_cloth_seg](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_cloth_seg.onnx), [u2net_human_seg](https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_human_seg.onnx), [silueta](https://github.com/danielgatis/rembg/releases/download/v0.0.0/silueta.onnx)
- [isnet](https://github.com/xuebinqin/DIS) | [isnet](https://github.com/danielgatis/rembg/releases/download/v0.0.0/isnet-general-use.onnx)

### TensorRT

TensorRT provides optimized inference for Nvidia GPUs. Models must be converted to TensorRT engine format for use. This offers the best performance on supported hardware.

## Troubleshooting

For troubleshooting information, view the [troubleshooting document](https://github.com/chaiNNer-org/chaiNNer/wiki/06--Troubleshooting).
Expand Down
37 changes: 37 additions & 0 deletions backend/src/nodes/impl/tensorrt/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""TensorRT implementation utilities."""

from .auto_split import tensorrt_auto_split
from .engine_builder import BuildConfig, build_engine_from_onnx
from .inference import (
TensorRTSession,
clear_session_cache,
get_tensorrt_session,
run_inference,
)
from .memory import (
CudaBuffer,
CudaMemoryManager,
check_cuda_available,
cuda_memory_context,
get_cuda_compute_capability,
get_cuda_device_name,
)
from .model import TensorRTEngine, TensorRTEngineInfo

__all__ = [
"BuildConfig",
"CudaBuffer",
"CudaMemoryManager",
"TensorRTEngine",
"TensorRTEngineInfo",
"TensorRTSession",
"build_engine_from_onnx",
"check_cuda_available",
"clear_session_cache",
"cuda_memory_context",
"get_cuda_compute_capability",
"get_cuda_device_name",
"get_tensorrt_session",
"run_inference",
"tensorrt_auto_split",
]
118 changes: 118 additions & 0 deletions backend/src/nodes/impl/tensorrt/auto_split.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""Auto-tiling support for TensorRT inference."""

from __future__ import annotations

import gc

import numpy as np

from ..upscale.auto_split import Tiler, auto_split
from .inference import get_tensorrt_session
from .model import TensorRTEngine


def _into_batched_form(img: np.ndarray) -> np.ndarray:
"""Convert image to NCHW batched format."""
shape_size = len(img.shape)
if shape_size == 3:
# (H, W, C) -> (1, C, H, W)
return img.transpose((2, 0, 1))[np.newaxis, :]
elif shape_size == 2:
# (H, W) -> (1, 1, H, W)
return img[np.newaxis, np.newaxis, :, :]
else:
raise ValueError("Unsupported input tensor shape")


def _into_standard_image_form(img: np.ndarray) -> np.ndarray:
"""Convert NCHW output back to HWC format."""
shape_size = len(img.shape)
if shape_size == 4:
# (1, C, H, W) -> (H, W, C)
return img.squeeze(0).transpose(1, 2, 0)
elif shape_size == 3:
# (C, H, W) -> (H, W, C)
return img.transpose(1, 2, 0)
elif shape_size == 2:
# (H, W)
return img
else:
raise ValueError("Unsupported output tensor shape")


def _flip_r_b_channels(img: np.ndarray) -> np.ndarray:
"""Flip R and B channels (RGB <-> BGR conversion)."""
shape_size = len(img.shape)
if shape_size != 3:
return img
if img.shape[2] == 3:
# (H, W, C) RGB -> BGR - use ascontiguousarray to avoid stride issues
return np.ascontiguousarray(np.flip(img, 2))
elif img.shape[2] == 4:
# (H, W, C) RGBA -> BGRA
return np.dstack((img[:, :, 2], img[:, :, 1], img[:, :, 0], img[:, :, 3]))
return img


def tensorrt_auto_split(
img: np.ndarray,
engine: TensorRTEngine,
tiler: Tiler,
gpu_index: int = 0,
) -> np.ndarray:
"""
Run TensorRT inference with automatic tiling for large images.

Args:
img: Input image in HWC format (float32, 0-1 range)
engine: TensorRT engine
tiler: Tiler configuration for splitting
gpu_index: GPU device index

Returns:
Upscaled image in HWC format
"""
session = get_tensorrt_session(engine, gpu_index)
is_fp16 = engine.precision == "fp16"

def upscale(img: np.ndarray, _: object):
try:
# Convert to appropriate precision
lr_img = img.astype(np.float16) if is_fp16 else img.astype(np.float32)

# Convert RGB to BGR (most models expect BGR)
lr_img = _flip_r_b_channels(lr_img)

# Convert to NCHW batched format
lr_img = _into_batched_form(lr_img)

# Run inference
output = session.infer(lr_img)

# Convert back to HWC format
output = _into_standard_image_form(output)

# Convert BGR back to RGB
output = _flip_r_b_channels(output)

return output.astype(np.float32)

except Exception as e:
error_str = str(e).lower()
# Check for CUDA OOM errors
if (
"out of memory" in error_str
or ("cuda" in error_str and "memory" in error_str)
or "allocation" in error_str
):
raise RuntimeError( # noqa: B904
"A VRAM out-of-memory error has occurred. Please try using a smaller tile size."
)
else:
# Re-raise the exception if not an OOM error
raise

try:
return auto_split(img, upscale, tiler)
finally:
gc.collect()
Loading