-
Notifications
You must be signed in to change notification settings - Fork 23
fabric_intro
The NVIDIA Fabric Manager Service VM appliance is a specialized OpenNebula tool designed to implement the NVIDIA NVSwitch Virtualization Model. This model is essential for virtualizing systems with multiple GPUs interconnected by NVSwitches (such as HGX or DGX platforms), allowing for the creation of hardware partitions for diverse workloads.
This appliance acts as the necessary Service VM on each compute node, taking control of the NVSwitch devices via PCI Passthrough and running the NVIDIA management software to partition the high-speed fabric interconnect.
The appliance is pre-configured with all components required to deploy the NVSwitch virtualization model:
| Component | Description |
|---|---|
| NVIDIA Drivers | Proprietary drivers for hardware detection and management. |
| Fabric Manager Service | The core NVIDIA service for managing the NVSwitch fabric. |
| Fabric Manager SDK & Dev | Libraries for custom tool development. |
nv-partitioner |
A custom C++ tool built on the Fabric Manager SDK for logical NVSwitch partitioning. |
The appliance is available in the OpenNebula Marketplace:
| Requirement | Description |
|---|---|
| Physical Host | Server with NVIDIA GPUs and NVSwitches (e.g., NVIDIA HGX). |
| VM Resources | 2 vCPUs, 4 GB RAM. |
| PCI Assignment | CRITICAL: All server NVSwitch devices must be assigned to the VM using PCI Passthrough. |
| Host Driver | The NVSwitches on the host must be bound to the vfio-pci driver before instantiation. |
The appliance is based on a stable Linux distribution.
| Component | Version |
|---|---|
| Base OS | Ubuntu 22.04 LTS (x86-64) |
| NVIDIA Driver | 570 |
| Fabric Manager | 570 |
nv-partitioner |
1.0.0 (Custom Partitioning Tool) |
Next: Quick Start
- OpenNebula Apps Overview
- OS Appliances Update Policy
- OneApps Quick Intro
- Build Instructions
- Linux Contextualization Packages
- Windows Contextualization Packages
- OneKE (OpenNebula Kubernetes Edition)
- Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
- Glossary
- WordPress
- Harbor Container Registry
- MinIO
- vLLM AI
- Slurm
- NVIDIA Fabric Manager
- Rancher CAPI
- Development