support different types of computing hardware

## Motivation
Currently, OpenPAI has supported the most widely used computing devices: Nvidia GPU, AMD GPU and CPU. In addition, it has the potential to support other types of device, e.g. AI computing chips (NPU).

### Goal
Decouple OpenPAI services and specific hardware types. One OpenPAI service container can support a list of hardware types.

### Requirements
For every type of computing device, the vendor should guarantee:
- one machine should only have one type of computing device
- driver and k8s device plugin are successfully deployed in each machine
- devices work correctly with docker and k8s
- compatible framework and docker images

## MVP with default scheduler
By assuming that **there is only one type of computing device in a cluster**, we could build a minimal viable solution with the default scheduler by 
1. configure `ComputeDevice` (default is `nvidia.com/gpu`) in deployment and record it in configmap
1. add option to turn off HivdD scheduler in quick start
1. bypass (or do other) pre-checks according to `ComputeDevice` in quick start
1. chage `nvidia.com/gpu` to `ComputeDevice` in rest server
1. change vc resource information when use default scheduler

https://github.com/microsoft/pai/blob/2fb370a59387f7df5e6cec9d30d194f3af19e2d9/src/rest-server/src/models/v2/job/k8s.js#L483-L487

Beside the necessary works, we (pai-dev team and device vendor) could make better support by 
- refactor and organize device-related codes in `devices` subfolders. The basic idea is to quick locate device related codes and isolate codes for different devices (e.g. different device vendors should avoid editing the same file).   
If a component must support diverse types of computing device, there will be a `devices` folder in it. For PAI services, they should take these files into consideration in build time. And one container will support a list of different machine models. For other components like the deploy script, they should check these files in runtime.
-  provide monitoring tool like `nvidia-smi` and prometheus exporter
- update webportal terms

## Perfect support with HiveD
By enabling HiveD, we could get better support 
- allow multiple device types in a cluster
- support virtual clusters
- topology aware scheduling to guarantee sharing safety of DL scenario

Some extra efforts must be done to achieve this
1. offer a container runtime for every device type. _Container runtime is a modified version of [runc](https://github.com/opencontainers/runc) adding a custom [pre-start hook](https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart) to all containers. Here are two examples [nvidia-container-runtime](https://github.com/NVIDIA/nvidia-container-runtime) and [runtime for AMD Radeon Open Compute](https://github.com/microsoft/pai/tree/master/contrib/rocm-container-runtime)_
1. describe machines and devices in `layout.yaml` #5151 
1. make sure HiveD config generation is independent of computing devices
1. add appropriate environment variables in rest-server when generate pod spec in addition to `NVIDIA_VISIBLE_DEVICES` and `PAI_AMD_VISIBLE_DEVICES`. 

https://github.com/microsoft/pai/blob/2fb370a59387f7df5e6cec9d30d194f3af19e2d9/src/rest-server/src/models/v2/job/k8s.js#L656-L676

Some optional work items include
- clarify and unify the machine sku description in `layout.yaml` and HiveD skus
- make `sku-(cpu,gpu,mem)` converting simply, predictably and decoupled with devices #5148.
- health report for computing device. This is not mandatory since node-level health check is provided by k8s already.


	memory: `${config.taskRoles[taskRole].resourcePerInstance.memoryMB}Mi`,
	'github.com/fuse': 1,
	'nvidia.com/gpu':
	config.taskRoles[taskRole].resourcePerInstance.gpu,
	...(infinibandDevice && { 'rdma/hca': 1 }),

	if (config.taskRoles[taskRole].resourcePerInstance.gpu > 0) {
	frameworkTaskRole.task.pod.spec.containers[0].env.push(
	{
	name: 'NVIDIA_VISIBLE_DEVICES',
	valueFrom: {
	fieldRef: {
	fieldPath: `metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']`,
	},
	},
	},
	{
	name: 'PAI_AMD_VISIBLE_DEVICES',
	valueFrom: {
	fieldRef: {
	fieldPath: `metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']`,
	},
	},
	},
	);
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support different types of computing hardware #5138

Motivation

Goal

Requirements

MVP with default scheduler

Perfect support with HiveD

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support different types of computing hardware #5138

Description

Motivation

Goal

Requirements

MVP with default scheduler

Perfect support with HiveD

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions