This section focuses on the extension implementation patterns: how to setup a user extension implement.
- Create a subclass of Extension and override the desired methods.
- Register the subclass using entrypoints in setup.py: entry_points={ 'dlrover.unified.extension': [ 'my_extension = my_module:MyExtension', ], }.
# step1: create a user extension
# test/a.py
from dlrover.python.unified.controller.extension import ManagerExtension
class XXXExtension(ManagerExtension):
def xxx(self):
return "xxx"
# step2: set into entrypoints
...
entry_points={
'dlrover.unified.extension': [
'xxx_extension = test.a:XXXExtension',
],
}
...The currently supported extension points are shown in the table below:
| Extension | method(point) | description | default implement |
|---|---|---|---|
| ManagerExtension | relaunch_nodes_impl | Logic for replacing Ray Nodes: This is because Ray deployments can reside on Kubernetes or physical machines, and the operational mode can be either job mode or cluster mode. | No replacement of Ray Nodes will be performed. The node replacement requirements in the fault tolerance process will be ignored. |