-
Job config file
Prepare a job config file as described in examples/README.md, for example,
exampleJob.json. -
Authentication
HTTP POST your username and password to get an access token from:
http://restserver/api/v1/tokenFor example, with curl, you can execute below command line:
curl -H "Content-Type: application/x-www-form-urlencoded" \ -X POST http://restserver/api/v1/token \ -d "username=YOUR_USERNAME" -d "password=YOUR_PASSWORD"
-
Submit a job
HTTP POST the config file as json with access token in header to:
http://restserver/api/v1/user/:username/jobsFor example, you can execute below command line:
curl -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ -X POST http://restserver/api/v1/user/:username/jobs \ -d @exampleJob.json
-
Monitor the job
Check the list of jobs at:
http://restserver/api/v1/jobsor
http://restserver/api/v1/user/:username/jobsCheck your exampleJob status at:
http://restserver/api/v1/user/:username/jobs/exampleJobGet the job config JSON content:
http://restserver/api/v1/user/:username/jobs/exampleJob/configGet the job's SSH info:
http://restserver/api/v1/user/:username/jobs/exampleJob/ssh
Configure the rest server port in services-configuration.yaml.
-
POST tokenAuthenticated and get an access token in the system.
Request
POST /api/v1/tokenParameters
{ "username": "your username", "password": "your password", "expiration": "expiration time in seconds" }Response if succeeded
Status: 200 { "token": "your access token", "user": "username", "admin": true if user is admin }Response if user does not exist
Status: 400 { "code": "NoUserError", "message": "User $username is not found." }Response if password is incorrect
Status: 400 { "code": "IncorrectPassworkError", "message": "Password is incorrect." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
PUT userUpdate a user in the system. Administrator can add user or change other user's password; user can change his own password.
Request
PUT /api/v1/user Authorization: Bearer <ACCESS_TOKEN>Parameters
{ "username": "username in [_A-Za-z0-9]+ format", "password": "password at least 6 characters", "admin": true | false, "modify": true | false }Response if succeeded
Status: 201 { "message": "update successfully" }Response if not authorized
Status: 401 { "code": "UnauthorizedUserError", "message": "Guest is not allowed to do this operation." }Response if current user has no permission
Status: 403 { "code": "ForbiddenUserError", "message": "Non-admin is not allow to do this operation." }Response if updated user does not exist
Status: 404 { "code": "NoUserError", "message": "User $username is not found." }Response if created user has a duplicate name
Status: 409 { "code": "ConflictUserError", "message": "User name $username already exists." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
DELETE user(administrator only)Remove a user in the system.
Request
DELETE /api/v1/user Authorization: Bearer <ACCESS_TOKEN>Parameters
{ "username": "username to be removed" }Response if succeeded
Status: 200 { "message": "remove successfully" }Response if not authorized
Status: 401 { "code": "UnauthorizedUserError", "message": "Guest is not allowed to do this operation." }Response if user has no permission
Status: 403 { "code": "ForbiddenUserError", "message": "Non-admin is not allow to do this operation." }Response if an admin will be removed
Status: 403 { "code": "RemoveAdminError", "message": "Admin $username is not allowed to remove." }Response if updated user does not exist
Status: 404 { "code": "NoUserError", "message": "User $username is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
PUT user/:username/virtualClusters(administrator only)Administrators can update user's virtual cluster. Administrators can access all virtual clusters, all users can access default virtual cluster.
Request
PUT /api/v1/user/:username/virtualClusters Authorization: Bearer <ACCESS_TOKEN>Parameters
{ "virtualClusters": "virtual cluster list separated by commas (e.g. vc1,vc2)" }Response if succeeded
Status: 201 { "message": "update user virtual clusters successfully" }Response if the virtual cluster does not exist.
Status: 400 { "code": "NoVirtualClusterError", "message": "Virtual cluster $vcname is not found." }Response if not authorized
Status: 401 { "code": "UnauthorizedUserError", "message": "Guest is not allowed to do this operation." }Response if user has no permission
Status: 403 { "code": "ForbiddenUserError", "message": "Non-admin is not allow to do this operation." }Response if user does not exist.
Status: 404 { "code": "NoUserError", "message": "User $username is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET jobsGet the list of jobs.
Request
GET /api/v1/jobsParameters
{ "username": "filter jobs with username" }Response if succeeded
Status: 200 { [ ... ] }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET user/:username/jobsGet the list of jobs of user.
Request
GET /api/v1/user/:username/jobsResponse if succeeded
Status: 200 { [ ... ] }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET jobs/:jobNameGet legacy job status in the system.
Request
GET /api/v1/jobs/:jobNameResponse if succeeded
Status: 200 { name: "jobName", jobStatus: { username: "username", virtualCluster: "virtualCluster", state: "jobState", // raw frameworkState from frameworklauncher subState: "frameworkState", createdTime: "createdTimestamp", completedTime: "completedTimestamp", executionType: "executionType", // Sum of succeededRetriedCount, transientNormalRetriedCount, // transientConflictRetriedCount, nonTransientRetriedCount, // and unKnownRetriedCount retries: retriedCount, appId: "applicationId", appProgress: "applicationProgress", appTrackingUrl: "applicationTrackingUrl", appLaunchedTime: "applicationLaunchedTimestamp", appCompletedTime: "applicationCompletedTimestamp", appExitCode: applicationExitCode, appExitDiagnostics: "applicationExitDiagnostics" appExitType: "applicationExitType" }, taskRoles: { // Name-details map "taskRoleName": { taskRoleStatus: { name: "taskRoleName" }, taskStatuses: { taskIndex: taskIndex, containerId: "containerId", containerIp: "containerIp", containerPorts: { // Protocol-port map "protocol": "portNumber" }, containerGpus: containerGpus, containerLog: containerLogHttpAddress, } }, ... } }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET user/:username/jobs/:jobNameGet job status in the system.
Request
GET /api/v1/user/:username/jobs/:jobNameResponse if succeeded
Status: 200 { name: "jobName", jobStatus: { username: "username", virtualCluster: "virtualCluster", state: "jobState", // raw frameworkState from frameworklauncher subState: "frameworkState", createdTime: "createdTimestamp", completedTime: "completedTimestamp", executionType: "executionType", // Sum of succeededRetriedCount, transientNormalRetriedCount, // transientConflictRetriedCount, nonTransientRetriedCount, // and unKnownRetriedCount retries: retriedCount, appId: "applicationId", appProgress: "applicationProgress", appTrackingUrl: "applicationTrackingUrl", appLaunchedTime: "applicationLaunchedTimestamp", appCompletedTime: "applicationCompletedTimestamp", appExitCode: applicationExitCode, appExitDiagnostics: "applicationExitDiagnostics" appExitType: "applicationExitType" }, taskRoles: { // Name-details map "taskRoleName": { taskRoleStatus: { name: "taskRoleName" }, taskStatuses: { taskIndex: taskIndex, containerId: "containerId", containerIp: "containerIp", containerPorts: { // Protocol-port map "protocol": "portNumber" }, containerGpus: containerGpus, containerLog: containerLogHttpAddress, } }, ... } }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
POST user/:username/jobsSubmit a job in the system.
Request
POST /api/v1/user/:username/jobs Authorization: Bearer <ACCESS_TOKEN>Parameters
Response if succeeded
Status: 202 { "message": "update job $jobName successfully" }Response if the virtual cluster does not exist.
Status: 400 { "code": "NoVirtualClusterError", "message": "Virtual cluster $vcname is not found." }Response if user has no permission
Status: 403 { "code": "ForbiddenUserError", "message": "User $username is not allowed to add job to $vcname }Response if there is a duplicated job submission
Status: 409 { "code": "ConflictJobError", "message": "Job name $jobname already exists." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET jobs/:jobName/configGet legacy job config JSON content.
Request
GET /api/v1/jobs/:jobName/configResponse if succeeded
Status: 200 { "jobName": "test", "image": "pai.run.tensorflow", ... }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if the job config does not exist
Status: 404 { "code": "NoJobConfigError", "message": "Config of job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET user/:username/jobs/:jobName/configGet job config JSON content.
Request
GET /api/v1/user/:username/jobs/:jobName/configResponse if succeeded
Status: 200 { "jobName": "test", "image": "pai.run.tensorflow", ... }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if the job config does not exist
Status: 404 { "code": "NoJobConfigError", "message": "Config of job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET jobs/:jobName/sshGet legacy job SSH info.
Request
GET /api/v1/jobs/:jobName/sshResponse if succeeded
Status: 200 { "containers": [ { "id": "<container id>", "sshIp": "<ip to access the container's ssh service>", "sshPort": "<port to access the container's ssh service>" }, ... ], "keyPair": { "folderPath": "HDFS path to the job's ssh folder", "publicKeyFileName": "file name of the public key file", "privateKeyFileName": "file name of the private key file", "privateKeyDirectDownloadLink": "HTTP URL to download the private key file" } }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if the job SSH info does not exist
Status: 404 { "code": "NoJobSshInfoError", "message": "SSH info of job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET user/:username/jobs/:jobName/sshGet job SSH info.
Request
GET /api/v1/user/:username/jobs/:jobName/sshResponse if succeeded
Status: 200 { "containers": [ { "id": "<container id>", "sshIp": "<ip to access the container's ssh service>", "sshPort": "<port to access the container's ssh service>" }, ... ], "keyPair": { "folderPath": "HDFS path to the job's ssh folder", "publicKeyFileName": "file name of the public key file", "privateKeyFileName": "file name of the private key file", "privateKeyDirectDownloadLink": "HTTP URL to download the private key file" } }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if the job SSH info does not exist
Status: 404 { "code": "NoJobSshInfoError", "message": "SSH info of job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
PUT jobs/:jobName/executionTypeStart or stop a legacy job.
Request
PUT /api/v1/jobs/:jobName/executionType Authorization: Bearer <ACCESS_TOKEN>Parameters
{ "value": "START" | "STOP" }Response if succeeded
Status: 200 { "message": "execute job $jobName successfully" }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
PUT user/:username/jobs/:jobName/executionTypeStart or stop a job.
Request
PUT /api/v1/user/:username/jobs/:jobName/executionType Authorization: Bearer <ACCESS_TOKEN>Parameters
{ "value": "START" | "STOP" }Response if succeeded
Status: 200 { "message": "execute job $jobName successfully" }Response if the job does not exist
Status: 404 { "code": "NoJobError", "message": "Job $jobname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET virtual-clustersGet the list of virtual clusters.
Request
GET /api/v1/virtual-clustersResponse if succeeded
Status: 200 { "vc1": { } ... }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" } -
GET virtual-clusters/:vcNameGet virtual cluster status in the system.
Request
GET /api/v1/virtual-clusters/:vcNameResponse if succeeded
Status: 200 { //capacity percentage this virtual cluster can use of entire cluster "capacity":50, //max capacity percentage this virtual cluster can use of entire cluster "maxCapacity":100, // used capacity percentage this virtual cluster can use of entire cluster "usedCapacity":0, "numActiveJobs":0, "numJobs":0, "numPendingJobs":0, "resourcesUsed":{ "memory":0, "vCores":0, "GPUs":0 }, }Response if the virtual cluster does not exist
Status: 404 { "code": "NoVirtualClusterError", "message": "Virtual cluster $vcname is not found." }Response if a server error occured
Status: 500 { "code": "UnknownError", "message": "*Upstream error messages*" }
Since Framework ACL is enabled since this version, jobs will have a namespace with job-creater's username. However there were still some jobs created before the version upgrade, which has no namespaces. They are called "legacy jobs", which can be retrieved, stopped, but cannot be created. To figure out them, there is a "legacy: true" field of them in list apis.
In the next versions, all operations of legacy jobs may be disabled, so please re-create them as namespaced job as soon as possible.