Skip to content

Question: Custom Check, How to exit without any changes, i.e. leave node in current state? #139

@flakrat

Description

@flakrat

Howdy, we have a custom check that retrieves a metric value from Prometheus using curl.

Edit: we are using Slurm as our resource manager.

The check works great, however I need to add code to the check to prevent NHC from changing the state of the node (drained, un-drained) if the curl command fails, examples:

  • The Prometheus server is not responding
  • The query doesn't return any metric (could happen if node_exporter died on the node)

Is there a way to return from the function where NHC would not make any changes to the node?

  • return 0 indicates no failure and triggers an un-drain if the node is already drained, so I can't use that
  • return 1 or any number indicates failure and drains the node.

Thanks,

Mike Hanby
UAB IT Research Computing

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions