[display event] add event watcher in database controller#4939
Conversation
| } | ||
| }; | ||
|
|
||
| async function assertDiskUsageHealthy() { |
There was a problem hiding this comment.
Better to limit the quota per job and global, and does not impact the critical path.
There was a problem hiding this comment.
Recorded in #4953 . We can solve this problem in the future.
| # Max connection number to database in cluster event watcher. | ||
| cluster-event-max-db-connection: 40 | ||
| # Max disk usage in internal storage for cluster event watcher | ||
| cluster-event-watcher-max-disk-usage-percent: 80 |
There was a problem hiding this comment.
Also limit for history ? Why not move non-critical things to another DB server?
There was a problem hiding this comment.
Recorded in #4954 . We can solve this problem in the future.
|
Event Watcher Test Cases:
Submit a job that will be always in waiting status (e.g. use a lot of resource). Then check if there is any event about "failed scheduling" on the job event page after a few minutes.
Submit a job with 2000+ tasks. After a few minutes check the event page can work properly.
Go to internal storage to see the existing usage: Please notice the usage of loop device Create a big file under After a few minutes, confirm that 1. there is a NodeFilesystemUsage alert shown on webportal 2. the event watcher should exit automatically. Remove the big file. After a few minutes, confirm that: 1. there is no more NodeFilesystemUsage alert 2. the event watcher should work properly, and we can see events of new jobs on webportal. |
database size control strategy:
60sand80%are configurable.Problem found: