Conversation
We have realized that force compacting an archive node with more than 600GB of disk is not such a great idea :) It takes up to 1h and 30min for the node to finish this task. Subsequent starts are also not faster..
| // After opening the DB, we want to compact it. | ||
| // | ||
| // This just in case the node crashed before to ensure the db stays fast. | ||
| db.force_compaction()?; |
There was a problem hiding this comment.
dq: This is not limited to archive nodes, but we have observed this while running archives?
There was a problem hiding this comment.
I had tested this before mainly with warp synced nodes. Aka not a node that has a 600GB disk 🙈
That it takes ages was detected while trying to switch to the stable25rc1 release on our westend nodes.
There was a problem hiding this comment.
Makes sense, this must be the fix for the westend nodes that were killed by the keep-alive/health-check services 🙏
There was a problem hiding this comment.
Yes this is the fix for them.
There was a problem hiding this comment.
I tested it locally with the archive db. Before my PC was also not able to compact it in any reasonable amount of time.
|
Hmm but we will have to remove the compaction on large write too right? Because if a large write happens we also don't want to wait for hours. |
We would need to write more than 64MiB, which should only happen at warp sync. I can increase this to 128MiB if you like. We don't warp sync archive nodes. |
Hmm okay, I was not sure that we can make this general assumption. |
Blocks have a max size of ~10MiB. So, writing that 128MiB sounds hard. Let me set it to 256MiB to be more safe. |
lrubasze
left a comment
There was a problem hiding this comment.
LGTM.
I presume DB compaction is performed automatically anyway in the background?
DQ: What would be "expected" write size if archive node goes offline for some time? Can it happen that we somehow exceed this anyway? |
We have realized that force compacting an archive node with more than 600GB of disk is not such a great idea :) It takes up to 1h and 30min for the node to finish this task. Subsequent starts are also not faster..