Skip to content

[Chore] Better handle open file descriptors #6916

@wileyj

Description

@wileyj

Running stacks-node will eventually open a lot of fd. Some schedulers override any system setting (unless overridden again, like in the case of systemd), which can lead to a node panic when it's run out of file descriptors.

As an example, on a host where the stacks-node binary is running via systemd as user stacks.
updated the limits to :

stacks	soft	nofile	unlimited
stacks	hard	nofile	unlimited

resulted in the node still panicking (no hard data on precisely how many open files it held when the panic happened).
Systemd in this case was overriding the updated limits, setting:

DefaultLimitNOFILE=524288
DefaultLimitNOFILESoft=1024

I can only presume that other schedulers (docker, runit, etc) also have a default limit they're setting.
When running a binary directly on hardware/vm (as long as the limits are set sufficiently high), I haven't noticed the same issue. It also seems to be more prevalent if a node is public, and moreso if the node is a bootstrap node.

@francesco-stacks noted #6903 that there is an existing crate that might help us resolve this better: #6903 (comment)

From what I can tell, a possible alternative would be to handle this in the application code. We could use something similar to rlimit crate in the stacks-node binary to read the OS hard limit for RLIMIT_NOFILE, and dynamically set our soft limit to match it. That way, the node safely gets the file descriptors it needs, regardless of how the user launches it. What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    choreNecessary but less impactful tasks such as cleanup or reorg

    Type

    No type

    Projects

    Status

    Status: 🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions