Skip to content

Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error" #6593

@zerofeerouting

Description

@zerofeerouting

Background

I run a CLN node and have experienced quite a couple of instances where my node force-closed a channel, due to the LND peer sending an internal error message.

I finally had this error with a peer that was able to provide the relevant logs (@ZoltanAB)

LND environment

  • LND: 0.14.2-beta
  • OS: Linux ipayblue-1 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
  • Using @C-Otto's rebalance-lnd script
    (if that's relevant)

Steps to reproduce

Have a channel between LND / CLN that forwards HTLCs.

Expected behaviour

LND should not send an error.

Actual behaviour

LND sends an error.

Logs

LND Logs (peer A)

2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): unable to remove fwd pkg for height=421027: bucket not found
2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): failing link: unable to resolve fwd pkgs: bucket not found with error: internal error

CLN logs (peer B)

2022-05-29T21:46:12.082Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-channeld-chan#7100: Adding HTLC 2358 too slow: killing connection
2022-05-29T21:46:12.084Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9)
2022-05-29T21:46:20.650Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR error channel 554a2852da9cbc79a6db12e9773f94865ac0ea382adc34f307739aace0437f29: internal error
2022-05-29T21:46:20.651Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Additional info

The LND node was heavily rebalancing and thus running into memory issues about 7 minutes before the event (no log entries up to 2022-05-29 21:40:02.124).

index

As you can tell from the graph, they stopped their rebalancing script a couple of hours after the crash.

Metadata

Metadata

Assignees

Labels

P1MUST be fixed or reviewedbugUnintended code behaviourdatabaseRelated to the database/storage of LNDhtlcswitch

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions