The devnet device telemetry agent stopped submitting samples for ~45 minutes after running out of funds. Resolved via manual intervention.
Initially the errors in the logs were insufficient lamports, but soon became VersionedTransaction too large after accumulating more unsubmitted samples.
The insufficient lamports error should have been recoverable after funding again, but it seems our transaction batching is still constructing transactions that are too large. This needs to be fixed.
We have a Rust-side test that asserts the max size of the transaction, and are using that for batching on the Go-side, but that doesn't seem to be completely accurate for Go, so we should also add a Go-side integration test that sends a transaction at the maximum size that we are using for batches. This output VersionedTransaction too large: 2864 bytes (max: encoded/raw 1644/1232) would seem to indicate that it's about 40% lower than the 474 samples sent in the transaction that triggered it from this log entry, which is significantly lower than the match batch size of 2560.
- The buffered samples were also lost on restart of the agents because they're only buffered in memory and were not able to flush to the ledger. We should consider if buffering to disk is worth implementing.
- We should consider storing timestamps with samples. The gap in the data is not visible at all on the timeseries in Grafana because of how we interpolate/estimate the timestamps. And it currently looks like there's no recent data being submitted, just appending past data.
Initial insufficient lamports error:
samples: failed to execute instruction: failed to send transaction: (*jsonrpc.RPCError)(0xc001580d50)({\n Code: (int) -32002,\n Message: (string) (len=88) "Transaction simulation failed: Error processing Instruction 0: custom program error: 0x1",\n Data: (map[string]interface {}) (len=7) {\n (string) (len=8) "accounts": (interface {}) ,\n (string) (len=3) "err": (map[string]interface {}) (len=1) {\n (string) (len=16) "InstructionError": ([]interface {}) (len=2 cap=2) {\n (json.Number) (len=1) "0",\n (map[string]interface {}) (len=1) {\n (string) (len=6) "Custom": (json.Number) (len=1) "1"\n }\n }\n },\n (string) (len=17) "innerInstructions": (interface {}) ,\n (string) (len=4) "logs": ([]interface {}) (len=10 cap=16) {\n (string) (len=63) "Program C9xqH76NSm11pBS6maNnY163tWHT8Govww47uyEmSnoG invoke [1]",\n (string) (len=113) "Program log: Instruction: WriteDeviceLatencySamples(start_timestamp_microseconds: 1753046460911670, samples: 234)",\n (string) (len=111) "Program log: Processing WriteDeviceLatencySamples: start_timestamp_microseconds: 1753046460911670, samples: 234",\n (string) (len=57) "Program log: Updating existing DZ latency samples account",\n (string) (len=56) "Program log: Rent required: 464357280, actual: 457842720",\n (string) (len=51) "Program 11111111111111111111111111111111 invoke [2]",\n (string) (len=53) "Transfer: insufficient lamports 1055800, need 6514560",\n (string) (len=74) "Program 11111111111111111111111111111111 failed: custom program error: 0x1",\n (string) (len=91) "Program C9xqH76NSm11pBS6maNnY163tWHT8Govww47uyEmSnoG consumed 15460 of 200000 compute units",\n (string) (len=86) "Program C9xqH76NSm11pBS6maNnY163tWHT8Govww47uyEmSnoG failed: custom program error: 0x1"\n },\n (string) (len=20) "replacementBlockhash": (interface {}) ,\n (string) (len=10) "returnData": (interface {}) ,\n (string) (len=13) "unitsConsumed": (json.Number) (len=5) "15460"\n }\n})\n"
Subsequent VersionedTransaction too large error:
time=2025-07-20T22:07:08.578Z level=ERROR source=/home/runner/work/doublezero/doublezero/controlplane/telemetry/internal/telemetry/submitter.go:182 msg="Submission failed after all retries" account=B9xjyQCvhVJSAZW9xN2gnE9s7tyQ1ExxJ3jURnWTfFTX-5JcwAoBnsuwng78a21LQXh9LaQ6CLRapEkjXUxQw3chd-4CEJN5dMT2fBf5bfWv6muqqGbJiLRQYBjMYwqgbGs3Xb-10144 attempt=5 samplesCount=474 error="failed to write device latency samples: failed to execute instruction: failed to send transaction: (*jsonrpc.RPCError)(0xc001593d40)({\n Code: (int) -32602,\n Message: (string) (len=117) "base64 encoded solana_transaction::versioned::VersionedTransaction too large: 2864 bytes (max: encoded/raw 1644/1232)",\n Data: (interface {}) \n})\n"
The devnet device telemetry agent stopped submitting samples for ~45 minutes after running out of funds. Resolved via manual intervention.
Initially the errors in the logs were
insufficient lamports, but soon becameVersionedTransaction too largeafter accumulating more unsubmitted samples.The insufficient lamports error should have been recoverable after funding again, but it seems our transaction batching is still constructing transactions that are too large. This needs to be fixed.
We have a Rust-side test that asserts the max size of the transaction, and are using that for batching on the Go-side, but that doesn't seem to be completely accurate for Go, so we should also add a Go-side integration test that sends a transaction at the maximum size that we are using for batches. This output
VersionedTransaction too large: 2864 bytes (max: encoded/raw 1644/1232)would seem to indicate that it's about 40% lower than the 474 samples sent in the transaction that triggered it from this log entry, which is significantly lower than the match batch size of 2560.Initial
insufficient lamportserror:Subsequent
VersionedTransaction too largeerror: