Skip to content

mgmt-gateway: Refine serial console messages#737

Merged
jgallagher merged 6 commits into
masterfrom
mgmt-gateway-serial-console-todos
Aug 31, 2022
Merged

mgmt-gateway: Refine serial console messages#737
jgallagher merged 6 commits into
masterfrom
mgmt-gateway-serial-console-todos

Conversation

@jgallagher

Copy link
Copy Markdown
Contributor

This is the hubris half to oxidecomputer/omicron#1651 that lets us address a few lingering TODOs:

  1. Attaching and detaching an MGS instance to a particular SP's proxied serial console are now explicit messages; previously attaching was implicit when MGS sent data, and detaching was implicit when a new MGS instance attached, neither of which was ideal.
  2. MGS -> SP serial console data packets now include an offset, allowing the SP to detect and ignore any resent UDP packets. Previously if MGS resent a serial console data packet because it missed an ack, the SP had no way to know it had already seen it, and it would forward it along the uart a second time.
  3. SP -> MGS serial console data packets now include an offset. We do not ack SP -> MGS messages, both because the logic to do that on the SP would be tricky and because it has very little recourse for unacked messages (it can't apply backpressure on a uart, and it can't buffer data arbitrarily long). As of this PR MGS can at least know and log when data has been lost due to a networking or SP buffering issue.

1. We can now accept partial messages
2. We no longer erroneously duplicate output on the uart if we receive a
   resent packet
Add an offset to detect lost data; continue reading from usart even if
that means discarding buffered data.
Comment thread task/mgmt-gateway/src/main.rs Outdated

// Ensure we always have space for at least 16 bytes (the stm32h7 usart
// FIFO depth), even if that means discarding old data.
if available_rx_space < 16 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log an error here, even though MGS will also notice it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "log" do you mean a ringbuf entry or something else?

Comment thread task/mgmt-gateway/src/main.rs Outdated
}

fn drain_flushed_data(&mut self, n: usize) {
self.from_rx.drain(..n);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood, this is going to ptr::copy the remaining data to shift it over by n (source), so it could do a bunch of extra work!

It seems like the "correct" data structure here is a circular buffer, although that makes serialize_with_trailing_data a little trickier, because the trailing data could be split over two slices.

Do you think it's worth fixing this now, or leaving a TODO?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heapless::Deque looks promising (and already has an as_slices method!)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to heapless::Deque in 215a1a0, replacing all the drains with loops over pop_front()s and rewriting the bit around available_rx_space. It's a big enough change to probably be worth a second look, if you don't mind.

Comment thread task/mgmt-gateway/src/main.rs Outdated
// Ensure we always have space for at least 16 bytes (the stm32h7 usart
// FIFO depth), even if that means discarding old data.
if available_rx_space < 16 {
self.from_rx.drain(..(16 - available_rx_space));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same concern about drain being expensive here)

// Have we already seen _all_ of this data? If so, just reply that
// we're ready for the data that comes after it.
if skip >= data.len() as u64 {
return Ok(offset + data.len() as u64);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return self.serial_console_write_offset here instead?

e.g. if

self.serial_console_write_offset = 128
offset = 64
data.len() = 32

we want the host to start sending us data at offset 128, not 96.

(or will the host get confused that we've managed to write more data than it thinks it's sent us?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or will the host get confused that we've managed to write more data than it thinks it's sent us?)

Exactly this, at least as currently implemented. In practice we should never see skip > data.len() here, unless something has gone horribly wrong, because the host should never go backwards in time, and only moves forward when we ack its messages. We may see skip == data.len() if our ack gets lost and the host resends the same packet it most recently sent, though.

This avoids potentially-expensive drains and lets us rewrite the logic
to pull data out of the USART FIFO more directly.
@jgallagher jgallagher merged commit 7f81c22 into master Aug 31, 2022
@jgallagher jgallagher deleted the mgmt-gateway-serial-console-todos branch August 31, 2022 18:51
FawazTirmizi pushed a commit to rivosinc/hubris that referenced this pull request Oct 3, 2022
* Refine receiving serial console messages from MGS

1. We can now accept partial messages
2. We no longer erroneously duplicate output on the uart if we receive a
   resent packet

* Refine sending serial console messages to MGS

Add an offset to detect lost data; continue reading from usart even if
that means discarding buffered data.

* Switch from ArrayVec to Deque for USART buffers

This avoids potentially-expensive drains and lets us rewrite the logic
to pull data out of the USART FIFO more directly.
FawazTirmizi pushed a commit to rivosinc/hubris that referenced this pull request Oct 3, 2022
* Refine receiving serial console messages from MGS

1. We can now accept partial messages
2. We no longer erroneously duplicate output on the uart if we receive a
   resent packet

* Refine sending serial console messages to MGS

Add an offset to detect lost data; continue reading from usart even if
that means discarding buffered data.

* Switch from ArrayVec to Deque for USART buffers

This avoids potentially-expensive drains and lets us rewrite the logic
to pull data out of the USART FIFO more directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants