Add net device support#49
Conversation
| pub fn cmdline_string(&self) -> String { | ||
| let mut s = String::new(); | ||
| if self.root_device { | ||
| s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Shouldn't vda be configurable? We should keep track somewhere of the number of block devices inserted, and compose the vd* part, I think.
There was a problem hiding this comment.
(also nit - for key=val params you can use Cmdline::insert)
There was a problem hiding this comment.
Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.
There was a problem hiding this comment.
Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.
|
|
||
| let mut chain = match self.rxq.iter()?.next() { | ||
| Some(c) => c, | ||
| _ => return Ok(false), |
There was a problem hiding this comment.
I think this would better be expressed as an error variant, since we've exhausted the descriptors.
There was a problem hiding this comment.
It's not necessarily an error, because it simply means there's no more room to write frames, which can appear frequently during normal operation. Most errors are treated as unrecoverable in this call chain for now. This could definitely use more polish/clarity; was just waiting for some of the ongoing vm-virtio things to settle down before changing things here as well.
| let rxq = self.cfg.virtio.queues[0].clone(); | ||
| let txq = self.cfg.virtio.queues[1].clone(); | ||
|
|
||
| let tap = Tap::open_named(self.tap_name.as_str()).map_err(Error::Tap)?; |
There was a problem hiding this comment.
Doesn't this also creates the device if it isn't already present? The PR description mentions how to create tap devices in order for the tests to run, but shouldn't they be created anyway assuming that the user has CAP_NET_ADMIN?
There was a problem hiding this comment.
I think that's the API yes, but the vmm should be run as deprivileged as possible in general.
| ops.remove(Events::empty(&self.rx_ioevent)) | ||
| .expect("Failed to remove rx ioevent"); | ||
| ops.remove(Events::empty(&self.tx_ioevent)) | ||
| .expect("Failed to remove rx ioevent"); |
There was a problem hiding this comment.
| .expect("Failed to remove rx ioevent"); | |
| .expect("Failed to remove tx ioevent"); |
There was a problem hiding this comment.
I think it would be better to ops.remove them in a row, keep track of the results, and panic after all the removes have been attempted so that events don't incorrectly remain in the event manager in case of a quick death.
That being said, I'm not sure about panicking in library code. There's not much we can do about it yet but if you go ahead with the expects, please add an issue to track removing them once we have metrics.
There was a problem hiding this comment.
Yeah this is something to think about. After this PR is merged we want to consolidate common/related device logic even more, so at least we can focus on one place (there's something similar for block right now as well).
lauralt
left a comment
There was a problem hiding this comment.
Looks nice, left a few comments, most of them related to rx/tx processing.
| bindings::TUN_F_CSUM | ||
| | bindings::TUN_F_UFO | ||
| | bindings::TUN_F_TSO4 | ||
| | bindings::TUN_F_TSO6, |
There was a problem hiding this comment.
Is it necessary/required to add TUN_F_TSO6 if VIRTIO_NET_F_GUEST_TSO6 wasn't negotiated?
There was a problem hiding this comment.
You're right, it was missing. Added!
There was a problem hiding this comment.
As you recently pointed out, I forgot to add VIRTIO_NET_F_GUEST_TSO6 to the feature list as well. I've done so, and will update the next time I push changes to the PR.
There was a problem hiding this comment.
I'll approve once this comment is addressed. Thanks!
| Tap(io::Error), | ||
| } | ||
|
|
||
| impl From<vm_virtio::Error> for Error { |
There was a problem hiding this comment.
Should we implement From for the other variants as well?
There was a problem hiding this comment.
I didn't because it was easier/cleaner to use map_err where the other variants are used. We can add more implementations of From if we get to a point where map_err is unwieldy. Otherwise, I like using it because it's a lot more explicit in terms of what actual error gets returned from a method.
| pub fn process_tap(&mut self) -> result::Result<(), Error> { | ||
| loop { | ||
| if self.rxbuf_current == 0 { | ||
| match self.tap.read(&mut self.rxbuf) { |
There was a problem hiding this comment.
I'm not quite sure what happens if we read some data from tap here, but don't have a descriptor chain available for writing to its buffer. Is that data lost? Should we do this read only if we're sure there is at least one descriptor chain?
There was a problem hiding this comment.
The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.
If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).
| break; | ||
| } | ||
|
|
||
| let len = cmp::min(left, desc.len() as usize); |
There was a problem hiding this comment.
What happens if len field from the descriptor is always 0 and we don't write any data to memory (count = 0 at the end of this loop)? Does it still make sense to add an entry in the used ring when no data was actually transferred?
There was a problem hiding this comment.
It does, because even if we didn't write anything, we still consumed a descriptor chain from the available ring, and now we have to return it to the used ring. The driver is expected to handle stranger cases like the one you mentioned (and also the standard does not prohibit this kind of situation).
| while let Some(mut chain) = self.txq.iter()?.next() { | ||
| let len = self.send_frame_from_chain(&mut chain)?; | ||
|
|
||
| self.txq.add_used(chain.head_index(), len)?; |
There was a problem hiding this comment.
len should be replaced with 0 I think.
| count += len; | ||
| } | ||
|
|
||
| self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?; |
There was a problem hiding this comment.
How should we treat the case when less than count bytes could've been written to the tap device? I guess we can add at least a (TODO) comment about this :-?.
There was a problem hiding this comment.
As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.
| // We transiently define a mut `Cmdline` object here to pass for device creation | ||
| // (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio | ||
| // functionality), until we figure out how this fits with the rest of the vmm, which | ||
| // apparently does not explicitly use `Cmdline` structs. Will discuss and fix this |
There was a problem hiding this comment.
It would be nice if we could use Cmdline everywhere (hopefully, we will avoid this way adding spaces in strings like here, but didn't look very much into how is Cmdline implemented).
There was a problem hiding this comment.
Yes that's def a thread we should continue. Created #98 to track it.
alexandruag
left a comment
There was a problem hiding this comment.
Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:
-
The unit tests for the
Tapstruct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd likeTapto be a separate abstraction (#94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)? -
Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?
| pub fn cmdline_string(&self) -> String { | ||
| let mut s = String::new(); | ||
| if self.root_device { | ||
| s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.
| pub fn cmdline_string(&self) -> String { | ||
| let mut s = String::new(); | ||
| if self.root_device { | ||
| s.push_str("root=/dev/vda"); |
There was a problem hiding this comment.
Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.
| pub fn process_tap(&mut self) -> result::Result<(), Error> { | ||
| loop { | ||
| if self.rxbuf_current == 0 { | ||
| match self.tap.read(&mut self.rxbuf) { |
There was a problem hiding this comment.
The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.
If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).
| count += len; | ||
| } | ||
|
|
||
| self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?; |
There was a problem hiding this comment.
As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.
| while let Some(mut chain) = self.txq.iter()?.next() { | ||
| let len = self.send_frame_from_chain(&mut chain)?; | ||
|
|
||
| self.txq.add_used(chain.head_index(), len)?; |
| // We transiently define a mut `Cmdline` object here to pass for device creation | ||
| // (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio | ||
| // functionality), until we figure out how this fits with the rest of the vmm, which | ||
| // apparently does not explicitly use `Cmdline` structs. Will discuss and fix this |
There was a problem hiding this comment.
Yes that's def a thread we should continue. Created #98 to track it.
457a08f to
ee66a26
Compare
Signed-off-by: Alexandru Agache <aagch@amazon.com>
85659d8 to
d0258ce
Compare
|
Looks like the tap tests run ok on the CI system; in that case the current disadvantage is that ppl running locally have to explicitly provide the required privileges so they can pass. I lowered the coverage due to not having sufficient unit tests as part of the net functionality (the queue handlers in particular). Are you ok with tracking that as a separate issue while things start falling into place more within |
I think we are more likely to write tests for uncover lines than to come back and re-write the tests such that they do not require sudo (at least this seems to be the case for past experiences). I would prefer us to be in a place where tests can run without polluting the environment.
We should add an issue for this. Ideally tests will not leave tap interfaces even when the run fails. |
alxiord
left a comment
There was a problem hiding this comment.
LGTM. The tests pass on CI because they already run privileged there; I think we shouldn't invest more in fixing the privilege problem until #94 is done, and I prefer missing tests to overly complex ones - at least until some of the complexity goes away.
aae0019 to
cae0937
Compare
Signed-off-by: Alexandru Agache <aagch@amazon.com>
|
I removed the |
This PR refactors some of the common virtio device code to reduce the amount of required boilerplate, and then implements virtio net device support. The
tapfunctionality and associated logic is taken from Firecracker for now, until we'll have a similar component available in rust-vmm or someplace else. More tests have to be added, and we're going to circle back to that after the initial version of the vsock support PR is ready as well.Helpful steps for testing net device support:
tapdevice is needed on the host. One can be created and configured with something like (please note the example might conflict with the existing host configuration if there's overlap in terms of IP addresses/subnets):Start the reference vmm with the
--netparameter, i.e.cargo run ... -- ... --net tap=vmtap100A network interface should be present inside the guest (usually
eth0, but the exact naming may vary). We need to configure this interface as well, for example by running (inside the guest):Please note the interface in the guest and the
tapdevice from the host must be configured as part of the same subnet (192.168.241.0/24in the previous examples) to communicate directly.