Add net device support by alexandruag · Pull Request #49 · rust-vmm/vmm-reference

alexandruag · 2020-12-15T11:24:42Z

This PR refactors some of the common virtio device code to reduce the amount of required boilerplate, and then implements virtio net device support. The tap functionality and associated logic is taken from Firecracker for now, until we'll have a similar component available in rust-vmm or someplace else. More tests have to be added, and we're going to circle back to that after the initial version of the vsock support PR is ready as well.

Helpful steps for testing net device support:

A tap device is needed on the host. One can be created and configured with something like (please note the example might conflict with the existing host configuration if there's overlap in terms of IP addresses/subnets):

sudo ip tuntap add dev vmtap100 mode tap
sudo ip addr add 192.168.241.1/24 dev vmtap100
sudo ip link set vmtap100 up

Start the reference vmm with the --net parameter, i.e. cargo run ... -- ... --net tap=vmtap100
A network interface should be present inside the guest (usually eth0, but the exact naming may vary). We need to configure this interface as well, for example by running (inside the guest):

ip a add 192.168.241.2/24 dev eth0
ip link set eth0 up

Please note the interface in the guest and the tap device from the host must be configured as part of the same subnet ( 192.168.241.0/24 in the previous examples) to communicate directly.

alxiord · 2020-12-15T15:30:13Z

+    pub fn cmdline_string(&self) -> String {
+        let mut s = String::new();
+        if self.root_device {
+            s.push_str("root=/dev/vda");


Shouldn't vda be configurable? We should keep track somewhere of the number of block devices inserted, and compose the vd* part, I think.

(also nit - for key=val params you can use Cmdline::insert)

Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.

Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.

alxiord · 2020-12-15T15:57:51Z

+
+        let mut chain = match self.rxq.iter()?.next() {
+            Some(c) => c,
+            _ => return Ok(false),


I think this would better be expressed as an error variant, since we've exhausted the descriptors.

It's not necessarily an error, because it simply means there's no more room to write frames, which can appear frequently during normal operation. Most errors are treated as unrecoverable in this call chain for now. This could definitely use more polish/clarity; was just waiting for some of the ongoing vm-virtio things to settle down before changing things here as well.

alxiord · 2020-12-15T16:05:30Z

+        let rxq = self.cfg.virtio.queues[0].clone();
+        let txq = self.cfg.virtio.queues[1].clone();
+
+        let tap = Tap::open_named(self.tap_name.as_str()).map_err(Error::Tap)?;


Doesn't this also creates the device if it isn't already present? The PR description mentions how to create tap devices in order for the tests to run, but shouldn't they be created anyway assuming that the user has CAP_NET_ADMIN?

I think that's the API yes, but the vmm should be run as deprivileged as possible in general.

alxiord · 2020-12-15T16:06:05Z

+        ops.remove(Events::empty(&self.rx_ioevent))
+            .expect("Failed to remove rx ioevent");
+        ops.remove(Events::empty(&self.tx_ioevent))
+            .expect("Failed to remove rx ioevent");


Suggested change

.expect("Failed to remove rx ioevent");

.expect("Failed to remove tx ioevent");

I think it would be better to ops.remove them in a row, keep track of the results, and panic after all the removes have been attempted so that events don't incorrectly remain in the event manager in case of a quick death.

That being said, I'm not sure about panicking in library code. There's not much we can do about it yet but if you go ahead with the expects, please add an issue to track removing them once we have metrics.

Yeah this is something to think about. After this PR is merged we want to consolidate common/related device logic even more, so at least we can focus on one place (there's something similar for block right now as well).

lauralt

Looks nice, left a few comments, most of them related to rx/tx processing.

lauralt · 2020-12-21T07:39:40Z

+            bindings::TUN_F_CSUM
+                | bindings::TUN_F_UFO
+                | bindings::TUN_F_TSO4
+                | bindings::TUN_F_TSO6,


Is it necessary/required to add TUN_F_TSO6 if VIRTIO_NET_F_GUEST_TSO6 wasn't negotiated?

You're right, it was missing. Added!

As you recently pointed out, I forgot to add VIRTIO_NET_F_GUEST_TSO6 to the feature list as well. I've done so, and will update the next time I push changes to the PR.

I'll approve once this comment is addressed. Thanks!

lauralt · 2020-12-21T07:53:03Z

+    Tap(io::Error),
+}
+
+impl From<vm_virtio::Error> for Error {


Should we implement From for the other variants as well?

I didn't because it was easier/cleaner to use map_err where the other variants are used. We can add more implementations of From if we get to a point where map_err is unwieldy. Otherwise, I like using it because it's a lot more explicit in terms of what actual error gets returned from a method.

lauralt · 2020-12-21T08:00:16Z

+    pub fn process_tap(&mut self) -> result::Result<(), Error> {
+        loop {
+            if self.rxbuf_current == 0 {
+                match self.tap.read(&mut self.rxbuf) {


I'm not quite sure what happens if we read some data from tap here, but don't have a descriptor chain available for writing to its buffer. Is that data lost? Should we do this read only if we're sure there is at least one descriptor chain?

The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.

If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).

lauralt · 2020-12-21T08:02:45Z

+                break;
+            }
+
+            let len = cmp::min(left, desc.len() as usize);


What happens if len field from the descriptor is always 0 and we don't write any data to memory (count = 0 at the end of this loop)? Does it still make sense to add an entry in the used ring when no data was actually transferred?

It does, because even if we didn't write anything, we still consumed a descriptor chain from the available ring, and now we have to return it to the used ring. The driver is expected to handle stranger cases like the one you mentioned (and also the standard does not prohibit this kind of situation).

lauralt · 2020-12-21T08:08:20Z

+            while let Some(mut chain) = self.txq.iter()?.next() {
+                let len = self.send_frame_from_chain(&mut chain)?;
+
+                self.txq.add_used(chain.head_index(), len)?;


len should be replaced with 0 I think.

That's right!

lauralt · 2020-12-21T08:08:52Z

+            count += len;
+        }
+
+        self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?;


How should we treat the case when less than count bytes could've been written to the tap device? I guess we can add at least a (TODO) comment about this :-?.

As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.

lauralt · 2020-12-21T08:18:37Z

        // We transiently define a mut `Cmdline` object here to pass for device creation
        // (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio
        // functionality), until we figure out how this fits with the rest of the vmm, which
        // apparently does not explicitly use `Cmdline` structs. Will discuss and fix this


It would be nice if we could use Cmdline everywhere (hopefully, we will avoid this way adding spaces in strings like here, but didn't look very much into how is Cmdline implemented).

Yes that's def a thread we should continue. Created #98 to track it.

alexandruag

Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:

The unit tests for the Tap struct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd like Tap to be a separate abstraction (#94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)?
Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?

alexandruag · 2021-03-03T10:23:54Z

+    pub fn cmdline_string(&self) -> String {
+        let mut s = String::new();
+        if self.root_device {
+            s.push_str("root=/dev/vda");


Thats good to know; for now this is used as a String here, and I think there is still an open discussion/todo regarding the most seamless way to leverage Cmdline everywhere.

alexandruag · 2021-03-03T10:34:16Z

+    pub fn cmdline_string(&self) -> String {
+        let mut s = String::new();
+        if self.root_device {
+            s.push_str("root=/dev/vda");


Looks like currently we can only have one block device in the vmm config. Created #97 to not forget about this.

alexandruag · 2021-03-03T12:02:15Z

+    pub fn process_tap(&mut self) -> result::Result<(), Error> {
+        loop {
+            if self.rxbuf_current == 0 {
+                match self.tap.read(&mut self.rxbuf) {


The tap event is registered as edged triggered, which means that it'll fire when something causes more data to be available on the tap, but only once for each such occurrence (so it will not fire continuously while any data is available). Moreover, we read the frame first in self.rxbuf (if it's empty), otherwise we attempt to write the frame that was already present in self.rxbuf to the guest instead of reading from the tap.

If we cannot write to the guest, nothing is lost, because the current frame remains in self.rxbuf. In time, the tap itself can start losing frames that arrive if we cannot read the rest quick enough. Also, it's worth noting that losing frames is acceptable for network devices (whether or not it's preferable to buffering is a bigger discussion).

alexandruag · 2021-03-03T12:10:31Z

+            count += len;
+        }
+
+        self.tap.write(&self.txbuf[..count]).map_err(Error::Tap)?;


As far as I know, the implementation of a tap interface will never cause partial reads/writes, so (in this context) every write is equivalent to a write_all. This is something to keep in mind when we start defining a more generic abstraction.

alexandruag · 2021-03-03T12:11:10Z

+            while let Some(mut chain) = self.txq.iter()?.next() {
+                let len = self.send_frame_from_chain(&mut chain)?;
+
+                self.txq.add_used(chain.head_index(), len)?;


That's right!

alexandruag · 2021-03-03T12:17:35Z

        // We transiently define a mut `Cmdline` object here to pass for device creation
        // (devices expect a `&mut Cmdline` to leverage the newly added virtio_mmio
        // functionality), until we figure out how this fits with the rest of the vmm, which
        // apparently does not explicitly use `Cmdline` structs. Will discuss and fix this


Yes that's def a thread we should continue. Created #98 to track it.

Signed-off-by: Alexandru Agache <aagch@amazon.com>

alexandruag · 2021-03-03T18:16:55Z

Looks like the tap tests run ok on the CI system; in that case the current disadvantage is that ppl running locally have to explicitly provide the required privileges so they can pass. I lowered the coverage due to not having sufficient unit tests as part of the net functionality (the queue handlers in particular). Are you ok with tracking that as a separate issue while things start falling into place more within vm-virtio (and some of the device logic gets consolidated)?

andreeaflorescu · 2021-03-04T10:37:25Z

Hi all, I've updated the PR, answered the previous comments, and opened some new issues. Would be useful to merge the current iteration, so we can add some more consolidation logic and device simplification on top of these changes. Here are some of the current concerns:

The unit tests for the Tap struct taken from Firecracker/crosvm require elevated privileges to run, as you have already noticed. Given that we'd like Tap to be a separate abstraction (Tap abstractions #94), should we just consider the current implementation essentially immutable after we've brought it over and not include those tests as well, or should we temporarily run the tests with more privileges (if they are not so already)?

I think we are more likely to write tests for uncover lines than to come back and re-write the tests such that they do not require sudo (at least this seems to be the case for past experiences). I would prefer us to be in a place where tests can run without polluting the environment.

Not sure how creating and tearing down TAP interfaces should exist within the current integration testing framework (as it's required for integration testing). Does anyone have a good starting idea here? Should we create an issue and solve that separately going forward?

We should add an issue for this. Ideally tests will not leave tap interfaces even when the run fails.

alxiord

LGTM. The tests pass on CI because they already run privileged there; I think we shouldn't invest more in fixing the privilege problem until #94 is done, and I prefer missing tests to overly complex ones - at least until some of the complexity goes away.

Signed-off-by: Alexandru Agache <aagch@amazon.com>

alexandruag · 2021-03-05T14:41:13Z

I removed the Tap tests that required special permissions, and I'll add any missing issues for outstanding work items later today. Please have another look, and thank you!

alexandruag mentioned this pull request Dec 15, 2020

[RFC] Net device proposal #33

Closed

alxiord reviewed Dec 15, 2020

View reviewed changes

lauralt reviewed Dec 21, 2020

View reviewed changes

alexandruag mentioned this pull request Dec 30, 2020

Add vsock device support #52

Closed

lauralt mentioned this pull request Feb 15, 2021

Improve block device CI test #47

Closed

alexandruag commented Mar 3, 2021

View reviewed changes

alexandruag force-pushed the net_pr branch 4 times, most recently from 457a08f to ee66a26 Compare March 3, 2021 17:12

refactor common code

4f949bb

Signed-off-by: Alexandru Agache <aagch@amazon.com>

alexandruag force-pushed the net_pr branch 3 times, most recently from 85659d8 to d0258ce Compare March 3, 2021 17:35

alxiord previously approved these changes Mar 5, 2021

View reviewed changes

alexandruag dismissed alxiord’s stale review via aae0019 March 5, 2021 13:46

alexandruag force-pushed the net_pr branch 2 times, most recently from aae0019 to cae0937 Compare March 5, 2021 14:26

add net support

4f8c596

Signed-off-by: Alexandru Agache <aagch@amazon.com>

alexandruag force-pushed the net_pr branch from cae0937 to 4f8c596 Compare March 5, 2021 14:32

lauralt approved these changes Mar 5, 2021

View reviewed changes

alxiord approved these changes Mar 5, 2021

View reviewed changes

alxiord merged commit 4a03e4f into rust-vmm:master Mar 5, 2021

alexandruag mentioned this pull request Mar 11, 2021

Add CI support for testing VMs with net devices #107

Open

This was referenced Sep 28, 2024

virtio-device: extend VirtioDeviceActions Trait rust-vmm/vm-virtio#291

Open

feat(core/remio): add Remote I/O infrastructure bao-project/bao-hypervisor#176

Merged

	.expect("Failed to remove rx ioevent");
	.expect("Failed to remove tx ioevent");

Conversation

alexandruag commented Dec 15, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lauralt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexandruag left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!