Skip to content

Add support for virtio device reset#5891

Open
ilstam wants to merge 25 commits into
firecracker-microvm:mainfrom
ilstam:device-reset
Open

Add support for virtio device reset#5891
ilstam wants to merge 25 commits into
firecracker-microvm:mainfrom
ilstam:device-reset

Conversation

@ilstam
Copy link
Copy Markdown
Contributor

@ilstam ilstam commented May 15, 2026

Add support for virtio device reset

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

ilstam added 25 commits May 15, 2026 17:35
The reset() trait method returns Option<(Arc<dyn VirtioInterrupt>,
Vec<EventFd>)> for no apparent reason and the values are never used by
any caller. Change the return value to a bool to signify success or
failure.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Simplify the MMIO transport reset logic by always performing the MMIO
transport reset unconditionally and calling device.reset() regardless of
activation state.

This requires updating the affected unit tests and implementing reset()
for the DummyDevice. The unit test assumed that reset() is never
supported (i.e. always returns false), but that is going to change soon
so we want to make sure that resetting moves the device to the
deactivated state.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The reset success path locked self.device, then tried to lock it again
via self.virtio_device().lock() to reset queues. This was a deadlock
that was never triggered because no device previously implemented
reset() (all returned false). Use the already-held guard instead.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Don't clear virtio_interrupt on reset. The interrupt object holds Arc
references to the PCI device's MSI-X configuration which remains valid
across reset. Clearing it will cause a panic at the unwrap() in the
activation path when the guest re-probes the device after reset.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reset device_feature_select and driver_feature_select to 0 on device
reset, matching what the MMIO transport does with features_select and
acked_features_select.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add test_bus_device_reset_failure to verify that when device.reset()
returns false, the FAILED bit is set in device_status.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a deactivate() method that sets the device state to Inactive.
This will be used by the generic reset implementation in a subsequent
patch.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Move the queue reset logic from the MMIO and PCI transport code into
the default reset() implementation in the VirtioDevice trait. This is
generic virtio state that should be reset for all devices, regardless
of transport.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Complete the reset() implementation by deactivating the device and
setting acked_features to 0. This must happen for all virtio backends.

Introduce a _reset() method that is called at the end of reset() and can
be overridden by virtio backends with backend specific reset code. The
default _reset() returns false, indicating the backend does not support
reset.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a clear() method to RxBuffers that resets all fields to their
initial state. This will be used by the virtio-net reset implementation.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the reset() method for the virtio-net device, resetting the
net specific state in-place.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-net device reset works end-to-end by
unbinding and rebinding the guest driver.

Suggested-by: Adam Jensen <adam@acj.sh>
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the reset() method for the virtio-block device, resetting the
device-specific state in-place. This only adds support for the Virtio
backend for now and not VhostUser.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-block device reset works end-to-end by
unbinding and rebinding the guest driver for a scratch block device.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Pmem does not have any backend specific state that needs resetting so
implement the method by simply returning true. The rest of the state is
handled by the generic reset().

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-pmem device reset works end-to-end by
unbinding and rebinding the guest driver.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the reset() method for the virtio-balloon device, resetting
the balloon specific state in-place. Do not deflate the balloon on
reset.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-balloon device reset works end-to-end
by unbinding and rebinding the guest driver.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The entropy device does not have any backend specific state that needs
resetting so implement the method by simply returning true.

The test_failed_reset_blocks_reinitialization() test used an entropy
device assuming it doesn't support reset. Since it now does, adjust the
test to check that the device is first deactivated after a reset and
then the driver can perform the initialization sequence again.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-rng device reset works end-to-end by
unbinding and rebinding the guest driver.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the reset() method for the virtio-vsock device, resetting the
vsock specific state in-place.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-vsock device reset works end-to-end by
unbinding and rebinding the guest driver.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The virtio-mem device does not have any backend specific state that
needs resetting so implement the method by simply returning true.
Plugged memory regions remain intact.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-mem device reset works end-to-end by
unbinding and rebinding the guest driver.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
@ilstam ilstam requested review from Manciukic and micz010 as code owners May 15, 2026 16:40
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 41.23711% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.77%. Comparing base (9b53e62) to head (e3917dd).

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/block/device.rs 0.00% 12 Missing ⚠️
src/vmm/src/devices/virtio/balloon/device.rs 0.00% 8 Missing ⚠️
src/vmm/src/devices/virtio/vsock/device.rs 0.00% 8 Missing ⚠️
src/vmm/src/devices/virtio/vsock/packet.rs 0.00% 8 Missing ⚠️
src/vmm/src/devices/virtio/mem/device.rs 0.00% 6 Missing ⚠️
src/vmm/src/devices/virtio/pmem/device.rs 0.00% 6 Missing ⚠️
.../vmm/src/devices/virtio/block/vhost_user/device.rs 0.00% 3 Missing ⚠️
src/vmm/src/devices/virtio/block/virtio/device.rs 0.00% 3 Missing ⚠️
src/vmm/src/devices/virtio/device.rs 80.00% 2 Missing ⚠️
src/vmm/src/devices/virtio/transport/pci/device.rs 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5891      +/-   ##
==========================================
- Coverage   82.90%   82.77%   -0.13%     
==========================================
  Files         277      277              
  Lines       30053    30115      +62     
==========================================
+ Hits        24914    24929      +15     
- Misses       5139     5186      +47     
Flag Coverage Δ
5.10-m5n.metal 83.07% <41.23%> (-0.13%) ⬇️
5.10-m6a.metal 82.40% <41.23%> (-0.14%) ⬇️
5.10-m6g.metal 79.75% <41.23%> (-0.09%) ⬇️
5.10-m6i.metal 83.06% <41.23%> (-0.14%) ⬇️
5.10-m7a.metal-48xl 82.39% <41.23%> (-0.14%) ⬇️
5.10-m7g.metal 79.75% <41.23%> (-0.09%) ⬇️
5.10-m7i.metal-24xl 83.03% <41.23%> (-0.14%) ⬇️
5.10-m7i.metal-48xl 83.04% <41.23%> (-0.14%) ⬇️
5.10-m8g.metal-24xl 79.75% <41.23%> (-0.08%) ⬇️
5.10-m8g.metal-48xl 79.75% <41.23%> (-0.09%) ⬇️
5.10-m8i.metal-48xl 83.04% <41.23%> (-0.14%) ⬇️
5.10-m8i.metal-96xl 83.04% <41.23%> (-0.14%) ⬇️
6.1-m5n.metal 83.09% <41.23%> (-0.14%) ⬇️
6.1-m6a.metal 82.42% <41.23%> (-0.14%) ⬇️
6.1-m6g.metal 79.75% <41.23%> (-0.09%) ⬇️
6.1-m6i.metal 83.08% <41.23%> (-0.14%) ⬇️
6.1-m7a.metal-48xl 82.41% <41.23%> (-0.14%) ⬇️
6.1-m7g.metal 79.75% <41.23%> (-0.09%) ⬇️
6.1-m7i.metal-24xl 83.10% <41.23%> (-0.13%) ⬇️
6.1-m7i.metal-48xl 83.10% <41.23%> (-0.14%) ⬇️
6.1-m8g.metal-24xl 79.75% <41.23%> (-0.09%) ⬇️
6.1-m8g.metal-48xl 79.75% <41.23%> (-0.08%) ⬇️
6.1-m8i.metal-48xl 83.10% <41.23%> (-0.14%) ⬇️
6.1-m8i.metal-96xl 83.10% <41.23%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/vmm/src/devices/virtio/transport/pci/device.rs Outdated
b.is_io_engine_throttled = false;
true
}
Self::VhostUser(_) => false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could leave a comment about this being unsupported

Comment on lines +387 to +391
fn _reset(&mut self) -> bool {
self.rx_packet.clear();
self.tx_packet.clear();
true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some backend state to reset? what about the conn tracking etc? We should probably drop connections, etc.

Also, cc @JackThomson2 as this may be a subtle conflict with your PR (depending on who merges first haha)

Comment on lines +397 to +401
# Rebind and verify it's back.
vm.ssh.check_output(
f"echo {virtio_dev} > /sys/bus/virtio/drivers/vmw_vsock_virtio_transport/bind"
)
vm.ssh.check_output("ls /sys/bus/virtio/drivers/vmw_vsock_virtio_transport/virtio*")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check the device still woks

Comment on lines +663 to +666
fn _reset(&mut self) -> bool {
true
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be confusing. We could point to the spec:

The device MUST NOT change the state of memory blocks during device reset
The device MUST NOT modify memory or memory properties of plugged memory blocks during device reset. 

for queue in self.queues_mut() {
*queue = Queue::new(queue.max_size);
}
self._reset()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called before the rest? This is a semantic change from the old code where a failed reset left the device activated.

self.device_state = DeviceState::Inactive;
}

fn _reset(&mut self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we disarm the stats timer?

self.device_state.is_activated()
}

fn deactivate(&mut self) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the rate limiters? are we assuming we want to preserve the state across reset?

vm.ssh.check_output(
f"echo {virtio_dev} > /sys/bus/virtio/drivers/virtio_balloon/bind"
)
vm.ssh.check_output("ls /sys/bus/virtio/drivers/virtio_balloon/virtio*")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check device is functional

Comment thread tests/integration_tests/functional/test_pmem.py
Comment thread src/vmm/src/devices/virtio/device.rs
Comment thread src/vmm/src/devices/virtio/block/device.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants