Skip to content

ide: ignore non-DMA commands in enlightened INT13 path#3040

Open
juantian8seattle wants to merge 2 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/ide-enlightened-dma-fix
Open

ide: ignore non-DMA commands in enlightened INT13 path#3040
juantian8seattle wants to merge 2 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/ide-enlightened-dma-fix

Conversation

@juantian8seattle
Copy link
Copy Markdown
Contributor

Fixes #3039

enlightened_hdd_command() does not validate the command byte from guest memory. Non-DMA commands (like READ_SECTORS) produce a PIO buffer that DMA can not drain, so the deferred write completion check never passes and we poll forever.

Only allow DMA commands through this path. Anything else gets ignored with a warning, same as the other early-exit cases (drive error, command pending, bad GPA, etc).

Includes a regression test that reproduces the hang by sending READ_SECTORS via the enlightened port.

@juantian8seattle juantian8seattle marked this pull request as ready for review March 18, 2026 06:29
@juantian8seattle juantian8seattle requested review from a team as code owners March 18, 2026 06:29
Copilot AI review requested due to automatic review settings March 18, 2026 06:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-provided IDE command and rejecting non-DMA commands that would otherwise leave the device in a state where the deferred write never completes.

Changes:

  • Add a DMA-command allowlist in enlightened_hdd_command() and ignore non-DMA commands with a rate-limited warning.
  • Adjust command handling to use the validated command value throughout the enlightened HDD path.
  • Add a regression test that reproduces the prior hang by issuing a PIO command (READ_SECTORS) via the enlightened port.
Comments suppressed due to low confidence (1)

vm/devices/storage/ide/src/lib.rs:471

  • WRITE_DMA_FUA_EXT is included in the DMA allowlist, but the 48-bit LBA setup only runs for READ_DMA_EXT/WRITE_DMA_EXT. WRITE_DMA_FUA_EXT is also a 48-bit command, so skipping the high LBA/sector-count register writes can make the device use stale high bytes and access the wrong LBA. Either handle it in the 48-bit setup path (same register sequence as other *_EXT commands) or remove it from the allowlist if it isn’t supported here.
        if cmd == IdeCommand::READ_DMA_EXT || cmd == IdeCommand::WRITE_DMA_EXT {
            // 48-bit LBA, high 24 bits of logical block address
            self.write_drive_register(
                DriveRegister::LbaLow,
                (eint13_cmd.lba_low >> 24) as u8,
                bus_master_state,
            );

            self.write_drive_register(
                DriveRegister::LbaMid,
                eint13_cmd.lba_high as u8,
                bus_master_state,
            );

            self.write_drive_register(
                DriveRegister::LbaHigh,
                (eint13_cmd.lba_high >> 8) as u8,
                bus_master_state,
            );

            // 48-bit LBA, high 8 bits of sector count
            // Write the low-byte of the sector count
            self.write_drive_register(
                DriveRegister::SectorCount,
                (eint13_cmd.block_count >> 8) as u8,
                bus_master_state,
            );

You can also share your feedback on Copilot code review. Take the survey.

Comment thread vm/devices/storage/ide/src/lib.rs Outdated
Comment on lines +3018 to +3019
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
Comment thread vm/devices/storage/ide/src/lib.rs Outdated
Comment on lines +3012 to +3028
// After fix: non-DMA commands are rejected early
}
IoResult::Defer(mut deferred) => {
// Poll a bounded number of times. With the bug present,
// this never completes (DRQ stays set after IO finishes).
const MAX_POLLS: usize = 1024;
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
ide_device.poll_device(&mut cx);
if let Poll::Ready(result) = deferred.poll_write(&mut cx) {
result.unwrap();
return;
}
}
panic!(
"non-DMA command (READ_SECTORS) via enlightened path \
didn't complete after {MAX_POLLS} polls -- deferred write stuck"
@github-actions
Copy link
Copy Markdown

chris-oo
chris-oo previously approved these changes Mar 18, 2026
Copy link
Copy Markdown
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but not sure if you want someone else on the storage side to review.

Was this a real issue we saw in production?

@smalis-msft
Copy link
Copy Markdown
Contributor

smalis-msft commented Mar 18, 2026

Additionally is this something the fuzzer could've caught?

Copilot AI review requested due to automatic review settings March 18, 2026 21:44
@juantian8seattle juantian8seattle force-pushed the user/juantian/ide-enlightened-dma-fix branch from 277c096 to 2803cc2 Compare March 18, 2026 21:44
@mattkur
Copy link
Copy Markdown
Contributor

mattkur commented Mar 18, 2026

@SrinivasShekar, would you take a look? He's the current IDE device expert, so let's get his review before signing off. Thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-supplied IDE command and rejecting non-DMA commands that would otherwise leave the device stuck with DRQ asserted.

Changes:

  • Add a DMA-command allowlist to enlightened_hdd_command() and early-exit (with a rate-limited warning) for non-DMA commands.
  • Add a regression test that reproduces the historical hang by issuing READ_SECTORS via the enlightened port.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread vm/devices/storage/ide/src/lib.rs
Comment thread vm/devices/storage/ide/src/lib.rs Outdated
Comment on lines +3012 to +3031
// After fix: non-DMA commands are rejected early
}
IoResult::Defer(mut deferred) => {
// Poll a bounded number of times. With the bug present,
// this never completes (DRQ stays set after IO finishes).
const MAX_POLLS: usize = 1024;
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
ide_device.poll_device(&mut cx);
if let Poll::Ready(result) = deferred.poll_write(&mut cx) {
result.unwrap();
return;
}
}
panic!(
"non-DMA command (READ_SECTORS) via enlightened path \
didn't complete after {MAX_POLLS} polls -- deferred write stuck"
);
}
IoResult::Err(e) => panic!("unexpected error: {e:?}"),
@juantian8seattle
Copy link
Copy Markdown
Contributor Author

Addressing review comments:

@chris-oo: Found via OneFuzzer (fuzz_ide target), not a production incident. Defense-in-depth fix for a guest-triggerable emulator thread hang.

@smalis-msft: Yep, the fuzzer did catch it -- there are 4 OneFuzzer bugs for exactly this crash.

WRITE_DMA_FUA_EXT 48-bit path gap (Copilot review): Good catch, but this is a pre-existing bug on main, not introduced by this PR. Filed as #3061 with a separate fix incoming.

Test tightened per review feedback: the test now asserts IoResult::Ok specifically instead of accepting both Ok and Defer. Pushed as a new commit (f3eda27).

@SrinivasShekar
Copy link
Copy Markdown
Contributor

SrinivasShekar commented Mar 19, 2026

LGTM @juantian8seattle . Thanks for the fix.

@juantian8seattle juantian8seattle requested a review from a team March 19, 2026 16:18
@juantian8seattle
Copy link
Copy Markdown
Contributor Author

@mattkur -- Srinivas reviewed and LGTM'd. Mind taking another look so we can get this merged? Thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a hang in the IDE “enlightened INT13” HDD fast-path by rejecting guest-provided non-DMA commands that would otherwise leave DRQ set and prevent deferred completion.

Changes:

  • Add a DMA-command allowlist to enlightened_hdd_command() and early-return IoResult::Ok with a rate-limited warning for non-DMA commands.
  • Add an async regression test that sends READ_SECTORS via the enlightened port and asserts it returns Ok (not Defer).

Comment on lines +2967 to +2968
// Set up a PRD table (the enlightened path always writes it,
// even though READ_SECTORS won't use it)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the enlightened path “always writes” the PRD table, but after the new early-return allowlist the function returns before programming the bus-master PRD for non-DMA commands. Please update the comment to reflect that the PRD table is only used/programmed for allowed DMA commands (or remove the claim).

Suggested change
// Set up a PRD table (the enlightened path always writes it,
// even though READ_SECTORS won't use it)
// Set up a PRD table as a guest would for DMA commands. The enlightened
// path only uses/programs the PRD table for allowed DMA commands, so
// this READ_SECTORS request will be rejected early and won't use it.

Copilot uses AI. Check for mistakes.
.unwrap();

// READ_SECTORS (0x20) is a PIO read command. The enlightened path
// is designed for DMA commands only (READ_DMA_EXT, WRITE_DMA_EXT).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment says the enlightened path is designed for DMA commands only “(READ_DMA_EXT, WRITE_DMA_EXT)”, but the allowlist also permits 28-bit DMA and ALT variants (READ_DMA/READ_DMA_ALT/WRITE_DMA/WRITE_DMA_ALT). Consider updating the comment to avoid implying only the EXT commands are supported.

Suggested change
// is designed for DMA commands only (READ_DMA_EXT, WRITE_DMA_EXT).
// is intended for DMA commands (for example, READ_DMA_EXT, WRITE_DMA_EXT,
// and other DMA/ALT variants), not PIO commands like READ_SECTORS.

Copilot uses AI. Check for mistakes.
@mattkur
Copy link
Copy Markdown
Contributor

mattkur commented Mar 25, 2026

@mattkur -- Srinivas reviewed and LGTM'd. Mind taking another look so we can get this merged? Thanks!

Apologies for the slow reply here. I took a look and I think it looks good. Mind doing two things?

  1. Please take a look at the latest copilot review feedback. Do you agree?
  2. I spent some time convincing myself that this change is not needed on the enlightened CD path. Can you confirm the same?

@juantian8seattle
Copy link
Copy Markdown
Contributor Author

@mattkur — Thanks for the review. Two items addressed:

1. Copilot feedback: Agreed with both comment suggestions. Updated the PRD table comment and broadened the DMA commands list to include 28-bit/ALT variants. Pushed as a new commit.

2. CD path: Confirmed — this fix is not needed on the CD path. \�nlightened_cd_command()\ hard-codes \PACKET_COMMAND\ and constructs the SCSI READ CDB internally, so the bug pattern from the HDD path cannot occur there.

mattkur
mattkur previously approved these changes Mar 25, 2026
juantian added 2 commits April 3, 2026 21:47
enlightened_hdd_command() does not validate the command byte from
guest memory. Non-DMA commands (like READ_SECTORS) produce a PIO
buffer that DMA can not drain, so the deferred write completion
check never passes and we poll forever.

Only allow DMA commands through this path. Anything else gets
ignored with a warning, same as the other early-exit cases
(drive error, command pending, bad GPA, etc).

Add a regression test that reproduces the hang by sending
READ_SECTORS via the enlightened port.
The test previously accepted both IoResult::Ok and IoResult::Defer.
Since the fix rejects non-DMA commands early, assert Ok specifically
so a future regression that re-accepts non-DMA commands gets caught.

Also removes unused Poll import (clippy unused-qualifications).
Copilot AI review requested due to automatic review settings April 3, 2026 22:35
@juantian8seattle juantian8seattle force-pushed the user/juantian/ide-enlightened-dma-fix branch from bce2535 to fdaa820 Compare April 3, 2026 22:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread vm/devices/storage/ide/src/lib.rs
Comment thread vm/devices/storage/ide/src/lib.rs
Comment thread vm/devices/storage/ide/src/lib.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ide: enlightened INT13 path hangs forever on non-DMA commands

7 participants