Skip to content

fix(riscv): set FP state before restore or clear#28

Merged
AsakuraMizu merged 3 commits intoarceos-org:devfrom
LetsWalkInLine:fix/rustsbi-boot-failure
Mar 23, 2026
Merged

fix(riscv): set FP state before restore or clear#28
AsakuraMizu merged 3 commits intoarceos-org:devfrom
LetsWalkInLine:fix/rustsbi-boot-failure

Conversation

@LetsWalkInLine
Copy link
Copy Markdown
Contributor

Background

When booting downstream StarryOS on riscv64 with RustSBI, the kernel could panic with:

Root Cause

In axcpu RISC-V fp-simd context switching, FP instructions can be executed in restore/clear paths while sstatus.FS is still Off on some boot paths/harts.
When FS=Off, executing FP instructions (e.g. fld / fmv.d.x) traps as IllegalInstruction.
So the issue is not OS entry order itself, but a missing local precondition in the axcpu FP context switch path.

A common trigger is switching from a task that does not use FP (FS=Off) to one that needs FP state restore/initialization, where FP instructions in the switch path can trap before the next task actually runs.

Changes

This PR fixes the issue at the root-cause location in axcpu:

  1. src/riscv/context.rs
    • Add a small helper:
      • FpState::enable_fp_instructions()
    • In FpState::switch_to():
      • Before next_fp_state.restore() (FS::Clean branch), explicitly enable FP execution.
      • Before FpState::clear() (FS::Initial branch), explicitly enable FP execution.
    • Keep existing post-switch behavior unchanged:
      • still sets sstatus.FS to next_fp_state.fs at the end.

Validation Performed

  • Local:
    • cargo fmt --all -- --check passes.
  • Behavioral/functional validation is from downstream reproduction:
    • With this fix concept applied, RustSBI boot no longer hits the IllegalInstruction trap in fp-simd switching path.

Impact

  • Makes RISC-V fp-simd context switching robust when firmware leaves sstatus.FS=Off.

Files Included In This PR

  • src/riscv/context.rs

I’m still learning kernel development, so feedback is very welcome.
If any part of this fix should be done differently, I’m happy to revise.

On some boot paths/firmware, sstatus.FS may remain Off before task
switch.
In that state, fp-simd restore/clear can execute FP instructions and
trap
with IllegalInstruction.
Fix this in the fp context switch path by explicitly enabling FP
execution
before restore/clear, then keep existing behavior of setting FS to
next_fp_state.fs after the switch.
This keeps the fix in axcpu (root cause location) and avoids relying on
OS-specific early init ordering.
Copilot AI review requested due to automatic review settings March 21, 2026 06:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens RISC-V FP/SIMD context switching by ensuring FP instructions won’t trap when restoring/clearing FP state during FpState::switch_to(), addressing firmware-dependent cases where sstatus.FS can be Off.

Changes:

  • Add FpState::enable_fp_instructions() helper to ensure FP instructions are enabled before FP register operations.
  • Call the helper before restore() (when FS::Clean) and before clear() (when FS::Initial) in FpState::switch_to().
  • Preserve existing behavior of setting sstatus.FS to next_fp_state.fs at the end of the switch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/riscv/context.rs Outdated
/// `Off`, where FP instructions trap as `IllegalInstruction`
#[inline]
fn enable_fp_instructions() {
unsafe { sstatus::set_fs(FS::Dirty) };
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_fp_instructions() currently sets sstatus.FS to Dirty. Since the goal is only to permit executing FP instructions during restore/clear, setting FS::Initial (or setting FS to the target state you’re about to restore, e.g. Clean/Initial) is sufficient and avoids briefly marking the hart’s FP state as dirty. This matches existing usage (e.g. UserContext::new sets FS::Initial in src/riscv/uspace.rs).

Suggested change
unsafe { sstatus::set_fs(FS::Dirty) };
unsafe { sstatus::set_fs(FS::Initial) };

Copilot uses AI. Check for mistakes.
Comment thread src/riscv/context.rs Outdated
/// Enables FP/SIMD instructions before issuing FP register operations.
///
/// On some boot paths (e.g., with different firmware), `sstatus.FS` may be
/// `Off`, where FP instructions trap as `IllegalInstruction`
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment: the sentence ending with IllegalInstruction is missing terminal punctuation; consider ending it with a period for consistency with surrounding docs.

Suggested change
/// `Off`, where FP instructions trap as `IllegalInstruction`
/// `Off`, where FP instructions trap as `IllegalInstruction`.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/riscv/context.rs Outdated
Comment on lines +114 to +117
FS::Initial => {
FpState::enable_fp_instructions();
FpState::clear(); // restore the FP state as constant values(all 0)
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FS::Initial path you clear FP registers but do not reset fcsr. That means a task with next_fp_state.fs == FS::Initial can inherit the previous task’s rounding mode/exception flags, so its first FP use may observe non-default fcsr despite the state being treated as “initial”. Consider explicitly resetting fcsr to 0 alongside the register clear (e.g., write fcsr/fscsr after clear() or make clear_fp_registers also clear fcsr).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@equation314 equation314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all we need to do is move sstatus::set_fs (line 106) before the match block (line 100).

Move sstatus::set_fs(next_fp_state.fs) before the restore/clear match
so FP instructions are executable when firmware leaves FS=Off.
@LetsWalkInLine
Copy link
Copy Markdown
Contributor Author

I think all we need to do is move sstatus::set_fs (line 106) before the match block (line 100).

Applied, thanks for the suggestion, this is indeed the clearest way.,I moved sstatus::set_fs(next_fp_state.fs) before the match block and removed the helper, so restore/clear no longer run with FS=Off.

@AsakuraMizu
Copy link
Copy Markdown
Contributor

I think all we need to do is move sstatus::set_fs (line 106) before the match block (line 100).

Applied, thanks for the suggestion, this is indeed the clearest way.,I moved sstatus::set_fs(next_fp_state.fs) before the match block and removed the helper, so restore/clear no longer run with FS=Off.

Do you test it with RustSBI?

@LetsWalkInLine
Copy link
Copy Markdown
Contributor Author

I think all we need to do is move sstatus::set_fs (line 106) before the match block (line 100).

Applied, thanks for the suggestion, this is indeed the clearest way.,I moved sstatus::set_fs(next_fp_state.fs) before the match block and removed the helper, so restore/clear no longer run with FS=Off.

Do you test it with RustSBI?

Yep, I tested with RustSBI after applying this fix.
For validation, I placed my patched axcpu repository under the StarryOS workspace and configured StarryOS to use it via Cargo patching in Cargo.toml. I also kept the QEMU BIOS configurability changes (QEMU_BIOS in make/Makefile and make/qemu.mk) so I could switch firmware cleanly between OpenSBI/default BIOS and RustSBI.
Then I built and ran StarryOS in Docker, and successfully booted into the shell in both cases:

  • OpenSBI/default BIOS
  • RustSBI (rustsbi-prototyper-dynamic.bin)
    Current limitation: I have not validated this fix on real hardware/dev boards yet.

here is the log:

[RustSBI] INFO  - Hello RustSBI!
[RustSBI] INFO  - RustSBI version 0.4.0
[RustSBI] INFO  - .______       __    __      _______.___________.  _______..______   __
[RustSBI] INFO  - |   _  \     |  |  |  |    /       |           | /       ||   _  \ |  |
[RustSBI] INFO  - |  |_)  |    |  |  |  |   |   (----`---|  |----`|   (----`|  |_)  ||  |
[RustSBI] INFO  - |      /     |  |  |  |    \   \       |  |      \   \    |   _  < |  |
[RustSBI] INFO  - |  |\  \----.|  `--'  |.----)   |      |  |  .----)   |   |  |_)  ||  |
[RustSBI] INFO  - | _| `._____| \______/ |_______/       |__|  |_______/    |______/ |__|
[RustSBI] INFO  - Initializing RustSBI machine-mode environment.
[RustSBI] INFO  - Platform Name                 : riscv-virtio,qemu
[RustSBI] INFO  - Platform HART Count           : 1
[RustSBI] INFO  - Enabled HARTs                 : [0]
[RustSBI] INFO  - Platform IPI Extension        : SiFiveClint (Base Address: 0x2000000)
[RustSBI] INFO  - Platform Console Extension    : Uart16550U8 (Base Address: 0x10000000)
[RustSBI] INFO  - Platform Reset Extension      : Available (Base Address: 0x100000)
[RustSBI] INFO  - Platform HSM Extension        : Available
[RustSBI] INFO  - Platform RFence Extension     : Available
[RustSBI] INFO  - Platform SUSP Extension       : Available
[RustSBI] INFO  - Platform PMU Extension        : Available
[RustSBI] INFO  - Memory range                  : 0x80000000 - 0xc0000000
[RustSBI] INFO  - Platform Status               : Platform initialization complete and ready.
[RustSBI] INFO  - PMP Configuration
[RustSBI] INFO  - PMP   Range      Permission      Address
[RustSBI] INFO  - 0     OFF        NONE            0x0000000000000000
[RustSBI] INFO  - 1     TOR        RWX             0x0000000080000000
[RustSBI] INFO  - 2     TOR        RWX             0x0000000080000000
[RustSBI] INFO  - 3     TOR        R               0x0000000080026000
[RustSBI] INFO  - 4     TOR        NONE            0x0000000080035000
[RustSBI] INFO  - 5     TOR        RW              0x0000000080071000
[RustSBI] INFO  - 6     TOR        RWX             0x00000000c0000000
[RustSBI] INFO  - 7     TOR        RWX             0xfffffffffffffffc
[RustSBI] INFO  - Boot HART ID                  : 0
[RustSBI] INFO  - Boot HART Privileged Version: : Version1_12
[RustSBI] INFO  - Boot HART MHPM Mask:          : 0x07ffff
[RustSBI] INFO  - The patched dtb is located at 0x8005c000 with length 0x15b0.
[RustSBI] INFO  - Redirecting hart 0 to 0x00000080200000 in Supervisor mode.

       d8888                            .d88888b.   .d8888b.
      d88888                           d88P" "Y88b d88P  Y88b
     d88P888                           888     888 Y88b.
    d88P 888 888d888  .d8888b  .d88b.  888     888  "Y888b.
   d88P  888 888P"   d88P"    d8P  Y8b 888     888     "Y88b.
  d88P   888 888     888      88888888 888     888       "888
 d8888888888 888     Y88b.    Y8b.     Y88b. .d88P Y88b  d88P
d88P     888 888      "Y8888P  "Y8888   "Y88888P"   "Y8888P"

arch = riscv64
platform = riscv64-qemu-virt
target = riscv64gc-unknown-none-elf
build_mode = release
log_level = warn
backtrace = true
smp = 1

Boot at 2026-03-23 06:38:57.011093200 UTC

[  0.904536 0 axnet_ng:139]   No vsock device found!
[  0.907012 0 axdisplay:26]   No display device found!
Welcome to Starry OS!
SHLVL=1
HOME=/root
PWD=/

Use apk to install packages.

starry:~#

@equation314
Copy link
Copy Markdown
Member

@LetsWalkInLine Could you resubmit this PR to the main branch, which will be included in the v0.3.0 release?

@LetsWalkInLine
Copy link
Copy Markdown
Contributor Author

@LetsWalkInLine Could you resubmit this PR to the main branch, which will be included in the v0.3.0 release?

Sure, thanks for the guidance
I’ll resubmit this fix as a new PR targeting main and share the new PR link here once it’s up.

@AsakuraMizu AsakuraMizu changed the title fix(riscv): enable FP before fp-simd restore/clear in switch_to fix(riscv): set FP state before restore or clear Mar 23, 2026
@AsakuraMizu AsakuraMizu merged commit 79ea310 into arceos-org:dev Mar 23, 2026
6 checks passed
@LetsWalkInLine LetsWalkInLine deleted the fix/rustsbi-boot-failure branch March 23, 2026 13:08
@LetsWalkInLine LetsWalkInLine restored the fix/rustsbi-boot-failure branch March 23, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants