Test for-next (regular, SELF kvm) by kdave · Pull Request #1624 · btrfs/linux

kdave · 2026-03-05T18:37:18Z

Keep this open, the build tests are on self-hosted workers.

kdave · 2026-03-05T18:48:59Z

While debugging a relocation issue I hit an assertion in backref.c but it was not super useful, since it could not tell what was the unexpected value that triggered the assertion. The stack trace was this: [583246.338097] assertion failed: !cache->nr_nodes, in fs/btrfs/backref.c:3158 [583246.339588] ------------[ cut here ]------------ [583246.340573] kernel BUG at fs/btrfs/backref.c:3158! [583246.342075] Oops: invalid opcode: 0000 [#1] SMP PTI [583246.343294] CPU: 5 UID: 0 PID: 677957 Comm: btrfs Not tainted 7.1.0-rc4-btrfs-next-234+ #1 PREEMPT(full) [583246.345715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [583246.348694] RIP: 0010:btrfs_backref_release_cache.cold+0x61/0x84 [btrfs] [583246.350759] Code: 90 d5 7c (...) [583246.354923] RSP: 0018:ffffd4fc88c93ad8 EFLAGS: 00010246 [583246.355982] RAX: 000000000000003e RBX: ffff8dec90d97020 RCX: 0000000000000000 [583246.357459] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff [583246.359517] RBP: ffff8dec8eeb78c0 R08: 0000000000000000 R09: 3fffffffffefffff [583246.361180] R10: ffffd4fc88c93970 R11: 0000000000000003 R12: ffff8decd21f3470 [583246.363184] R13: 00000000fffffffe R14: ffff8decd21f3000 R15: ffff8decd21f3000 [583246.364666] FS: 00007f9a51751400(0000) GS:ffff8df3f4255000(0000) knlGS:0000000000000000 [583246.366287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [583246.367443] CR2: 00007f9a518ed8f5 CR3: 00000004467c8002 CR4: 0000000000370ef0 [583246.368969] Call Trace: [583246.369541] <TASK> [583246.370040] relocate_block_group+0xf2/0x520 [btrfs] [583246.371243] btrfs_relocate_block_group+0x9a9/0x22e0 [btrfs] [583246.372443] ? preempt_count_add+0x47/0xa0 [583247.532978] ? btrfs_tree_read_lock_nested+0x19/0x90 [btrfs] [583247.534520] ? mutex_lock+0x1a/0x40 [583247.602233] ? btrfs_scrub_pause+0x2e/0x120 [btrfs] [583247.603543] btrfs_relocate_chunk+0x3b/0x1a0 [btrfs] [583247.604893] btrfs_balance+0x9d5/0x1920 [btrfs] [583247.606189] ? preempt_count_add+0x69/0xa0 [583247.607030] btrfs_ioctl+0x260c/0x2a20 [btrfs] [583247.608015] ? __memcg_slab_free_hook+0x156/0x1a0 [583247.636971] __x64_sys_ioctl+0x92/0xe0 [583247.679247] do_syscall_64+0x60/0xf20 [583247.753297] ? clear_bhb_loop+0x60/0xb0 [583247.756321] entry_SYSCALL_64_after_hwframe+0x76/0x7e [583247.787018] RIP: 0033:0x7f9a5186a8db [583247.787787] Code: 00 48 89 (...) [583247.791410] RSP: 002b:00007fff2ffa6ac0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [583247.792897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f9a5186a8db [583247.794319] RDX: 00007fff2ffa6bb0 RSI: 00000000c4009420 RDI: 0000000000000003 [583247.795714] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [583247.797149] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff2ffa903f [583247.798685] R13: 00007fff2ffa6bb0 R14: 0000000000000002 R15: 0000000000000002 [583247.800136] </TASK> So update all simple assertions in backref.c to print out the values when they aren't testing simple boolean conditions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

@ret

[BUG] The test case generic/362 will fail with "nodatasum" mount option (*): MOUNT_OPTIONS -- -o nodatasum /dev/mapper/test-scratch1 /mnt/scratch generic/362 0s ... - output mismatch (see /home/adam/xfstests/results//generic/362.out.bad) --- tests/generic/362.out 2024-08-24 15:31:37.200000000 +0930 +++ /home/adam/xfstests/results//generic/362.out.bad 2026-05-27 10:21:17.574771567 +0930 @@ -1,2 +1,3 @@ QA output created by 362 +First write failed: Input/output error Silence is golden ... *: If the test case has been executed before with default data checksum, the failure will not reproduce. Need the following fix to make it reliably reproducible: https://lore.kernel.org/linux-btrfs/20260528111659.87113-1-wqu@suse.com/ [CAUSE] Inside __iomap_dio_rw(), the -EFAULT/-ENOTBLK error is not directly returned. Thus we never got an error pointer from __iomap_dio_rw(). The call chain looks like this: btrfs_direct_write() |- btrfs_dio_write() |- __iomap_dio_rw() | |- iomap_iter() | | |- btrfs_dio_iomap_begin() | | Now an ordered extent is allocated for the 4K write. | | | |- iomi.status = iomap_dio_iter() | | Where iomap_dio_iter() returned -EFAULT. | | | |- ret = iomap_iter() | | |- btrfs_dio_iomap_end() | | | |- btrfs_finish_ordered_extent(uptodate = false) | | | | |- can_finish_ordered_extent() | | | | |- btrfs_mark_ordered_extent_error() | | | | |- mapping_set_error() | | | | Now the address space is marked error. | | | | return -ENOTBLK | | |- return -ENOTBLK | |- if (ret == -ENOTBLK) { ret = 0; } | Now the return value is reset to 0. | Thus no error pointer will be returned. | |- ret = iomap_dio_complete() | Since no byte is submitted, @ret is 0. | |- Fallback to buffered IO | And the buffered write finished without error | |- filemap_fdatawait_range() |- filemap_check_errors() The previous error is recorded, thus an error is returned However the buffered write is properly submitted and finished, the error is from the btrfs_finish_ordered_extent() call with @uptodate = false. [FIX] When a short dio write happened, any range that is submitted will have btrfs_extract_ordered_extent() to be called, thus the submitted range will always have an OE just covering the submitted range. The remaining OE range is never submitted, thus they should be treated as truncated, not an error. So that we can properly reclaim and not insert an unnecessary file extent item, without marking the mapping as error. Extract a helper, btrfs_mark_ordered_extent_truncated(), and utilize that helper to mark the direct IO ordered extent as truncated, so it won't cause failure for the later buffered fallback. [REASON FOR NO FIXES TAG] The bug itself is pretty old, at commit f85781f ("btrfs: switch to iomap for direct IO") we're already passing @uptodate=false finishing the OE. But at that time OE with IOERR won't call mapping_set_error(), so it's not exposed. Later commit d61bec0 ("btrfs: mark ordered extent and inode with error if we fail to finish") finally exposed the bug, but that commit is doing a correct job, not the root cause. Anyway the bug is very old, dating back to 5.1x days, thus only CC to stable. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] With the previous bug of short direct writes fixed, test case generic/362 (*) still fails with the following error with nodatasum mount option: generic/362 0s ... - output mismatch (see /home/adam/xfstests/results//generic/362.out.bad) - output mismatch (see /home/adam/xfstests/results//generic/362.out.bad) --- tests/generic/362.out 2024-08-24 15:31:37.200000000 +0930 +++ /home/adam/xfstests/results//generic/362.out.bad 2026-05-27 10:13:09.072485767 +0930 @@ -1,2 +1,3 @@ QA output created by 362 +Wrong file size after first write, got 8192 expected 4096 Silence is golden ... *: If the test case has been executed before with default data checksum, the failure will not reproduce. Need the following fix to make it reliably reproducible: https://lore.kernel.org/linux-btrfs/20260528111659.87113-1-wqu@suse.com/ [CAUSE] Inside btrfs_dio_iomap_begin() for a direct write, we increase the isize if it's beyond the current isize. But if the direct io finished short, we do not revert the isize to the previous value nor to the short write end. Then if we need to fall back to buffered writes, and the write has IOCB_APPEND flag, then the buffered write will be positioned at the incorrect isize. The call chain looks like this: btrfs_direct_write(pos=0, length=4K) |- __iomap_dio_rw() | |- iomap_iter() | | |- btrfs_dio_iomap_begin() | | |- btrfs_get_blocks_direct_write() | | |- i_size_write() | | Which updates the isize to the write end (4K). | | | |- iomap_dio_iter() | | Failed with -EFAULT on the first page. | | | |- iomap_iter() | | |- btrfs_dio_iomap_end() | | Detects a short write, return -ENOTBLK | |- if (ret == -ENOTBLK) { ret = 0;} | Which resets the return value. | |- ret = iomap_dio_complet() | Which returns 0. | |- btrfs_buffered_write(iocb, from); |- generic_write_checks() |- iocb->ki_pos = i_size_read() Which is still the new size (4K), other than the original isize 0. [FIX] Introduce the following btrfs_dio_data members: - old_isize - updated_isize If the direct write has enlarged the isize. Then if we got a short write, and btrfs_dio_data::updated_isize is set, revert to the correct isize based on old_isize and current file position. And here we call i_size_write() without holding an extent lock, which is a very special case that we're safe to do: - Only a single writer can be enlarging isize Enlarging isize will take the exclusive inode lock. - Buffered readers need to wait for the OE we're holding Buffered readers will lock extent and wait for OE of the folio range. Sometimes we can skip the OE wait, but since all page cache is invalidated, the OE wait can not be skipped. But I do not think this is the most elegant solution, nor covers all cases. E.g. if the bio is submitted but IO failed, we are unable to do the revert. I believe the more elegant one would be extend the EXTENT_DIO_LOCKED lifespan for direct writes, so that we can update the isize when a write beyond EOF finished successfully. However that change is too huge for a small bug fix. So only implement the minimal partial fix for now. [REASON FOR NO FIXES TAG] The bug is again very old, before commit f85781f ("btrfs: switch to iomap for direct IO") we are already increasing isize without a proper rollback for short writes. Thus only a CC to stable. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

@ret

Currently btrfs_direct_write() will not try to fault in the pages, but directly fall back to buffered writes, if the first page of the buffer can not be faulted in. For example, during generic/362 with nodatasum mount option, there is a write at file offset 0, length PAGE_SIZE, and the page is not faulted in. Then we go the following callchain and directly fall back to buffered IO: btrfs_direct_write() |- btrfs_dio_write() |- __iomap_dio_rw() | |- iomap_iter() | | |- btrfs_dio_iomap_begin() | | Now an ordered extent is allocated for the 4K write. | | | |- iomi.status = iomap_dio_iter() | | Where iomap_dio_iter() returned -EFAULT. | | | |- ret = iomap_iter() | | |- btrfs_dio_iomap_end() | | | | return -ENOTBLK | | |- return -ENOTBLK | |- if (ret == -ENOTBLK) { ret = 0; } | Now the return value is reset to 0. | |- ret = iomap_dio_complete() | Since no byte is submitted, @ret is now zero. | |- if (iov_iter_count() > 0 && (ret == -EFAULT || ret > 0)) | @ret is zero, thus not meeting the above retry condition | |- Fallback to buffered Just slightly loosen the condition to allow retry faulting in pages after a zero sized short write. Unlike the previous two bug fixes, this one is not really cause any real bug, but only reducing the chance to do zero-copy direct IO. Thus it doesn't really require stable-CC nor fixes-tag. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…nput lzo_decompress_bio() validates each on-disk segment length seg_len only against the workspace cbuf size, not against the compressed input size (compressed_len, the total folio bytes of the bio). A crafted extent can carry a segment whose seg_len passes the cbuf check but runs past the end of the bio, so copy_compressed_segment() walks off the last folio: get_current_folio() then returns the NULL folio from bio_next_folio(), and with CONFIG_BTRFS_ASSERT disabled (default) folio_size(NULL) faults. BUG: KASAN: null-ptr-deref in lzo_decompress_bio (fs/btrfs/lzo.c:383) Read of size 8 at addr 0000000000000000 by task kworker/u8:1/29 Workqueue: btrfs-endio simple_end_io_work kasan_report (mm/kasan/report.c:590) lzo_decompress_bio (fs/btrfs/lzo.c:383) end_bbio_compressed_read (fs/btrfs/compression.c:1065) btrfs_bio_end_io (fs/btrfs/bio.c:135) btrfs_check_read_bio (fs/btrfs/bio.c:180 fs/btrfs/bio.c:285) simple_end_io_work process_one_work worker_thread Reject any segment whose payload would extend beyond compressed_len before copying it, treating it as corruption like the other on-disk validation failures in this function. Reported-by: Xiang Mei <xmei5@asu.edu> Fixes: a6e66e6 ("btrfs: rework lzo_decompress_bio() to make it subpage compatible") Assisted-by: Claude:claude-opus-4-8 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>

It does not make sense for the single caller to have the responsability to lock the relocation mutex before calling the function and then have the function to assert the lock is held. As this is a function in relocation.c, move the locking details into it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There's no point in having the WARN_ON(1) inside the if statement for the unexpected error. Move it into the if statement's condition, which brings a couple benefits: 1) It marks the branch as unlikely, hinting the compiler to generate better code; 2) The WARN_ON() produces a stack trace after the dumped leaf and error message which can hide that more important information in case we get a truncated dmesg/syslog. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

If we get a failure during relocation, before we update all the extent buffers that have file extent items pointing to extents from the block group being relocated, we can trigger a user-after-free on the reloc control structure (fs_info->reloc_control) if we have a concurrent task that is COWing a subvolume leaf. This happens like this: 1) Relocation of data block group X starts; 2) Relocation changes its state to UPDATE_DATA_PTRS; 3) A task doing a rename for example, COWs leaf A from a subvolume tree and ends up at btrfs_reloc_cow_block() and extracts fs_info->reloc_ctl into a local variable, which then passes to replace_file_extents(); 4) The relocation task gets an error and under the label 'out_put_bg' in btrfs_relocate_block_group() calls free_reloc_control(), which frees the reloc control structure that the rename task is using; 5) The rename task triggers a use-after-free on the reloc control structure that was just freed. Syzbot reported this recently, with the following stack trace: [ 88.389822][ T5325] BTRFS error (device loop0 state A): Transaction aborted (error -5) [ 88.389842][ T5325] BTRFS: error (device loop0 state A) in cleanup_transaction:2067: errno=-5 IO failure [ 88.389864][ T5325] BTRFS info (device loop0 state EA): forced readonly [ 88.392277][ T5324] BTRFS: error (device loop0 state EA) in btrfs_sync_log:3572: errno=-5 IO failure [ 88.396630][ T5325] BTRFS info (device loop0 state EA): balance: ended with status: -5 [ 88.400135][ T5346] ================================================================== [ 88.400148][ T5346] BUG: KASAN: slab-use-after-free in replace_file_extents+0x85f/0x1590 [ 88.400288][ T5346] Read of size 8 at addr ffff888012312010 by task syz.0.0/5346 [ 88.400299][ T5346] [ 88.400306][ T5346] CPU: 0 UID: 0 PID: 5346 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) [ 88.400319][ T5346] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 88.400325][ T5346] Call Trace: [ 88.400331][ T5346] <TASK> [ 88.400336][ T5346] dump_stack_lvl+0xe8/0x150 [ 88.400351][ T5346] print_address_description+0x55/0x1e0 [ 88.400364][ T5346] ? replace_file_extents+0x85f/0x1590 [ 88.400378][ T5346] print_report+0x58/0x70 [ 88.400389][ T5346] kasan_report+0x117/0x150 [ 88.400405][ T5346] ? replace_file_extents+0x85f/0x1590 [ 88.400420][ T5346] replace_file_extents+0x85f/0x1590 [ 88.400440][ T5346] ? __pfx_replace_file_extents+0x10/0x10 [ 88.400452][ T5346] ? update_ref_for_cow+0xa71/0x1270 [ 88.400473][ T5346] btrfs_force_cow_block+0xa4d/0x2450 [ 88.400492][ T5346] ? __pfx_btrfs_force_cow_block+0x10/0x10 [ 88.400508][ T5346] ? __pfx_btrfs_get_32+0x10/0x10 [ 88.400523][ T5346] btrfs_cow_block+0x3c4/0xa90 [ 88.400542][ T5346] push_leaf_left+0x2ac/0x4a0 [ 88.400561][ T5346] split_leaf+0xd16/0x12e0 [ 88.400574][ T5346] ? btrfs_bin_search+0x924/0xc70 [ 88.400592][ T5346] ? __pfx_split_leaf+0x10/0x10 [ 88.400602][ T5346] ? leaf_space_used+0x177/0x1e0 [ 88.400618][ T5346] ? btrfs_leaf_free_space+0x14a/0x2f0 [ 88.400634][ T5346] btrfs_search_slot+0x2641/0x2d20 [ 88.400654][ T5346] ? __pfx_btrfs_search_slot+0x10/0x10 [ 88.400669][ T5346] ? rcu_is_watching+0x15/0xb0 [ 88.400681][ T5346] ? trace_kmem_cache_alloc+0x29/0xe0 [ 88.400694][ T5346] btrfs_insert_empty_items+0x9c/0x190 [ 88.400711][ T5346] btrfs_insert_inode_ref+0x229/0xcb0 [ 88.400724][ T5346] ? __pfx_btrfs_insert_inode_ref+0x10/0x10 [ 88.400736][ T5346] ? __pfx_btrfs_qgroup_convert_reserved_meta+0x10/0x10 [ 88.400751][ T5346] ? btrfs_record_root_in_trans+0x124/0x180 [ 88.400767][ T5346] ? start_transaction+0x8a0/0x1820 [ 88.400778][ T5346] ? btrfs_set_inode_index+0x5e/0x100 [ 88.400787][ T5346] btrfs_rename2+0x17bb/0x40d0 [ 88.400800][ T5346] ? check_noncircular+0xda/0x150 [ 88.400814][ T5346] ? add_lock_to_list+0xc7/0x100 [ 88.400828][ T5346] ? __pfx_btrfs_rename2+0x10/0x10 [ 88.400842][ T5346] ? lockdep_hardirqs_on+0x7a/0x110 [ 88.400901][ T5346] ? lock_acquire+0x221/0x350 [ 88.400915][ T5346] ? down_write_nested+0x174/0x210 [ 88.400931][ T5346] ? __pfx_down_write_nested+0x10/0x10 [ 88.400941][ T5346] ? do_raw_spin_unlock+0x4d/0x210 [ 88.400952][ T5346] ? try_break_deleg+0x5b/0x180 [ 88.400963][ T5346] ? __pfx_btrfs_rename2+0x10/0x10 [ 88.400973][ T5346] vfs_rename+0xa96/0xeb0 [ 88.400992][ T5346] ? __pfx_vfs_rename+0x10/0x10 [ 88.401010][ T5346] ovl_fill_super+0x46b7/0x5e20 [ 88.401030][ T5346] ? __pfx_ovl_fill_super+0x10/0x10 [ 88.401042][ T5346] ? xas_create+0x1902/0x1b90 [ 88.401060][ T5346] ? __pfx___mutex_trylock_common+0x10/0x10 [ 88.401076][ T5346] ? trace_contention_end+0x3d/0x140 [ 88.401094][ T5346] ? shrinker_register+0x124/0x230 [ 88.401111][ T5346] ? __mutex_unlock_slowpath+0x1be/0x6f0 [ 88.401127][ T5346] ? shrinker_register+0x61/0x230 [ 88.401143][ T5346] ? __pfx___mutex_lock+0x10/0x10 [ 88.401158][ T5346] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [ 88.401177][ T5346] ? __raw_spin_lock_init+0x45/0x100 [ 88.401196][ T5346] ? sget_fc+0x962/0xa40 [ 88.401208][ T5346] ? __pfx_set_anon_super_fc+0x10/0x10 [ 88.401222][ T5346] ? __pfx_ovl_fill_super+0x10/0x10 [ 88.401241][ T5346] get_tree_nodev+0xbb/0x150 [ 88.401257][ T5346] vfs_get_tree+0x92/0x2a0 [ 88.401272][ T5346] do_new_mount+0x341/0xd30 [ 88.401283][ T5346] ? apparmor_capable+0x126/0x170 [ 88.401301][ T5346] ? __pfx_do_new_mount+0x10/0x10 [ 88.401311][ T5346] ? ns_capable+0x89/0xe0 [ 88.401322][ T5346] ? path_mount+0x690/0x10e0 [ 88.401333][ T5346] ? user_path_at+0xd4/0x160 [ 88.401346][ T5346] __se_sys_mount+0x31d/0x420 [ 88.401358][ T5346] ? __pfx___se_sys_mount+0x10/0x10 [ 88.401370][ T5346] ? __x64_sys_mount+0x20/0xc0 [ 88.401381][ T5346] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 88.401391][ T5346] do_syscall_64+0x15f/0xf80 [ 88.401403][ T5346] ? trace_irq_disable+0x3b/0x140 [ 88.401413][ T5346] ? clear_bhb_loop+0x40/0x90 [ 88.401421][ T5346] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 88.401429][ T5346] RIP: 0033:0x7fa1ff79ce59 [ 88.401436][ T5346] Code: ff c3 66 (...) [ 88.401443][ T5346] RSP: 002b:00007fa2005affe8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 88.401456][ T5346] RAX: ffffffffffffffda RBX: 00007fa1ffa16180 RCX: 00007fa1ff79ce59 [ 88.401464][ T5346] RDX: 0000200000000100 RSI: 0000200000002240 RDI: 0000000000000000 [ 88.401474][ T5346] RBP: 00007fa1ff832d6f R08: 0000200000000440 R09: 0000000000000000 [ 88.401481][ T5346] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 88.401488][ T5346] R13: 00007fa1ffa16218 R14: 00007fa1ffa16180 R15: 00007ffc734fba78 [ 88.401500][ T5346] </TASK> [ 88.401506][ T5346] [ 88.401510][ T5346] Allocated by task 5325: [ 88.401516][ T5346] kasan_save_track+0x3e/0x80 [ 88.401529][ T5346] __kasan_kmalloc+0x93/0xb0 [ 88.401542][ T5346] __kmalloc_cache_noprof+0x31c/0x660 [ 88.401554][ T5346] btrfs_relocate_block_group+0x217/0xc40 [ 88.401568][ T5346] btrfs_relocate_chunk+0x115/0x820 [ 88.401577][ T5346] __btrfs_balance+0x1db0/0x2ae0 [ 88.401587][ T5346] btrfs_balance+0xaf3/0x11b0 [ 88.401596][ T5346] btrfs_ioctl_balance+0x3d3/0x610 [ 88.401612][ T5346] __se_sys_ioctl+0xfc/0x170 [ 88.401626][ T5346] do_syscall_64+0x15f/0xf80 [ 88.401640][ T5346] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 88.401650][ T5346] [ 88.401653][ T5346] Freed by task 5325: [ 88.401659][ T5346] kasan_save_track+0x3e/0x80 [ 88.401671][ T5346] kasan_save_free_info+0x46/0x50 [ 88.401680][ T5346] __kasan_slab_free+0x5c/0x80 [ 88.401692][ T5346] kfree+0x1c5/0x640 [ 88.401703][ T5346] btrfs_relocate_block_group+0x95d/0xc40 [ 88.401715][ T5346] btrfs_relocate_chunk+0x115/0x820 [ 88.401724][ T5346] __btrfs_balance+0x1db0/0x2ae0 [ 88.401733][ T5346] btrfs_balance+0xaf3/0x11b0 [ 88.401742][ T5346] btrfs_ioctl_balance+0x3d3/0x610 [ 88.401757][ T5346] __se_sys_ioctl+0xfc/0x170 [ 88.401770][ T5346] do_syscall_64+0x15f/0xf80 [ 88.401785][ T5346] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 88.401795][ T5346] [ 88.401798][ T5346] The buggy address belongs to the object at ffff888012312000 [ 88.401798][ T5346] which belongs to the cache kmalloc-2k of size 2048 [ 88.401807][ T5346] The buggy address is located 16 bytes inside of [ 88.401807][ T5346] freed 2048-byte region [ffff888012312000, ffff888012312800) [ 88.401819][ T5346] [ 88.401822][ T5346] The buggy address belongs to the physical page: [ 88.401829][ T5346] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12310 [ 88.401840][ T5346] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 88.401849][ T5346] flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff) [ 88.401860][ T5346] page_type: f5(slab) [ 88.401871][ T5346] raw: 00fff00000000040 ffff88801ac42000 dead000000000100 dead000000000122 [ 88.401881][ T5346] raw: 0000000000000000 0000000800080008 00000000f5000000 0000000000000000 [ 88.401892][ T5346] head: 00fff00000000040 ffff88801ac42000 dead000000000100 dead000000000122 [ 88.401902][ T5346] head: 0000000000000000 0000000800080008 00000000f5000000 0000000000000000 [ 88.401913][ T5346] head: 00fff00000000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff [ 88.401923][ T5346] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 [ 88.401929][ T5346] page dumped because: kasan: bad access detected [ 88.401935][ T5346] page_owner tracks the page as allocated [ 88.401941][ T5346] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 9, tgid 9 (kworker/0:0), ts 83905464494, free_ts 83674944822 [ 88.401961][ T5346] post_alloc_hook+0x231/0x280 [ 88.401975][ T5346] get_page_from_freelist+0x24ba/0x2540 [ 88.401990][ T5346] __alloc_frozen_pages_noprof+0x18d/0x380 [ 88.402004][ T5346] allocate_slab+0x77/0x660 [ 88.402019][ T5346] refill_objects+0x339/0x3d0 [ 88.402033][ T5346] __pcs_replace_empty_main+0x321/0x720 [ 88.402043][ T5346] __kmalloc_node_track_caller_noprof+0x572/0x7b0 [ 88.402055][ T5346] __alloc_skb+0x2c1/0x7d0 [ 88.402067][ T5346] mld_newpack+0x14c/0xc90 [ 88.402080][ T5346] add_grhead+0x5a/0x2a0 [ 88.402093][ T5346] add_grec+0x1452/0x1740 [ 88.402105][ T5346] mld_ifc_work+0x6e6/0xe70 [ 88.402116][ T5346] process_scheduled_works+0xb5d/0x1860 [ 88.402127][ T5346] worker_thread+0xa53/0xfc0 [ 88.402138][ T5346] kthread+0x389/0x470 [ 88.402150][ T5346] ret_from_fork+0x514/0xb70 [ 88.402161][ T5346] page last free pid 5282 tgid 5282 stack trace: [ 88.402168][ T5346] __free_frozen_pages+0xbc7/0xd30 [ 88.402180][ T5346] __slab_free+0x274/0x2c0 [ 88.402191][ T5346] qlist_free_all+0x99/0x100 [ 88.402201][ T5346] kasan_quarantine_reduce+0x148/0x160 [ 88.402211][ T5346] __kasan_slab_alloc+0x22/0x80 [ 88.402221][ T5346] __kmalloc_cache_noprof+0x2ba/0x660 [ 88.402231][ T5346] kernfs_fop_open+0x3f0/0xda0 [ 88.402253][ T5346] do_dentry_open+0x785/0x14e0 [ 88.402262][ T5346] vfs_open+0x3b/0x340 [ 88.402270][ T5346] path_openat+0x2e08/0x3860 [ 88.402281][ T5346] do_file_open+0x23e/0x4a0 [ 88.402292][ T5346] do_sys_openat2+0x113/0x200 [ 88.402300][ T5346] __x64_sys_openat+0x138/0x170 [ 88.402309][ T5346] do_syscall_64+0x15f/0xf80 [ 88.402326][ T5346] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 88.402336][ T5346] [ 88.402339][ T5346] Memory state around the buggy address: [ 88.402345][ T5346] ffff888012311f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 88.402352][ T5346] ffff888012311f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 88.402359][ T5346] >ffff888012312000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 88.402365][ T5346] ^ [ 88.402370][ T5346] ffff888012312080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 88.402380][ T5346] ffff888012312100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 88.402385][ T5346] ================================================================== Fix this by: 1) Making the reloc control structure ref counted; 2) Make revery place that access fs_info->reloc_ctl outside the relocation code, which at the moment it's only replace_file_extents() and btrfs_init_reloc_root(), get a reference count on the structure. There's also btrfs_update_reloc_root() that is called outside the relocation code, but this case is safe because it's only called in the transaction commit path while under the fs_info->reloc_mutex protection, but nevertheless grab a reference to make the code more consistent and avoid false alerts from AI reviews; 3) Add a spinlock to protect fs_info->reloc_ctl, since we can not take the fs_info->reloc_mutex as that would cause a deadlock since that lock is taken in the transaction commit path. That spinlock is taken before setting fs_info->reloc_ctl to an allocated structure, setting it to NULL and reading fs_info->reloc_ctl; 4) Make sure the structure is freed only when its reference count drops to zero. Reported-by: syzbot+0eea49bba18051dea35e@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/6a1df323.bb0696ed.125a22.000a.GAE@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

@written

That member is to record how many bytes are submitted for a direct read/write, utilized by iomap_end() callback to handle short IO cases. However iomap_end() callback is already providing an internally tracked @written member, which is doing the same accounting and providing the same value as btrfs_dio_data::submitted. There is no need to duplicate the work, just remove btrfs_dio_data::submitted. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

@pos

That function has the following problems: - Read/write handling scattered across different locations E.g. At the beginning there is a dedicated hole read handling, but later short read handling is at an if() branch. - Modifying of @pos and @Length parameter for short read Although it's completely fine to modify those parameters as they are passed by value, but it can still be confusing to read. As normally we would assume @pos and @Length to be the original range. But for short IO handling we modify @pos/@Length, and completely ignore @written. - Unnecessary split for ordered extent and changeset handling Both OE and changeset are only for writes, but they are handled in two different if (write) {} blocks. Refactor the function so that: - Handling of reads and writes are concentrated in their code block Now the handling of reads are in its own small if () branch. Leaving the more complex writes handling to take the remaining function, and reduce the indent level. This also removes all unnecessary "if (write)" checks. - Do not modify @pos and @Length Let short IO handling to manually calculate the remaining range. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

The qgroup ioctls update the quota tree, but they currently start their transactions using the root of the inode passed to the ioctl. This makes the transaction reservation depend on the path used for the ioctl instead of the tree being modified. Start qgroup ioctl transactions on the quota root instead. Take a reference to fs_info->quota_root under qgroup_ioctl_lock before starting the transaction, because quota disable can clear and put fs_info->quota_root after the early quota-enabled check. Keep the reference until the transaction handle is ended. Suggested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dongjiang Zhu <zhudongjiang@fnnas.com> Reviewed-by: Qu Wenruo <wqu@suse.com>

We set the xattr and then attempt to apply the property. If the apply fails we then attempt to delete the xattr to avoid an inconsistency. However we don't verify if the deletion succeed, so if it fails we leave an inconsistency between the state in the btree and the in-memory inode. So address this by validating first if we can apply the property, then set the xattr, then apply the property, and this last step should not fail since the validation succeeded before - assert that it does not fail but leave code to attempt to delete the xattr if it happens, and then abort the transaction only if the xattr delete failed. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

…tr_set() We are using 2 units for properties but we only set one property. Fix this by using the correct amount: 1 unit. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

There's no need to abort the transaction if we failed to set or delete a property, as we haven't done any change. However we need to abort if we set a property or delete a property and then fail to update the inode item, as that would leave the inode's state in subvolume tree inconsistent. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

[TEST FAILURE] The test case generic/628 will fail if MOUNT_OPTIONS is set to "-o nodatasum": FSTYP -- btrfs PLATFORM -- Linux/x86_64 btrfs-vm 7.1.0-rc4-custom+ #383 SMP PREEMPT_DYNAMIC Sat May 30 07:35:42 ACST 2026 MKFS_OPTIONS -- -O bgt -K /dev/mapper/test-scratch1 MOUNT_OPTIONS -- -o nodatasum /dev/mapper/test-scratch1 /mnt/scratch generic/628 1s ... - output mismatch (see /home/adam/xfstests/results//generic/628.out.bad) --- tests/generic/628.out 2022-05-11 11:25:30.816666664 +0930 +++ /home/adam/xfstests/results//generic/628.out.bad 2026-06-08 18:56:49.878542927 +0930 @@ -8,8 +8,9 @@ 310f146ce52077fcd3308dcbe7632bb2 SCRATCH_MNT/a 310f146ce52077fcd3308dcbe7632bb2 SCRATCH_MNT/d test reflink flag not set iflag +XFS_IOC_CLONE: Invalid argument 310f146ce52077fcd3308dcbe7632bb2 SCRATCH_MNT/a -310f146ce52077fcd3308dcbe7632bb2 SCRATCH_MNT/b +d41d8cd98f00b204e9800998ecf8427e SCRATCH_MNT/b ... [CAUSE] The direct cause is that after "chattr +S", the btrfs inode will lose its NODATASUM flag inherited from the mount option. E.g: # mkfs.btrfs -f $dev # mount $dev $mnt -o nodatasum # touch $mnt/foobar # sync # btrfs ins dump-tree -t 5 $dev | grep "(257 INODE_ITEM 0) itemoff" -A 3 item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 generation 9 transid 9 size 0 nbytes 0 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 sequence 1 flags 0x1(NODATASUM) ^^^^^^^^^ Proper NODATASUM flag # chattr +S $mnt/foobar # sync # btrfs ins dump-tree -t 5 $dev | grep "(257 INODE_ITEM 0) itemoff" -A 3 item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 generation 9 transid 10 size 0 nbytes 0 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 sequence 2 flags 0x20(SYNC) ^^^^ Only the new SYNC flag This makes the inode to drop the old NODATASUM flag, meanwhile the new reflink destination will still inherit the NODATASUM flag. The mismatching NODATASUM flags will cause the reflink to fail. The root cause is that, inside btrfs_fileattr_set() if no FS_NOCOW_FL is set, we remove both NODATASUM and NODATACOW flag. However we should not touch NODATASUM flag, as data COW doesn't require checksum. Only NODATACOW implies NODATASUM, but DATACOW doesn't imply DATASUM. The deeper problems are: - Fileattr API is too binary It either clears or sets a flag, there is no "do not change" option. So that why "chattr +S" implies "chattr -C", and is forcing us to change NODATACOW along with NODATASUM flag. - No way to change NODATASUM through fileattr API In fact NODATASUM can only be modified through mount option. The deeper problems are much harder to attack. [FIX] Remove NODATACOW flag when FS_NOCOW_FL is not set, but only remove NODATASUM if "nodatasum" mount option is not set. This allows the existing "chattr +C" then "chattr -C" to remove both NODATACOW and NODATASUM flags on a default mount. But for a mount with "nodatasum" option, the NODATASUM inode flag will persist through either "chattr +C" and "chattr -C". Fixes: 7e97b8d ("btrfs: allow setting NOCOW for a zero sized file via ioctl") Cc: stable@vger.kernel.org Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

When loading a v1 free space cache, __load_free_space_cache() takes num_entries and num_bitmaps straight from the on-disk btrfs_free_space_header. That header is stored in the tree_root under a key with type 0, which the tree-checker has no case for, so neither count is validated before the load trusts it. The load loops num_entries times and maps the next page whenever the current one runs out, going through io_ctl_check_crc() -> io_ctl_map_page(), which does io_ctl->pages[io_ctl->index++]. But pages[] is allocated in io_ctl_init() from the cache inode's i_size, not from num_entries: num_pages = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); io_ctl->pages = kcalloc(num_pages, sizeof(struct page *), GFP_NOFS); So if num_entries claims more records than the pages can hold, io_ctl->index runs off the end of pages[]. The write side never hits this because io_ctl_add_entry() and io_ctl_add_bitmap() both stop once io_ctl->index >= io_ctl->num_pages; the read side just never had the same check. To trigger it, take a clean cache (num_entries = <N> here), set num_entries in the header to 0x10000, and fix up the leaf checksum so it still passes the tree-checker. The cache inode has i_size = 65536, so num_pages is 16 and pages[] is a 16-pointer (kmalloc-128) array. The load now tries to read 65536 entries, io_ctl->index walks up to 16, and pages[16] is read past the array: BUG: KASAN: slab-out-of-bounds in io_ctl_check_crc (fs/btrfs/free-space-cache.c:420 fs/btrfs/free-space-cache.c:565) Read of size 8 at addr ffff88800c833a80 by task kworker/u8:3/58 io_ctl_check_crc (fs/btrfs/free-space-cache.c:420 fs/btrfs/free-space-cache.c:565) __load_free_space_cache (fs/btrfs/free-space-cache.c:655 fs/btrfs/free-space-cache.c:820) load_free_space_cache (fs/btrfs/free-space-cache.c:1017) caching_thread (fs/btrfs/block-group.c:880) btrfs_work_helper (fs/btrfs/async-thread.c:312) process_one_work worker_thread kthread ret_from_fork free-space-cache.c:420 is io_ctl_map_page(), inlined into io_ctl_check_crc() at line 565, which is why that is the frame KASAN names. The out-of-bounds slot is then treated as a struct page and handed to crc32c(), so the bad read turns into a GP fault. Add the missing check to io_ctl_check_crc(), which is where both the entry loop and the bitmap loop end up. When num_entries is too large the load now fails like any corrupt cache: __load_free_space_cache() drops it and rebuilds the free space from the extent tree, so a valid cache is never rejected. Fixes: 5b0e95b ("Btrfs: inline checksums into the disk free space cache") Link: https://lore.kernel.org/linux-btrfs/CAPpSM+RMPByMCKXvM5QFKToxsyNccfuFLWMdD0mfd0wh2Ja62w@mail.gmail.com/ Reported-by: Weiming Shi <bestswngs@gmail.com> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Qu Wenruo <wqu@suse.com>

…ubvol() If during relocation we fail in insert_dirty_subvol() because btrfs_update_reloc_root() returned an error, we will leave a root's reloc_root field pointing to a reloc root that was freed instead of NULL, resulting later in a use-after-free, or double free attempt during unmount. The sequence of steps is this: 1) During relocation the call to btrfs_update_reloc_root() in insert_dirty_subvol() fails, so insert_dirty_subvol() returns the error to merge_reloc_root() without adding the root to the list rc->dirty_subvol_roots; 2) Then merge_reloc_root() aborts the current transaction because insert_dirty_subvol() returned an error; 3) Up the call chain, merge_reloc_roots() gets the error, adds the reloc root for root X to the local reloc_roots list and jumps to the 'out' label, where it calls free_reloc_roots() to free all the reloc roots in the local reloc_roots list. This frees the reloc root for root X; 4) We go up the call chain to relocate_block_group() which calls clean_dirty_subvols() to go over dirty roots and set their ->reloc_root field to NULL, but root X is not in the dirty_subvol_roots list, so its ->reloc_root still points to a reloc root; 5) Relocation finishes, with an error and a transaction abort, but the ->reloc_root field for root X still points to the reloc root that was freed in step 3; 6) When unmounting the fs we end up calling: btrfs_free_fs_roots() btrfs_drop_and_free_fs_root() --> calls btrfs_put_root() against root X's ->reloc_root which is not NULL and points to the already freed reloc root in step 4 above Resulting in a use-after-free to a double free attempt. Syzbot reported this with the following dmesg/syslog: [ 106.004389][ T5339] BTRFS error (device loop0 state A): Transaction aborted (error -5) [ 106.014266][ T5339] BTRFS: error (device loop0 state A) in merge_reloc_root:1655: errno=-5 IO failure [ 106.021891][ T1061] BTRFS error (device loop0 state A): error while writing out transaction: -5 [ 106.026964][ T1061] BTRFS warning (device loop0 state A): Skipping commit of aborted transaction. [ 106.033807][ T5340] BTRFS error (device loop0 state A): bdev /dev/loop0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 [ 106.039265][ T1061] BTRFS: error (device loop0 state A) in cleanup_transaction:2067: errno=-5 IO failure [ 106.044382][ T5339] BTRFS info (device loop0 state EA): forced readonly [ 106.074329][ T5339] BTRFS: error (device loop0 state EA) in merge_reloc_roots:1887: errno=-5 IO failure [ 106.081004][ T5356] BTRFS info (device loop0 state EA): scrub: started on devid 1 [ 106.085611][ T5339] BTRFS info (device loop0 state EA): balance: ended with status: -30 [ 106.089517][ T5356] BTRFS info (device loop0 state EA): scrub: not finished on devid 1 with status: -30 [ 106.662365][ T5338] BTRFS info (device loop0 state EA): last unmount of filesystem 3a375e4e-b156-4d76-a2ad-16e198ce1409 [ 106.682946][ T5338] ================================================================== [ 106.686574][ T5338] BUG: KASAN: slab-use-after-free in btrfs_put_root+0x2f/0x250 [ 106.690090][ T5338] Write of size 4 at addr ffff88803f978630 by task syz.0.0/5338 [ 106.693173][ T5338] [ 106.694279][ T5338] CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) [ 106.694293][ T5338] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 106.694300][ T5338] Call Trace: [ 106.694308][ T5338] <TASK> [ 106.694314][ T5338] dump_stack_lvl+0xe8/0x150 [ 106.694331][ T5338] print_address_description+0x55/0x1e0 [ 106.694343][ T5338] ? btrfs_put_root+0x2f/0x250 [ 106.694358][ T5338] print_report+0x58/0x70 [ 106.694368][ T5338] kasan_report+0x117/0x150 [ 106.694384][ T5338] ? btrfs_put_root+0x2f/0x250 [ 106.694399][ T5338] kasan_check_range+0x264/0x2c0 [ 106.694416][ T5338] btrfs_put_root+0x2f/0x250 [ 106.694430][ T5338] btrfs_drop_and_free_fs_root+0x160/0x210 [ 106.694447][ T5338] btrfs_free_fs_roots+0x2f9/0x3c0 [ 106.694464][ T5338] ? __pfx_btrfs_free_fs_roots+0x10/0x10 [ 106.694479][ T5338] ? free_root_pointers+0x5bf/0x5f0 [ 106.694494][ T5338] close_ctree+0x798/0x12d0 [ 106.694511][ T5338] ? __pfx_close_ctree+0x10/0x10 [ 106.694526][ T5338] ? _raw_spin_unlock_irqrestore+0x74/0x80 [ 106.694599][ T5338] ? rcu_preempt_deferred_qs_irqrestore+0x906/0xbc0 [ 106.694620][ T5338] ? __rcu_read_unlock+0x83/0xe0 [ 106.694636][ T5338] ? btrfs_put_super+0x48/0x1c0 [ 106.694652][ T5338] ? __pfx_btrfs_put_super+0x10/0x10 [ 106.694667][ T5338] generic_shutdown_super+0x13d/0x2d0 [ 106.694682][ T5338] kill_anon_super+0x3b/0x70 [ 106.694695][ T5338] btrfs_kill_super+0x41/0x50 [ 106.694710][ T5338] deactivate_locked_super+0xbc/0x130 [ 106.694722][ T5338] cleanup_mnt+0x437/0x4d0 [ 106.694736][ T5338] ? _raw_spin_unlock_irq+0x23/0x50 [ 106.694752][ T5338] task_work_run+0x1d9/0x270 [ 106.694769][ T5338] ? __pfx_task_work_run+0x10/0x10 [ 106.694784][ T5338] ? do_raw_spin_unlock+0x4d/0x210 [ 106.694802][ T5338] do_exit+0x70f/0x22c0 [ 106.694817][ T5338] ? trace_irq_disable+0x3b/0x140 [ 106.694835][ T5338] ? __pfx_do_exit+0x10/0x10 [ 106.694848][ T5338] ? preempt_schedule_thunk+0x16/0x30 [ 106.694863][ T5338] ? preempt_schedule_common+0x82/0xd0 [ 106.694878][ T5338] ? preempt_schedule_thunk+0x16/0x30 [ 106.694892][ T5338] do_group_exit+0x21b/0x2d0 [ 106.694906][ T5338] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 106.694918][ T5338] __x64_sys_exit_group+0x3f/0x40 [ 106.694932][ T5338] x64_sys_call+0x221a/0x2240 [ 106.694944][ T5338] do_syscall_64+0x174/0x580 [ 106.694954][ T5338] ? clear_bhb_loop+0x40/0x90 [ 106.694967][ T5338] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 106.694978][ T5338] RIP: 0033:0x7f958ef9ce59 [ 106.694988][ T5338] Code: Unable to access opcode bytes at 0x7f958ef9ce2f. [ 106.694994][ T5338] RSP: 002b:00007fffd4058318 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 106.695008][ T5338] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f958ef9ce59 [ 106.695015][ T5338] RDX: 00007f958c3f8000 RSI: 0000000000000000 RDI: 0000000000000000 [ 106.695022][ T5338] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007f958f1e73e0 [ 106.695028][ T5338] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 106.695034][ T5338] R13: 00007f958f1e73e0 R14: 0000000000000003 R15: 00007fffd40583d0 [ 106.695046][ T5338] </TASK> [ 106.695050][ T5338] [ 106.821635][ T5338] Allocated by task 1061: [ 106.823446][ T5338] kasan_save_track+0x3e/0x80 [ 106.825498][ T5338] __kasan_kmalloc+0x93/0xb0 [ 106.827381][ T5338] __kmalloc_cache_noprof+0x31c/0x660 [ 106.829525][ T5338] btrfs_alloc_root+0x75/0x930 [ 106.831458][ T5338] read_tree_root_path+0x127/0xb00 [ 106.833556][ T5338] btrfs_read_tree_root+0x34/0x60 [ 106.835553][ T5338] create_reloc_root+0x6b3/0xcb0 [ 106.837556][ T5338] btrfs_init_reloc_root+0x2ec/0x4b0 [ 106.839557][ T5338] record_root_in_trans+0x2ab/0x350 [ 106.841685][ T5338] btrfs_record_root_in_trans+0x15c/0x180 [ 106.844237][ T5338] start_transaction+0x39c/0x1820 [ 106.846638][ T5338] btrfs_finish_one_ordered+0x88e/0x2680 [ 106.849436][ T5338] btrfs_work_helper+0x37b/0xc20 [ 106.851549][ T5338] process_scheduled_works+0xb5d/0x1860 [ 106.853807][ T5338] worker_thread+0xa53/0xfc0 [ 106.855773][ T5338] kthread+0x389/0x470 [ 106.857548][ T5338] ret_from_fork+0x514/0xb70 [ 106.859493][ T5338] ret_from_fork_asm+0x1a/0x30 [ 106.861504][ T5338] [ 106.862527][ T5338] Freed by task 5339: [ 106.864224][ T5338] kasan_save_track+0x3e/0x80 [ 106.866180][ T5338] kasan_save_free_info+0x46/0x50 [ 106.868371][ T5338] __kasan_slab_free+0x5c/0x80 [ 106.870462][ T5338] kfree+0x1c5/0x640 [ 106.872180][ T5338] __del_reloc_root+0x341/0x3b0 [ 106.874290][ T5338] free_reloc_roots+0x5f/0x90 [ 106.876282][ T5338] merge_reloc_roots+0x73f/0x8a0 [ 106.878489][ T5338] relocate_block_group+0xbcc/0xe70 [ 106.880742][ T5338] do_nonremap_reloc+0xa8/0x5b0 [ 106.882885][ T5338] btrfs_relocate_block_group+0x7e6/0xc40 [ 106.885336][ T5338] btrfs_relocate_chunk+0x115/0x820 [ 106.887502][ T5338] __btrfs_balance+0x1db0/0x2ae0 [ 106.889543][ T5338] btrfs_balance+0xaf3/0x11b0 [ 106.891456][ T5338] btrfs_ioctl_balance+0x3d3/0x610 [ 106.893672][ T5338] __se_sys_ioctl+0xfc/0x170 [ 106.895530][ T5338] do_syscall_64+0x174/0x580 [ 106.897518][ T5338] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 106.900101][ T5338] [ 106.901123][ T5338] The buggy address belongs to the object at ffff88803f978000 [ 106.901123][ T5338] which belongs to the cache kmalloc-4k of size 4096 [ 106.906907][ T5338] The buggy address is located 1584 bytes inside of [ 106.906907][ T5338] freed 4096-byte region [ffff88803f978000, ffff88803f979000) [ 106.912980][ T5338] [ 106.914022][ T5338] The buggy address belongs to the physical page: [ 106.916716][ T5338] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3f978 [ 106.920390][ T5338] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 106.923834][ T5338] flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff) [ 106.927104][ T5338] page_type: f5(slab) [ 106.928898][ T5338] raw: 04fff00000000040 ffff88801ac42140 dead000000000122 0000000000000000 [ 106.932507][ T5338] raw: 0000000000000000 0000000800040004 00000000f5000000 0000000000000000 [ 106.936193][ T5338] head: 04fff00000000040 ffff88801ac42140 dead000000000122 0000000000000000 [ 106.939856][ T5338] head: 0000000000000000 0000000800040004 00000000f5000000 0000000000000000 [ 106.943601][ T5338] head: 04fff00000000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff [ 106.947268][ T5338] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 [ 106.950988][ T5338] page dumped because: kasan: bad access detected [ 106.953710][ T5338] page_owner tracks the page as allocated [ 106.956198][ T5338] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2820(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 24, tgid 24 (kworker/u4:2), ts 105728970387, free_ts 29540875453 [ 106.964984][ T5338] post_alloc_hook+0x22d/0x280 [ 106.966956][ T5338] get_page_from_freelist+0x2593/0x2610 [ 106.969307][ T5338] __alloc_frozen_pages_noprof+0x18d/0x380 [ 106.971839][ T5338] allocate_slab+0x77/0x660 [ 106.973709][ T5338] refill_objects+0x339/0x3d0 [ 106.975696][ T5338] __pcs_replace_empty_main+0x321/0x720 [ 106.978136][ T5338] __kmalloc_node_track_caller_noprof+0x572/0x7b0 [ 106.981009][ T5338] __alloc_skb+0x2c1/0x7d0 [ 106.982983][ T5338] nsim_dev_trap_report_work+0x29a/0xb90 [ 106.985356][ T5338] process_scheduled_works+0xb5d/0x1860 [ 106.987710][ T5338] worker_thread+0xa53/0xfc0 [ 106.989847][ T5338] kthread+0x389/0x470 [ 106.991727][ T5338] ret_from_fork+0x514/0xb70 [ 106.993722][ T5338] ret_from_fork_asm+0x1a/0x30 [ 106.995900][ T5338] page last free pid 77 tgid 77 stack trace: [ 106.998479][ T5338] __free_frozen_pages+0xc1c/0xd30 [ 107.000819][ T5338] vfree+0x1d1/0x2f0 [ 107.002631][ T5338] delayed_vfree_work+0x55/0x80 [ 107.004848][ T5338] process_scheduled_works+0xb5d/0x1860 [ 107.007366][ T5338] worker_thread+0xa53/0xfc0 [ 107.009388][ T5338] kthread+0x389/0x470 [ 107.011177][ T5338] ret_from_fork+0x514/0xb70 [ 107.013313][ T5338] ret_from_fork_asm+0x1a/0x30 [ 107.015454][ T5338] [ 107.016460][ T5338] Memory state around the buggy address: [ 107.019052][ T5338] ffff88803f978500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 107.022691][ T5338] ffff88803f978580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 107.026264][ T5338] >ffff88803f978600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 107.029721][ T5338] ^ [ 107.032062][ T5338] ffff88803f978680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 107.035547][ T5338] ffff88803f978700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 107.038865][ T5338] ================================================================== Fix this by resetting a root's ->reloc_root if we get an error while trying to merge a reloc root. Reported-by: syzbot+b3d472d13f9d7bf20669@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/6a1ebde9.c1435f33.112120.0176.GAE@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

…oots() If we have an unexpected reloc_root for our root, we jump to the out label but never drop the reference we obtained for root, resulting in a leak. Add a missing btrfs_put_root() call. Fixes: 24213fa ("btrfs: do proper error handling in merge_reloc_roots") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

We have been running with commit root csums enabled for some time and have noticed a slight uptick in zero csum errors. Investigating those revealed that they were same transaction reads of extents that were just relocated, but the extent map generation was long ago. It turns out that relocation intentionally does not update the extent generation (replace_file_extents()), but must write a new csum since the data has moved, so we must account for this with commit root csum reading. Luckily this is a short lived condition: after the relocation transaction the commit root will once again have the csum. So we can add a generic fallback to the lookup to try again with the transaction csum root. Fixes: f07b855 ("btrfs: try to search for data csums in commit root") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io>

If the root we got has zero root refs in its root item, we are resetting the root's ->reloc_root without using barriers like we do everywhere else. Sashiko complained about this while reviewing another patch, and it's correct (see the Link tag below). Also, we should not clear BTRFS_ROOT_DEAD_RELOC_TREE from the root unless the root points to the reloc root we have. Fix this by using clear_reloc_root(), which issues the memory barrier after setting the root's ->reloc_root to NULL and before clearing the bit BTRFS_ROOT_DEAD_RELOC_TREE from the root. Link: https://sashiko.dev/#/patchset/cf84f1a217c719e25b6b69e4298dd7afd36c9427.1781194426.git.fdmanana%40suse.com Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com>

When we set a root's reloc_root to NULL, we do it like this: static void clear_reloc_root(struct btrfs_root *root) { root->reloc_root = NULL; /* * Need barrier to ensure clear_bit() only happens after * root->reloc_root = NULL. Pairs with have_reloc_root(). */ smp_wmb(); clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); } So that a NULL reloc_root is always seen before seeing that the bit BTRFS_ROOT_DEAD_RELOC_TREE was cleared. But on the read side we have: static bool reloc_root_is_dead(const struct btrfs_root *root) { smp_rmb(); if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) return true; return false; } And then callers of reloc_root_is_dead() access root->reloc_root. Because the read memory barrier is placed before testing the bit, the CPU is completely free to speculatively reorder those two loads. It can read root->reloc_root before it actually checks the dead tree bit. Sashiko reported this as an existing problem in another patch review, see the link in the Link tag below. Fix this by moving the read memory barrier to happen after testing the bit and update the comment to reflect current reality. Link: https://sashiko.dev/#/patchset/cf84f1a217c719e25b6b69e4298dd7afd36c9427.1781194426.git.fdmanana%40suse.com Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com>

…nded The loop intends to copy the data in chunks up to 1M but we allocate the pages array for the entire length and don't cap it to 1M. Fix this by computing 'nr_pages' using 'copy_len' instead of 'length'. While at it, also make 'nr_pages' and 'copy_len' const, as they never change, to make the code more clear. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

In Meta production, we have observed a large number of hosts running kernels newer than 6.13 which hit hung tasks on btrfs_read_folio()->lock_extents_for_read(). Looking through the history in this codepath reveals an interesting history. in 6.12, we merged commit ac325fc ("btrfs: do not hold the extent lock for entire read") which holds the extent lock very narrowly while looking up the extent_map. However, this proved to introduce a serious race with DIO writes which was fixed in 6.14 with commit acc18e1 ("btrfs: fix stale page cache after race between readahead and direct IO write") That latter fix subtly changed the extent unlock point from the pre-6.12 regime. In 6.11, each read endio unlocked the extent it finished reading, but in 6.14, the extent is locked/unlocked as a unit around the entire readahead loop, while the individual folios are still unlocked as the endios finish. This is mostly the same behavior, as all successful reads will populate the page cache, so subsequent reads won't enter btrfs and hit the extent lock. But in the case where the readahead fails, perhaps because of a memory allocation failure doing compressed reads, the page will not be brought up to date and a later read of an overlapping range *will* block on the extent lock. Why is this a problem? On sufficiently large loaded systems, I have observed that direct reclaim can run for minutes. Given that, consider two tasks on such a system reading an overlapping range of a compressed file: Task 1 locks the whole range and starts to read. Some allocation for the compressed read for folio F fails and we carry on while holding the extent lock for the full range. Task 2 wants to read F, which is not uptodate and in page cache, so it blocks on the extent lock held by Task 1. Task 1 keeps getting stuck in direct reclaim (likely, we already supposed an allocation failure above) Task 2 stays blocked on the extent lock the whole time. If you consider the effects of readahead_expand and imagine a file with a 128k compressed extent followed by many smaller compressed extents, you can imagine that the expanded window will result in subsequent reads hitting many extents (128k/4k = 32) per lock window in the worst case. The system likeley wouldn't be all that healthy anyway, so this is likely not a critical improvement, but it does alleviate this one source of stress and one thread's slowdown escalating to others. To bring this behavior back to the old model, we should unlock the extent at each loop of the readahead loop rather than in one shot at the end. This allows such overlapping reads to proceed as they should. Writes are fine because either the page has already been read and has an appropriate state in the page cache to be invalidated (or not uptodate) or it is still-to-be-read and the extent lock is still held protecting it. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io>

…ck groups A swap file on btrfs will pin down block groups that cover the swap file extent. Pinned down block groups will be skipped for scrub and relocation. These degradation on critical btrfs maintenance operations is never properly educated to end users, and have already caused problems including: - Scrub finished too quick Because the enabled swap file has pinned down most of the block groups. Thus any file extents in those block groups, even not utilized by the swap file, will be skipped from scrub. - Unbalanced data and metadata usage, meanwhile relocation won't help The same reason, pinned down block groups will not be considered as relocation target, thus data extents that are not utilized by the swap file can still be skipped from relocation. Although we already have kernel messages for both scrub and balance, the balance one is still info level. To better communicate those potential long term problems, add the following output into dmesg: - Change the message level to warn for __btrfs_balance() - Total pinned down block group number and size during swapfile activation - Total released block group number and size during swapfile deactivation The above messages have info level. - The fact that pinned down block groups will not be scrubbed nor balanced The above message has warning level. The example output would look like the following, for enabling a 1.2G swapfile, which pinned down 2G block groups: BTRFS info (device dm-3): swapfile activated on root 5 ino 257, pinned down 2147483648 bytes from 2 block group(s) BTRFS warning (device dm-3): block groups with swapfile extents will not be scrubbed or balanced Adding 1257468k swap on /mnt/btrfs/foobar. Priority:-1 extents:1 across:1257468k BTRFS info (device dm-3): swapfile deactivated on root 5 ino 257, released 2147483648 bytes from 2 block group(s) Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

The variable-sized buffer buf in struct btrfs_ioctl_search_args_v2 is declared as __u64[], but it holds a packed byte stream of search results, where all offsets into the buffer are in bytes. Declaring buf as __u64[] makes it easy for user space to write incorrect pointer arithmetic: adding a byte offset directly to a __u64 pointer scales the offset by 8, landing at byte position offset*8 instead of offset. This recently caused an infinite loop in btrfs-progs: the accessor read all-zero data from misaddressed items, which fed zeroed search keys back into the ioctl loop and spun forever. The issue was worked around at the time by disabling TREE_SEARCH_V2 entirely in btrfs-progs (d73e69824854: "btrfs-progs: temporarily disable usage of v2 of search tree ioctl"). The kernel side already treats buf as a byte buffer, so change the declaration to __u8[] to match the actual semantics and prevent similar misuse in user space. The change is ABI compatible: both the structure size and alignment are unchanged. Suggested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: You-Kai Zheng <ykzheng@synology.com> Fixes: cc68a8a ("btrfs: new ioctl TREE_SEARCH_V2") Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com>

Inside btrfs we always pair -EUCLEAN error with an error message to indicate which data is corrupted. However there are 3 cases inside lzo decompression where there is no error message for corrupted headers. Add those missing error messages to show exactly where the corruption is. Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

@name

[BUG] A crafted btrfs image can trigger the following crash: BUG: unable to handle page fault for address: ffffd1dc42884000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page CPU: 9 UID: 0 PID: 1034 Comm: poc Not tainted 7.1.0-rc4-custom+ #383 PREEMPT(full) 46af0a92938a63be7132e0dfd71e62327c51d5c2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:memcpy+0xc/0x10 Call Trace: <TASK> read_extent_buffer+0xe4/0x100 [btrfs 3cf0785dd58fec8c5ff84633b772f17ce1f92a8f] btrfs_get_name+0x15e/0x1e0 [btrfs 3cf0785dd58fec8c5ff84633b772f17ce1f92a8f] reconnect_path+0x165/0x390 exportfs_decode_fh_raw+0x337/0x400 ? drop_caches_sysctl_handler+0xb0/0xb0 </TASK> ---[ end trace 0000000000000000 ]--- RIP: 0010:memcpy+0xc/0x10 Kernel panic - not syncing: Fatal exception [CAUSE] The crafted image has the following corrupted INODE_REF item: item 9 key (258 INODE_REF 257) itemoff 11544 itemsize 4106 index 2 namelen 4096 name: d\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 The itemsize matches the namelen, but the namelen is 4096, way larger than normal name length limit (BTRFS_NAME_LEN, 255). Meanwhile the memory of the @name is only 255 byte sized, this will cause out-of-boundary access, and cause the above crash. [FIX] Add extra namelen verification for INODE_REF, just like what we have done in ROOT_REF checks. Now the crafted image can be rejected gracefully: BTRFS critical (device dm-2): corrupt leaf: root=5 block=30572544 slot=14 ino=259, invalid inode ref name length, has 4096 expect [1, 255] BTRFS error (device dm-2): read time tree block corruption detected on logical 30572544 mirror 2 Link: https://lore.kernel.org/linux-btrfs/aik0hEV6ehKx6Ldv@Air.local/ Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> [ Rebase, add a Link: tag, add an simple cause analyze ] Acked-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

V2 space cache is already the default mkfs option since btrfs-progs v5.15, and commit 1e7bec1 ("btrfs: emit a warning about space cache v1 being deprecated") has already added a warning to show v1 space cache is already deprecated. It has been long enough that we should remove v1 space cache completely. As the first step, disable v1 space cache by: - Make "space_cache" mount option fallback to "nospace_cache" - Make "space_cache=v1" to fallback to "nospace_cache" Which is the safer than forcing "space_cache=v2", as forcing v2 cache will require removal of v1 cache and regenerate v2 cache. Such operation can be slow, and will take extra metadata space, thus it is not always safe for existing filesystems. With this done, v1 cache mount will always fallback to nospace cache, and mount option will not be able to force v1 space cache usage. For example, even for a fs with v1 cache: # btrfs ins dump-super test.img superblock: bytenr=65536, device=test.img --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0xdce44b2c [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 7d7c3bba-8211-4206-868d-10eedd5703f8 metadata_uuid 00000000-0000-0000-0000-000000000000 label generation 9 root 30605312 [...] compat_ro_flags 0x0 << No FST feature incompat_flags 0x361 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | NO_HOLES ) cache_generation 9 <<< Matches generation uuid_tree_generation 9 Mounting it will lead to no space cache other than v1 space cache: # mount test.img /mnt/btrfs # dmesg -t | tail -n 5 BTRFS: device fsid 7d7c3bba-8211-4206-868d-10eedd5703f8 devid 1 transid 9 /dev/loop0 (7:0) scanned by mount (1264) BTRFS info (device loop0): first mount of filesystem 7d7c3bba-8211-4206-868d-10eedd5703f8 BTRFS info (device loop0): using crc32c checksum algorithm BTRFS info (device loop0): turning on async discard BTRFS info (device loop0): last unmount of filesystem 7d7c3bba-8211-4206-868d-10eedd5703f8 Even forcing v1 cache will not work, but fallback to the usual nospace_cache: # mount test.img -o space_cache=v1 /mnt/btrfs # dmesg -t | tail -n 6 BTRFS warning: v1 space cache is deprecated, fallback to no space cache BTRFS: device fsid 7d7c3bba-8211-4206-868d-10eedd5703f8 devid 1 transid 9 /dev/loop0 (7:0) scanned by mount (1264) BTRFS info (device loop0): first mount of filesystem 7d7c3bba-8211-4206-868d-10eedd5703f8 BTRFS info (device loop0): using crc32c checksum algorithm BTRFS info (device loop0): turning on async discard BTRFS info (device loop0): last unmount of filesystem 7d7c3bba-8211-4206-868d-10eedd5703f8 And there will be no way to force converting a v2 cache back to v1, such attempt will only clear free space tree and fallback to no space cache. # mkfs.btrfs -f -O fst,^bgt test.img # mount -o clear_cache,space_cache=v1 test.img /mnt/btrfs # dmesg -t | tail -n 11 BTRFS warning: v1 space cache is deprecated, fallback to no space cache BTRFS: device fsid f59daad2-3ab5-4f33-b752-a36cfb09b674 devid 1 transid 8 /dev/loop0 (7:0) scanned by mount (1419) BTRFS info (device loop0): first mount of filesystem f59daad2-3ab5-4f33-b752-a36cfb09b674 BTRFS info (device loop0): using crc32c checksum algorithm BTRFS info (device loop0): rebuilding free space tree BTRFS info (device loop0): disabling free space tree BTRFS info (device loop0): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1) BTRFS info (device loop0): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2) BTRFS info (device loop0): checking UUID tree BTRFS info (device loop0): turning on async discard BTRFS info (device loop0): force clearing of disk cache # mount | grep /mnt/btrfs /home/adam/test.img on /mnt/btrfs type btrfs (rw,relatime,discard=async,nospace_cache,subvolid=5,subvol=/) Signed-off-by: Qu Wenruo <wqu@suse.com>

Since commit bac3c29 ("btrfs: remove 2K block size support") there is no 2K block size support inside btrfs anymore. Remove the stale comments of btrfs_supported_blocksize(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

Since v5.15 btrfs has support for block size < page size, but we still only support 4K block size, meanwhile there is no special reason that we can not support 8K/16K/32K block sizes for 64K page size. That 4K limit is completely artificial, and mostly to reduce test runtime so we do not need to test all the extra block size combinations. However that also limits the user choices, some users may understand what they are doing, and want larger block sizes. In that case, fixed 4K block size for subpage routine is blocking our way. Just remove that fixed 4K requirement for block size < page size. This should not affect regular end users, since mkfs is already using 4K block size as default for quite a while, and the existing bs == ps support is always there. But for power users, this allows extra block size support, and may provide extra test coverage. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

kdave changed the title ~~Test for-next (regular, GH kvm) 2~~ Test for-next (regular, GH kvm) Mar 5, 2026

kdave closed this Mar 5, 2026

kdave reopened this Mar 5, 2026

kdave force-pushed the ci-kvm branch 2 times, most recently from 69fc6c9 to 98bf7e7 Compare March 5, 2026 23:30

adam900710 force-pushed the for-next branch from 3810750 to 5e325f7 Compare March 11, 2026 21:11

kdave force-pushed the for-next branch 3 times, most recently from 934d926 to 0cb5a8a Compare March 13, 2026 11:46

kdave force-pushed the ci-kvm branch 3 times, most recently from 2cd3911 to c09d7cb Compare March 13, 2026 19:11

kdave force-pushed the for-next branch from f82792c to 8c22e39 Compare March 13, 2026 19:54

kdave force-pushed the ci-kvm branch from c09d7cb to 789ae6c Compare March 13, 2026 20:06

kdave force-pushed the for-next branch 8 times, most recently from c61e262 to daed989 Compare March 17, 2026 16:00

adam900710 force-pushed the for-next branch from 02c8fe4 to 2f7bc14 Compare March 17, 2026 21:35

kdave force-pushed the for-next branch from 2f7bc14 to d390292 Compare March 18, 2026 10:15

kdave force-pushed the ci-kvm branch from 789ae6c to 1dd22c8 Compare March 18, 2026 16:11

kdave force-pushed the for-next branch 4 times, most recently from 9dc51d6 to d76ed94 Compare March 19, 2026 13:21

fdmanana and others added 30 commits June 9, 2026 18:22

btrfs: don't over reserve metadata space for property in btrfs_fileat…

c461f4a

…tr_set() We are using 2 units for properties but we only set one property. Fix this by using the correct amount: 1 unit. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test for-next (regular, SELF kvm)#1624

Test for-next (regular, SELF kvm)#1624
kdave wants to merge 10000 commits into
ci-kvmfrom
for-next

kdave commented Mar 5, 2026 •

edited

Loading

Uh oh!

kdave commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

Conversation

kdave commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kdave commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

kdave commented Mar 5, 2026 •

edited

Loading