Message ID | 2a19a500ccb297397018dac23d30106977153d62.1707714970.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: reject zoned RW mount if sectorsize is smaller than page size | expand |
On 12.02.24 06:16, Qu Wenruo wrote: > Reported-by: HAN Yuwei <hrx@bupt.moe> > Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ > CC: stable@vger.kernel.org # 5.10+ > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c3ab268533ca..85cd23aebdd6 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) > * part of @locked_page. > * That's also why compression for subpage only work for page aligned ranges. > */ > - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { > + if (fs_info->sectorsize < PAGE_SIZE && > + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { > btrfs_warn(fs_info, > "no zoned read-write support for page size %lu with sectorsize %u", > PAGE_SIZE, fs_info->sectorsize); Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED). Otherwise, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On 2024/2/12 20:15, Johannes Thumshirn wrote: > On 12.02.24 06:16, Qu Wenruo wrote: >> Reported-by: HAN Yuwei <hrx@bupt.moe> >> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ >> CC: stable@vger.kernel.org # 5.10+ >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index c3ab268533ca..85cd23aebdd6 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) >> * part of @locked_page. >> * That's also why compression for subpage only work for page aligned ranges. >> */ >> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { >> + if (fs_info->sectorsize < PAGE_SIZE && >> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { >> btrfs_warn(fs_info, >> "no zoned read-write support for page size %lu with sectorsize %u", >> PAGE_SIZE, fs_info->sectorsize); > > Please keep btrfs_is_zoned(fs_info) instead of using > btrfs_fs_incompat(fs_info, ZONED). At the time of calling, we haven't yet populate fs_info->zone_size, thus we have to use super flags to verify if it's zoned. If needed, I can add a comment for it. Thanks, Qu > > Otherwise, > Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On 12.02.24 10:50, Qu Wenruo wrote: > > > On 2024/2/12 20:15, Johannes Thumshirn wrote: >> On 12.02.24 06:16, Qu Wenruo wrote: >>> Reported-by: HAN Yuwei <hrx@bupt.moe> >>> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ >>> CC: stable@vger.kernel.org # 5.10+ >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> --- >>> fs/btrfs/disk-io.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index c3ab268533ca..85cd23aebdd6 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) >>> * part of @locked_page. >>> * That's also why compression for subpage only work for page aligned ranges. >>> */ >>> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { >>> + if (fs_info->sectorsize < PAGE_SIZE && >>> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { >>> btrfs_warn(fs_info, >>> "no zoned read-write support for page size %lu with sectorsize %u", >>> PAGE_SIZE, fs_info->sectorsize); >> >> Please keep btrfs_is_zoned(fs_info) instead of using >> btrfs_fs_incompat(fs_info, ZONED). > > At the time of calling, we haven't yet populate fs_info->zone_size, thus > we have to use super flags to verify if it's zoned. > > If needed, I can add a comment for it. All good, I didn't have enough coffee.
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote: > [BUG] > There is a bug report that with zoned device and sectorsize is smaller > than page size (aka, subpage), btrfs would crash with a very basic > workload: > > # getconfig PAGESIZE > 16384 > # mkfs.btrfs -f $dev -s 4k > # mount $dev $mnt > # $fsstress -w -n 8 -s 1707820327 -v -d $mnt > # umount $mnt > > The crash would look like this (with CONFIG_BTRFS_ASSERT enabled): > > assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 This is the same as what Josef fixed in https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9cc "btrfs: don't drop extent_map for free space inode on write error" but not on zoned+subpage, is it really a different error you're fixing? [...] > [WORKAROUND] > A proper fix requires some big changes to delalloc workload, to allow > extent_write_locked_range() to handle multiple different entries with > the same @locked_page. > > So for now, disable read-write support for subpage zoned btrfs. > > The problem can only be solved if subpage btrfs can handle subpage > compression, which need quite some work on the delalloc procedure for > the @locked_page handling. Ok, this on itself is a valid reason to disable the support.
On 2024/2/12 21:44, David Sterba wrote: > On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote: >> [BUG] >> There is a bug report that with zoned device and sectorsize is smaller >> than page size (aka, subpage), btrfs would crash with a very basic >> workload: >> >> # getconfig PAGESIZE >> 16384 >> # mkfs.btrfs -f $dev -s 4k >> # mount $dev $mnt >> # $fsstress -w -n 8 -s 1707820327 -v -d $mnt >> # umount $mnt >> >> The crash would look like this (with CONFIG_BTRFS_ASSERT enabled): >> >> assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 > > This is the same as what Josef fixed in > https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9cc > "btrfs: don't drop extent_map for free space inode on write error" > but not on zoned+subpage, is it really a different error you're fixing? Yes, it's really a different bug. Although the root cause is a little more complex, it has to be explained through the whole CAUSE section. > > [...] >> [WORKAROUND] >> A proper fix requires some big changes to delalloc workload, to allow >> extent_write_locked_range() to handle multiple different entries with >> the same @locked_page. >> >> So for now, disable read-write support for subpage zoned btrfs. >> >> The problem can only be solved if subpage btrfs can handle subpage >> compression, which need quite some work on the delalloc procedure for >> the @locked_page handling. > > Ok, this on itself is a valid reason to disable the support. But this reminds me that, I have to work on the proper delalloc support for subpage. My current plan is: 1. Introduce subpage LOCK bitmap Not just only an atomic counter. 2. Make find_lock_delalloc_range() to populate that lock bitmap 3. Make find_lock_delalloc_range() to find all delalloc range of a page and properly lock them This would not change the behavior for page sized sectorsize. 4. Make find_lock_delalloc_range() to skip the page if it's already being locked before. This is also a subpage specific behavior With above planned work, btrfs should be able to handle subpage unlock, so that we won't need weird @locked_page handling. Hopefully this would solve both the subpage+zoned and full sector aligned subpage compression support. Thanks, Qu
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote: > [BUG] > There is a bug report that with zoned device and sectorsize is smaller > than page size (aka, subpage), btrfs would crash with a very basic > workload: > > # getconfig PAGESIZE > 16384 > # mkfs.btrfs -f $dev -s 4k > # mount $dev $mnt > # $fsstress -w -n 8 -s 1707820327 -v -d $mnt > # umount $mnt > > The crash would look like this (with CONFIG_BTRFS_ASSERT enabled): > > assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 > ------------[ cut here ]------------ > kernel BUG at fs/btrfs/extent_io.c:1384! > CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7 > Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023 > Workqueue: writeback wb_workfn (flush-btrfs-8) > pc : __extent_writepage_io+0x404/0x460 [btrfs] > lr : __extent_writepage_io+0x404/0x460 [btrfs] > Call trace: > __extent_writepage_io+0x404/0x460 [btrfs] > extent_write_locked_range+0x16c/0x460 [btrfs] > run_delalloc_cow+0x88/0x118 [btrfs] > btrfs_run_delalloc_range+0x128/0x228 [btrfs] > writepage_delalloc+0xb8/0x178 [btrfs] > __extent_writepage+0xc8/0x3a0 [btrfs] > extent_write_cache_pages+0x1cc/0x460 [btrfs] > extent_writepages+0x8c/0x120 [btrfs] > btrfs_writepages+0x18/0x30 [btrfs] > do_writepages+0x94/0x1f8 > __writeback_single_inode+0x4c/0x388 > writeback_sb_inodes+0x208/0x4b0 > wb_writeback+0x118/0x3c0 > wb_do_writeback+0xbc/0x388 > wb_workfn+0x80/0x240 > process_one_work+0x154/0x3c8 > worker_thread+0x2bc/0x3e0 > kthread+0xf4/0x108 > ret_from_fork+0x10/0x20 > Code: 9102c021 90000be0 91378000 9402bf53 (d4210000) > ---[ end trace 0000000000000000 ]--- > > [CAUSE] > There are several factors causing the problem: > > 1. __extent_writepage_io() requires all dirty ranges to have delalloc > executed > This can be solved by adding @start and @len parameter to only submit > IO for a subset of the page, and update several involved helpers to > do subpage checks. > > So this is not a big deal. > > 2. Subpage only accepts for full page aligned ranges for > extent_write_locked_range() > For zoned device, regular COW is switched to utilize > extent_write_locked_range() to submit the IO. > > But the caller, run_delalloc_cow() can be called to run on a subpage > range, e.g. > > 0 4K 8K 12K 16K > |/////| |/////| > > Where |///| is the dirtied range. > > In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(), > which would call extent_write_locked_range() for [0, 4K), and unlock > the whole [0, 16K) page. > > But btrfs_run_delalloc_range() would again be called for range [8K, > 12K), as there are still dirty range left. > In that case, since the whole page is already unlocked by previous > iteration, and would cause different ASSERT()s inside > extent_write_locked_range(). > > That's also why compression for subpage cases require fully page > aligned range. > > [WORKAROUND] > A proper fix requires some big changes to delalloc workload, to allow > extent_write_locked_range() to handle multiple different entries with > the same @locked_page. > > So for now, disable read-write support for subpage zoned btrfs. > > The problem can only be solved if subpage btrfs can handle subpage > compression, which need quite some work on the delalloc procedure for > the @locked_page handling. > > Reported-by: HAN Yuwei <hrx@bupt.moe> > Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ > CC: stable@vger.kernel.org # 5.10+ > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c3ab268533ca..85cd23aebdd6 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) > * part of @locked_page. > * That's also why compression for subpage only work for page aligned ranges. > */ > - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { > + if (fs_info->sectorsize < PAGE_SIZE && > + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { > btrfs_warn(fs_info, > "no zoned read-write support for page size %lu with sectorsize %u", > PAGE_SIZE, fs_info->sectorsize); This does not match any code I see in usual branches and it also does not apply so it can't be picked as a fix.
On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote: > > > On 2024/2/12 20:15, Johannes Thumshirn wrote: > > On 12.02.24 06:16, Qu Wenruo wrote: > >> Reported-by: HAN Yuwei <hrx@bupt.moe> > >> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ > >> CC: stable@vger.kernel.org # 5.10+ > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/disk-io.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index c3ab268533ca..85cd23aebdd6 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) > >> * part of @locked_page. > >> * That's also why compression for subpage only work for page aligned ranges. > >> */ > >> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { > >> + if (fs_info->sectorsize < PAGE_SIZE && > >> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { > >> btrfs_warn(fs_info, > >> "no zoned read-write support for page size %lu with sectorsize %u", > >> PAGE_SIZE, fs_info->sectorsize); > > > > Please keep btrfs_is_zoned(fs_info) instead of using > > btrfs_fs_incompat(fs_info, ZONED). > > At the time of calling, we haven't yet populate fs_info->zone_size, thus > we have to use super flags to verify if it's zoned. > > If needed, I can add a comment for it. Yes please add a comment the difference is quite subtle.
在 2024/2/14 17:59, David Sterba 写道: > On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote: >> >> >> On 2024/2/12 20:15, Johannes Thumshirn wrote: >>> On 12.02.24 06:16, Qu Wenruo wrote: >>>> Reported-by: HAN Yuwei <hrx@bupt.moe> >>>> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ >>>> CC: stable@vger.kernel.org # 5.10+ >>>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>>> --- >>>> fs/btrfs/disk-io.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index c3ab268533ca..85cd23aebdd6 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) >>>> * part of @locked_page. >>>> * That's also why compression for subpage only work for page aligned ranges. >>>> */ >>>> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { >>>> + if (fs_info->sectorsize < PAGE_SIZE && >>>> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { >>>> btrfs_warn(fs_info, >>>> "no zoned read-write support for page size %lu with sectorsize %u", >>>> PAGE_SIZE, fs_info->sectorsize); >>> >>> Please keep btrfs_is_zoned(fs_info) instead of using >>> btrfs_fs_incompat(fs_info, ZONED). >> >> At the time of calling, we haven't yet populate fs_info->zone_size, thus >> we have to use super flags to verify if it's zoned. >> >> If needed, I can add a comment for it. > > Yes please add a comment the difference is quite subtle. > I'd say this patch can be dropped. The reason is: - I'm already working on the proper subpage handling for the @locked_page of a delalloc range The patchset is under testing now, the results looks fine for both regular and subpage cases. Will queue extra testing for subpage zoned. - The rejection would cause future problems for detecting whether we have proper subpage + zoned support. Either we do not detect, or introduce a complex mechanism only for this one edge case. Thus I prefer not to detect. - Subage + zoned is too niche for now The most common subpage usage would be aarch64 (especially for Apple M1/2 chips). For those Apple based ones, they have no ability to expand, thus won't hit any real zoned devices. For other aarch64 servers, they should have the ability to choose a 4K page size kernel if they really want to go zoned devices. This bug is only exposed with a helpful reporter using 16K page sized loongson board with SATA zoned disk. Due to above reasons, I really prefer to keep the current situation, and focus on the proper fix instead. Thanks, Qu
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */ - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { + if (fs_info->sectorsize < PAGE_SIZE && + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);
[BUG] There is a bug report that with zoned device and sectorsize is smaller than page size (aka, subpage), btrfs would crash with a very basic workload: # getconfig PAGESIZE 16384 # mkfs.btrfs -f $dev -s 4k # mount $dev $mnt # $fsstress -w -n 8 -s 1707820327 -v -d $mnt # umount $mnt The crash would look like this (with CONFIG_BTRFS_ASSERT enabled): assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1384! CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023 Workqueue: writeback wb_workfn (flush-btrfs-8) pc : __extent_writepage_io+0x404/0x460 [btrfs] lr : __extent_writepage_io+0x404/0x460 [btrfs] Call trace: __extent_writepage_io+0x404/0x460 [btrfs] extent_write_locked_range+0x16c/0x460 [btrfs] run_delalloc_cow+0x88/0x118 [btrfs] btrfs_run_delalloc_range+0x128/0x228 [btrfs] writepage_delalloc+0xb8/0x178 [btrfs] __extent_writepage+0xc8/0x3a0 [btrfs] extent_write_cache_pages+0x1cc/0x460 [btrfs] extent_writepages+0x8c/0x120 [btrfs] btrfs_writepages+0x18/0x30 [btrfs] do_writepages+0x94/0x1f8 __writeback_single_inode+0x4c/0x388 writeback_sb_inodes+0x208/0x4b0 wb_writeback+0x118/0x3c0 wb_do_writeback+0xbc/0x388 wb_workfn+0x80/0x240 process_one_work+0x154/0x3c8 worker_thread+0x2bc/0x3e0 kthread+0xf4/0x108 ret_from_fork+0x10/0x20 Code: 9102c021 90000be0 91378000 9402bf53 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] There are several factors causing the problem: 1. __extent_writepage_io() requires all dirty ranges to have delalloc executed This can be solved by adding @start and @len parameter to only submit IO for a subset of the page, and update several involved helpers to do subpage checks. So this is not a big deal. 2. Subpage only accepts for full page aligned ranges for extent_write_locked_range() For zoned device, regular COW is switched to utilize extent_write_locked_range() to submit the IO. But the caller, run_delalloc_cow() can be called to run on a subpage range, e.g. 0 4K 8K 12K 16K |/////| |/////| Where |///| is the dirtied range. In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(), which would call extent_write_locked_range() for [0, 4K), and unlock the whole [0, 16K) page. But btrfs_run_delalloc_range() would again be called for range [8K, 12K), as there are still dirty range left. In that case, since the whole page is already unlocked by previous iteration, and would cause different ASSERT()s inside extent_write_locked_range(). That's also why compression for subpage cases require fully page aligned range. [WORKAROUND] A proper fix requires some big changes to delalloc workload, to allow extent_write_locked_range() to handle multiple different entries with the same @locked_page. So for now, disable read-write support for subpage zoned btrfs. The problem can only be solved if subpage btrfs can handle subpage compression, which need quite some work on the delalloc procedure for the @locked_page handling. Reported-by: HAN Yuwei <hrx@bupt.moe> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/ CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)