From patchwork Mon Aug 8 02:01:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12938430 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F205CC3F6B0 for ; Mon, 8 Aug 2022 02:24:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242630AbiHHCYp (ORCPT ); Sun, 7 Aug 2022 22:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231560AbiHHCYc (ORCPT ); Sun, 7 Aug 2022 22:24:32 -0400 Received: from esa1.hgst.iphmx.com (esa1.hgst.iphmx.com [68.232.141.245]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65F0011800; Sun, 7 Aug 2022 19:02:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1659924148; x=1691460148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Li3AcQqvWPFS5yhQcZu3nuMYh+jjP936Zdb1cpbCddU=; b=IokzWH3+nCdjA0WwIhz1ikuRT+jnHkirtO84GQum/Ttuh4FWLlQdr4Yu wn+0tWQ19wIl5nUO2Bw9tHi6ZIMVXxpq0mxhW8jmueTbKLA9oADGL6Xzt B8IacpWpt2Dh7bZlL4pdS8SKXGudBC+LB2UXuRO19UG74P/YxWryS3Dym vIalGNsoaa9KisF2jLUpCiBsTc543p2eBBMO2sYNlvxSdn8ncWR4VTurj 7DstRUtDqReKBAmH+NrBIMRUhuYQL5qyyAe8QFtnWZcRt0JUwesAz9YHB MbdJUydK9pMSoU8vzaMNr3JPpxUVw8hL/DlU+dGLH49DXc7xR51DZ/XgW A==; X-IronPort-AV: E=Sophos;i="5.93,221,1654531200"; d="scan'208";a="320182411" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2022 10:02:27 +0800 IronPort-SDR: mU+cBe+JEz/ZFc98d+yBDYcjEca48WfElaCwJ4WsFym9JnLjlG8KnM2nSq8UjEKSY98IjOCidw 7dzDFt0/caFnt8eAkFx9kF6Ya/gUIP/Ur+A3/7BiPZ8T46uMKDxqcx5dm4Tj6q8XF5OR92Iqit QAzreilp0WA7YtxPVnvc9Em5B6s0OQCYtI7JuJvlEk9sNX3n7A87QmMBsl9taA+uvXJYNp+LhU V64bg8gdCfykkAuf0goAd2odpx8poPUfWRJaPTQ+07srHslMAg0u8sVUDUI/dru1khOor+f1BR Gwq76V390PkAWtvG2XM8yn7X Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 07 Aug 2022 18:18:09 -0700 IronPort-SDR: 7Nx3DXhpblBv36KEZE941Kda2FXhUePe2Sw2j1hEQtMZ0NO1LC/bGMIiRp/FmawefSMOmUTEkf RYSTuSjelJMj5vzT/ip3wMi2UGAs5p+R1Od0EZD20U2eUue33muzTy4wUTSXjNz9QS80PM0Ivb OXvK4B8nd618dmKWsOyVEP4pIgWC4HuOuQlZuWVQaiugygnPJ6H7BTsWDbgWVK1xOkhveH6RC+ FOlh2LbDAAnGbj+QDQWatuI79E/7bwnCVYMo+/60Zy7m486yd6zhXkhMWb3YMZGHLCbiefZWFA 3Ts= WDCIronportException: Internal Received: from ctl002.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.129]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Aug 2022 19:02:27 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: Naohiro Aota , David Sterba Subject: [PATCH STABLE 5.18 v2 1/3] btrfs: zoned: prevent allocation from previous data relocation BG Date: Mon, 8 Aug 2022 11:01:59 +0900 Message-Id: <20220808020201.712924-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220808020201.712924-1-naohiro.aota@wdc.com> References: <20220808020201.712924-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 343d8a30851c48a4ef0f5ef61d5e9fbd847a6883 upstream After commit 5f0addf7b890 ("btrfs: zoned: use dedicated lock for data relocation"), we observe IO errors on e.g, btrfs/232 like below. [09.0][T4038707] WARNING: CPU: 3 PID: 4038707 at fs/btrfs/extent-tree.c:2381 btrfs_cross_ref_exist+0xfc/0x120 [btrfs] [09.9][T4038707] Call Trace: [09.5][T4038707] [09.3][T4038707] run_delalloc_nocow+0x7f1/0x11a0 [btrfs] [09.6][T4038707] ? test_range_bit+0x174/0x320 [btrfs] [09.2][T4038707] ? fallback_to_cow+0x980/0x980 [btrfs] [09.3][T4038707] ? find_lock_delalloc_range+0x33e/0x3e0 [btrfs] [09.5][T4038707] btrfs_run_delalloc_range+0x445/0x1320 [btrfs] [09.2][T4038707] ? test_range_bit+0x320/0x320 [btrfs] [09.4][T4038707] ? lock_downgrade+0x6a0/0x6a0 [09.2][T4038707] ? orc_find.part.0+0x1ed/0x300 [09.5][T4038707] ? __module_address.part.0+0x25/0x300 [09.0][T4038707] writepage_delalloc+0x159/0x310 [btrfs] [09.4][ C3] sd 10:0:1:0: [sde] tag#2620 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 Sense Key : Illegal Request [current] [09.9][ C3] sd 10:0:1:0: [sde] tag#2620 Add. Sense: Unaligned write command [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 CDB: Write(16) 8a 00 00 00 00 00 02 f3 63 87 00 00 00 2c 00 00 [09.4][ C3] critical target error, dev sde, sector 396041272 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0 [09.9][ C3] BTRFS error (device dm-1): bdev /dev/mapper/dml_102_2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 The IO errors occur when we allocate a regular extent in previous data relocation block group. On zoned btrfs, we use a dedicated block group to relocate a data extent. Thus, we allocate relocating data extents (pre-alloc) only from the dedicated block group and vice versa. Once the free space in the dedicated block group gets tight, a relocating extent may not fit into the block group. In that case, we need to switch the dedicated block group to the next one. Then, the previous one is now freed up for allocating a regular extent. The BG is already not enough to allocate the relocating extent, but there is still room to allocate a smaller extent. Now the problem happens. By allocating a regular extent while nocow IOs for the relocation is still on-going, we will issue WRITE IOs (for relocation) and ZONE APPEND IOs (for the regular writes) at the same time. That mixed IOs confuses the write pointer and arises the unaligned write errors. This commit introduces a new bit 'zoned_data_reloc_ongoing' to the btrfs_block_group. We set this bit before releasing the dedicated block group, and no extent are allocated from a block group having this bit set. This bit is similar to setting block_group->ro, but is different from it by allowing nocow writes to start. Once all the nocow IO for relocation is done (hooked from btrfs_finish_ordered_io), we reset the bit to release the block group for further allocation. Fixes: c2707a255623 ("btrfs: zoned: add a dedicated data relocation block group") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Naohiro Aota Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 20 ++++++++++++++++++-- fs/btrfs/inode.c | 2 ++ fs/btrfs/zoned.c | 27 +++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 5 files changed, 53 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 19db5693175f..2a0ead57db71 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -104,6 +104,7 @@ struct btrfs_block_group { unsigned int relocating_repair:1; unsigned int chunk_item_inserted:1; unsigned int zone_is_active:1; + unsigned int zoned_data_reloc_ongoing:1; int disk_cache_state; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6aa92f84f465..f45ecd939a2c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3836,7 +3836,7 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, block_group->start == fs_info->data_reloc_bg || fs_info->data_reloc_bg == 0); - if (block_group->ro) { + if (block_group->ro || block_group->zoned_data_reloc_ongoing) { ret = 1; goto out; } @@ -3898,8 +3898,24 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, out: if (ret && ffe_ctl->for_treelog) fs_info->treelog_bg = 0; - if (ret && ffe_ctl->for_data_reloc) + if (ret && ffe_ctl->for_data_reloc && + fs_info->data_reloc_bg == block_group->start) { + /* + * Do not allow further allocations from this block group. + * Compared to increasing the ->ro, setting the + * ->zoned_data_reloc_ongoing flag still allows nocow + * writers to come in. See btrfs_inc_nocow_writers(). + * + * We need to disable an allocation to avoid an allocation of + * regular (non-relocation data) extent. With mix of relocation + * extents and regular extents, we can dispatch WRITE commands + * (for relocation extents) and ZONE APPEND commands (for + * regular extents) at the same time to the same zone, which + * easily break the write pointer. + */ + block_group->zoned_data_reloc_ongoing = 1; fs_info->data_reloc_bg = 0; + } spin_unlock(&fs_info->relocation_bg_lock); spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9ae79342631a..5d15e374d032 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3102,6 +3102,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->file_offset, ordered_extent->file_offset + logical_len); + btrfs_zoned_release_data_reloc_bg(fs_info, ordered_extent->disk_bytenr, + ordered_extent->disk_num_bytes); } else { BUG_ON(root == fs_info->tree_root); ret = insert_ordered_extent_file_extent(trans, ordered_extent); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 5091d679a602..2c0851d94eff 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2116,3 +2116,30 @@ void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) } mutex_unlock(&fs_devices->device_list_mutex); } + +void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, + u64 length) +{ + struct btrfs_block_group *block_group; + + if (!btrfs_is_zoned(fs_info)) + return; + + block_group = btrfs_lookup_block_group(fs_info, logical); + /* It should be called on a previous data relocation block group. */ + ASSERT(block_group && (block_group->flags & BTRFS_BLOCK_GROUP_DATA)); + + spin_lock(&block_group->lock); + if (!block_group->zoned_data_reloc_ongoing) + goto out; + + /* All relocation extents are written. */ + if (block_group->start + block_group->alloc_offset == logical + length) { + /* Now, release this block group for further allocations. */ + block_group->zoned_data_reloc_ongoing = 0; + } + +out: + spin_unlock(&block_group->lock); + btrfs_put_block_group(block_group); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 2d898970aec5..cf6320feef46 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -80,6 +80,8 @@ void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg, struct extent_buffer *eb); void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg); void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info); +void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, u64 logical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -241,6 +243,9 @@ static inline void btrfs_schedule_zone_finish_bg(struct btrfs_block_group *bg, static inline void btrfs_clear_data_reloc_bg(struct btrfs_block_group *bg) { } static inline void btrfs_free_zone_cache(struct btrfs_fs_info *fs_info) { } + +static inline void btrfs_zoned_release_data_reloc_bg(struct btrfs_fs_info *fs_info, + u64 logical, u64 length) { } #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Mon Aug 8 02:02:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12938431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CEF7C25B0C for ; Mon, 8 Aug 2022 02:24:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242009AbiHHCYq (ORCPT ); Sun, 7 Aug 2022 22:24:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231852AbiHHCYc (ORCPT ); Sun, 7 Aug 2022 22:24:32 -0400 Received: from esa1.hgst.iphmx.com (esa1.hgst.iphmx.com [68.232.141.245]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DEBF1180D; Sun, 7 Aug 2022 19:02:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1659924149; x=1691460149; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4iRH2Heg/YztVNHuPLG+Pi5cN1PMuYAJr1dtblZ5aQE=; b=Gi48c5uxcBUeqhVpSVDlnPb8IPojweKkTk81KPUHSoKTM1ru0XorPnNL FUr7l5OPSZPbv5621PsHRHbF7/wRhstvmBzXt0ME5TliFavums8DPdVwv H7t6WTtAlGzsBiIShzuXdx+KypZFEKtDrQeSc8rnaoC/k6Y2lTAkY2SKk emUeFFcSiwMEpWeHCKVeH1mIIX8RJfTAn7fQr6+KJfuOVOKy7DPb9o6je DSZhU853H9HUSHerW9K9miS99G32uu3iY+N8CZDZ5Snz8nxm/ZK7qO/2R WK9D21MK9LGPI+ODvy4O3XEkuCgTUESV0kzgwbKBU8KhOlnqkh3tAwjMO Q==; X-IronPort-AV: E=Sophos;i="5.93,221,1654531200"; d="scan'208";a="320182415" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2022 10:02:29 +0800 IronPort-SDR: tten+aKbwb0AOlaSSngNNJivrEvC1uoFMKCsmEYk57MUl43YsSPx+jYKkOfFEFAqIh7z/fmrkn h7zbct9z/gCFptZxdSzQ6zDuQmyFw3LhNkS1veDoR7tDUqoMin2DKZNWQ0f6UyD1pRtNPPl9Fq FMKu16GY/TF8f3aLxbmF5lRtFX8IQ5ypEVX0pqIknJxesb24m/z+7hI30Ygyjz1UYSYWkQxHE6 Aj1Qb943cg8P1QkXTwToECPd0VC/WRDrT2fbgHmkwcXgU1qvexBZ/SibPEnp1ILR9bHPVYU6B9 VMF1zKcx4sobuVKxqk4lJPIq Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 07 Aug 2022 18:18:10 -0700 IronPort-SDR: dzSYYrgIwI4FConoGw3KqQutevGoEBmGDojm/pWs16tzUvkBo4wJymvJhGobkX1zttH+Sc+brj 0I7xqUuK5uBWqyyv9HkNgxJCfdCFMmOMRS9OI0RtBD3X0r2k9MdrYvgauZleSpB3XbtlVnVmBX bMCpwHlSG9qr/He4rROvIe6Q63XG6NifkbmTR/yvXrodD/1F51az8poU7gYDZGCER/1kT6TsgE sz6tnwj4S26x2Cy3uqNW8KFalQ71NdWqfNvDhUNqMyonhI5xq4P4//59rn4e28TNjAZYXpw0Bm mBo= WDCIronportException: Internal Received: from ctl002.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.129]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Aug 2022 19:02:29 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: Naohiro Aota , Johannes Thumshirn , David Sterba Subject: [PATCH STABLE 5.18 v2 2/3] btrfs: zoned: fix critical section of relocation inode writeback Date: Mon, 8 Aug 2022 11:02:00 +0900 Message-Id: <20220808020201.712924-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220808020201.712924-1-naohiro.aota@wdc.com> References: <20220808020201.712924-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 19ab78ca86981e0e1e73036fb73a508731a7c078 upstream We use btrfs_zoned_data_reloc_{lock,unlock} to allow only one process to write out to the relocation inode. That critical section must include all the IO submission for the inode. However, flush_write_bio() in extent_writepages() is out of the critical section, causing an IO submission outside of the lock. This leads to an out of the order IO submission and fail the relocation process. Fix it by extending the critical section. Fixes: 35156d852762 ("btrfs: zoned: only allow one process to add pages to a relocation inode") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a23a42ba88ca..68ddd90685d9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5214,13 +5214,14 @@ int extent_writepages(struct address_space *mapping, */ btrfs_zoned_data_reloc_lock(BTRFS_I(inode)); ret = extent_write_cache_pages(mapping, wbc, &epd); - btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); ASSERT(ret <= 0); if (ret < 0) { + btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); end_write_bio(&epd, ret); return ret; } ret = flush_write_bio(&epd); + btrfs_zoned_data_reloc_unlock(BTRFS_I(inode)); return ret; } From patchwork Mon Aug 8 02:02:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12938432 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B17B9C25B0E for ; Mon, 8 Aug 2022 02:24:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242709AbiHHCYs (ORCPT ); Sun, 7 Aug 2022 22:24:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241838AbiHHCYc (ORCPT ); Sun, 7 Aug 2022 22:24:32 -0400 Received: from esa1.hgst.iphmx.com (esa1.hgst.iphmx.com [68.232.141.245]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1E1B11811; Sun, 7 Aug 2022 19:02:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1659924150; x=1691460150; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=464mKJOcENT8/jWX3CKvvBL8fIiP88Nj5Nh21OYdmbc=; b=LMGIt4alZUtAmNFr+B9MXt2LRAviRuIGABdv4m/yB8mSq9g+NSch+2AR MQcsrZIvkCL8XXoP0RnosRUWh+wJhSK5WN0y4JTanVabqhBD1BrLny2gE 5v8FQfyBxl9dY1OwmS0lJj0ocV9cN5lbDjkFPK27DdLvWZL20JTw3gOmE uS6qfDRz6bkahXCG/bQPTxeLF4kvIyqKcjYPpY1cMYb1bdBONXIZIKCpR KZ6+cyTCY20cXz4+20Q5mKb5OoaoxaMPmMyYjzO7DfMU9ibL3Bfa1MBLt jd/IFpl3O7RES4d3XwhbWFnPCpiyRsnan4906DoR1rdb7ukAxQUGsVTyG A==; X-IronPort-AV: E=Sophos;i="5.93,221,1654531200"; d="scan'208";a="320182420" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 08 Aug 2022 10:02:30 +0800 IronPort-SDR: 04mWyEz27TtVNEckA0vUBt3mX58qxf78kNSn6OkcReA+iwWZ+voa0S0qzjGg8ilYfgnqSbe2aj OTsJaU744D7QFwgD/VKY0b+a605SgABsrTfz7gxfk/e4jdglI+qgly/WDJkSckvtxqJgqx48Wg AFv3USglZItFCQUKuwFPI+J8SQG7RC0/aYuAIWls/AwSYXnHKRlg1locFxi5reJzJ32mrIAm76 c3R62MJ/NVaBVdkiPLRCWZu4gSI+2FzRlnVETbILyxA6swF3P5T1VAYx4ok1D7oHHvDTN9gszO JSzAx8Bzf4PTzf2aSOgkAxfW Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 07 Aug 2022 18:18:12 -0700 IronPort-SDR: Scd2HmbfaV1r1sYZd7ijC8AJuOSfloMjneI3PPFMm2SXddwXTHKDJXDfd0EX596goVzdT/A/3O 8Ibq1QHpcVdmf04o6gXMZ/3oQvCrdfL2ty4ewqegXpOUFQbCw6Gr2Tg5IERSrBcAfom0Geipgc 58WK2JYMsn0jxtCVzcEibWKgDR12sEQsVDLsLvdWUvzMIIng5xXJ62aDKscG2FZIVrrjEUBWzH a+hHXptbd/wEfMM8GpwSm5yYr0qDuXvoVZZo8K5RhxEYxbK5R2Qqye2edM+ow+Ei0U5RO/dlh7 Gu8= WDCIronportException: Internal Received: from ctl002.ad.shared (HELO naota-xeon.wdc.com) ([10.225.53.129]) by uls-op-cesaip02.wdc.com with ESMTP; 07 Aug 2022 19:02:30 -0700 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, stable@vger.kernel.org Cc: Naohiro Aota , Johannes Thumshirn , David Sterba Subject: [PATCH STABLE 5.18 v2 3/3] btrfs: zoned: drop optimization of zone finish Date: Mon, 8 Aug 2022 11:02:01 +0900 Message-Id: <20220808020201.712924-4-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220808020201.712924-1-naohiro.aota@wdc.com> References: <20220808020201.712924-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit b3a3b0255797e1d395253366ba24a4cc6c8bdf9c upstream We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we wrote fully into the zone. The assumption is determined by "alloc_offset == capacity". This condition won't work if the last ordered extent is canceled due to some errors. In that case, we consider the zone is deactivated without sending the finish command while it's still active. This inconstancy results in activating another block group while we cannot really activate the underlying zone, which causes the active zone exceeds errors like below. BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 Fix the issue by removing the optimization for now. Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB") Reviewed-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Signed-off-by: David Sterba --- fs/btrfs/zoned.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2c0851d94eff..84b6d39509bd 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -2005,6 +2005,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len struct btrfs_device *device; u64 min_alloc_bytes; u64 physical; + int i; if (!btrfs_is_zoned(fs_info)) return; @@ -2039,13 +2040,25 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len spin_unlock(&block_group->lock); map = block_group->physical_map; - device = map->stripes[0].dev; - physical = map->stripes[0].physical; + for (i = 0; i < map->num_stripes; i++) { + int ret; - if (!device->zone_info->max_active_zones) - goto out; + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->zone_info->max_active_zones == 0) + continue; - btrfs_dev_clear_active_zone(device, physical); + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH, + physical >> SECTOR_SHIFT, + device->zone_info->zone_size >> SECTOR_SHIFT, + GFP_NOFS); + + if (ret) + return; + + btrfs_dev_clear_active_zone(device, physical); + } spin_lock(&fs_info->zone_active_bgs_lock); ASSERT(!list_empty(&block_group->active_bg_list));