From patchwork Thu Jul 7 05:32:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4EF9C43334 for ; Thu, 7 Jul 2022 05:33:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234891AbiGGFdA (ORCPT ); Thu, 7 Jul 2022 01:33:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234716AbiGGFc7 (ORCPT ); Thu, 7 Jul 2022 01:32:59 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43DCD3134C for ; Wed, 6 Jul 2022 22:32:58 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DC40621E3B for ; Thu, 7 Jul 2022 05:32:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171976; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bbuz35aYGGlIOmXJlPgihigDXD7+a/SPH0Ikalhnhtw=; b=FWR/YdjjMy4PdMSnZqKtE+mewq0RRwW+loJ05pkjYUVmXKnuCjs+dIB6lGn+Md059RBhSV bHBiyrjBIuVF0QTR+W6Xap2R8PSm+lQWMo4PR3q1OwgxP1g0KNpoodWBTpGzSRWxCxvp50 EN+ADWTzF/GasP0VO/7h/eUTD+TZvNI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3F46C13488 for ; Thu, 7 Jul 2022 05:32:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id +DoVAwhwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:32:56 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 01/12] btrfs: introduce new compat RO flag, EXTRA_SUPER_RESERVED Date: Thu, 7 Jul 2022 13:32:26 +0800 Message-Id: <67f49ace41bb92c4ed0ad663c553e96d56acf5af.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From the beginning of btrfs kernel module, kernel will avoid allocating dev extents into the first 1MiB of each device, not only for the superblock at 64KiB, but also for legacy bootloaders to store their data. Here for later expansion, we introduce a new compat RO flag, EXTRA_SUPER_RESERVED, this allows btrfs to have extra reserved space beyond the default 1MiB. The extra reserved space can be utilized by things like write-intent log. Several new checks are introduced: - No super_reserved_bytes should be smaller than the old 1MiB limit - No zoned device support Such reserved space will be utilized in a no COW way, thus it can not be supported by zoned device. We still allow dev extents to exist in the new reserved_bytes range, this is to allow btrfstune to set reserved_bytes, then rely on kernel balance to reserve the new space. But later, if there is any feature relying on the reserved_bytes, we will disable those features automatically. There is a special catch, the new member @reserved_bytes is located at the end of the superblock, to utilize the padding bytes. The reason is, due to some undetermined extent tree v2 on-disk format mess, we don't have reliable location inside the reserved8[] array. I just want a stable location undisturbed. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 14 +++++++++++--- fs/btrfs/disk-io.c | 9 +++++++++ fs/btrfs/sysfs.c | 2 ++ fs/btrfs/volumes.c | 22 ++++++++++++++++++---- fs/btrfs/zoned.c | 8 ++++++++ include/uapi/linux/btrfs.h | 10 ++++++++++ 6 files changed, 58 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e2569f84aab..12019904f1cf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -293,14 +293,19 @@ struct btrfs_super_block { __le64 block_group_root_generation; u8 block_group_root_level; - /* future expansion */ u8 reserved8[7]; __le64 reserved[25]; u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE]; struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS]; + /* + * How many bytes are reserved at the beginning of a device. + * Should be >= BTRFS_DEFAULT_RESERVED. + */ + __le32 reserved_bytes; + /* Padded to 4096 bytes */ - u8 padding[565]; + u8 padding[561]; } __attribute__ ((__packed__)); static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE); @@ -315,7 +320,8 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE); #define BTRFS_FEATURE_COMPAT_RO_SUPP \ (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \ BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \ - BTRFS_FEATURE_COMPAT_RO_VERITY) + BTRFS_FEATURE_COMPAT_RO_VERITY | \ + BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED) #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL @@ -2539,6 +2545,8 @@ BTRFS_SETGET_STACK_FUNCS(super_block_group_root_generation, block_group_root_generation, 64); BTRFS_SETGET_STACK_FUNCS(super_block_group_root_level, struct btrfs_super_block, block_group_root_level, 8); +BTRFS_SETGET_STACK_FUNCS(super_reserved_bytes, struct btrfs_super_block, + reserved_bytes, 32); int btrfs_super_csum_size(const struct btrfs_super_block *s); const char *btrfs_super_csum_name(u16 csum_type); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 70b388de4d66..1df2da2509ca 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2822,6 +2822,15 @@ static int validate_super(struct btrfs_fs_info *fs_info, ret = -EINVAL; } + if (btrfs_super_compat_ro_flags(sb) & + BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED && + btrfs_super_reserved_bytes(sb) < BTRFS_DEVICE_RANGE_RESERVED) { + btrfs_err(fs_info, +"EXTRA_SUPER_RESERVED feature enabled, but reserved space is smaller than default (%u)", + BTRFS_DEVICE_RANGE_RESERVED); + ret = -EINVAL; + } + /* * The generation is a global counter, we'll trust it more than the others * but it's still possible that it's the one that's wrong. diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index d5d0717fd09a..7fd38539315f 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -287,6 +287,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); +BTRFS_FEAT_ATTR_COMPAT_RO(extra_super_reserved, EXTRA_SUPER_RESERVED); #ifdef CONFIG_BLK_DEV_ZONED BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED); #endif @@ -317,6 +318,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(metadata_uuid), BTRFS_FEAT_ATTR_PTR(free_space_tree), BTRFS_FEAT_ATTR_PTR(raid1c34), + BTRFS_FEAT_ATTR_PTR(extra_super_reserved), #ifdef CONFIG_BLK_DEV_ZONED BTRFS_FEAT_ATTR_PTR(zoned), #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d788a351c1f..2a4ac905e39f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1401,8 +1401,13 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start, static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) { + struct btrfs_fs_info *fs_info = device->fs_devices->fs_info; + switch (device->fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: + if (btrfs_fs_compat_ro(fs_info, EXTRA_SUPER_RESERVED)) + return max_t(u64, start, + btrfs_super_reserved_bytes(fs_info->super_copy)); return max_t(u64, start, BTRFS_DEVICE_RANGE_RESERVED); case BTRFS_CHUNK_ALLOC_ZONED: /* @@ -7969,11 +7974,14 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, struct extent_map *em; struct map_lookup *map; struct btrfs_device *dev; + u32 super_reserved = BTRFS_DEVICE_RANGE_RESERVED; u64 stripe_len; bool found = false; int ret = 0; int i; + if (btrfs_fs_compat_ro(fs_info, EXTRA_SUPER_RESERVED)) + super_reserved = btrfs_super_reserved_bytes(fs_info->super_copy); read_lock(&em_tree->lock); em = lookup_extent_mapping(em_tree, chunk_offset, 1); read_unlock(&em_tree->lock); @@ -7998,11 +8006,17 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, } /* - * Very old mkfs.btrfs (before v4.1) will not respect the reserved - * space. Although kernel can handle it without problem, better to warn - * the users. + * This can be caused by two cases: + * - Very old mkfs.btrfs (before v4.1) and no balance at all + * This should be pretty rare now, as balance will relocate + * those dev extents in reserved range. + * + * - Newly set btrfs_super_block::reserved_bytes + * We are rely on this mount to relocate those dev extents. + * + * So here, we just give a warning and continue the mount. */ - if (physical_offset < BTRFS_DEVICE_RANGE_RESERVED) + if (physical_offset < super_reserved) btrfs_warn(fs_info, "devid %llu physical %llu len %llu inside the reserved space", devid, physical_offset, physical_len); diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 7a0f8fa44800..ec16c0a6fb22 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -703,6 +703,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) goto out; } + if (incompat_zoned && btrfs_fs_compat_ro(fs_info, EXTRA_SUPER_RESERVED)) { + btrfs_err(fs_info, + "zoned: incompatible optional feature detected: 0x%llx", + BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED); + ret = -EINVAL; + goto out; + } + /* * stripe_size is always aligned to BTRFS_STRIPE_LEN in * btrfs_create_chunk(). Since we want stripe_len == zone_size, diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index f54dc91e4025..4a0c9f4f55d1 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -290,6 +290,16 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID (1ULL << 1) #define BTRFS_FEATURE_COMPAT_RO_VERITY (1ULL << 2) +/* + * Allow btrfs to have extra reserved space (other than the default 1MiB) at + * the beginning of each device. + * + * This feature will enable the usage of btrfs_super_block::reserved_bytes. + * + * This feature would only be available for non-zoned filesystems. + */ +#define BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED (1ULL << 3) + #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0) #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1) #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL << 2) From patchwork Thu Jul 7 05:32:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97F1FCCA47F for ; Thu, 7 Jul 2022 05:33:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234896AbiGGFdB (ORCPT ); Thu, 7 Jul 2022 01:33:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiGGFc7 (ORCPT ); Thu, 7 Jul 2022 01:32:59 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30A9E31345 for ; Wed, 6 Jul 2022 22:32:59 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E34B121EDA for ; Thu, 7 Jul 2022 05:32:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171977; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FmFAgy4r6e5updNR2VO+sRMQhBE1hm9IdlGZ2cy6hgc=; b=hmgiU01UlkIQ5ceBGyzpdIduxHrKX/a0t7F2Yh0YkmJv9PzB+fr8Tyyu2YjLI8/IktZGWz 6e14SPSaV75EZWWFUt08ctnPgsEVE/xpcjhs0agE+N/XVpQXjtxDqUYwcQIkfwXgRZpZZS jF2vo5KnnPJPhSUM8/gatafTqldJr1E= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 466A313488 for ; Thu, 7 Jul 2022 05:32:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UBbqBAlwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:32:57 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 02/12] btrfs: introduce a new experimental compat RO flag, WRITE_INTENT_BITMAP Date: Thu, 7 Jul 2022 13:32:27 +0800 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new flag is for the incoming write intent bitmap, mostly to address the RAID56 write-hole, by doing a mandatory scrub for partial written stripes at mount time. Currently the feature is still under development, this patch is mostly a placeholder for the extra reserved bytes for write intent bitmap. We will utilize the newly introduce EXTRA_SUPER_RESERVED compat RO flags to enlarge the reserved bytes to at least (1MiB + 64KiB), and use that 64KiB (exact value is not yet fully determined) for write-intent bitmap. Only one extra check is introduced, to ensure we have enough space to place the write-intent bitmap at 1MiB physical offset. This patch is only a place holder for the incoming on-disk format change, no real write-intent functionality is implemented yet. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 9 +++++++++ fs/btrfs/disk-io.c | 20 ++++++++++++++++++++ fs/btrfs/volumes.c | 14 +++++++++++++- include/uapi/linux/btrfs.h | 7 +++++++ 4 files changed, 49 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 12019904f1cf..908a735a66cf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -317,11 +317,20 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE); #define BTRFS_FEATURE_COMPAT_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR 0ULL +#ifdef CONFIG_BTRFS_DEBUG +#define BTRFS_FEATURE_COMPAT_RO_SUPP \ + (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \ + BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \ + BTRFS_FEATURE_COMPAT_RO_VERITY | \ + BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED | \ + BTRFS_FEATURE_COMPAT_RO_WRITE_INTENT_BITMAP) +#else #define BTRFS_FEATURE_COMPAT_RO_SUPP \ (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \ BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \ BTRFS_FEATURE_COMPAT_RO_VERITY | \ BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED) +#endif #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1df2da2509ca..967c020c380a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2830,6 +2830,26 @@ static int validate_super(struct btrfs_fs_info *fs_info, BTRFS_DEVICE_RANGE_RESERVED); ret = -EINVAL; } + if (btrfs_super_compat_ro_flags(sb) & + BTRFS_FEATURE_COMPAT_RO_WRITE_INTENT_BITMAP) { + /* Write intent bitmap requires extra reserve. */ + if (!(btrfs_super_compat_ro_flags(sb) & + BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED)) { + btrfs_err(fs_info, +"WRITE_INTENT_BITMAP feature enabled, but missing EXTRA_SUPER_RESERVED feature"); + ret = -EINVAL; + } + /* + * Write intent bitmap is always located at 1MiB. + * Extra check like the length check against the reserved space + * will happen at bitmap load time. + */ + if (btrfs_super_reserved_bytes(sb) < BTRFS_DEVICE_RANGE_RESERVED) { + btrfs_err(fs_info, + "not enough reserved space for write intent bitmap"); + ret = -EINVAL; + } + } /* * The generation is a global counter, we'll trust it more than the others diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2a4ac905e39f..4882c616768c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -8016,11 +8016,23 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, * * So here, we just give a warning and continue the mount. */ - if (physical_offset < super_reserved) + if (physical_offset < super_reserved) { btrfs_warn(fs_info, "devid %llu physical %llu len %llu inside the reserved space", devid, physical_offset, physical_len); + /* Disable any feature relying on the new reserved_bytes. */ + if (btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) { + struct btrfs_super_block *sb = fs_info->super_copy; + + btrfs_warn(fs_info, + "disabling write intent bitmap due to the lack of reserved space."); + btrfs_set_super_compat_ro_flags(sb, + btrfs_super_compat_ro_flags(sb) | + ~BTRFS_FEATURE_COMPAT_RO_WRITE_INTENT_BITMAP); + } + } + for (i = 0; i < map->num_stripes; i++) { if (map->stripes[i].dev->devid == devid && map->stripes[i].physical == physical_offset) { diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 4a0c9f4f55d1..38c74a50323e 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -300,6 +300,13 @@ struct btrfs_ioctl_fs_info_args { */ #define BTRFS_FEATURE_COMPAT_RO_EXTRA_SUPER_RESERVED (1ULL << 3) +/* + * Allow btrfs to have per-device write-intent bitmap. + * Will be utilized to close the RAID56 write-hole (by forced scrub for dirty + * partial written stripes at mount time). + */ +#define BTRFS_FEATURE_COMPAT_RO_WRITE_INTENT_BITMAP (1ULL << 4) + #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0) #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1) #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL << 2) From patchwork Thu Jul 7 05:32:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 677A8C43334 for ; Thu, 7 Jul 2022 05:33:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234900AbiGGFdD (ORCPT ); Thu, 7 Jul 2022 01:33:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiGGFdC (ORCPT ); Thu, 7 Jul 2022 01:33:02 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 795F031345 for ; Wed, 6 Jul 2022 22:33:01 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 373991F8D3 for ; Thu, 7 Jul 2022 05:33:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171980; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QCVTq+nLUTbwjNpOFA353mI3mKirIJBPM6fhcatjdWk=; b=ZvdZhZOnGRrbHTM/6sHf/ZDsgn4d+vMwGHiLAM3/0x/kEcEc+HjEN6PwGLhJFJo/rQaHdt WuY097MEQAskrTv/eW7EFz0+SCgyBP1rEX0NivoeY/1Sf8kMzZUunJCjwR3sfy7gFoMgSZ Kt5PRZ8NXhdWKtJaWZRCHq2zhalbmuM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 295DB13488 for ; Thu, 7 Jul 2022 05:32:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CDPwBgpwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:32:58 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 03/12] btrfs: introduce the on-disk format of btrfs write intent bitmaps Date: Thu, 7 Jul 2022 13:32:28 +0800 Message-Id: <20e6fdb9e8fa3ba80a5b2c1988a377e7f5733abb.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org With extra comments explaining the on-disk format and the basic workflow. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 4 +- fs/btrfs/raid56.c | 1 + fs/btrfs/write-intent.h | 199 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 fs/btrfs/write-intent.h diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 967c020c380a..963a89cd4bfb 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -42,6 +42,7 @@ #include "space-info.h" #include "zoned.h" #include "subpage.h" +#include "write-intent.h" #define BTRFS_SUPER_FLAG_SUPP (BTRFS_HEADER_FLAG_WRITTEN |\ BTRFS_HEADER_FLAG_RELOC |\ @@ -2844,7 +2845,8 @@ static int validate_super(struct btrfs_fs_info *fs_info, * Extra check like the length check against the reserved space * will happen at bitmap load time. */ - if (btrfs_super_reserved_bytes(sb) < BTRFS_DEVICE_RANGE_RESERVED) { + if (btrfs_super_reserved_bytes(sb) < + BTRFS_DEVICE_RANGE_RESERVED + WRITE_INTENT_BITMAPS_SIZE) { btrfs_err(fs_info, "not enough reserved space for write intent bitmap"); ret = -EINVAL; diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index c6411c849fea..7b2d2b6c8c61 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -19,6 +19,7 @@ #include "volumes.h" #include "raid56.h" #include "async-thread.h" +#include "write-intent.h" /* set when additional merges to this rbio are not allowed */ #define RBIO_RMW_LOCKED_BIT 1 diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h new file mode 100644 index 000000000000..b851917bb0b6 --- /dev/null +++ b/fs/btrfs/write-intent.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* Btrfs specific write-intent bitmaps. */ + +#ifndef BTRFS_WRITE_INTENT_H +#define BTRFS_WRITE_INTENT_H + +#include +#include +/* For BTRFS_STRIPE_LEN. */ +#include "volumes.h" + +#define WRITE_INTENT_SUPER_MAGIC 0x515F62536249775FULL /* ascii _wIbSb_Q */ + +/* This write-intent bitmap records writes for RAID56 writes. */ +#define WRITE_INTENT_FLAGS_TARGET_RAID56 (1ULL << 0) + +/* This write-intent bitmap is internal, aka locates at 1MiB of each device. */ +#define WRITE_INTENT_FLAGS_INTERNAL (1ULL << 1) + +/* This write-intent bitmap uses logical bytenr. */ +#define WRITE_INTENT_FLAGS_BYTENR_LOGICAL (1ULL << 2) + +#define WRITE_INTENT_FLAGS_SUPPORTED \ + (WRITE_INTENT_FLAGS_TARGET_RAID56 | \ + WRITE_INTENT_FLAGS_INTERNAL | \ + WRITE_INTENT_FLAGS_BYTENR_LOGICAL) + +/* + * We use BTRFS_STRIPE_LEN as blocksize. + * This makes a RAID56 full stripe to @nr_data bits, and greatly + * enlarge how many bytes we can represent just using 4KiB. + */ +#define WRITE_INTENT_BLOCKSIZE (BTRFS_STRIPE_LEN) + +/* + * For now, 4KiB is enough, as using 64KiB blocksize we can save bitmaps + * for 896MiB (224 entries, each entri can cache 64KiB * 64) sized logical + * range. + */ +#define WRITE_INTENT_BITMAPS_SIZE (SZ_4K) + + +/* + * The super block of write intent bitmaps, should be at physical offset 1MiB of + * every writeable device. + */ +struct write_intent_super { + /* Csum for the super block and all the internal entries. */ + __u8 csum[BTRFS_CSUM_SIZE]; + __u8 fsid[BTRFS_FSID_SIZE]; + + __le64 magic; + + /* Important behavior flags would be set here. */ + __le64 flags; + + /* + * Event counter for the bitmap, every time the bitmaps get written + * this value increases. + */ + __le64 events; + + /* + * Total size of the bitmaps, including this super block and all the + * entries. + * + * U32 should be enough for internal bitmaps, but just in case we want + * to support external device as dedicated journal/bitmap device. + */ + __le64 size; + + /* How many entries we have utilized. */ + __le64 nr_entries; + + /* How many bytes one bit represents. */ + __le32 blocksize; + /* + * This should be the same as btrfs_super_block::csum_type. + * Cache csum type here so we read the write intent superblock without + * a fully opened btrfs (for dump purpose). + */ + __le16 csum_type; + + /* For future expansion, padding to 512 bytes. */ + __u8 reserved1[418]; +} __attribute__ ((__packed__)); + +static_assert(sizeof(struct write_intent_super) == 512); + +struct write_intent_entry { + /* + * Bytenr 0 is special, means this entry is empty, and also means the + * end of the bitmaps. + */ + __le64 bytenr; + __le64 bitmap; +}; + +/* How many entries we can have in the bitmaps. */ +#define WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES \ + ((WRITE_INTENT_BITMAPS_SIZE - \ + sizeof(struct write_intent_super)) / \ + sizeof(struct write_intent_entry)) + +/* The number of bits we can have in one entry. */ +#define WRITE_INTENT_BITS_PER_ENTRY (64) + +/* + * ON-DISK FORMAT + * ============== + * + * [ super ][ entry 1 ][ entry 2 ] ... [entry N] + * |<------------ super::size --------------->| + * + * Normally it's 4KiB in size for internal bitmap. + * + * Currently the write-intent bitmaps is only for RAID56 writes, thus its + * blocksize is always 64KiB. + * Thus one entry can represent at most 4MiB (64 * 64 KiB) of logical range. + * + * When one raid56 full stripe needs partial writeback, the full stripe logical + * bytenr range will be included into at least one entry. + * + * After the last used entry, the remaining entries will all be filled with 0. + * + * WORKFLOW + * ======== + * + * 1) Write bio arrive + * Every write meets the requirement (so far, only RAID56 partial write) will + * have its bio delayed, until corresponding range are marked in the entry. + * + * 2) Update the write-intent bitmaps + * The entries will be populated, and write back to all writeable devices, + * with FUA flag set. + * Will wait until the write reaches disks. + * + * 3) Allow the involved write bios to be submitted + * + * 4) Write bios finish + * The corresponding range will be recorded to be freed at next flush. + * + * 5) All devices get flushed (caused by btrfs super block writeback or bitmaps + * pressure) + * The recorded ranges will be cleared. And if an entry is empty, it will be + * freed. + * Then update the write-intent bitmaps with its superblock (writeback with FUA + * flag and wait for it). + */ + +#define WRITE_INTENT_SETGET_FUNCS(name, type, member, bits) \ +static inline u##bits wi_##name(const type *s) \ +{ \ + return le##bits##_to_cpu(s->member); \ +} \ +static inline void wi_set_##name(type *s, u##bits val) \ +{ \ + s->member = cpu_to_le##bits(val); \ +} + +WRITE_INTENT_SETGET_FUNCS(super_magic, struct write_intent_super, magic, 64); +WRITE_INTENT_SETGET_FUNCS(super_flags, struct write_intent_super, flags, 64); +WRITE_INTENT_SETGET_FUNCS(super_events, struct write_intent_super, events, 64); +WRITE_INTENT_SETGET_FUNCS(super_size, struct write_intent_super, size, 64); +WRITE_INTENT_SETGET_FUNCS(super_nr_entries, struct write_intent_super, + nr_entries, 64); +WRITE_INTENT_SETGET_FUNCS(super_blocksize, struct write_intent_super, + blocksize, 32); +WRITE_INTENT_SETGET_FUNCS(super_csum_type, struct write_intent_super, + csum_type, 16); +WRITE_INTENT_SETGET_FUNCS(entry_bytenr, struct write_intent_entry, bytenr, 64); + +static inline void wie_get_bitmap(struct write_intent_entry *entry, + unsigned long *bitmap) +{ +#ifdef __LITTLE_ENDIAN + bitmap_from_arr64(bitmap, &entry->bitmap, WRITE_INTENT_BITS_PER_ENTRY); +#else + u64 val = le64_to_cpu(entry->bitmap); + + bitmap_from_arr64(bitmap, &val, WRITE_INTENT_BITS_PER_ENTRY); +#endif +} + +static inline void wie_set_bitmap(struct write_intent_entry *entry, + unsigned long *bitmap) +{ +#ifdef __LITTLE_ENDIAN + bitmap_to_arr64(&entry->bitmap, bitmap, WRITE_INTENT_BITS_PER_ENTRY); +#else + u64 val; + + bitmap_to_arr64(&val, bitmap, WRITE_INTENT_BITS_PER_ENTRY); + entry->bitmap = cpu_to_le64(val); +#endif +} + +#endif From patchwork Thu Jul 7 05:32:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB63C433EF for ; Thu, 7 Jul 2022 05:33:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234908AbiGGFdE (ORCPT ); Thu, 7 Jul 2022 01:33:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbiGGFdD (ORCPT ); Thu, 7 Jul 2022 01:33:03 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E9343134C for ; Wed, 6 Jul 2022 22:33:02 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3C37721D94 for ; Thu, 7 Jul 2022 05:33:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171981; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0ZUXn5ypN199wPptlBGguCI5w7lGApL6vasq0MC4OY=; b=qPCHSgLPelzS7WCe2a9GhU9mZHuHTjnPSLcXmzi97T4w/92zB2Uyos0jDYJtwyj7s7rZ8G bmaBLDl8JvbOawcLeyttaU+qNXceZrtg2Rs4pRVpjdgnFytaMmfpw9KXfcsyqx8INHZIn1 2a8QJb3FbmA2NQyjwCFB6X/r4CyM0Fs= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 95E6113488 for ; Thu, 7 Jul 2022 05:33:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WFx0GAxwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:00 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 04/12] btrfs: load/create write-intent bitmaps at mount time Date: Thu, 7 Jul 2022 13:32:29 +0800 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch will introduce btrfs_fs_info::wi_ctrl, which will have a non-highmem page for the write intent bitmaps. Please note that, if we can't find a valid bitmaps, the newly create one will only be in memory for now, the bitmaps writeback functionality will be introduced in the next commit. Signed-off-by: Qu Wenruo --- fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 7 ++ fs/btrfs/write-intent.c | 174 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/write-intent.h | 15 ++++ 5 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 fs/btrfs/write-intent.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 99f9995670ea..af93119d52e2 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \ - subpage.o tree-mod-log.o + subpage.o tree-mod-log.o write-intent.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 908a735a66cf..fcc8ae4b7fb4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -989,6 +989,7 @@ struct btrfs_fs_info { struct workqueue_struct *scrub_wr_completion_workers; struct workqueue_struct *scrub_parity_workers; struct btrfs_subpage_info *subpage_info; + struct write_intent_ctrl *wi_ctrl; struct btrfs_discard_ctl discard_ctl; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 963a89cd4bfb..edbb21706bda 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3699,6 +3699,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device ret); goto fail_block_groups; } + ret = btrfs_write_intent_init(fs_info); + if (ret) { + btrfs_err(fs_info, "failed to init write-intent bitmaps: %d", + ret); + goto fail_block_groups; + } ret = btrfs_recover_balance(fs_info); if (ret) { btrfs_err(fs_info, "failed to recover balance: %d", ret); @@ -4639,6 +4645,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) ret = btrfs_commit_super(fs_info); if (ret) btrfs_err(fs_info, "commit super ret %d", ret); + btrfs_write_intent_free(fs_info); } if (BTRFS_FS_ERROR(fs_info)) diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c new file mode 100644 index 000000000000..a7ed21182525 --- /dev/null +++ b/fs/btrfs/write-intent.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "ctree.h" +#include "volumes.h" +#include "write-intent.h" + +/* + * Return 0 if a valid write intent bitmap can be found. + * Return 1 if no valid write intent bitmap can be found. + * Return <0 for other fatal errors. + */ +static int write_intent_load(struct btrfs_device *device, struct page *dst) +{ + struct btrfs_fs_info *fs_info = device->fs_info; + struct write_intent_super *wis; + struct bio *bio; + int ret; + + bio = bio_alloc(device->bdev, 1, REQ_OP_READ | REQ_SYNC | REQ_META, + GFP_NOFS); + /* It's backed by fs_bioset. */ + ASSERT(bio); + bio->bi_iter.bi_sector = BTRFS_DEVICE_RANGE_RESERVED >> SECTOR_SHIFT; + __bio_add_page(bio, dst, WRITE_INTENT_BITMAPS_SIZE, 0); + ret = submit_bio_wait(bio); + if (ret < 0) + return ret; + + wis = page_address(dst); + if (wi_super_magic(wis) != WRITE_INTENT_SUPER_MAGIC) + return 1; + + /* Stale bitmaps, doesn't belong to our fs. */ + if (memcmp(wis->fsid, device->fs_devices->fsid, BTRFS_FSID_SIZE)) + return 1; + + /* Above checks pass, but still csum mismatch, a big problem. */ + if (btrfs_super_csum_type(fs_info->super_copy) != + wi_super_csum_type(wis)) { + btrfs_err(fs_info, + "csum type mismatch, write intent bitmap has %u fs has %u", + wi_super_csum_type(wis), + btrfs_super_csum_type(fs_info->super_copy)); + return -EUCLEAN; + } + + if (wi_super_flags(wis) & ~WRITE_INTENT_FLAGS_SUPPORTED) { + btrfs_err(fs_info, "unsupported flags 0x%llx", + wi_super_flags(wis) & ~WRITE_INTENT_FLAGS_SUPPORTED); + return -EOPNOTSUPP; + } + + return ret; +} + +static void write_intent_init(struct btrfs_fs_info *fs_info) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + struct write_intent_super *wis; + + ASSERT(ctrl); + ASSERT(ctrl->page); + + /* Always start event count from 1. */ + atomic64_set(&ctrl->event, 1); + ctrl->blocksize = WRITE_INTENT_BLOCKSIZE; + memzero_page(ctrl->page, 0, WRITE_INTENT_BITMAPS_SIZE); + + wis = page_address(ctrl->page); + memcpy(wis->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE); + wi_set_super_magic(wis, WRITE_INTENT_SUPER_MAGIC); + wi_set_super_csum_type(wis, btrfs_super_csum_type(fs_info->super_copy)); + wi_set_super_events(wis, 1); + wi_set_super_flags(wis, WRITE_INTENT_FLAGS_SUPPORTED); + wi_set_super_size(wis, WRITE_INTENT_BITMAPS_SIZE); + wi_set_super_blocksize(wis, ctrl->blocksize); + wi_set_super_nr_entries(wis, 0); + btrfs_info(fs_info, "creating new write intent bitmaps"); +} + +int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) +{ + struct btrfs_device *highest_dev = NULL; + struct btrfs_device *dev; + struct write_intent_super *wis; + u64 highest_event = 0; + int ret; + + ASSERT(fs_info->wi_ctrl == NULL); + if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) + return 0; + + fs_info->wi_ctrl = kzalloc(sizeof(*fs_info->wi_ctrl), GFP_NOFS); + if (!fs_info->wi_ctrl) + return -ENOMEM; + + fs_info->wi_ctrl->page = alloc_page(GFP_NOFS); + if (!fs_info->wi_ctrl->page) { + ret = -ENOMEM; + goto cleanup; + } + + /* + * Go through every writeable device to find the highest event. + * + * Only the write-intent with highest event number makes sense. + * If during bitmap writeback we lost power, some dev may have old + * bitmap which is already stale. + */ + list_for_each_entry(dev, &fs_info->fs_devices->devices, dev_list) { + u64 cur_event; + + if (!dev->bdev) + continue; + + ret = write_intent_load(dev, fs_info->wi_ctrl->page); + if (ret > 0) + continue; + if (ret < 0) { + btrfs_err(fs_info, + "failed to load write intent from devid %llu: %d", + dev->devid, ret); + goto cleanup; + } + wis = page_address(fs_info->wi_ctrl->page); + cur_event = wi_super_events(wis); + if (cur_event > highest_event) { + highest_dev = dev; + highest_event = cur_event; + } + } + + /* Load the bitmap with lowest event as our bitmap. */ + if (highest_dev) { + ret = write_intent_load(highest_dev, fs_info->wi_ctrl->page); + if (ret < 0) { + btrfs_err(fs_info, + "failed to load write intent from devid %llu: %d", + dev->devid, ret); + goto cleanup; + } + wis = page_address(fs_info->wi_ctrl->page); + atomic64_set(&fs_info->wi_ctrl->event, wi_super_events(wis)); + fs_info->wi_ctrl->blocksize = wi_super_blocksize(wis); + btrfs_info(fs_info, + "loaded write intent bitmaps, event count %llu", + atomic64_read(&fs_info->wi_ctrl->event)); + return 0; + } + + /* No valid bitmap found, create a new one. */ + write_intent_init(fs_info); + return 0; +cleanup: + if (fs_info->wi_ctrl) { + if (fs_info->wi_ctrl->page) + __free_page(fs_info->wi_ctrl->page); + kfree(fs_info->wi_ctrl); + fs_info->wi_ctrl = NULL; + } + return ret; +} + +void btrfs_write_intent_free(struct btrfs_fs_info *fs_info) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + + if (!ctrl) + return; + ASSERT(ctrl->page); + __free_page(ctrl->page); + kfree(ctrl); + fs_info->wi_ctrl = NULL; +} diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index b851917bb0b6..2c5cd434e978 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -106,6 +106,18 @@ struct write_intent_entry { /* The number of bits we can have in one entry. */ #define WRITE_INTENT_BITS_PER_ENTRY (64) +/* In-memory write-intent control structure. */ +struct write_intent_ctrl { + /* For the write_intent super and entries. */ + struct page *page; + + /* Cached event counter.*/ + atomic64_t event; + + /* Cached blocksize from write intent super. */ + u32 blocksize; +}; + /* * ON-DISK FORMAT * ============== @@ -196,4 +208,7 @@ static inline void wie_set_bitmap(struct write_intent_entry *entry, #endif } +int btrfs_write_intent_init(struct btrfs_fs_info *fs_info); +void btrfs_write_intent_free(struct btrfs_fs_info *fs_info); + #endif From patchwork Thu Jul 7 05:32:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909064 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7784C43334 for ; Thu, 7 Jul 2022 05:33:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234928AbiGGFdF (ORCPT ); Thu, 7 Jul 2022 01:33:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234905AbiGGFdE (ORCPT ); Thu, 7 Jul 2022 01:33:04 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 820163134E for ; Wed, 6 Jul 2022 22:33:03 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 40EA51F920 for ; Thu, 7 Jul 2022 05:33:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171982; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3mNEcPMIA/u8zlu8+71Blz3ecRV7A/aIqkjlqxCFskk=; b=ttoYFSXB7B1VZLTTp9G8AD5vBi/7Li5b7a+Np8XyURLqHHKeC01eRP5YK6cypnhpNSRdkw 2adxx/fsFmpCvOFNLmz7KGhbgOBnrtYzzMG7nVcVG4C1HJBP/ubvfCnlz7V+usTi9ZwmHu UGDlII2T65pxBoMCKfjiNS/3h5b7PJo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9B01913488 for ; Thu, 7 Jul 2022 05:33:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id oAakGQ1wxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:01 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/12] btrfs: write-intent: write the newly created bitmaps to all disks Date: Thu, 7 Jul 2022 13:32:30 +0800 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This write back will happen even for RO mounts. This will ensure we always have write-intent bitmaps for fses with that compat RO flags set. Signed-off-by: Qu Wenruo --- fs/btrfs/write-intent.c | 154 ++++++++++++++++++++++++++++++++++++++-- fs/btrfs/write-intent.h | 6 ++ 2 files changed, 155 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index a7ed21182525..d1c5e8e206ba 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -1,8 +1,140 @@ // SPDX-License-Identifier: GPL-2.0 +#include +#include #include "ctree.h" #include "volumes.h" #include "write-intent.h" +#include "rcu-string.h" + +struct bitmap_writeback_contrl { + atomic_t pending_bios; + atomic_t errors; + wait_queue_head_t wait; +}; + +static void write_intent_end_write(struct bio *bio) +{ + struct bitmap_writeback_contrl *wb_ctrl = bio->bi_private; + + if (bio->bi_status) + atomic_inc(&wb_ctrl->errors); + atomic_dec(&wb_ctrl->pending_bios); + wake_up(&wb_ctrl->wait); + + bio_put(bio); +} + +static int submit_one_device(struct btrfs_device *dev, + struct bitmap_writeback_contrl *wb_ctrl) +{ + struct btrfs_fs_info *fs_info = dev->fs_info; + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + struct bio *bio; + + if (!dev->bdev) + return -EIO; + + if (!test_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &dev->dev_state) || + !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) + return 0; + + atomic_inc(&wb_ctrl->pending_bios); + bio = bio_alloc(dev->bdev, 1, REQ_OP_WRITE | REQ_SYNC | + REQ_META | REQ_PRIO | REQ_FUA, GFP_NOFS); + bio->bi_iter.bi_sector = BTRFS_DEVICE_RANGE_RESERVED >> SECTOR_SHIFT; + bio->bi_private = wb_ctrl; + bio->bi_end_io = write_intent_end_write; + __bio_add_page(bio, ctrl->commit_page, WRITE_INTENT_BITMAPS_SIZE, + offset_in_page(BTRFS_DEVICE_RANGE_RESERVED)); + submit_bio(bio); + return 0; +} + +/* Write back the bitmaps page to all devices. */ +static int write_intent_writeback(struct btrfs_fs_info *fs_info) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + struct write_intent_super *wis; + struct btrfs_device *dev; + struct btrfs_device **found_devs; + struct bitmap_writeback_contrl wb_ctrl = {0}; + SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); + const int nr_devs_max = fs_info->fs_devices->open_devices + 4; + int nr_devs = 0; + int total_errors = 0; + int ret; + int i; + + ASSERT(ctrl); + + found_devs = kcalloc(nr_devs_max, sizeof(struct btrfs_device *), + GFP_NOFS); + if (!found_devs) + return -ENOMEM; + + shash->tfm = fs_info->csum_shash; + + spin_lock(&ctrl->lock); + wis = page_address(ctrl->page); + + /* + * Bump up the event counter each time this bitmap is going to be + * written. + */ + wi_set_super_events(wis, wi_super_events(wis) + 1); + crypto_shash_digest(shash, (unsigned char *)wis + BTRFS_CSUM_SIZE, + WRITE_INTENT_BITMAPS_SIZE - BTRFS_CSUM_SIZE, + wis->csum); + atomic64_inc(&ctrl->event); + memcpy_page(ctrl->commit_page, 0, ctrl->page, 0, + WRITE_INTENT_BITMAPS_SIZE); + spin_unlock(&ctrl->lock); + + init_waitqueue_head(&wb_ctrl.wait); + atomic_set(&wb_ctrl.pending_bios, 0); + atomic_set(&wb_ctrl.errors, 0); + + rcu_read_lock(); + /* + * Record all the writeable devices into found_devs[]. + * + * We have to do this to keep a consistent view of writeable devices, + * without holding device_list_mutex. + * As dev-replace/dev-removal will all hold that mutex and wait for + * submitted bios to finish. + * If we try to hold device_list_mutex at bio submission path, we will + * deadlock with above dev-replace/dev-removal + * + * So here we just grab a local list of devices, and since we're at + * bio submission path, the device will never disapper before the bio + * finished. + */ + list_for_each_entry_rcu(dev, &fs_info->fs_devices->devices, dev_list) { + found_devs[nr_devs] = dev; + nr_devs++; + + if (unlikely(nr_devs >= nr_devs_max)) + break; + } + rcu_read_unlock(); + + /* Go through all the recorded devices, and submit the commit_page. */ + for (i = 0; i < nr_devs; i++) { + ret = submit_one_device(found_devs[i], &wb_ctrl); + if (ret < 0) + total_errors++; + } + wait_event(wb_ctrl.wait, atomic_read(&wb_ctrl.pending_bios) == 0); + + if (total_errors + atomic_read(&wb_ctrl.errors) > + btrfs_super_num_devices(fs_info->super_copy) - 1) { + btrfs_err(fs_info, "failed to writeback write-intent bitmaps"); + ret = -EIO; + } + kfree(found_devs); + return ret; +} /* * Return 0 if a valid write intent bitmap can be found. @@ -53,10 +185,11 @@ static int write_intent_load(struct btrfs_device *device, struct page *dst) return ret; } -static void write_intent_init(struct btrfs_fs_info *fs_info) +static int write_intent_init(struct btrfs_fs_info *fs_info) { struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; struct write_intent_super *wis; + int ret; ASSERT(ctrl); ASSERT(ctrl->page); @@ -75,7 +208,12 @@ static void write_intent_init(struct btrfs_fs_info *fs_info) wi_set_super_size(wis, WRITE_INTENT_BITMAPS_SIZE); wi_set_super_blocksize(wis, ctrl->blocksize); wi_set_super_nr_entries(wis, 0); - btrfs_info(fs_info, "creating new write intent bitmaps"); + + ret = write_intent_writeback(fs_info); + if (ret < 0) + return ret; + btrfs_info(fs_info, "new write intent bitmaps created"); + return 0; } int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) @@ -95,11 +233,14 @@ int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) return -ENOMEM; fs_info->wi_ctrl->page = alloc_page(GFP_NOFS); - if (!fs_info->wi_ctrl->page) { + fs_info->wi_ctrl->commit_page = alloc_page(GFP_NOFS); + if (!fs_info->wi_ctrl->page || !fs_info->wi_ctrl->commit_page) { ret = -ENOMEM; goto cleanup; } + spin_lock_init(&fs_info->wi_ctrl->lock); + /* * Go through every writeable device to find the highest event. * @@ -149,12 +290,15 @@ int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) } /* No valid bitmap found, create a new one. */ - write_intent_init(fs_info); - return 0; + ret = write_intent_init(fs_info); + + return ret; cleanup: if (fs_info->wi_ctrl) { if (fs_info->wi_ctrl->page) __free_page(fs_info->wi_ctrl->page); + if (fs_info->wi_ctrl->commit_page) + __free_page(fs_info->wi_ctrl->commit_page); kfree(fs_info->wi_ctrl); fs_info->wi_ctrl = NULL; } diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 2c5cd434e978..797e57aef0e1 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -111,9 +111,15 @@ struct write_intent_ctrl { /* For the write_intent super and entries. */ struct page *page; + /* A copy for writeback. */ + struct page *commit_page; + /* Cached event counter.*/ atomic64_t event; + /* Lock for reading/writing above @page. */ + spinlock_t lock; + /* Cached blocksize from write intent super. */ u32 blocksize; }; From patchwork Thu Jul 7 05:32:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01DDCC433EF for ; Thu, 7 Jul 2022 05:33:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234933AbiGGFdH (ORCPT ); Thu, 7 Jul 2022 01:33:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234919AbiGGFdF (ORCPT ); Thu, 7 Jul 2022 01:33:05 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 996E03134C for ; Wed, 6 Jul 2022 22:33:04 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 53C461F97B for ; Thu, 7 Jul 2022 05:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171983; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vj+zlSA6fRiXKqrpj6UJkGYwv59YFjMTJm+laW019zo=; b=ZXSxiBSbYLlG8HBTfzDvadSat2E5mOoFSjLH6txRr39L7wXDpFNnkq/cn++IVXB0h7lNuS gG684dhUAxWlsiDjUQbsE3xggAMAyh3hsW/mXa4czEDC2giYi+wlOlWYjGvECotGs+C1Rl l0ksfKxEJpjbxwG4yK1QlYXNrtxfJgA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A13B013488 for ; Thu, 7 Jul 2022 05:33:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0OYeGw5wxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:02 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 06/12] btrfs: write-intent: introduce an internal helper to set bits for a range. Date: Thu, 7 Jul 2022 13:32:31 +0800 Message-Id: <1574950e8caee003d1682ca6a9c6c85142cef5bd.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Signed-off-by: Qu Wenruo Reported-by: kernel test robot Reported-by: kernel test robot --- fs/btrfs/write-intent.c | 246 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/write-intent.h | 19 ++++ 2 files changed, 265 insertions(+) diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index d1c5e8e206ba..eaf6d010462e 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -216,6 +216,252 @@ static int write_intent_init(struct btrfs_fs_info *fs_info) return 0; } +static struct write_intent_entry *write_intent_entry_nr( + struct write_intent_ctrl *ctrl, int nr) +{ + + ASSERT(nr < WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES); + return (page_address(ctrl->page) + + sizeof(struct write_intent_super) + + nr * sizeof(struct write_intent_entry)); +} + +/* + * Return <0 if the bytenr is before the given entry. + * Return 0 if the bytenr is inside the given entry. + * Return >0 if the bytenr is after the given entry. + */ +static int compare_bytenr_to_range(u64 bytenr, u64 start, u32 len) +{ + if (bytenr < start) + return -1; + if (start <= bytenr && bytenr < start + len) + return 0; + return 1; +} + +/* + * Move all non-empty entries starting from @nr, to the right, and make room + * for @nr_new entries. + * Those new entries will be all zero filled. + * + * Caller should ensure we have enough room for @nr_new new entries. + */ +static void move_entries_right(struct write_intent_ctrl *ctrl, int nr, + int nr_new) +{ + struct write_intent_super *wis = page_address(ctrl->page); + int move_size; + + ASSERT(nr_new > 0); + ASSERT(wi_super_nr_entries(wis) + nr_new <= + WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES); + + move_size = (wi_super_nr_entries(wis) - nr) * + sizeof(struct write_intent_entry); + + memmove(write_intent_entry_nr(ctrl, nr + nr_new), + write_intent_entry_nr(ctrl, nr), move_size); + memset(write_intent_entry_nr(ctrl, nr), 0, + nr_new * sizeof(struct write_intent_entry)); + wi_set_super_nr_entries(wis, wi_super_nr_entries(wis) + nr_new); +} + +static void set_bits_in_one_entry(struct write_intent_ctrl *ctrl, + struct write_intent_entry *entry, + u64 bytenr, u32 len) +{ + const u64 entry_start = wi_entry_bytenr(entry); + const u32 entry_len = write_intent_entry_size(ctrl); + unsigned long bitmaps[WRITE_INTENT_BITS_PER_ENTRY / BITS_PER_LONG]; + + wie_get_bitmap(entry, bitmaps); + + ASSERT(entry_start <= bytenr && bytenr + len <= entry_start + entry_len); + bitmap_set(bitmaps, (bytenr - entry_start) / ctrl->blocksize, + len / ctrl->blocksize); + wie_set_bitmap(entry, bitmaps); +} + +/* + * Insert new entries for the range [@bytenr, @bytenr + @len) at slot @nr + * and fill the new entries with proper bytenr and bitmaps. + */ +static void insert_new_entries(struct write_intent_ctrl *ctrl, int nr, + u64 bytenr, u32 len) +{ + const u32 entry_size = write_intent_entry_size(ctrl); + u64 entry_start; + u64 new_start = round_down(bytenr, entry_size); + u64 new_end; + int nr_new_entries; + u64 cur; + + if (nr >= wi_super_nr_entries(page_address(ctrl->page)) || + nr >= WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES) + entry_start = U64_MAX; + else + entry_start = wi_entry_bytenr(write_intent_entry_nr(ctrl, nr)); + + ASSERT(bytenr < entry_start); + + new_end = min(entry_start, round_up(bytenr + len, entry_size)); + nr_new_entries = (new_end - new_start) / entry_size; + + if (nr_new_entries == 0) + return; + + move_entries_right(ctrl, nr, nr_new_entries); + + for (cur = new_start; cur < new_end; cur += entry_size, nr++) { + struct write_intent_entry *entry = + write_intent_entry_nr(ctrl, nr); + u64 range_start = max(cur, bytenr); + u64 range_len = min(cur + entry_size, bytenr + len) - + range_start; + + /* Fill the bytenr into the new empty entries.*/ + wi_set_entry_bytenr(entry, cur); + + /* And set the bitmap. */ + set_bits_in_one_entry(ctrl, entry, range_start, range_len); + } +} + +/* + * This should be only called when we have enough room in the bitmaps, and hold + * the wi_ctrl->lock. + * + * This function is only exported for selftests, which doesn't need to hold any + * lock. + */ +void write_intent_set_bits(struct write_intent_ctrl *ctrl, u64 bytenr, u32 len) +{ + struct write_intent_super *wis = page_address(ctrl->page); + const u32 entry_size = write_intent_entry_size(ctrl); + int i; + u64 nr_entries = wi_super_nr_entries(wis); + u64 cur_bytenr; + + /* + * Currently we only accept full stripe length, which should be + * aligned to 64KiB. + */ + ASSERT(IS_ALIGNED(len, BTRFS_STRIPE_LEN)); + + /* + * We should have room to contain the worst case scenario, in which we + * need to create one or more new entry. + */ + ASSERT(nr_entries + bytes_to_entries(bytenr, len, BTRFS_STRIPE_LEN) <= + WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES); + + /* + * Iterate through the existing entries to insert new entries or set + * bits in the existing ones. + */ + for (i = 0, cur_bytenr = bytenr; + i < wi_super_nr_entries(wis) && cur_bytenr < bytenr + len; i++) { + struct write_intent_entry *entry = write_intent_entry_nr(ctrl, i); + u64 entry_start = wi_entry_bytenr(entry); + u64 entry_end = entry_start + entry_size; + + /* + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * We need to insert one or more new entries for the range not + * covered by the existing entry. + */ + if (compare_bytenr_to_range(cur_bytenr, entry_start, + entry_size) < 0) { + u64 new_range_end; + + new_range_end = min(entry_start, bytenr + len); + insert_new_entries(ctrl, i, cur_bytenr, + new_range_end - cur_bytenr); + + cur_bytenr = new_range_end; + continue; + } + /* + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-------- entry ------->| + * |<- bytenr/len ->| + * + * In this case, we just set the bitmap in the current entry, and + * advance @cur_bytenr to the end of the existing entry. + * By this, we either go check the range against the next entry, + * or we finish our current range. + */ + if (compare_bytenr_to_range(cur_bytenr, entry_start, + entry_size) == 0) { + u64 range_end = min(entry_end, bytenr + len); + + set_bits_in_one_entry(ctrl, entry, cur_bytenr, + range_end - cur_bytenr); + cur_bytenr = range_end; + continue; + } + + /* + * (A) + * |<-- entry -->| |<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (B) + * |<-- entry -->| |<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (C) + * |<-- entry -->|<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (D) + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * For all above cases, we just skip to the next entry. + * + * For case (A) and (B), we will insert new entries before + * the next one at the next loop. + * + * For case (C), we just do the regular set bits. + * Thus case (A) ~ (C) are all handled properly. + * + * For case (D), we will handle it after the loop. + */ + } + + /* + * We still have some range not handled after the existing entries, + * just insert new entries. + */ + if (cur_bytenr < bytenr + len) + insert_new_entries(ctrl, i, cur_bytenr, + bytenr + len - cur_bytenr); +} + int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) { struct btrfs_device *highest_dev = NULL; diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 797e57aef0e1..707ccf73e13a 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -106,6 +106,15 @@ struct write_intent_entry { /* The number of bits we can have in one entry. */ #define WRITE_INTENT_BITS_PER_ENTRY (64) +static inline u32 bytes_to_entries(u64 start, u32 length, u32 blocksize) +{ + u32 entry_len = blocksize * WRITE_INTENT_BITS_PER_ENTRY; + u64 entry_start = round_down(start, entry_len); + u64 entry_end = round_up(start + length, entry_len); + + return DIV_ROUND_UP((u32)(entry_end - entry_start), entry_len); +} + /* In-memory write-intent control structure. */ struct write_intent_ctrl { /* For the write_intent super and entries. */ @@ -189,6 +198,13 @@ WRITE_INTENT_SETGET_FUNCS(super_csum_type, struct write_intent_super, csum_type, 16); WRITE_INTENT_SETGET_FUNCS(entry_bytenr, struct write_intent_entry, bytenr, 64); +static inline u32 write_intent_entry_size(struct write_intent_ctrl *ctrl) +{ + struct write_intent_super *wis = page_address(ctrl->page); + + return wi_super_blocksize(wis) * WRITE_INTENT_BITS_PER_ENTRY; +} + static inline void wie_get_bitmap(struct write_intent_entry *entry, unsigned long *bitmap) { @@ -214,6 +230,9 @@ static inline void wie_set_bitmap(struct write_intent_entry *entry, #endif } +/* This function is only exported for selftests. */ +void write_intent_set_bits(struct write_intent_ctrl *ctrl, u64 bytenr, u32 len); + int btrfs_write_intent_init(struct btrfs_fs_info *fs_info); void btrfs_write_intent_free(struct btrfs_fs_info *fs_info); From patchwork Thu Jul 7 05:32:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909066 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E940CCA479 for ; Thu, 7 Jul 2022 05:33:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234939AbiGGFdH (ORCPT ); Thu, 7 Jul 2022 01:33:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234905AbiGGFdG (ORCPT ); Thu, 7 Jul 2022 01:33:06 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BB763134F for ; Wed, 6 Jul 2022 22:33:06 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B80AE21D94 for ; Thu, 7 Jul 2022 05:33:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171984; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oIuOh+Ik7bfe6hfSa2OmyJ1dpKtXOE7wMKvobZhYwD8=; b=fsnwCDJFitfUEVfEy8Hp72j1YokebgfNT09NWdk+JwOeMO/wxvCkb9QIOM+9TWYYTHBdxs OzxZngqIeXDVTG3+poXtiOWQ0fqk6vbgW8S8deAYNDWvNdERKWHRtAoknjlDrzpIBhpNyp ErlTiFck9pahFSdB5axYUTA1dwYNEcU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A86D513488 for ; Thu, 7 Jul 2022 05:33:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id cC31HA9wxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:03 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 07/12] btrfs: write-intent: introduce an internal helper to clear bits for a range. Date: Thu, 7 Jul 2022 13:32:32 +0800 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This new helper. write_intent_clear_bits(), is much simpler than the set bits counter part. As if we can not find a entry for our target range, then it must be something wrong, and we only need to warn and skip to next entry. Although there has one extra thing to do, if we have emptied one entry, we have to delete that empty entry. Signed-off-by: Qu Wenruo --- fs/btrfs/write-intent.c | 172 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/write-intent.h | 4 +- 2 files changed, 175 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index eaf6d010462e..8a69bf39a994 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -283,6 +283,36 @@ static void set_bits_in_one_entry(struct write_intent_ctrl *ctrl, wie_set_bitmap(entry, bitmaps); } +static bool is_entry_empty(struct write_intent_ctrl *ctrl, + struct write_intent_entry *entry) +{ + unsigned long bitmaps[2]; + + wie_get_bitmap(entry, bitmaps); + + return bitmap_empty(bitmaps, WRITE_INTENT_BITS_PER_ENTRY); +} + +/* + * NOTE: This function may clear all bits of an entry, caller must check if + * that's the case, and delete the empty entry if needed. + */ +static void clear_bits_in_one_entry(struct write_intent_ctrl *ctrl, + struct write_intent_entry *entry, + u64 bytenr, u32 len) +{ + const u64 entry_start = wi_entry_bytenr(entry); + const u32 entry_len = write_intent_entry_size(ctrl); + unsigned long bitmaps[WRITE_INTENT_BITS_PER_ENTRY / BITS_PER_LONG]; + + wie_get_bitmap(entry, bitmaps); + + ASSERT(entry_start <= bytenr && bytenr + len <= entry_start + entry_len); + bitmap_clear(bitmaps, (bytenr - entry_start) / ctrl->blocksize, + len / ctrl->blocksize); + wie_set_bitmap(entry, bitmaps); +} + /* * Insert new entries for the range [@bytenr, @bytenr + @len) at slot @nr * and fill the new entries with proper bytenr and bitmaps. @@ -328,6 +358,25 @@ static void insert_new_entries(struct write_intent_ctrl *ctrl, int nr, } } +static void delete_one_entry(struct write_intent_ctrl *ctrl, int nr) +{ + struct write_intent_super *wis = page_address(ctrl->page); + int cur_nr_entries = wi_super_nr_entries(wis); + + ASSERT(is_entry_empty(ctrl, write_intent_entry_nr(ctrl, nr))); + ASSERT(nr < cur_nr_entries); + + /* Move all the entries after slot @nr by one slot. */ + memmove(write_intent_entry_nr(ctrl, nr), + write_intent_entry_nr(ctrl, nr + 1), + (cur_nr_entries - nr - 1) * sizeof(struct write_intent_entry)); + + /* Memzero the right most entry. */ + memset(write_intent_entry_nr(ctrl, cur_nr_entries - 1), 0, + sizeof(struct write_intent_entry)); + wi_set_super_nr_entries(wis, cur_nr_entries - 1); +} + /* * This should be only called when we have enough room in the bitmaps, and hold * the wi_ctrl->lock. @@ -462,6 +511,129 @@ void write_intent_set_bits(struct write_intent_ctrl *ctrl, u64 bytenr, u32 len) bytenr + len - cur_bytenr); } +/* This should be only called with wi_ctrl->lock hold, except for selftests. */ +void write_intent_clear_bits(struct write_intent_ctrl *ctrl, u64 bytenr, + u32 len) +{ + struct write_intent_super *wis = page_address(ctrl->page); + const u32 entry_size = write_intent_entry_size(ctrl); + int i; + u64 cur_bytenr; + + /* + * Currently we only accept full stripe length, which should be + * aligned to 64KiB. + */ + ASSERT(IS_ALIGNED(len, BTRFS_STRIPE_LEN)); + + /* + * Iterate through the existing entries to delete entries or clear + * bits in the existing ones. + */ + for (i = 0, cur_bytenr = bytenr; + i < wi_super_nr_entries(wis) && cur_bytenr < bytenr + len; i++) { + struct write_intent_entry *entry = write_intent_entry_nr(ctrl, i); + u64 entry_start = wi_entry_bytenr(entry); + u64 entry_end = entry_start + entry_size; + + /* + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * This case should not happen, it means we have some logged + * dirty range, but it's no longer there. + * Just warn and skip to the next covered range. + */ + if (compare_bytenr_to_range(cur_bytenr, entry_start, entry_size) < 0) { + WARN_ON_ONCE(1); + cur_bytenr = min(bytenr + len, entry_start); + continue; + } + + /* + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * Or + * + * |<-------- entry ------->| + * |<- bytenr/len ->| + * + * In this case, we just clear the bitmap in current entry, and + * advance @cur_bytenr. + * By this, we either go check the range against the next entry, + * or we finish our current range. + */ + if (compare_bytenr_to_range(cur_bytenr, entry_start, entry_size) == 0) { + u64 range_end = min(entry_end, bytenr + len); + + clear_bits_in_one_entry(ctrl, entry, cur_bytenr, + range_end - cur_bytenr); + cur_bytenr = range_end; + /* + * If the current entry is empty, we need to delete the + * entry and decrease @nr. + */ + if (is_entry_empty(ctrl, entry)) { + delete_one_entry(ctrl, i); + i--; + } + continue; + } + + /* + * (A) + * |<-- entry -->| |<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (B) + * |<-- entry -->| |<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (C) + * |<-- entry -->|<--- next -->| + * |<-- bytenr/len -->| + * + * OR + * + * (D) + * |<-- entry -->| + * |<-- bytenr/len -->| + * + * For all above cases, we just skip to the next entry. + * + * For case (A) and (B), we will trigger wanring as we + * don't have expected entries to clear. + * + * For case (C), we just do the regular clear bits. + * Thus case (A) ~ (C) are all handled properly. + * + * For case (D), we will handle it after the loop. + */ + } + /* + * There is some range not handled, and there is no more entries, + * another unexpected case. + * Just do warning. + */ + if (cur_bytenr < bytenr + len) + WARN_ON_ONCE(1); +} + int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) { struct btrfs_device *highest_dev = NULL; diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 707ccf73e13a..0da1b7421590 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -230,8 +230,10 @@ static inline void wie_set_bitmap(struct write_intent_entry *entry, #endif } -/* This function is only exported for selftests. */ +/* These two functions are only exported for selftests. */ void write_intent_set_bits(struct write_intent_ctrl *ctrl, u64 bytenr, u32 len); +void write_intent_clear_bits(struct write_intent_ctrl *ctrl, u64 bytenr, + u32 len); int btrfs_write_intent_init(struct btrfs_fs_info *fs_info); void btrfs_write_intent_free(struct btrfs_fs_info *fs_info); From patchwork Thu Jul 7 05:32:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14424C43334 for ; Thu, 7 Jul 2022 05:33:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234940AbiGGFdJ (ORCPT ); Thu, 7 Jul 2022 01:33:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234905AbiGGFdI (ORCPT ); Thu, 7 Jul 2022 01:33:08 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E3783134C for ; Wed, 6 Jul 2022 22:33:07 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C056C21EDA for ; Thu, 7 Jul 2022 05:33:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171985; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9ixV9ZO8TtEoDFyIMcivCPdUEX0tcGN5Egu7vdgIsTc=; b=NesgbPmY2NKmrCxyqBIB8ead6QnS3f/tNw5S9ZtKizWf6OGrDJRGjwCGDSh3h+4wnnn05g pzW9+BLNU07pO8j3HNKTDBTEgpzltzZ7qA4HRhRDHMjWaG3CenG4mwJZi3tpsIAqvt/GOO hhZk2T3MeUl3KPZsrPX7EDAjSA6l6Ck= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 22CE113488 for ; Thu, 7 Jul 2022 05:33:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id cMz0NxBwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/12] btrfs: selftests: add selftests for write-intent bitmaps Date: Thu, 7 Jul 2022 13:32:33 +0800 Message-Id: <71df1f42e026a2da1e24de5f961175b669f47935.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org It turns out such sparse bitmap still has a lot of things to go wrong, definitely needs some tests to cover all the different corner cases. Signed-off-by: Qu Wenruo --- fs/btrfs/Makefile | 3 +- fs/btrfs/tests/btrfs-tests.c | 4 + fs/btrfs/tests/btrfs-tests.h | 2 + fs/btrfs/tests/write-intent-bitmaps-tests.c | 245 ++++++++++++++++++++ fs/btrfs/write-intent.c | 10 - fs/btrfs/write-intent.h | 12 + 6 files changed, 265 insertions(+), 11 deletions(-) create mode 100644 fs/btrfs/tests/write-intent-bitmaps-tests.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index af93119d52e2..a9658782d363 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -42,4 +42,5 @@ btrfs-$(CONFIG_FS_VERITY) += verity.o btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \ tests/extent-buffer-tests.o tests/btrfs-tests.o \ tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \ - tests/free-space-tree-tests.o tests/extent-map-tests.o + tests/free-space-tree-tests.o tests/extent-map-tests.o \ + tests/write-intent-bitmaps-tests.o diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index 1591bfa55bcc..27bb34acb156 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -27,6 +27,7 @@ const char *test_error[] = { [TEST_ALLOC_INODE] = "cannot allocate inode", [TEST_ALLOC_BLOCK_GROUP] = "cannot allocate block group", [TEST_ALLOC_EXTENT_MAP] = "cannot allocate extent map", + [TEST_ALLOC_WRITE_INTENT_CTRL] = "cannot allocate write intent control", }; static const struct super_operations btrfs_test_super_ops = { @@ -279,6 +280,9 @@ int btrfs_run_sanity_tests(void) } } ret = btrfs_test_extent_map(); + if (ret) + goto out; + ret = btrfs_test_write_intent_bitmaps(); out: btrfs_destroy_test_fs(); diff --git a/fs/btrfs/tests/btrfs-tests.h b/fs/btrfs/tests/btrfs-tests.h index 7a2d7ffbe30e..1845bfb6465d 100644 --- a/fs/btrfs/tests/btrfs-tests.h +++ b/fs/btrfs/tests/btrfs-tests.h @@ -23,6 +23,7 @@ enum { TEST_ALLOC_INODE, TEST_ALLOC_BLOCK_GROUP, TEST_ALLOC_EXTENT_MAP, + TEST_ALLOC_WRITE_INTENT_CTRL, }; extern const char *test_error[]; @@ -37,6 +38,7 @@ int btrfs_test_inodes(u32 sectorsize, u32 nodesize); int btrfs_test_qgroups(u32 sectorsize, u32 nodesize); int btrfs_test_free_space_tree(u32 sectorsize, u32 nodesize); int btrfs_test_extent_map(void); +int btrfs_test_write_intent_bitmaps(void); struct inode *btrfs_new_test_inode(void); struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize); void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/tests/write-intent-bitmaps-tests.c b/fs/btrfs/tests/write-intent-bitmaps-tests.c new file mode 100644 index 000000000000..c956acaea1c7 --- /dev/null +++ b/fs/btrfs/tests/write-intent-bitmaps-tests.c @@ -0,0 +1,245 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "../ctree.h" +#include "../volumes.h" +#include "../write-intent.h" +#include "btrfs-tests.h" + +static struct write_intent_ctrl *alloc_dummy_ctrl(void) +{ + struct write_intent_ctrl *ctrl; + struct write_intent_super *wis; + + ctrl = kzalloc(sizeof(*ctrl), GFP_NOFS); + if (!ctrl) + return NULL; + /* + * For dummy tests, we only need the primary page, no need for the + * commit page. + */ + ctrl->page = alloc_page(GFP_NOFS); + if (!ctrl->page) { + kfree(ctrl); + return NULL; + } + ctrl->blocksize = BTRFS_STRIPE_LEN; + atomic64_set(&ctrl->event, 1); + spin_lock_init(&ctrl->lock); + memzero_page(ctrl->page, 0, WRITE_INTENT_BITMAPS_SIZE); + wis = page_address(ctrl->page); + wi_set_super_magic(wis, WRITE_INTENT_SUPER_MAGIC); + wi_set_super_csum_type(wis, 0); + wi_set_super_events(wis, 1); + wi_set_super_flags(wis, WRITE_INTENT_FLAGS_SUPPORTED); + wi_set_super_size(wis, WRITE_INTENT_BITMAPS_SIZE); + wi_set_super_blocksize(wis, ctrl->blocksize); + wi_set_super_nr_entries(wis, 0); + return ctrl; +} + +static void zero_bitmaps(struct write_intent_ctrl *ctrl) +{ + struct write_intent_super *wis = page_address(ctrl->page); + + memzero_page(ctrl->page, sizeof(struct write_intent_super), + WRITE_INTENT_BITMAPS_SIZE - + sizeof(struct write_intent_super)); + wi_set_super_nr_entries(wis, 0); +} + +static int compare_bitmaps(struct write_intent_ctrl *ctrl, + int nr_entries, u64 *bytenrs, u64 *bitmaps) +{ + struct write_intent_super *wis = page_address(ctrl->page); + struct write_intent_entry empty = {0}; + int i; + + if (wi_super_nr_entries(wis) != nr_entries) { + test_err("nr entries mismatch, has %llu expect %u", + wi_super_nr_entries(wis), nr_entries); + goto err; + } + + for (i = 0; i < nr_entries; i++) { + struct write_intent_entry *entry = + write_intent_entry_nr(ctrl, i); + + if (wi_entry_bytenr(entry) != bytenrs[i]) { + test_err("bytenr mismatch, has %llu expect %llu", + wi_entry_bytenr(entry), bytenrs[i]); + goto err; + } + if (wi_entry_raw_bitmap(entry) != bitmaps[i]) { + test_err("bitmap mismatch, has 0x%016llx expect 0x%016llx", + wi_entry_raw_bitmap(entry), bitmaps[i]); + goto err; + } + } + + /* The unused entries should all be zero. */ + for (i = nr_entries; i < WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES; + i++) { + if (memcmp(write_intent_entry_nr(ctrl, i), &empty, + sizeof(empty))) { + test_err( + "unused entry is not empty, entry %u nr_entries %u", + i, nr_entries); + goto err; + } + } + return 0; +err: + /* Dump the bitmaps for better debugging. */ + test_err("dumping bitmaps, nr_entries=%llu:", wi_super_nr_entries(wis)); + for (i = 0; i < wi_super_nr_entries(wis); i++) { + struct write_intent_entry *entry = + write_intent_entry_nr(ctrl, i); + + test_err(" entry=%u bytenr=%llu bitmap=0x%016llx\n", + i, wi_entry_bytenr(entry), wi_entry_raw_bitmap(entry)); + } + return -EUCLEAN; +} + +static void free_dummy_ctrl(struct write_intent_ctrl *ctrl) +{ + __free_page(ctrl->page); + ASSERT(ctrl->commit_page == NULL); + kfree(ctrl); +} + +/* + * Basic tests to ensure set and clear can properly handle bits set/clear in + * one entry. + */ +static int test_case_simple_entry(struct write_intent_ctrl *ctrl) +{ + const u32 blocksize = BTRFS_STRIPE_LEN; + u64 bitmaps[1] = { 0 }; + u64 bytenrs[1] = { 0 }; + int ret; + + zero_bitmaps(ctrl); + + write_intent_set_bits(ctrl, 0, blocksize * 3); + + bitmaps[0] = 0x7; + bytenrs[0] = 0; + ret = compare_bitmaps(ctrl, 1, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_clear_bits(ctrl, 0, blocksize * 3); + ret = compare_bitmaps(ctrl, 0, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_set_bits(ctrl, blocksize * 8, blocksize * 3); + + bitmaps[0] = 0x700; + bytenrs[0] = 0; + ret = compare_bitmaps(ctrl, 1, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_clear_bits(ctrl, blocksize * 9, blocksize * 2); + bitmaps[0] = 0x100; + bytenrs[0] = 0; + ret = compare_bitmaps(ctrl, 1, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_clear_bits(ctrl, blocksize * 8, blocksize * 1); + ret = compare_bitmaps(ctrl, 0, bytenrs, bitmaps); + if (ret < 0) + return ret; + + /* Tests at high bits. */ + write_intent_set_bits(ctrl, blocksize * 61, blocksize * 3); + bitmaps[0] = 0xe000000000000000L; + bytenrs[0] = 0; + ret = compare_bitmaps(ctrl, 1, bytenrs, bitmaps); + if (ret < 0) + return ret; + write_intent_clear_bits(ctrl, blocksize * 61, blocksize * 1); + bitmaps[0] = 0xc000000000000000L; + bytenrs[0] = 0; + ret = compare_bitmaps(ctrl, 1, bytenrs, bitmaps); + if (ret < 0) + return ret; + write_intent_clear_bits(ctrl, blocksize * 62, blocksize * 2); + ret = compare_bitmaps(ctrl, 0, bytenrs, bitmaps); + if (ret < 0) + return ret; + return 0; +} + +/* Tests set/clear that cross entry boundaries. */ +static int test_case_cross_entries(struct write_intent_ctrl *ctrl) +{ + const u32 blocksize = BTRFS_STRIPE_LEN; + u64 bitmaps[3] = { 0 }; + u64 bytenrs[3] = { 0 }; + int ret; + + zero_bitmaps(ctrl); + + write_intent_set_bits(ctrl, blocksize * 32, blocksize * 64); + bitmaps[0] = 0xffffffff00000000L; + bytenrs[0] = 0; + bitmaps[1] = 0x00000000ffffffffL; + bytenrs[1] = 4194304; + ret = compare_bitmaps(ctrl, 2, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_set_bits(ctrl, blocksize * 96, blocksize * 64); + bitmaps[0] = 0xffffffff00000000L; + bytenrs[0] = 0; + bitmaps[1] = 0xffffffffffffffffL; + bytenrs[1] = 4194304; + bitmaps[2] = 0x00000000ffffffffL; + bytenrs[2] = 8388608; + ret = compare_bitmaps(ctrl, 3, bytenrs, bitmaps); + if (ret < 0) + return ret; + + write_intent_clear_bits(ctrl, blocksize * 33, blocksize * 126); + bitmaps[0] = 0x0000000100000000L; + bytenrs[0] = 0; + bitmaps[1] = 0x0000000080000000L; + bytenrs[1] = 8388608; + ret = compare_bitmaps(ctrl, 2, bytenrs, bitmaps); + if (ret < 0) + return ret; + + return 0; +} + +int btrfs_test_write_intent_bitmaps(void) +{ + struct write_intent_ctrl *ctrl; + int ret; + + ctrl = alloc_dummy_ctrl(); + if (!ctrl) { + test_std_err(TEST_ALLOC_WRITE_INTENT_CTRL); + return -ENOMEM; + } + test_msg("running extent_map tests"); + + ret = test_case_simple_entry(ctrl); + if (ret < 0) { + test_err("failed set/clear tests in one simple entry"); + goto out; + } + + ret = test_case_cross_entries(ctrl); + if (ret < 0) { + test_err("failed set/clear tests across entry boundaries"); + goto out; + } +out: + free_dummy_ctrl(ctrl); + return ret; +} diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index 8a69bf39a994..b4a205cb0c88 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -216,16 +216,6 @@ static int write_intent_init(struct btrfs_fs_info *fs_info) return 0; } -static struct write_intent_entry *write_intent_entry_nr( - struct write_intent_ctrl *ctrl, int nr) -{ - - ASSERT(nr < WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES); - return (page_address(ctrl->page) + - sizeof(struct write_intent_super) + - nr * sizeof(struct write_intent_entry)); -} - /* * Return <0 if the bytenr is before the given entry. * Return 0 if the bytenr is inside the given entry. diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 0da1b7421590..00eb116e4c38 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -197,6 +197,8 @@ WRITE_INTENT_SETGET_FUNCS(super_blocksize, struct write_intent_super, WRITE_INTENT_SETGET_FUNCS(super_csum_type, struct write_intent_super, csum_type, 16); WRITE_INTENT_SETGET_FUNCS(entry_bytenr, struct write_intent_entry, bytenr, 64); +WRITE_INTENT_SETGET_FUNCS(entry_raw_bitmap, struct write_intent_entry, + bitmap, 64); static inline u32 write_intent_entry_size(struct write_intent_ctrl *ctrl) { @@ -205,6 +207,16 @@ static inline u32 write_intent_entry_size(struct write_intent_ctrl *ctrl) return wi_super_blocksize(wis) * WRITE_INTENT_BITS_PER_ENTRY; } +static inline struct write_intent_entry *write_intent_entry_nr( + struct write_intent_ctrl *ctrl, int nr) +{ + + ASSERT(nr < WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES); + return (page_address(ctrl->page) + + sizeof(struct write_intent_super) + + nr * sizeof(struct write_intent_entry)); +} + static inline void wie_get_bitmap(struct write_intent_entry *entry, unsigned long *bitmap) { From patchwork Thu Jul 7 05:32:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96E71C433EF for ; Thu, 7 Jul 2022 05:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234969AbiGGFdL (ORCPT ); Thu, 7 Jul 2022 01:33:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233366AbiGGFdJ (ORCPT ); Thu, 7 Jul 2022 01:33:09 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15FE931345 for ; Wed, 6 Jul 2022 22:33:08 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C709221F08 for ; Thu, 7 Jul 2022 05:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171986; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GzjKHXwZrxhu5vz/p/18q8oKYd23ypFK+Gn2fOAhUxI=; b=ZScvjZoJWpo/4F0T8v8+WxD7969tsOD8gXPtCTZFmsrMUBTZtCSbx0nXKGqz+Ebyfoos9x K0ZS4unisV5FdVEGviSXrxqWGGDnIHK8+IaetJdSPBiOZb9vhNBzBKaswj3fEMib330S7E pyZXoc3mHUUT6XCwiHBK30XQholGvQM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2AB0313488 for ; Thu, 7 Jul 2022 05:33:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id eALEORFwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:05 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/12] btrfs: write back write intent bitmap after barrier_all_devices() Date: Thu, 7 Jul 2022 13:32:34 +0800 Message-Id: <3cda79498dbdb3218a06dab469d8548b63b9694e.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new function, btrfs_write_intent_writeback(), will also accept a u64 parameter, @event, to do extra fastpath check to avoid unnecessary writeback. For now we just pass 0 to write the current bitmap to disk. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 9 +++ fs/btrfs/tests/write-intent-bitmaps-tests.c | 2 + fs/btrfs/write-intent.c | 79 ++++++++++++++++++++- fs/btrfs/write-intent.h | 25 +++++++ 4 files changed, 114 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index edbb21706bda..bd30decd38e4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4357,6 +4357,15 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) } } + ret = btrfs_write_intent_writeback(fs_info, 0); + if (ret < 0) { + mutex_unlock( + &fs_info->fs_devices->device_list_mutex); + btrfs_handle_fs_error(fs_info, ret, + "errors while writing back write intent bitmaps."); + return ret; + } + list_for_each_entry(dev, head, dev_list) { if (!dev->bdev) { total_errors++; diff --git a/fs/btrfs/tests/write-intent-bitmaps-tests.c b/fs/btrfs/tests/write-intent-bitmaps-tests.c index c956acaea1c7..1b81720f21d1 100644 --- a/fs/btrfs/tests/write-intent-bitmaps-tests.c +++ b/fs/btrfs/tests/write-intent-bitmaps-tests.c @@ -25,6 +25,8 @@ static struct write_intent_ctrl *alloc_dummy_ctrl(void) ctrl->blocksize = BTRFS_STRIPE_LEN; atomic64_set(&ctrl->event, 1); spin_lock_init(&ctrl->lock); + init_waitqueue_head(&ctrl->overflow_wait); + init_waitqueue_head(&ctrl->write_wait); memzero_page(ctrl->page, 0, WRITE_INTENT_BITMAPS_SIZE); wis = page_address(ctrl->page); wi_set_super_magic(wis, WRITE_INTENT_SUPER_MAGIC); diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index b4a205cb0c88..5e26813a06e0 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -76,8 +76,17 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) shash->tfm = fs_info->csum_shash; spin_lock(&ctrl->lock); - wis = page_address(ctrl->page); + /* No update on the bitmap, just skip this writeback. */ + if (!memcmp(page_address(ctrl->page), page_address(ctrl->commit_page), + WRITE_INTENT_BITMAPS_SIZE)) { + ctrl->writing_event = 0; + spin_unlock(&ctrl->lock); + wake_up(&ctrl->write_wait); + return 0; + } + + wis = page_address(ctrl->page); /* * Bump up the event counter each time this bitmap is going to be * written. @@ -127,6 +136,13 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) } wait_event(wb_ctrl.wait, atomic_read(&wb_ctrl.pending_bios) == 0); + spin_lock(&ctrl->lock); + if (ctrl->writing_event > ctrl->committed_event) + ctrl->committed_event = ctrl->writing_event; + ctrl->writing_event = 0; + spin_unlock(&ctrl->lock); + wake_up(&ctrl->write_wait); + if (total_errors + atomic_read(&wb_ctrl.errors) > btrfs_super_num_devices(fs_info->super_copy) - 1) { btrfs_err(fs_info, "failed to writeback write-intent bitmaps"); @@ -624,6 +640,63 @@ void write_intent_clear_bits(struct write_intent_ctrl *ctrl, u64 bytenr, WARN_ON_ONCE(1); } +int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + + if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) + return 0; + + ASSERT(ctrl); + +again: + spin_lock(&ctrl->lock); + + /* + * The bitmap has already been written to disk at least once. Our update + * has already reached disk. + */ + if (event && ctrl->committed_event > event) { + spin_unlock(&ctrl->lock); + return 0; + } + + /* Previous writing hasn't finished, wait for it and retry. */ + if (ctrl->writing_event && ctrl->writing_event <= event) { + DEFINE_WAIT(__wait); + + prepare_to_wait_event(&ctrl->write_wait, &__wait, + TASK_UNINTERRUPTIBLE); + spin_unlock(&ctrl->lock); + schedule(); + finish_wait(&ctrl->write_wait, &__wait); + goto again; + } + + /* Someone is already writing back the bitmap. */ + if (ctrl->writing_event) { + DEFINE_WAIT(__wait); + + ASSERT(ctrl->writing_event > event); + prepare_to_wait_event(&ctrl->write_wait, &__wait, + TASK_UNINTERRUPTIBLE); + spin_unlock(&ctrl->lock); + schedule(); + finish_wait(&ctrl->write_wait, &__wait); + return 0; + } + + /* + * We're the one to write back the bitmap, update @writing_event so + * all the other caller will just wait for us. + */ + ctrl->writing_event = atomic64_read(&ctrl->event) + 1; + spin_unlock(&ctrl->lock); + + /* Slow path, do the submission and wait. */ + return write_intent_writeback(fs_info); +} + int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) { struct btrfs_device *highest_dev = NULL; @@ -648,6 +721,8 @@ int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) } spin_lock_init(&fs_info->wi_ctrl->lock); + init_waitqueue_head(&fs_info->wi_ctrl->overflow_wait); + init_waitqueue_head(&fs_info->wi_ctrl->write_wait); /* * Go through every writeable device to find the highest event. @@ -688,6 +763,8 @@ int btrfs_write_intent_init(struct btrfs_fs_info *fs_info) dev->devid, ret); goto cleanup; } + memcpy_page(fs_info->wi_ctrl->commit_page, 0, + fs_info->wi_ctrl->page, 0, WRITE_INTENT_BITMAPS_SIZE); wis = page_address(fs_info->wi_ctrl->page); atomic64_set(&fs_info->wi_ctrl->event, wi_super_events(wis)); fs_info->wi_ctrl->blocksize = wi_super_blocksize(wis); diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 00eb116e4c38..bf84737e0706 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -123,12 +123,30 @@ struct write_intent_ctrl { /* A copy for writeback. */ struct page *commit_page; + /* + * For callers who has updated their bitmap, wait for the bitmap to be + * flushed to disk. + */ + wait_queue_head_t write_wait; + + /* + * For callers whose bits can not be updated as no enough space left + * int the bitmap. + */ + wait_queue_head_t overflow_wait; + /* Cached event counter.*/ atomic64_t event; /* Lock for reading/writing above @page. */ spinlock_t lock; + /* Event number for the bitmap being written. */ + u64 writing_event; + + /* Event number for the last committed bitmap. */ + u64 committed_event; + /* Cached blocksize from write intent super. */ u32 blocksize; }; @@ -250,4 +268,11 @@ void write_intent_clear_bits(struct write_intent_ctrl *ctrl, u64 bytenr, int btrfs_write_intent_init(struct btrfs_fs_info *fs_info); void btrfs_write_intent_free(struct btrfs_fs_info *fs_info); +/* + * Ensure the bitmap for @event is already written to disk. + * + * If @event is 0, it means to write current bitmap to disk. + */ +int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event); + #endif From patchwork Thu Jul 7 05:32:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DED1C43334 for ; Thu, 7 Jul 2022 05:33:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234765AbiGGFdM (ORCPT ); Thu, 7 Jul 2022 01:33:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234905AbiGGFdK (ORCPT ); Thu, 7 Jul 2022 01:33:10 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 802FD3134E for ; Wed, 6 Jul 2022 22:33:09 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3F11421D94 for ; Thu, 7 Jul 2022 05:33:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171988; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NXgDBrF40f2q77S+R2o6IqWa1GIhL4ttGgInpF3+0nM=; b=hvyPwmRx+Iu8fb0Sh9te/omaMszSlhq/VDLbWSXc1JCfBUcNkngav1xeRotERvtlXmEtVL Dlq0zILlMhxJyrGQYWXKRrvgEUnEfi0/3Ha6FTJBaaugN6Ph+DeADegmTjbw31R2wTWLCV IS23GMsZz6XYofXPfSzdqCDKBsTzq8I= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3347D13488 for ; Thu, 7 Jul 2022 05:33:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4EolABNwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:07 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/12] btrfs: update and writeback the write-intent bitmap for RAID56 write. Date: Thu, 7 Jul 2022 13:32:35 +0800 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This allows us to mark the write-intent bitmaps dirty for later recovery usage. For now, we only mark the bitmaps dirty but without really clearing them, this is going to cause problems (hang the fs if the bitmap is full), but this also allows us to debug the bitmap with the new dump-write-intent tool: csum_type 0 (crc32c) csum 0xad622029 [match] magic _wIbSb_Q [match] fsid 46bcd711-6c9b-400f-aaba-bf99aa3dd321 flags 0x7 ( TARGET_RAID56 | INTERNAL | BYTENR_LOGICAL ) events 10 size 4096 blocksize 65536 nr_entries 1 entry 0, bytenr 385875968, bitmap 0x000000000003fffc This is doing 1MiB write for logical 388104192, which matches the above output bitmap. Signed-off-by: Qu Wenruo --- fs/btrfs/raid56.c | 8 +++++++ fs/btrfs/write-intent.c | 46 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/write-intent.h | 9 ++++++++ 3 files changed, 63 insertions(+) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 7b2d2b6c8c61..9f18d6f6f4dc 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1185,6 +1185,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) bool has_qstripe; struct bio_list bio_list; struct bio *bio; + u64 event; int ret; bio_list_init(&bio_list); @@ -1338,6 +1339,13 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) atomic_set(&rbio->stripes_pending, bio_list_size(&bio_list)); BUG_ON(atomic_read(&rbio->stripes_pending) == 0); + /* Update the write intent bitmap before we start submitting bios. */ + btrfs_write_intent_mark_dirty(bioc->fs_info, rbio->bioc->raid_map[0], + rbio->nr_data * BTRFS_STRIPE_LEN, &event); + ret = btrfs_write_intent_writeback(bioc->fs_info, event); + + if (ret < 0) + goto cleanup; while ((bio = bio_list_pop(&bio_list))) { bio->bi_end_io = raid_write_end_io; diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index 5e26813a06e0..8520105d0d84 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -381,6 +381,9 @@ static void delete_one_entry(struct write_intent_ctrl *ctrl, int nr) memset(write_intent_entry_nr(ctrl, cur_nr_entries - 1), 0, sizeof(struct write_intent_entry)); wi_set_super_nr_entries(wis, cur_nr_entries - 1); + + /* We freed one entry, wake up who are waiting for the extra space. */ + wake_up(&ctrl->overflow_wait); } /* @@ -640,6 +643,49 @@ void write_intent_clear_bits(struct write_intent_ctrl *ctrl, u64 bytenr, WARN_ON_ONCE(1); } +void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, + u32 len, u64 *event_ret) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + struct write_intent_super *wis; + u32 entry_len; + int nr_entries; + + if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) + return; + + ASSERT(ctrl); + ASSERT(IS_ALIGNED(len, BTRFS_STRIPE_LEN)); + +again: + spin_lock(&ctrl->lock); + entry_len = ctrl->blocksize * WRITE_INTENT_BITS_PER_ENTRY; + nr_entries = (round_up(logical + len, entry_len) - + round_down(logical, entry_len)) / entry_len; + wis = page_address(ctrl->page); + + /* + * May not have enough space left. This calculation is definitely + * overkilled, but will ensure we have enough space for it. + */ + if (unlikely(wi_super_nr_entries(wis) + nr_entries) >= + WRITE_INTENT_INTERNAL_BITMAPS_MAX_ENTRIES) { + DEFINE_WAIT(__wait); + + prepare_to_wait_event(&ctrl->overflow_wait, &__wait, + TASK_UNINTERRUPTIBLE); + spin_unlock(&ctrl->lock); + schedule(); + finish_wait(&ctrl->write_wait, &__wait); + goto again; + } + + /* Update the bitmap. */ + write_intent_set_bits(ctrl, logical, len); + *event_ret = atomic64_read(&ctrl->event); + spin_unlock(&ctrl->lock); +} + int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) { struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index bf84737e0706..e1e3a16f8929 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -275,4 +275,13 @@ void btrfs_write_intent_free(struct btrfs_fs_info *fs_info); */ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event); +/* + * Mark the range dirty in write intent bitmaps. + * + * May (but unlikely) sleep if there is not enough free entries. + * In that case, we will wait for enough free entries to be released. + */ +void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, + u32 len, u64 *event_ret); + #endif From patchwork Thu Jul 7 05:32:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D784C433EF for ; Thu, 7 Jul 2022 05:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234905AbiGGFdN (ORCPT ); Thu, 7 Jul 2022 01:33:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234970AbiGGFdL (ORCPT ); Thu, 7 Jul 2022 01:33:11 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9078B31352 for ; Wed, 6 Jul 2022 22:33:10 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4720321EDA for ; Thu, 7 Jul 2022 05:33:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171989; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=id6jUpPgXB9PILuZBSrwDXYcOOLKyYJYqiGXOaoKYXQ=; b=P1C1wM8nTALQJqW7TdC+YVTosRPhCndBFPWDSAET2fuV/BJZJPOniM1ETs0ROdCd02N+S/ uqKz8REoSgZ2flBUzpONQmVnAooVcDUBBW8B+ZvEFvOuSwJh7KKRDlSWEl6BHAbN72fkK0 /u0zZrBrPI1p4j3GN+FRzLzzR7WRHfk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9EF3D13488 for ; Thu, 7 Jul 2022 05:33:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EN5+GhRwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:08 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 11/12] btrfs: raid56: clear write-intent bimaps when a full stripe finishes. Date: Thu, 7 Jul 2022 13:32:36 +0800 Message-Id: <8f1942505029d94a66310dfd3b189c11f7044e94.1657171615.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/raid56.c | 7 +++++++ fs/btrfs/write-intent.c | 45 +++++++++++++++++++++++++++++------------ fs/btrfs/write-intent.h | 8 ++++++++ 3 files changed, 47 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 9f18d6f6f4dc..f2b0ea2315c5 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -817,6 +817,13 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err) if (rbio->generic_bio_cnt) btrfs_bio_counter_sub(rbio->bioc->fs_info, rbio->generic_bio_cnt); + + /* Clear the write-intent bitmap range for write operation. */ + if (rbio->operation == BTRFS_RBIO_WRITE) + btrfs_write_intent_clear_dirty(rbio->bioc->fs_info, + rbio->bioc->raid_map[0], + rbio->nr_data * BTRFS_STRIPE_LEN); + /* * Clear the data bitmap, as the rbio may be cached for later usage. * do this before before unlock_stripe() so there will be no new bio diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index 8520105d0d84..40d579574f3d 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -61,6 +61,7 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) struct bitmap_writeback_contrl wb_ctrl = {0}; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); const int nr_devs_max = fs_info->fs_devices->open_devices + 4; + unsigned long flags; int nr_devs = 0; int total_errors = 0; int ret; @@ -75,13 +76,13 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) shash->tfm = fs_info->csum_shash; - spin_lock(&ctrl->lock); + spin_lock_irqsave(&ctrl->lock, flags); /* No update on the bitmap, just skip this writeback. */ if (!memcmp(page_address(ctrl->page), page_address(ctrl->commit_page), WRITE_INTENT_BITMAPS_SIZE)) { ctrl->writing_event = 0; - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); wake_up(&ctrl->write_wait); return 0; } @@ -98,7 +99,7 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) atomic64_inc(&ctrl->event); memcpy_page(ctrl->commit_page, 0, ctrl->page, 0, WRITE_INTENT_BITMAPS_SIZE); - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); init_waitqueue_head(&wb_ctrl.wait); atomic_set(&wb_ctrl.pending_bios, 0); @@ -136,11 +137,11 @@ static int write_intent_writeback(struct btrfs_fs_info *fs_info) } wait_event(wb_ctrl.wait, atomic_read(&wb_ctrl.pending_bios) == 0); - spin_lock(&ctrl->lock); + spin_lock_irqsave(&ctrl->lock, flags); if (ctrl->writing_event > ctrl->committed_event) ctrl->committed_event = ctrl->writing_event; ctrl->writing_event = 0; - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); wake_up(&ctrl->write_wait); if (total_errors + atomic_read(&wb_ctrl.errors) > @@ -648,6 +649,7 @@ void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, { struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; struct write_intent_super *wis; + unsigned long flags; u32 entry_len; int nr_entries; @@ -658,7 +660,7 @@ void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, ASSERT(IS_ALIGNED(len, BTRFS_STRIPE_LEN)); again: - spin_lock(&ctrl->lock); + spin_lock_irqsave(&ctrl->lock, flags); entry_len = ctrl->blocksize * WRITE_INTENT_BITS_PER_ENTRY; nr_entries = (round_up(logical + len, entry_len) - round_down(logical, entry_len)) / entry_len; @@ -674,7 +676,7 @@ void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, prepare_to_wait_event(&ctrl->overflow_wait, &__wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); schedule(); finish_wait(&ctrl->write_wait, &__wait); goto again; @@ -683,12 +685,29 @@ void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, /* Update the bitmap. */ write_intent_set_bits(ctrl, logical, len); *event_ret = atomic64_read(&ctrl->event); - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); +} + +void btrfs_write_intent_clear_dirty(struct btrfs_fs_info *fs_info, u64 logical, + u32 len) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + unsigned long flags; + + if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) + return; + ASSERT(ctrl); + ASSERT(IS_ALIGNED(len, BTRFS_STRIPE_LEN)); + + spin_lock_irqsave(&ctrl->lock, flags); + write_intent_clear_bits(ctrl, logical, len); + spin_unlock_irqrestore(&ctrl->lock, flags); } int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) { struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + unsigned long flags; if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) return 0; @@ -696,14 +715,14 @@ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) ASSERT(ctrl); again: - spin_lock(&ctrl->lock); + spin_lock_irqsave(&ctrl->lock, flags); /* * The bitmap has already been written to disk at least once. Our update * has already reached disk. */ if (event && ctrl->committed_event > event) { - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); return 0; } @@ -713,7 +732,7 @@ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) prepare_to_wait_event(&ctrl->write_wait, &__wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); schedule(); finish_wait(&ctrl->write_wait, &__wait); goto again; @@ -726,7 +745,7 @@ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) ASSERT(ctrl->writing_event > event); prepare_to_wait_event(&ctrl->write_wait, &__wait, TASK_UNINTERRUPTIBLE); - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); schedule(); finish_wait(&ctrl->write_wait, &__wait); return 0; @@ -737,7 +756,7 @@ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) * all the other caller will just wait for us. */ ctrl->writing_event = atomic64_read(&ctrl->event) + 1; - spin_unlock(&ctrl->lock); + spin_unlock_irqrestore(&ctrl->lock, flags); /* Slow path, do the submission and wait. */ return write_intent_writeback(fs_info); diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index e1e3a16f8929..872a707ef67d 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -284,4 +284,12 @@ int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event); void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, u32 len, u64 *event_ret); +/* + * Clear the range dirty in write intent bitmaps. + * + * This function should not sleep, and no need to wait for the bitmap to be + * flushed. + */ +void btrfs_write_intent_clear_dirty(struct btrfs_fs_info *fs_info, u64 logical, + u32 len); #endif From patchwork Thu Jul 7 05:32:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12909071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 665A4C43334 for ; Thu, 7 Jul 2022 05:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233947AbiGGFdO (ORCPT ); Thu, 7 Jul 2022 01:33:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234980AbiGGFdM (ORCPT ); Thu, 7 Jul 2022 01:33:12 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3FD831353 for ; Wed, 6 Jul 2022 22:33:11 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4E8A31F934 for ; Thu, 7 Jul 2022 05:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1657171990; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OrDNvaio6oSbKGc0RfBVqCuGovWAV4kyQ3LxuYtCkDA=; b=u+An6mEfuo1NRitXzN2XIhq5dIVzPBCwcyx2nx0LV2FkoFtWBCN6YF8ZtNh8BVN172orjC FUN5o2tMMh2AYLFSyLRd8PhV0zWGTZdUHijLVaeU94rtC46GVeQahK3dFhYNAUBJq4WLao MHYpQHz6GUhwAqrsX5yw8/2dhHEzWRU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A66C813488 for ; Thu, 7 Jul 2022 05:33:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CN95HBVwxmKcLAAAMHmgww (envelope-from ) for ; Thu, 07 Jul 2022 05:33:09 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/12] btrfs: warn and clear bitmaps if there is dirty bitmap at mount time Date: Thu, 7 Jul 2022 13:32:37 +0800 Message-Id: <8c94d4c4ec1a15896e4bbcd0e2f79d16934b7c81.1657171616.git.wqu@suse.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To properly cleanup the bitmaps, we need to scrub the logical ranges in the bitmaps. Unfortunately there is no such convient interface at all (scrub only works at device offset for now). So just introduce a place holder to warn and clear the bitmap. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 7 +++++++ fs/btrfs/write-intent.c | 35 +++++++++++++++++++++++++++++++++++ fs/btrfs/write-intent.h | 8 ++++++++ 3 files changed, 50 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index bd30decd38e4..ccc136023303 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3705,6 +3705,13 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device ret); goto fail_block_groups; } + + ret = btrfs_write_intent_recover(fs_info); + if (ret < 0) { + btrfs_err(fs_info, "failed to recover write-intent bitmap: %d", + ret); + goto fail_block_groups; + } ret = btrfs_recover_balance(fs_info); if (ret) { btrfs_err(fs_info, "failed to recover balance: %d", ret); diff --git a/fs/btrfs/write-intent.c b/fs/btrfs/write-intent.c index 40d579574f3d..82228713e621 100644 --- a/fs/btrfs/write-intent.c +++ b/fs/btrfs/write-intent.c @@ -704,6 +704,41 @@ void btrfs_write_intent_clear_dirty(struct btrfs_fs_info *fs_info, u64 logical, spin_unlock_irqrestore(&ctrl->lock, flags); } +int btrfs_write_intent_recover(struct btrfs_fs_info *fs_info) +{ + struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; + struct write_intent_super *wis; + int ret = 0; + + if (!btrfs_fs_compat_ro(fs_info, WRITE_INTENT_BITMAP)) + return ret; + + ASSERT(ctrl); + wis = page_address(ctrl->page); + + if (wi_super_nr_entries(wis) != 0) { + int i; + + btrfs_warn(fs_info, "dirty write intent bitmap found:"); + for (i = 0; i < wi_super_nr_entries(wis); i++) { + struct write_intent_entry *entry = + write_intent_entry_nr(ctrl, i); + + btrfs_warn(fs_info, + " entry=%u bytenr=%llu bitmap=0x%016llx\n", i, + wi_entry_bytenr(entry), + wi_entry_raw_bitmap(entry)); + } + /* For now, we just clear the whole bitmap. */ + memzero_page(ctrl->page, sizeof(struct write_intent_super), + WRITE_INTENT_BITMAPS_SIZE - + sizeof(struct write_intent_super)); + wi_set_super_nr_entries(wis, 0); + ret = write_intent_writeback(fs_info); + } + return ret; +} + int btrfs_write_intent_writeback(struct btrfs_fs_info *fs_info, u64 event) { struct write_intent_ctrl *ctrl = fs_info->wi_ctrl; diff --git a/fs/btrfs/write-intent.h b/fs/btrfs/write-intent.h index 872a707ef67d..bb6e9b599373 100644 --- a/fs/btrfs/write-intent.h +++ b/fs/btrfs/write-intent.h @@ -292,4 +292,12 @@ void btrfs_write_intent_mark_dirty(struct btrfs_fs_info *fs_info, u64 logical, */ void btrfs_write_intent_clear_dirty(struct btrfs_fs_info *fs_info, u64 logical, u32 len); + +/* + * Rebuild the range in the write-intent bitmaps. + * + * Currently not working, it will just output a warning and clear the bitmap. + */ +int btrfs_write_intent_recover(struct btrfs_fs_info *fs_info); + #endif