From patchwork Tue Oct 30 14:43:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1765D14BD for ; Tue, 30 Oct 2018 14:43:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0724828F47 for ; Tue, 30 Oct 2018 14:43:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF5F02902F; Tue, 30 Oct 2018 14:43:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EDD1B2907D for ; Tue, 30 Oct 2018 14:43:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726580AbeJ3XhS (ORCPT ); Tue, 30 Oct 2018 19:37:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:42302 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726219AbeJ3XhR (ORCPT ); Tue, 30 Oct 2018 19:37:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E44EB0F4 for ; Tue, 30 Oct 2018 14:43:31 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Date: Tue, 30 Oct 2018 16:43:23 +0200 Message-Id: <1540910608-4121-2-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This field is going to be used when the user wants to change the UUID of the filesystem without having to rewrite all metadata blocks. This field adds another level of indirection such that when the FSID is changed what really happens is the current UUID (the one with which the fs was created) is copied to the 'metadata_uuid' field in the superblock as well as a new incompat flag is set METADATA_UUID. When the kernel detects this flag is set it knows that the superblock in fact has 2 UUIDs: 1. Is the UUID which is user-visible, currently known as FSID. 2. Metadata UUID - this is the UUID which is stamped into all on-disk datastructures belonging to this file system. When the new incompat flag is present device scaning checks whether both fsid/metadata_uuid of the scanned device match to any of the registed filesystems. When the flag is not set then both UUIDs are equal and only the FSID is retained on disk, metadata_uuid is set only in-memory during mount. Additionally a new metadata_uuid field is also added to the fs_info struct. It's initialised either with the FSID in case METADATA_UUID incompat flag is not set or with the metdata_uuid of the superblock otherwise. This commit introduces the new fields as well as the new incompat flag and switches all users of the fsid to the new logic. Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.c | 4 +-- fs/btrfs/ctree.h | 12 ++++--- fs/btrfs/disk-io.c | 32 ++++++++++++++---- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++--------- fs/btrfs/volumes.h | 1 + include/uapi/linux/btrfs.h | 1 + include/uapi/linux/btrfs_tree.h | 1 + 8 files changed, 97 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 539901fb5165..75cd41bf12f7 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, else btrfs_set_header_owner(cow, new_root_objectid); - write_extent_buffer_fsid(cow, fs_info->fsid); + write_extent_buffer_fsid(cow, fs_info->metadata_fsid); WARN_ON(btrfs_header_generation(buf) > trans->transid); if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) @@ -1050,7 +1050,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, else btrfs_set_header_owner(cow, root->root_key.objectid); - write_extent_buffer_fsid(cow, fs_info->fsid); + write_extent_buffer_fsid(cow, fs_info->metadata_fsid); ret = update_ref_for_cow(trans, root, buf, cow, &last_ref); if (ret) { diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 68ca41dbbef3..501ada9ec7bd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -197,7 +197,7 @@ struct btrfs_root_backup { struct btrfs_super_block { u8 csum[BTRFS_CSUM_SIZE]; /* the first 4 fields must match struct btrfs_header */ - u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ + u8 fsid[BTRFS_FSID_SIZE]; /* userfacing FS specific uuid */ __le64 bytenr; /* this block number */ __le64 flags; @@ -234,8 +234,10 @@ struct btrfs_super_block { __le64 cache_generation; __le64 uuid_tree_generation; + u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */ + /* future expansion */ - __le64 reserved[30]; + __le64 reserved[28]; u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE]; struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS]; } __attribute__ ((__packed__)); @@ -265,7 +267,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_RAID56 | \ BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ - BTRFS_FEATURE_INCOMPAT_NO_HOLES) + BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ + BTRFS_FEATURE_INCOMPAT_METADATA_UUID) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) @@ -746,7 +749,8 @@ struct btrfs_delayed_root; #define BTRFS_FS_BALANCE_RUNNING 18 struct btrfs_fs_info { - u8 fsid[BTRFS_FSID_SIZE]; + u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */ + u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */ u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; unsigned long flags; struct btrfs_root *extent_root; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b0ab41da91d1..b76b18388b93 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) if (WARN_ON(!PageUptodate(page))) return -EUCLEAN; - ASSERT(memcmp_extent_buffer(eb, fs_info->fsid, + ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); return csum_tree_block(fs_info, eb, 0); @@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); while (fs_devices) { - if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) { + u8 *metadata_uuid; + /* + * Checking the incompat flag is only valid for the current + * fs. For seed devices it's forbidden to have their uuid + * changed so reading ->fsid in this case is fine + */ + if (fs_devices == fs_info->fs_devices && + btrfs_fs_incompat(fs_info, METADATA_UUID)) + metadata_uuid = fs_devices->metadata_uuid; + else + metadata_uuid = fs_devices->fsid; + + if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) { ret = 0; break; } @@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info, ret = -EINVAL; } - if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) { + if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid, + BTRFS_FSID_SIZE) != 0) { btrfs_err(fs_info, - "dev_item UUID does not match fsid: %pU != %pU", - fs_info->fsid, sb->dev_item.fsid); + "dev_item UUID does not match metadata fsid: %pU != %pU", + fs_info->metadata_fsid, sb->dev_item.fsid); ret = -EINVAL; } @@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb, brelse(bh); memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); + if (btrfs_fs_incompat(fs_info, METADATA_UUID)) { + memcpy(fs_info->metadata_fsid, + fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE); + } else { + memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE); + } ret = btrfs_validate_mount_super(fs_info); if (ret) { @@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) btrfs_set_stack_device_io_width(dev_item, dev->io_width); btrfs_set_stack_device_sector_size(dev_item, dev->sector_size); memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE); - memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE); + memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid, + BTRFS_FSID_SIZE); flags = btrfs_super_flags(sb); btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a1febf155747..047fba06a8d8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8169,7 +8169,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, btrfs_set_header_generation(buf, trans->transid); btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(buf, owner); - write_extent_buffer_fsid(buf, fs_info->fsid); + write_extent_buffer_fsid(buf, fs_info->metadata_fsid); write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid); if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) { buf->log_index = root->log_transid % 2; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f435d397019e..e3e12c94834f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void) /* * alloc_fs_devices - allocate struct btrfs_fs_devices - * @fsid: if not NULL, copy the uuid to fs_devices::fsid + * @fsid: if not NULL, copy the uuid to fs_devices::fsid + * @metadata_fsid: if not NULL, copy the uuid to fs_devices::metadata_fsid * * Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR(). * The returned struct is not linked onto any lists and can be destroyed with * kfree() right away. */ -static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid) +static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid, + const u8 *metadata_fsid) { struct btrfs_fs_devices *fs_devs; @@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid) if (fsid) memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE); + if (metadata_fsid) + memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE); + else if (fsid) + memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE); + return fs_devs; } @@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices, return NULL; } -static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid) +static noinline struct btrfs_fs_devices * +find_fsid(const u8 *fsid, const u8 *metadata_fsid) { struct btrfs_fs_devices *fs_devices; + ASSERT(fsid); + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0) - return fs_devices; + if (metadata_fsid) { + if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0 + && memcmp(metadata_fsid, fs_devices->metadata_uuid, + BTRFS_FSID_SIZE) == 0) + return fs_devices; + } else { + if (memcmp(fsid, fs_devices->fsid, + BTRFS_FSID_SIZE) == 0) + return fs_devices; + } } return NULL; } @@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, device->generation = btrfs_super_generation(disk_super); if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) { + if (btrfs_super_incompat_flags(disk_super) & + BTRFS_FEATURE_INCOMPAT_METADATA_UUID) { + pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n"); + goto error_brelse; + } + clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); fs_devices->seeding = 1; } else { @@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path, struct rcu_string *name; u64 found_transid = btrfs_super_generation(disk_super); u64 devid = btrfs_stack_device_id(&disk_super->dev_item); + bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & + BTRFS_FEATURE_INCOMPAT_METADATA_UUID); + + if (has_metadata_uuid) + fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid); + else + fs_devices = find_fsid(disk_super->fsid, NULL); - fs_devices = find_fsid(disk_super->fsid); if (!fs_devices) { - fs_devices = alloc_fs_devices(disk_super->fsid); + if (has_metadata_uuid) + fs_devices = alloc_fs_devices(disk_super->fsid, + disk_super->metadata_uuid); + else + fs_devices = alloc_fs_devices(disk_super->fsid, NULL); + if (IS_ERR(fs_devices)) return ERR_CAST(fs_devices); @@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) struct btrfs_device *device; struct btrfs_device *orig_dev; - fs_devices = alloc_fs_devices(orig->fsid); + fs_devices = alloc_fs_devices(orig->fsid, NULL); if (IS_ERR(fs_devices)) return fs_devices; @@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans, ptr = btrfs_device_uuid(dev_item); write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE); ptr = btrfs_device_fsid(dev_item); - write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE); + write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr, + BTRFS_FSID_SIZE); btrfs_mark_buffer_dirty(leaf); ret = 0; @@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path( disk_super = (struct btrfs_super_block *)bh->b_data; devid = btrfs_stack_device_id(&disk_super->dev_item); dev_uuid = disk_super->dev_item.uuid; - device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid); + if (btrfs_fs_incompat(fs_info, METADATA_UUID)) + device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid); + else + device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid); + brelse(bh); if (!device) device = ERR_PTR(-ENOENT); @@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info) if (!fs_devices->seeding) return -EINVAL; - seed_devices = alloc_fs_devices(NULL); + seed_devices = alloc_fs_devices(NULL, NULL); if (IS_ERR(seed_devices)) return PTR_ERR(seed_devices); @@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info) generate_random_uuid(fs_devices->fsid); memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE); + memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE); + memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE); memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE); mutex_unlock(&fs_devices->device_list_mutex); @@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid, cur_devices = fs_info->fs_devices; while (cur_devices) { if (!fsid || - !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) { + !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) { device = find_device(cur_devices, devid, uuid); if (device) return device; @@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info, fs_devices = fs_devices->seed; } - fs_devices = find_fsid(fsid); + fs_devices = find_fsid(fsid, NULL); if (!fs_devices) { if (!btrfs_test_opt(fs_info, DEGRADED)) return ERR_PTR(-ENOENT); - fs_devices = alloc_fs_devices(fsid); + fs_devices = alloc_fs_devices(fsid, NULL); if (IS_ERR(fs_devices)) return fs_devices; @@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info, read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item), BTRFS_FSID_SIZE); - if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) { + if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) { fs_devices = open_seed_devices(fs_info, fs_uuid); if (IS_ERR(fs_devices)) return PTR_ERR(fs_devices); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index aefce895e994..04860497b33c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ + u8 metadata_uuid[BTRFS_FSID_SIZE]; struct list_head fs_list; u64 num_devices; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 5ca1d21fc4a7..e0763bc4158e 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7) #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) +#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) struct btrfs_ioctl_feature_flags { __u64 compat_flags; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index aff1356c2bb8..e974f4bb5378 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -458,6 +458,7 @@ struct btrfs_free_space_header { #define BTRFS_SUPER_FLAG_METADUMP (1ULL << 33) #define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34) #define BTRFS_SUPER_FLAG_CHANGING_FSID (1ULL << 35) +#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36) /* From patchwork Tue Oct 30 14:43:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 11B0B15E9 for ; Tue, 30 Oct 2018 14:43:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01FC428F47 for ; Tue, 30 Oct 2018 14:43:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EA75A292D1; Tue, 30 Oct 2018 14:43:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 407FF28F47 for ; Tue, 30 Oct 2018 14:43:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726699AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:42336 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725853AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B74BCB0F6 for ; Tue, 30 Oct 2018 14:43:32 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Date: Tue, 30 Oct 2018 16:43:24 +0200 Message-Id: <1540910608-4121-3-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently btrfs_fs_info structure contains a copy of the fsid/metadata_uuid fields. Same values are also contained in the btrfs_fs_devices structure which fs_info has a reference to. Let's reduce duplication by removing the fields from fs_info and always refer to the ones in fs_devices. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/check-integrity.c | 2 +- fs/btrfs/ctree.c | 5 +++-- fs/btrfs/ctree.h | 2 -- fs/btrfs/disk-io.c | 21 ++++++++++++--------- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 10 ++++------ include/trace/events/btrfs.h | 2 +- 9 files changed, 24 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 2e43fba44035..781cae168d2a 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -1720,7 +1720,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state, num_pages = state->metablock_size >> PAGE_SHIFT; h = (struct btrfs_header *)datav[0]; - if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE)) + if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE)) return 1; for (i = 0; i < num_pages; i++) { diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 75cd41bf12f7..61f14a9836a1 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -12,6 +12,7 @@ #include "transaction.h" #include "print-tree.h" #include "locking.h" +#include "volumes.h" static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int level); @@ -224,7 +225,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, else btrfs_set_header_owner(cow, new_root_objectid); - write_extent_buffer_fsid(cow, fs_info->metadata_fsid); + write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid); WARN_ON(btrfs_header_generation(buf) > trans->transid); if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) @@ -1050,7 +1051,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, else btrfs_set_header_owner(cow, root->root_key.objectid); - write_extent_buffer_fsid(cow, fs_info->metadata_fsid); + write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid); ret = update_ref_for_cow(trans, root, buf, cow, &last_ref); if (ret) { diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 501ada9ec7bd..8531f0f5d672 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -749,8 +749,6 @@ struct btrfs_delayed_root; #define BTRFS_FS_BALANCE_RUNNING 18 struct btrfs_fs_info { - u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */ - u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */ u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; unsigned long flags; struct btrfs_root *extent_root; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b76b18388b93..a458ef5b605e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) if (WARN_ON(!PageUptodate(page))) return -EUCLEAN; - ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid, + ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid, btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0); return csum_tree_block(fs_info, eb, 0); @@ -2490,11 +2490,12 @@ static int validate_super(struct btrfs_fs_info *fs_info, ret = -EINVAL; } - if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid, + if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) { btrfs_err(fs_info, "dev_item UUID does not match metadata fsid: %pU != %pU", - fs_info->metadata_fsid, sb->dev_item.fsid); + fs_info->fs_devices->metadata_uuid, + sb->dev_item.fsid); ret = -EINVAL; } @@ -2834,14 +2835,16 @@ int open_ctree(struct super_block *sb, sizeof(*fs_info->super_for_commit)); brelse(bh); - memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); + ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, + BTRFS_FSID_SIZE)); + if (btrfs_fs_incompat(fs_info, METADATA_UUID)) { - memcpy(fs_info->metadata_fsid, - fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE); - } else { - memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE); + ASSERT(!memcmp(fs_info->fs_devices->metadata_uuid, + fs_info->super_copy->metadata_uuid, + BTRFS_FSID_SIZE)); } + ret = btrfs_validate_mount_super(fs_info); if (ret) { btrfs_err(fs_info, "superblock contains fatal errors"); @@ -2961,7 +2964,7 @@ int open_ctree(struct super_block *sb, sb->s_blocksize = sectorsize; sb->s_blocksize_bits = blksize_bits(sectorsize); - memcpy(&sb->s_uuid, fs_info->fsid, BTRFS_FSID_SIZE); + memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE); mutex_lock(&fs_info->chunk_mutex); ret = btrfs_read_sys_array(fs_info); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 047fba06a8d8..c98638155652 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8169,7 +8169,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, btrfs_set_header_generation(buf, trans->transid); btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(buf, owner); - write_extent_buffer_fsid(buf, fs_info->metadata_fsid); + write_extent_buffer_fsid(buf, fs_info->fs_devices->metadata_uuid); write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid); if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) { buf->log_index = root->log_transid % 2; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a990a9045139..510635e49319 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3135,7 +3135,7 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info, } rcu_read_unlock(); - memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid)); + memcpy(&fi_args->fsid, fs_devices->fsid, sizeof(fi_args->fsid)); fi_args->nodesize = fs_info->nodesize; fi_args->sectorsize = fs_info->sectorsize; fi_args->clone_alignment = fs_info->sectorsize; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b362b45dd757..1163183fe6dd 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2090,7 +2090,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) u64 total_free_data = 0; u64 total_free_meta = 0; int bits = dentry->d_sb->s_blocksize_bits; - __be32 *fsid = (__be32 *)fs_info->fsid; + __be32 *fsid = (__be32 *)fs_info->fs_devices->fsid; unsigned factor = 1; struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv; int ret; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e3e12c94834f..bf0aa900f22c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1744,8 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans, ptr = btrfs_device_uuid(dev_item); write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE); ptr = btrfs_device_fsid(dev_item); - write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr, - BTRFS_FSID_SIZE); + write_extent_buffer(leaf, trans->fs_info->fs_devices->metadata_uuid, + ptr, BTRFS_FSID_SIZE); btrfs_mark_buffer_dirty(leaf); ret = 0; @@ -2278,9 +2278,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info) fs_devices->seed = seed_devices; generate_random_uuid(fs_devices->fsid); - memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE); memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE); - memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE); memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE); mutex_unlock(&fs_devices->device_list_mutex); @@ -2522,7 +2520,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path * so rename the fsid on the sysfs */ snprintf(fsid_buf, BTRFS_UUID_UNPARSED_SIZE, "%pU", - fs_info->fsid); + fs_info->fs_devices->fsid); if (kobject_rename(&fs_devices->fsid_kobj, fsid_buf)) btrfs_warn(fs_info, "sysfs: failed to create fsid for sprout"); @@ -6671,7 +6669,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info, read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item), BTRFS_FSID_SIZE); - if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) { + if (memcmp(fs_uuid, fs_devices->metadata_uuid, BTRFS_FSID_SIZE)) { fs_devices = open_seed_devices(fs_info, fs_uuid); if (IS_ERR(fs_devices)) return PTR_ERR(fs_devices); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 8568946f491d..4b8400f7d4fa 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -92,7 +92,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS); #define TP_STRUCT__entry_fsid __array(u8, fsid, BTRFS_FSID_SIZE) #define TP_fast_assign_fsid(fs_info) \ - memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE) + memcpy(__entry->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE) #define TP_STRUCT__entry_btrfs(args...) \ TP_STRUCT__entry( \ From patchwork Tue Oct 30 14:43:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25A5E14BD for ; Tue, 30 Oct 2018 14:43:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15FE428F47 for ; Tue, 30 Oct 2018 14:43:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 09E2C292CC; Tue, 30 Oct 2018 14:43:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 947A328F47 for ; Tue, 30 Oct 2018 14:43:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726523AbeJ3XhR (ORCPT ); Tue, 30 Oct 2018 19:37:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:42318 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725853AbeJ3XhR (ORCPT ); Tue, 30 Oct 2018 19:37:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0F797B0F5 for ; Tue, 30 Oct 2018 14:43:32 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Date: Tue, 30 Oct 2018 16:43:25 +0200 Message-Id: <1540910608-4121-4-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Even though fsid change without rewrite is a very quick operations it's still possible to experience a split brain scenario if power loss occurs at the right time. This patch handle the case where power failure occurs while the first transaction (the one setting CHANGING_FSID_V2) flag is being persisted on disk. This can cause the btrfs_fs_devices of this filesystem to be created by a device which: a) has the CHANGING_FSID_V2 flag set but its fsid value is intact b) or a device which doesn't have CHANGING_FSID_V2 flag set and its fsid value is intact This situation is trivially handled by the current find_fsid code since in both cases the devices are going to be treated like ordinary devices. Since btrfs is always mounted using the superblock of the latest device (the one with highest generation number), meaning it will have the CHANGING_FSID_V2 flag set, ensure it's being cleared on mount. On the first transaction commit following mount all disks will have it cleared. Signed-off-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a458ef5b605e..6498434c2e06 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2831,10 +2831,10 @@ int open_ctree(struct super_block *sb, * the whole block of INFO_SIZE */ memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy)); - memcpy(fs_info->super_for_commit, fs_info->super_copy, - sizeof(*fs_info->super_for_commit)); brelse(bh); + disk_super = fs_info->super_copy; + ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE)); @@ -2844,6 +2844,15 @@ int open_ctree(struct super_block *sb, BTRFS_FSID_SIZE)); } + features = btrfs_super_flags(disk_super); + if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) { + features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2; + btrfs_set_super_flags(disk_super, features); + btrfs_info(fs_info, "found metadata uuid in progress flag. Clearing"); + } + + memcpy(fs_info->super_for_commit, fs_info->super_copy, + sizeof(*fs_info->super_for_commit)); ret = btrfs_validate_mount_super(fs_info); if (ret) { @@ -2852,7 +2861,6 @@ int open_ctree(struct super_block *sb, goto fail_alloc; } - disk_super = fs_info->super_copy; if (!btrfs_super_root(disk_super)) goto fail_alloc; From patchwork Tue Oct 30 14:43:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FC0E15E9 for ; Tue, 30 Oct 2018 14:43:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F359A28F47 for ; Tue, 30 Oct 2018 14:43:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E790C2907D; Tue, 30 Oct 2018 14:43:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90BB82902F for ; Tue, 30 Oct 2018 14:43:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726720AbeJ3XhU (ORCPT ); Tue, 30 Oct 2018 19:37:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:42368 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726219AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 924ACB0E4 for ; Tue, 30 Oct 2018 14:43:33 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Date: Tue, 30 Oct 2018 16:43:26 +0200 Message-Id: <1540910608-4121-5-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In order to gracefully handle split-brain scenario which are very unlikely, yet possible, while performing the FSID change I'm gonna need two more pieces of information: 1. The highest generation number among all devices registered to a particular btrfs_fs_devices 2. A boolean flag whether a given btrfs_fs_devices was created by a device which had the FSID_CHANGING_V2 flag set. This is a preparatory patch and just introduces the variables as well as code which sets them, their actual use is going to happen in a later patch. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 9 ++++++++- fs/btrfs/volumes.h | 5 +++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bf0aa900f22c..f9dcbe74093c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -785,6 +785,8 @@ static noinline struct btrfs_device *device_list_add(const char *path, u64 devid = btrfs_stack_device_id(&disk_super->dev_item); bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID); + bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & + BTRFS_SUPER_FLAG_CHANGING_FSID_V2); if (has_metadata_uuid) fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid); @@ -798,6 +800,8 @@ static noinline struct btrfs_device *device_list_add(const char *path, else fs_devices = alloc_fs_devices(disk_super->fsid, NULL); + fs_devices->fsid_change = fsid_change_in_progress; + if (IS_ERR(fs_devices)) return ERR_CAST(fs_devices); @@ -904,8 +908,11 @@ static noinline struct btrfs_device *device_list_add(const char *path, * it back. We need it to pick the disk with largest generation * (as above). */ - if (!fs_devices->opened) + if (!fs_devices->opened) { device->generation = found_transid; + fs_devices->latest_generation = max(found_transid, + fs_devices->latest_generation); + } fs_devices->total_devices = btrfs_super_num_devices(disk_super); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 04860497b33c..6b2a01c55426 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -211,6 +211,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ u8 metadata_uuid[BTRFS_FSID_SIZE]; + bool fsid_change; struct list_head fs_list; u64 num_devices; @@ -219,6 +220,10 @@ struct btrfs_fs_devices { u64 missing_devices; u64 total_rw_bytes; u64 total_devices; + + /* Highest generation number of seen devices */ + u64 latest_generation; + struct block_device *latest_bdev; /* all of the devices in the FS, protected by a mutex From patchwork Tue Oct 30 14:43:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0339314BD for ; Tue, 30 Oct 2018 14:43:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E831728F47 for ; Tue, 30 Oct 2018 14:43:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DCF962907D; Tue, 30 Oct 2018 14:43:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6285B28F47 for ; Tue, 30 Oct 2018 14:43:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726686AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:42342 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726532AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CF501B0F7 for ; Tue, 30 Oct 2018 14:43:32 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Date: Tue, 30 Oct 2018 16:43:27 +0200 Message-Id: <1540910608-4121-6-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This commit continues hardening the scanning code to handle cases where power loss could have caused disks in a multi-disk filesystem to be in inconsistent state. Namely handle the situation that can occur when some of the disks in multi-disk fs have completed their fsid change i.e they have METADATA_UUID incompat flag set, have cleared the CHANGING_FSID_V2 flag and their fsid/metadata_uuid are different. At the same time the other half of the disks will have their fsid/metadata_uuid unchanged and will only have CHANGING_FSID_V2 flag. This is handled by introducing code in the scan path which: a) Handles the case when a device with CHANGING_FSID_V2 flag is scanned and as a result btrfs_fs_devices is created with matching fsid/metdata_uuid. Subsequently, when a device with completed fsid change is scanned it will detect this via the new code in find_fsid i.e that such an fs_devices exist that fsid_change flag is set to true, it's metadata_uuid/fsid match and the metadata_uuid of the scanned device matches that of the fs_devices. In this case, it's important to note that the devices which has its fsid change completed will have a higher generation number than the device with FSID_CHANGING_V2 flag set, so its superblock block will be used during mount. To prevent an assertion triggering because the sb used for mounting will have differing fsid/metadata_uuid than the ones in the fs_devices struct also add code in device_list_add which overwrites the values in fs_devices. b) Alternatively we can end up with a device that completed its fsid change be scanned first which will create the respective btrfs_fs_devices struct with differing fsid/metadata_uuid. In this case when a device with FSID_CHANGING_V2 flag set is scanned it will call the newly added find_fsid_inprogress function which will return the correct fs_devices. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 74 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f9dcbe74093c..f967e995feff 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -382,6 +382,26 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid) ASSERT(fsid); + if (metadata_fsid) { + + /* + * Handle scanned device having completed its fsid change but + * belonging to a fs_devices that was created by first scanning + * a device which didn't have it's fsid/metadata_uuid changed + * at all and the CHANGING_FSID_V2 flag set. + */ + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + if (fs_devices->fsid_change && + memcmp(metadata_fsid, fs_devices->fsid, + BTRFS_FSID_SIZE) == 0 && + memcmp(fs_devices->fsid, fs_devices->metadata_uuid, + BTRFS_FSID_SIZE) == 0) { + return fs_devices; + } + } + } + + /* Handle non-split brain cases */ list_for_each_entry(fs_devices, &fs_uuids, fs_list) { if (metadata_fsid) { if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0 @@ -768,6 +788,27 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, } /* + * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices + * being created with a disk that has already completed its fsid change. + */ +static struct btrfs_fs_devices *find_fsid_inprogress( + struct btrfs_super_block *disk_super) +{ + struct btrfs_fs_devices *fs_devices; + + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid, + BTRFS_FSID_SIZE) != 0 && + memcmp(fs_devices->metadata_uuid, disk_super->fsid, + BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change) { + return fs_devices; + } + } + + return NULL; +} + +/* * Add new device to list of registered devices * * Returns: @@ -779,7 +820,7 @@ static noinline struct btrfs_device *device_list_add(const char *path, bool *new_device_added) { struct btrfs_device *device; - struct btrfs_fs_devices *fs_devices; + struct btrfs_fs_devices *fs_devices = NULL; struct rcu_string *name; u64 found_transid = btrfs_super_generation(disk_super); u64 devid = btrfs_stack_device_id(&disk_super->dev_item); @@ -788,10 +829,24 @@ static noinline struct btrfs_device *device_list_add(const char *path, bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_CHANGING_FSID_V2); - if (has_metadata_uuid) - fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid); - else + if (fsid_change_in_progress && !has_metadata_uuid) { + /* + * When we have an image which has CHANGING_FSID_V2 set it might + * belong to either a filesystem which has disks with completed + * fsid change or it might belong to fs with no uuid changes in + * effect, handle both. + */ + fs_devices = find_fsid_inprogress(disk_super); + if (!fs_devices) + fs_devices = find_fsid(disk_super->fsid, NULL); + + } else if (has_metadata_uuid) { + fs_devices = find_fsid(disk_super->fsid, + disk_super->metadata_uuid); + } else { fs_devices = find_fsid(disk_super->fsid, NULL); + } + if (!fs_devices) { if (has_metadata_uuid) @@ -813,6 +868,21 @@ static noinline struct btrfs_device *device_list_add(const char *path, mutex_lock(&fs_devices->device_list_mutex); device = find_device(fs_devices, devid, disk_super->dev_item.uuid); + + /* + * If this disk has been pulled into an fs devices created by + * a device which had the FSID_CHANGING flag then replace the + * metadata_uuid/fsid values of the fs_devices. + */ + if (has_metadata_uuid && fs_devices->fsid_change && + found_transid > fs_devices->latest_generation) { + memcpy(fs_devices->fsid, disk_super->fsid, + BTRFS_FSID_SIZE); + memcpy(fs_devices->metadata_uuid, + disk_super->metadata_uuid, BTRFS_FSID_SIZE); + + fs_devices->fsid_change = false; + } } if (!device) { From patchwork Tue Oct 30 14:43:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 10660993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8990E1751 for ; Tue, 30 Oct 2018 14:43:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7ACFD28F47 for ; Tue, 30 Oct 2018 14:43:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F9A92902F; Tue, 30 Oct 2018 14:43:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE88A292CC for ; Tue, 30 Oct 2018 14:43:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726731AbeJ3XhW (ORCPT ); Tue, 30 Oct 2018 19:37:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:42348 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726537AbeJ3XhT (ORCPT ); Tue, 30 Oct 2018 19:37:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E7405B0F8 for ; Tue, 30 Oct 2018 14:43:32 +0000 (UTC) From: Nikolay Borisov To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov Subject: [PATCH 6/6] btrfs: Handle final split-brain possibility during fsid change Date: Tue, 30 Oct 2018 16:43:28 +0200 Message-Id: <1540910608-4121-7-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1540910608-4121-1-git-send-email-nborisov@suse.com> References: <1540910608-4121-1-git-send-email-nborisov@suse.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch lands the last case which needs to be handled by the fsid change code. Namely, this is the case where a multidisk filesystem has already undergone at least one successful fsid change i.e all disks have the METADATA_UUID incompat bit and power failure occurs as another fsid change is in progress. When such an event occurs, disks could be split in 2 groups. One of the groups will have both METADATA_UUID and CHANGING_FSID_V2 flags set coupled with old fsid/metadata_uuid pairs. The other group of disks will have only METADATA_UUID bit set and their fsid will be different than the one in disks in the first group. Here we look at the following cases: a) A disk from the first group is scanned first, so fs_devices is created with stale fsid/metdata_uuid. Then when a disk from the second group is scanned it needs to first check whether there exists such an fs_devices that has fsid_change set to true (because it was created with a disk having the CHANGING_FSID_V2 flag), the metadata_uuid and fsid of the fs_devices will be different (since it was created by a disk which already has had at least 1 successful fsid change) and finally the metadata_uuid of the fs_devices will equal that of the currently scanned disk (because metadata_uuid never really changes). When the correct fs_devices is found the information from the scanned disk will replace the current one in fs_devices since the scanned disk will have higher generation number. b) A disk from the second group is scanned so fs_devices is created as usual with differing fsid/metdata_uid. Then when a disk from the first group is scanned the code detects that it has both CHANGING_FSID_V2 and METADATA_UUID flags set and will search for fs_devices that has differing metadata_uuid/fsid and whose metadata_uuid is the same as that of the scanned device. Signed-off-by: Nikolay Borisov --- fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f967e995feff..0ce2600c63b8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -383,7 +383,6 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid) ASSERT(fsid); if (metadata_fsid) { - /* * Handle scanned device having completed its fsid change but * belonging to a fs_devices that was created by first scanning @@ -399,6 +398,21 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid) return fs_devices; } } + /* + * Handle scanned device having completed its fsid change but + * belonging to a fs_devices that was created by a device that + * has an outdated pair of fsid/metadata_uuid and + * CHANGING_FSID_V2 flag set. + */ + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + if (fs_devices->fsid_change && + memcmp(fs_devices->metadata_uuid, + fs_devices->fsid, BTRFS_FSID_SIZE) != 0 && + memcmp(metadata_fsid, fs_devices->metadata_uuid, + BTRFS_FSID_SIZE) == 0) { + return fs_devices; + } + } } /* Handle non-split brain cases */ @@ -808,6 +822,30 @@ static struct btrfs_fs_devices *find_fsid_inprogress( return NULL; } + +static struct btrfs_fs_devices *find_fsid_changed( + struct btrfs_super_block *disk_super) +{ + struct btrfs_fs_devices *fs_devices; + + /* + * Handles the case where scanned device is part of an fs that had + * multiple successful changes of FSID but curently device didn't + * observe it. Meaning our fsid will be different than theirs. + */ + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid, + BTRFS_FSID_SIZE) != 0 && + memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid, + BTRFS_FSID_SIZE) == 0 && + memcmp(fs_devices->fsid, disk_super->fsid, + BTRFS_FSID_SIZE) != 0) { + return fs_devices; + } + } + + return NULL; +} /* * Add new device to list of registered devices * @@ -829,17 +867,20 @@ static noinline struct btrfs_device *device_list_add(const char *path, bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_CHANGING_FSID_V2); - if (fsid_change_in_progress && !has_metadata_uuid) { - /* - * When we have an image which has CHANGING_FSID_V2 set it might - * belong to either a filesystem which has disks with completed - * fsid change or it might belong to fs with no uuid changes in - * effect, handle both. - */ - fs_devices = find_fsid_inprogress(disk_super); - if (!fs_devices) - fs_devices = find_fsid(disk_super->fsid, NULL); - + if (fsid_change_in_progress) { + if (!has_metadata_uuid) { + /* + * When we have an image which has CHANGING_FSID_V2 set + * it might belong to either a filesystem which has + * disks with completed fsid change or it might belong + * to fs with no uuid changes in effect, handle both. + */ + fs_devices = find_fsid_inprogress(disk_super); + if (!fs_devices) + fs_devices = find_fsid(disk_super->fsid, NULL); + } else { + fs_devices = find_fsid_changed(disk_super); + } } else if (has_metadata_uuid) { fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);