From patchwork Sun Mar 6 18:14:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12770881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52A42C433F5 for ; Sun, 6 Mar 2022 18:15:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232319AbiCFSQs (ORCPT ); Sun, 6 Mar 2022 13:16:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231402AbiCFSQq (ORCPT ); Sun, 6 Mar 2022 13:16:46 -0500 Received: from smtp.tiscali.it (michael.mail.tiscali.it [213.205.33.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6A5B865D19 for ; Sun, 6 Mar 2022 10:15:54 -0800 (PST) Received: from venice.bhome ([78.12.27.75]) by michael.mail.tiscali.it with id 36El2700C1dDdji016EnH3; Sun, 06 Mar 2022 18:14:47 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Boris Burkov , Goffredo Baroncelli Subject: [PATCH 1/5] btrfs: add flags to give an hint to the chunk allocator Date: Sun, 6 Mar 2022 19:14:39 +0100 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1646590488; bh=k4fStNXRZTOhSgYE1Jym6cSH926IWoozudV13F4pQj0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=tl27NA84w7V+AUpaRa44HkFCUUSo444kkJg8T8bmK13wOUBaKpc1W/BEp55TJ+L/3 YY+0qRPwfnjxUBzVnOiGwx2zJ3jZsinmBOCzjPW3mrWQszzKTtvetdtnpS5hJWBS0v eWXC1nu80gKQ+5XbYy/HL046TvvePLOlo9EUyRGg= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Add the following flags to give an hint about which chunk should be allocated in which disk: - BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED preferred for data chunk, but metadata chunk allowed - BTRFS_DEV_ALLOCATION_HINT_METADATA_PREFERRED preferred for metadata chunk, but data chunk allowed - BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY only metadata chunk allowed - BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY only data chunk allowed Signed-off-by: Goffredo Baroncelli --- include/uapi/linux/btrfs_tree.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index b069752a8ecf..e0d842c2e616 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -389,6 +389,22 @@ struct btrfs_key { __u64 offset; } __attribute__ ((__packed__)); +/* dev_item.type */ + +/* btrfs chunk allocation hint */ +#define BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT 2 +/* btrfs chunk allocation hint mask */ +#define BTRFS_DEV_ALLOCATION_HINT_MASK \ + ((1 << BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT) - 1) +/* preferred data chunk, but metadata chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED (0ULL) +/* preferred metadata chunk, but data chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_METADATA_PREFERRED (1ULL) +/* only metadata chunk are allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY (2ULL) +/* only data chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY (3ULL) + struct btrfs_dev_item { /* the internal btrfs device id */ __le64 devid; From patchwork Sun Mar 6 18:14:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12770879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CF96C433EF for ; Sun, 6 Mar 2022 18:15:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232159AbiCFSQs (ORCPT ); Sun, 6 Mar 2022 13:16:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230445AbiCFSQq (ORCPT ); Sun, 6 Mar 2022 13:16:46 -0500 X-Greylist: delayed 63 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 06 Mar 2022 10:15:53 PST Received: from smtp.tiscali.it (michael.mail.tiscali.it [213.205.33.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 62BC765D0E for ; Sun, 6 Mar 2022 10:15:53 -0800 (PST) Received: from venice.bhome ([78.12.27.75]) by michael.mail.tiscali.it with id 36El2700C1dDdji016EoHV; Sun, 06 Mar 2022 18:14:48 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Boris Burkov , Goffredo Baroncelli Subject: [PATCH 2/5] btrfs: export the device allocation_hint property in sysfs Date: Sun, 6 Mar 2022 19:14:40 +0100 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1646590488; bh=JjRHIaQBH8Q68/ZAJRphaTfGINGKVAAb0+OMlvGVml4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=0CnVRQIdUNLa+/FrN75RNDTSSm2D0vfKq2ndNsBUHUcqp8B8sDspzet1zL5y3VrFp sDRN+wNn9FkBPc9wqKdTv1KMjZsbvsnLpXIJN+hrZM7bkTMJ1NxWqEZJwcFMZEb0Hu w2QsoZj9LwSjaR1+UtIr30YooeMSBEO11VOf2a0w= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Export the device allocation_hint property via /sys/fs/btrfs//devinfo//allocation_hint Signed-off-by: Goffredo Baroncelli Reported-by: kernel test robot --- fs/btrfs/sysfs.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 17389a42a3ab..59d92a385a96 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1578,6 +1578,36 @@ static ssize_t btrfs_devinfo_error_stats_show(struct kobject *kobj, } BTRFS_ATTR(devid, error_stats, btrfs_devinfo_error_stats_show); + +struct allocation_hint_name_t { + const char *name; + const u64 value; +} allocation_hint_name[] = { + { "DATA_PREFERRED", BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED }, + { "METADATA_PREFERRED", BTRFS_DEV_ALLOCATION_HINT_METADATA_PREFERRED }, + { "DATA_ONLY", BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY }, + { "METADATA_ONLY", BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY }, +}; + +static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, + struct kobj_attribute *a, char *buf) +{ + int i; + struct btrfs_device *device = container_of(kobj, struct btrfs_device, + devid_kobj); + + for (i = 0 ; i < ARRAY_SIZE(allocation_hint_name) ; i++) { + if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) != + allocation_hint_name[i].value) + continue; + + return scnprintf(buf, PAGE_SIZE, "%s\n", + allocation_hint_name[i].name); + } + return scnprintf(buf, PAGE_SIZE, "\n"); +} +BTRFS_ATTR(devid, allocation_hint, btrfs_devinfo_allocation_hint_show); + /* * Information about one device. * @@ -1591,6 +1621,7 @@ static struct attribute *devid_attrs[] = { BTRFS_ATTR_PTR(devid, replace_target), BTRFS_ATTR_PTR(devid, scrub_speed_max), BTRFS_ATTR_PTR(devid, writeable), + BTRFS_ATTR_PTR(devid, allocation_hint), NULL }; ATTRIBUTE_GROUPS(devid); From patchwork Sun Mar 6 18:14:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12770880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40512C433FE for ; Sun, 6 Mar 2022 18:15:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232029AbiCFSQr (ORCPT ); Sun, 6 Mar 2022 13:16:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbiCFSQq (ORCPT ); Sun, 6 Mar 2022 13:16:46 -0500 Received: from smtp.tiscali.it (michael.mail.tiscali.it [213.205.33.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5B94065D0C for ; Sun, 6 Mar 2022 10:15:53 -0800 (PST) Received: from venice.bhome ([78.12.27.75]) by michael.mail.tiscali.it with id 36El2700C1dDdji016EoHv; Sun, 06 Mar 2022 18:14:48 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Boris Burkov , Goffredo Baroncelli Subject: [PATCH 3/5] btrfs: change the device allocation_hint property via sysfs Date: Sun, 6 Mar 2022 19:14:41 +0100 Message-Id: <7c56077a080b9ab77d1a722cb3bdde50e83895c4.1646589622.git.kreijack@inwind.it> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1646590488; bh=QEcrKhcmRlCqQhsUfRiL4nKoOngJPihtP7ZVa+dX7+o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=VT4CkhblbbmupjVtL7Xhukl3E2p35rhUk1IFTCEiLEKoNPMUqdp1V347uZt1JOPyz coYNoZ1e8ujTF70bvFFZ3KSvLaDasAOJu4tdP9PEG6LS7EoSUJHvQ4XBxKtZPfy18D I/oshhaH4PecV0cJ71wGVN8SY4SiqKnVZg1beeU4= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli This patch allow to change the allocation_hint property writing a numerical value in the file. /sysfs/fs/btrfs//devinfo//allocation_hint To update this field it is added the property "allocation_hint" in btrfs-prog too. Signed-off-by: Goffredo Baroncelli --- fs/btrfs/sysfs.c | 76 +++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 ++ 3 files changed, 78 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 59d92a385a96..c6723456c0e1 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1606,7 +1606,81 @@ static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, } return scnprintf(buf, PAGE_SIZE, "\n"); } -BTRFS_ATTR(devid, allocation_hint, btrfs_devinfo_allocation_hint_show); + +static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info; + struct btrfs_root *root; + struct btrfs_device *device; + int ret; + struct btrfs_trans_handle *trans; + int i, l; + u64 type, prev_type; + + if (len < 1) + return -EINVAL; + + /* remove trailing newline */ + l = len; + if (buf[len-1] == '\n') + l--; + + for (i = 0 ; i < ARRAY_SIZE(allocation_hint_name) ; i++) { + if (l != strlen(allocation_hint_name[i].name)) + continue; + + if (strncasecmp(allocation_hint_name[i].name, buf, l)) + continue; + + type = allocation_hint_name[i].value; + break; + } + + if (i >= ARRAY_SIZE(allocation_hint_name)) + return -EINVAL; + + device = container_of(kobj, struct btrfs_device, devid_kobj); + fs_info = device->fs_info; + if (!fs_info) + return -EPERM; + + root = fs_info->chunk_root; + if (sb_rdonly(fs_info->sb)) + return -EROFS; + + /* check if a change is really needed */ + if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) == type) + return len; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + prev_type = device->type; + device->type = (device->type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | type; + + ret = btrfs_update_device(trans, device); + + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + goto abort; + } + + ret = btrfs_commit_transaction(trans); + if (ret < 0) + goto abort; + + return len; +abort: + device->type = prev_type; + return ret; +} +BTRFS_ATTR_RW(devid, allocation_hint, btrfs_devinfo_allocation_hint_show, + btrfs_devinfo_allocation_hint_store); + /* * Information about one device. diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5e3e13d4940b..d4ac90f5c949 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2846,7 +2846,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path return ret; } -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans, +noinline int btrfs_update_device(struct btrfs_trans_handle *trans, struct btrfs_device *device) { int ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index bd297f23d19e..93ac27d8097c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -636,5 +636,7 @@ int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); +int btrfs_update_device(struct btrfs_trans_handle *trans, + struct btrfs_device *device); #endif From patchwork Sun Mar 6 18:14:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12770882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26624C4332F for ; Sun, 6 Mar 2022 18:15:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232389AbiCFSQt (ORCPT ); Sun, 6 Mar 2022 13:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231410AbiCFSQr (ORCPT ); Sun, 6 Mar 2022 13:16:47 -0500 Received: from smtp.tiscali.it (michael.mail.tiscali.it [213.205.33.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6537665D18 for ; Sun, 6 Mar 2022 10:15:54 -0800 (PST) Received: from venice.bhome ([78.12.27.75]) by michael.mail.tiscali.it with id 36El2700C1dDdji016EoJF; Sun, 06 Mar 2022 18:14:49 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Boris Burkov , Goffredo Baroncelli Subject: [PATCH 4/5] btrfs: add allocation_hint mode Date: Sun, 6 Mar 2022 19:14:42 +0100 Message-Id: <2291ba747c6c9701952fa75140684535cfe4ab3e.1646589622.git.kreijack@inwind.it> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1646590489; bh=1/JEtmf2CVZjP3zFhRx9qnbnBKpvX8AsJX5om/0uWBA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=WRVpWHaB1oI+Sue7xla+NaAJHKZOIG7j8vhpzzZtEZSAvTv6gCEpRCadTpbS/jjEG Onu4gSxDeX7XLAefuEmp1fEfcmrCezaf+4delX1RJEGfWRiQnCdea50ilzra9koDz8 /HlH6jrwy3s8r5bH+qAtrtkxUYWMBQJbgVnbctjw= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli The chunk allocation policy is modified as follow. Each disk may have one of the following tags: - BTRFS_DEV_ALLOCATION_METADATA_PREFERRED - BTRFS_DEV_ALLOCATION_METADATA_ONLY - BTRFS_DEV_ALLOCATION_DATA_ONLY - BTRFS_DEV_ALLOCATION_DATA_PREFERRED (default) During a *mixed data/metadata* chunk allocation, BTRFS works as usual. During a *data* chunk allocation, the space are searched first in BTRFS_DEV_ALLOCATION_DATA_ONLY. If the space found is not enough (eg. in raid5, only two disks are available), then the disks tagged BTRFS_DEV_ALLOCATION_DATA_PREFERRED are considered. If the space is not enough again, the disks tagged BTRFS_DEV_ALLOCATION_METADATA_PREFERRED are also considered. If even in this case the space is not sufficient, -ENOSPC is raised. A disk tagged with BTRFS_DEV_ALLOCATION_METADATA_ONLY is never considered for a data BG allocation. During a *metadata* chunk allocation, the same algorithm applies swapping _DATA_ and _METADATA_. By default the disks are tagged as BTRFS_DEV_ALLOCATION_DATA_PREFERRED, so BTRFS behaves as usual. If the user prefers to store the metadata in the faster disks (e.g. SSD), he can tag these with BTRFS_DEV_ALLOCATION_METADATA_PREFERRED: in this case the metadata BG go in the BTRFS_DEV_ALLOCATION_METADATA_PREFERRED disks and the data BG in the others ones. When a disks set is filled, the other is considered. Signed-off-by: Goffredo Baroncelli --- fs/btrfs/volumes.c | 113 +++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 1 + 2 files changed, 111 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d4ac90f5c949..7b37db9bb887 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -184,6 +184,27 @@ enum btrfs_raid_types __attribute_const__ btrfs_bg_flags_to_raid_index(u64 flags return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } +#define BTRFS_DEV_ALLOCATION_HINT_COUNT (1ULL << \ + BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT) + +/* + * The order of BTRFS_DEV_ALLOCATION_HINT_* values are not + * good, because BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED is 0 + * (for backward compatibility reason), and the other + * values are greater (because the field is unsigned). So we + * need a map that rearranges the order giving to _DATA_PREFERRED + * an intermediate priority. + * These values give to METADATA_ONLY the highest priority, and are + * valid for metadata BG allocation. When a data + * BG is allocated we negate these values to reverse the priority. + */ +static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_HINT_COUNT] = { + [BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY] = -1, + [BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED] = 0, + [BTRFS_DEV_ALLOCATION_HINT_METADATA_PREFERRED] = 1, + [BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY] = 2, +}; + const char *btrfs_bg_type_to_raid_name(u64 flags) { const int index = btrfs_bg_flags_to_raid_index(flags); @@ -5030,13 +5051,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info, } /* - * sort the devices in descending order by max_avail, total_avail + * sort the devices in descending order by alloc_hint, + * max_avail, total_avail */ static int btrfs_cmp_device_info(const void *a, const void *b) { const struct btrfs_device_info *di_a = a; const struct btrfs_device_info *di_b = b; + if (di_a->alloc_hint > di_b->alloc_hint) + return -1; + if (di_a->alloc_hint < di_b->alloc_hint) + return 1; if (di_a->max_avail > di_b->max_avail) return -1; if (di_a->max_avail < di_b->max_avail) @@ -5199,6 +5225,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, int ndevs = 0; u64 max_avail; u64 dev_offset; + int hint; /* * in the first pass through the devices list, we gather information @@ -5251,17 +5278,95 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, devices_info[ndevs].max_avail = max_avail; devices_info[ndevs].total_avail = total_avail; devices_info[ndevs].dev = device; + + if ((ctl->type & BTRFS_BLOCK_GROUP_DATA) && + (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) { + /* + * if mixed bg set all the alloc_hint + * fields to the same value, so the sorting + * is not affected + */ + devices_info[ndevs].alloc_hint = 0; + } else if (ctl->type & BTRFS_BLOCK_GROUP_DATA) { + hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + + /* + * skip BTRFS_DEV_METADATA_ONLY disks + */ + if (hint == BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY) + continue; + /* + * if a data chunk must be allocated, + * sort also by hint (data disk + * higher priority) + */ + devices_info[ndevs].alloc_hint = -alloc_hint_map[hint]; + } else { /* BTRFS_BLOCK_GROUP_METADATA */ + hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + + /* + * skip BTRFS_DEV_DATA_ONLY disks + */ + if (hint == BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY) + continue; + /* + * if a metadata chunk must be allocated, + * sort also by hint (metadata hint + * higher priority) + */ + devices_info[ndevs].alloc_hint = alloc_hint_map[hint]; + } + ++ndevs; } ctl->ndevs = ndevs; + return 0; +} + +static void sort_and_reduce_device_info(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + int ndevs, hint, i; + + ndevs = ctl->ndevs; /* - * now sort the devices by hole size / available space + * now sort the devices by hint / hole size / available space */ sort(devices_info, ndevs, sizeof(struct btrfs_device_info), btrfs_cmp_device_info, NULL); - return 0; + /* + * select the minimum set of disks grouped by hint that + * can host the chunk + */ + ndevs = 0; + while (ndevs < ctl->ndevs) { + hint = devices_info[ndevs++].alloc_hint; + while (ndevs < ctl->ndevs) { + if (devices_info[ndevs].alloc_hint != hint) + break; + ndevs++; + } + if (ndevs >= ctl->devs_min) + break; + } + + ctl->ndevs = ndevs; + + /* + * the next layers require the devices_info ordered by + * max_avail. If we are returning two (or more) different + * group of alloc_hint, this is not always true. So sort + * these again. + */ + + for (i = 0 ; i < ndevs ; i++) + devices_info[i].alloc_hint = 0; + + sort(devices_info, ndevs, sizeof(struct btrfs_device_info), + btrfs_cmp_device_info, NULL); + } static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, @@ -5513,6 +5618,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans, goto out; } + sort_and_reduce_device_info(&ctl, devices_info); + ret = decide_stripe_size(fs_devices, &ctl, devices_info); if (ret < 0) { block_group = ERR_PTR(ret); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 93ac27d8097c..b066f9af216a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -404,6 +404,7 @@ struct btrfs_device_info { u64 dev_offset; u64 max_avail; u64 total_avail; + int alloc_hint; }; struct btrfs_raid_attr { From patchwork Sun Mar 6 18:14:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12770883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A861C43217 for ; Sun, 6 Mar 2022 18:15:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232542AbiCFSQt (ORCPT ); Sun, 6 Mar 2022 13:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231678AbiCFSQr (ORCPT ); Sun, 6 Mar 2022 13:16:47 -0500 Received: from smtp.tiscali.it (michael.mail.tiscali.it [213.205.33.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6FABC65D1A for ; Sun, 6 Mar 2022 10:15:54 -0800 (PST) Received: from venice.bhome ([78.12.27.75]) by michael.mail.tiscali.it with id 36El2700C1dDdji016EpJh; Sun, 06 Mar 2022 18:14:49 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Boris Burkov , Goffredo Baroncelli Subject: [PATCH 5/5] btrfs: rename dev_item->type to dev_item->flags Date: Sun, 6 Mar 2022 19:14:43 +0100 Message-Id: <7c805844c54fab100402ac7392d1a8c9d28372b9.1646589622.git.kreijack@inwind.it> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1646590489; bh=vBjL+Gr66VOY8EXQkOc5K7KSwkkB/VnlPpI4scTqCb8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=ydwiBBuKRjEu/cNSrPn0jh+vHSoTKipzmfci/EuiwdoJxcf6cZe8q1nxLyCGgRKXF UqWMcWPzu8wcIK+rAnD/Tq8ITJluwLk8bWShv99thOr3qQfOTgY91VKcnMaBBvRiTv lodpDuzwmllranK2i74z/2Nhk8T7rkywi376mqE4= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Rename the field type of dev_item from 'type' to 'flags' changing the struct btrfs_device and btrfs_dev_item. Signed-off-by: Goffredo Baroncelli --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 2 +- fs/btrfs/sysfs.c | 15 ++++++++------- fs/btrfs/volumes.c | 10 +++++----- fs/btrfs/volumes.h | 4 ++-- include/uapi/linux/btrfs_tree.h | 4 ++-- 6 files changed, 20 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4db17bd05a21..afa47061a47a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1707,7 +1707,7 @@ static inline void btrfs_set_device_total_bytes(const struct extent_buffer *eb, } -BTRFS_SETGET_FUNCS(device_type, struct btrfs_dev_item, type, 64); +BTRFS_SETGET_FUNCS(device_flags, struct btrfs_dev_item, flags, 64); BTRFS_SETGET_FUNCS(device_bytes_used, struct btrfs_dev_item, bytes_used, 64); BTRFS_SETGET_FUNCS(device_io_align, struct btrfs_dev_item, io_align, 32); BTRFS_SETGET_FUNCS(device_io_width, struct btrfs_dev_item, io_width, 32); @@ -1720,7 +1720,7 @@ BTRFS_SETGET_FUNCS(device_seek_speed, struct btrfs_dev_item, seek_speed, 8); BTRFS_SETGET_FUNCS(device_bandwidth, struct btrfs_dev_item, bandwidth, 8); BTRFS_SETGET_FUNCS(device_generation, struct btrfs_dev_item, generation, 64); -BTRFS_SETGET_STACK_FUNCS(stack_device_type, struct btrfs_dev_item, type, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_flags, struct btrfs_dev_item, flags, 64); BTRFS_SETGET_STACK_FUNCS(stack_device_total_bytes, struct btrfs_dev_item, total_bytes, 64); BTRFS_SETGET_STACK_FUNCS(stack_device_bytes_used, struct btrfs_dev_item, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6a0b4dbd70e9..0c2f5a98b9df 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4407,7 +4407,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) continue; btrfs_set_stack_device_generation(dev_item, 0); - btrfs_set_stack_device_type(dev_item, dev->type); + btrfs_set_stack_device_flags(dev_item, dev->flags); btrfs_set_stack_device_id(dev_item, dev->devid); btrfs_set_stack_device_total_bytes(dev_item, dev->commit_total_bytes); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c6723456c0e1..8d0581c5383d 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1597,7 +1597,7 @@ static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, devid_kobj); for (i = 0 ; i < ARRAY_SIZE(allocation_hint_name) ; i++) { - if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) != + if ((device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK) != allocation_hint_name[i].value) continue; @@ -1617,7 +1617,7 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, int ret; struct btrfs_trans_handle *trans; int i, l; - u64 type, prev_type; + u64 flags, prev_flags; if (len < 1) return -EINVAL; @@ -1634,7 +1634,7 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, if (strncasecmp(allocation_hint_name[i].name, buf, l)) continue; - type = allocation_hint_name[i].value; + flags = allocation_hint_name[i].value; break; } @@ -1651,15 +1651,16 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, return -EROFS; /* check if a change is really needed */ - if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) == type) + if ((device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK) == flags) return len; trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) return PTR_ERR(trans); - prev_type = device->type; - device->type = (device->type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | type; + prev_flags = device->flags; + device->flags = (device->flags & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | + flags; ret = btrfs_update_device(trans, device); @@ -1675,7 +1676,7 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, return len; abort: - device->type = prev_type; + device->flags = prev_flags; return ret; } BTRFS_ATTR_RW(devid, allocation_hint, btrfs_devinfo_allocation_hint_show, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7b37db9bb887..728e3a7582bc 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1871,7 +1871,7 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans, btrfs_set_device_id(leaf, dev_item, device->devid); btrfs_set_device_generation(leaf, dev_item, 0); - btrfs_set_device_type(leaf, dev_item, device->type); + btrfs_set_device_flags(leaf, dev_item, device->flags); btrfs_set_device_io_align(leaf, dev_item, device->io_align); btrfs_set_device_io_width(leaf, dev_item, device->io_width); btrfs_set_device_sector_size(leaf, dev_item, device->sector_size); @@ -2898,7 +2898,7 @@ noinline int btrfs_update_device(struct btrfs_trans_handle *trans, dev_item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_item); btrfs_set_device_id(leaf, dev_item, device->devid); - btrfs_set_device_type(leaf, dev_item, device->type); + btrfs_set_device_flags(leaf, dev_item, device->flags); btrfs_set_device_io_align(leaf, dev_item, device->io_align); btrfs_set_device_io_width(leaf, dev_item, device->io_width); btrfs_set_device_sector_size(leaf, dev_item, device->sector_size); @@ -5288,7 +5288,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, */ devices_info[ndevs].alloc_hint = 0; } else if (ctl->type & BTRFS_BLOCK_GROUP_DATA) { - hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + hint = device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK; /* * skip BTRFS_DEV_METADATA_ONLY disks @@ -5302,7 +5302,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, */ devices_info[ndevs].alloc_hint = -alloc_hint_map[hint]; } else { /* BTRFS_BLOCK_GROUP_METADATA */ - hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + hint = device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK; /* * skip BTRFS_DEV_DATA_ONLY disks @@ -7308,7 +7308,7 @@ static void fill_device_from_item(struct extent_buffer *leaf, device->commit_total_bytes = device->disk_total_bytes; device->bytes_used = btrfs_device_bytes_used(leaf, dev_item); device->commit_bytes_used = device->bytes_used; - device->type = btrfs_device_type(leaf, dev_item); + device->flags = btrfs_device_flags(leaf, dev_item); device->io_align = btrfs_device_io_align(leaf, dev_item); device->io_width = btrfs_device_io_width(leaf, dev_item); device->sector_size = btrfs_device_sector_size(leaf, dev_item); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b066f9af216a..6230d911e7af 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -101,8 +101,8 @@ struct btrfs_device { /* optimal io width for this device */ u32 io_width; - /* type and info about this device */ - u64 type; + /* device flags (e.g. allocation hint) */ + u64 flags; /* minimal io size for this device */ u32 sector_size; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index e0d842c2e616..bfe0b1a7f3a1 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -424,8 +424,8 @@ struct btrfs_dev_item { /* minimal io size for this device */ __le32 sector_size; - /* type and info about this device */ - __le64 type; + /* device flags (e.g. allocation hint) */ + __le64 flags; /* expected generation for this device */ __le64 generation;