From patchwork Fri Dec 17 18:47:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F9FAC43217 for ; Fri, 17 Dec 2021 18:52:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240504AbhLQSwX (ORCPT ); Fri, 17 Dec 2021 13:52:23 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:53002 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240500AbhLQSwV (ORCPT ); Fri, 17 Dec 2021 13:52:21 -0500 Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnRpf; Fri, 17 Dec 2021 18:47:25 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 1/6] btrfs: add flags to give an hint to the chunk allocator Date: Fri, 17 Dec 2021 19:47:17 +0100 Message-Id: <377d6c51cb957fbad5627bb93ff0a76ce9ba79da.1639766364.git.kreijack@inwind.it> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766845; bh=jbkkU8asLw/4hSQLus/n6WoA4G/fKeJmATWmP6RZJEg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=zIWpeK1qkLxxrqsUxv742y7Si0H3tK8S0kXc9IPYAlCkfxnYOlwB8zGln4ZQdiPdd GBEF4BLl9MuBYSgq3kuzxtyb0GW+EUexnR9kFbpLtqk+EfTLDZUMrJr/cSx2NF53T/ D4owOvXbZT5jFQ4MVsMfTfm7d7lz+WJ6t8azl0Es= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Add the following flags to give an hint about which chunk should be allocated in which a disk: - BTRFS_DEV_ALLOCATION_HINT_PREFERRED_DATA preferred data chunk, but metadata chunk allowed - BTRFS_DEV_ALLOCATION_HINT_PREFERRED_METADATA preferred metadata chunk, but data chunk allowed - BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY only metadata chunk allowed - BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY only data chunk allowed Signed-off-by: Goffredo Baroncelli --- include/uapi/linux/btrfs_tree.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 5416f1f1a77a..55da906c2eac 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -386,6 +386,22 @@ struct btrfs_key { __u64 offset; } __attribute__ ((__packed__)); +/* dev_item.type */ + +/* btrfs chunk allocation hint */ +#define BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT 2 +/* btrfs chunk allocation hint mask */ +#define BTRFS_DEV_ALLOCATION_HINT_MASK \ + ((1 << BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT) -1) +/* preferred data chunk, but metadata chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_PREFERRED_DATA (0ULL) +/* preferred metadata chunk, but data chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_PREFERRED_METADATA (1ULL) +/* only metadata chunk are allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY (2ULL) +/* only data chunk allowed */ +#define BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY (3ULL) + struct btrfs_dev_item { /* the internal btrfs device id */ __le64 devid; From patchwork Fri Dec 17 18:47:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69EA7C433EF for ; Fri, 17 Dec 2021 18:52:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234036AbhLQSwW (ORCPT ); Fri, 17 Dec 2021 13:52:22 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:52998 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240501AbhLQSwV (ORCPT ); Fri, 17 Dec 2021 13:52:21 -0500 Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnRqE; Fri, 17 Dec 2021 18:47:25 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 2/6] btrfs: export the device allocation_hint property in sysfs Date: Fri, 17 Dec 2021 19:47:18 +0100 Message-Id: <9a3c5371722ab7d10e2eb974c53d07eba53400a5.1639766364.git.kreijack@inwind.it> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766845; bh=/QHeyDz0Pq3haR9fuw3N6nrUgmQ9LDw+7pDkGvAZe7o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=SwmVV3RiVe7r0umSVPHhyhVkaB30Apn8RZgyEGYjCybfckYpeG5KS4h3ivWOL+3Hr nSnfDdj8oWinJLs0nUW4ikyo455P/A7zINIGpMC5/SQfF7zPFqqlnpbwmQqNvqEn8U GmPvk5c4ikkCUo/Ra3UwBN5RzlgcXY8mJxauIgvE= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Eport the device allocation_hint property via /sys/fs/btrfs//devinfo//allocation_hint Signed-off-by: Goffredo Baroncelli Reviewed-by: Boris Burkov --- fs/btrfs/sysfs.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index beb7f72d50b8..a8d918700d2b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1575,6 +1575,17 @@ static ssize_t btrfs_devinfo_error_stats_show(struct kobject *kobj, } BTRFS_ATTR(devid, error_stats, btrfs_devinfo_error_stats_show); +static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, + struct kobj_attribute *a, char *buf) +{ + struct btrfs_device *device = container_of(kobj, struct btrfs_device, + devid_kobj); + + return scnprintf(buf, PAGE_SIZE, "0x%08llx\n", + device->type & BTRFS_DEV_ALLOCATION_HINT_MASK ); +} +BTRFS_ATTR(devid, allocation_hint, btrfs_devinfo_allocation_hint_show); + /* * Information about one device. * @@ -1588,6 +1599,7 @@ static struct attribute *devid_attrs[] = { BTRFS_ATTR_PTR(devid, replace_target), BTRFS_ATTR_PTR(devid, scrub_speed_max), BTRFS_ATTR_PTR(devid, writeable), + BTRFS_ATTR_PTR(devid, allocation_hint), NULL }; ATTRIBUTE_GROUPS(devid); From patchwork Fri Dec 17 18:47:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C94AC433FE for ; Fri, 17 Dec 2021 18:52:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234138AbhLQSwd (ORCPT ); Fri, 17 Dec 2021 13:52:33 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:53002 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240507AbhLQSwY (ORCPT ); Fri, 17 Dec 2021 13:52:24 -0500 Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnRqt; Fri, 17 Dec 2021 18:47:26 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 3/6] btrfs: change the device allocation_hint property via sysfs Date: Fri, 17 Dec 2021 19:47:19 +0100 Message-Id: <1425d4ea491c4f4b2019bc5e9d2f405a7573b5ed.1639766364.git.kreijack@inwind.it> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766846; bh=e5asLBdo8yeqlljTT9u9/0jdcaXV7Wk0RqKoM3r2reM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=nIoNTaz1FAbi7pgTfz/mp8IvZM1QIm/jmISV/iFG4LJw3Rgix6Nx3MzYOKB6uqyKH HBw1/+ttGHSlauvTid0o3KeoLQywYoZAL+dkvYZQGmOoOUq9TIL/lkAjcyhbiKEG9i BsA3LN1OTFJTU7TzB+4YTby/qMS/zMJEmhgaqzWM= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Signed-off-by: Goffredo Baroncelli --- fs/btrfs/sysfs.c | 62 +++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 ++ 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index a8d918700d2b..53acc66065dd 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1584,7 +1584,67 @@ static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, return scnprintf(buf, PAGE_SIZE, "0x%08llx\n", device->type & BTRFS_DEV_ALLOCATION_HINT_MASK ); } -BTRFS_ATTR(devid, allocation_hint, btrfs_devinfo_allocation_hint_show); + +static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_fs_info *fs_info; + struct btrfs_root *root; + struct btrfs_device *device; + int ret; + struct btrfs_trans_handle *trans; + + u64 type, prev_type; + + device = container_of(kobj, struct btrfs_device, devid_kobj); + fs_info = device->fs_info; + if (!fs_info) + return -EPERM; + + root = fs_info->chunk_root; + if (sb_rdonly(fs_info->sb)) + return -EROFS; + + ret = kstrtou64(buf, 0, &type); + if (ret < 0) + return -EINVAL; + + /* for now, allow to touch only the 'allocation hint' bits */ + if (type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) + return -EINVAL; + + /* check if a change is really needed */ + if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) == type) + return len; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + prev_type = device->type; + device->type = (device->type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | type; + + ret = btrfs_update_device(trans, device); + + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + goto abort; + } + + ret = btrfs_commit_transaction(trans); + if (ret < 0) + goto abort; + + return len; +abort: + device->type = prev_type; + return ret; +} +BTRFS_ATTR_RW(devid, allocation_hint, btrfs_devinfo_allocation_hint_show, + btrfs_devinfo_allocation_hint_store); + /* * Information about one device. diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a7071f34fe64..806b599c6a46 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2859,7 +2859,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path return ret; } -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans, +noinline int btrfs_update_device(struct btrfs_trans_handle *trans, struct btrfs_device *device) { int ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 9cf1d93a3d66..5097c0c12a8e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -638,5 +638,7 @@ int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); +int btrfs_update_device(struct btrfs_trans_handle *trans, + struct btrfs_device *device); #endif From patchwork Fri Dec 17 18:47:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D138C433FE for ; Fri, 17 Dec 2021 18:52:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240528AbhLQSwa (ORCPT ); Fri, 17 Dec 2021 13:52:30 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:52998 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240520AbhLQSw0 (ORCPT ); Fri, 17 Dec 2021 13:52:26 -0500 Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnSrN; Fri, 17 Dec 2021 18:47:26 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 4/6] btrfs: add allocation_hint mode Date: Fri, 17 Dec 2021 19:47:20 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766846; bh=l3HdplbtJ4rePCwDyojsbaAWdVGb8yutr0rSfbenvEA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=6TXDuJmn6e9higF451pA4moBc6ZXWn0PvNi8oJrEpCzicDwu64k2YD391dBKRmpo3 WNQm9ZGyyQ0q93CPAwQL1rzPaoY5wgU0tmCU8n2+SXffFmN/FqQtF/qBIxbDiTlXFg /SqX5q2zBstpwbmaiasqUbdW8SkMyUQUfsS7l7DY= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli The chunk allocation policy is modified as follow. Each disk may have one of the following tags: - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA - BTRFS_DEV_ALLOCATION_METADATA_ONLY - BTRFS_DEV_ALLOCATION_DATA_ONLY - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default) During a *mixed data/metadata* chunk allocation, BTRFS works as usual. During a *data* chunk allocation, the space are searched first in BTRFS_DEV_ALLOCATION_DATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_DATA tagged disks. If no space is found or the space found is not enough (eg. in raid5, only two disks are available), then also the disks tagged BTRFS_DEV_ALLOCATION_PREFERRED_METADATA are evaluated. If even in this case this the space is not sufficient, -ENOSPC is raised. A disk tagged with BTRFS_DEV_ALLOCATION_METADATA_ONLY is never considered for a data BG allocation. During a *metadata* chunk allocation, the space are searched first in BTRFS_DEV_ALLOCATION_METADATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_METADATA tagged disks. If no space is found or the space found is not enough (eg. in raid5, only two disks are available), then also the disks tagged BTRFS_DEV_ALLOCATION_PREFERRED_DATA are considered. If even in this case this the space is not sufficient, -ENOSPC is raised. A disk tagged with BTRFS_DEV_ALLOCATION_DATA_ONLY is never considered for a metadata BG allocation. By default the disks are tagged as BTRFS_DEV_ALLOCATION_PREFERRED_DATA, so the default behavior happens. If the user prefer to store the metadata in the faster disks (e.g. the SSD), he can tag these with BTRFS_DEV_ALLOCATION_PREFERRED_DATA: in this case the data BG go in the BTRFS_DEV_ALLOCATION_PREFERRED_DATA disks and the metadata BG in the others, until there is enough space. Only if one disks set is filled, the other is occupied. WARNING: if the user tags a disk with BTRFS_DEV_ALLOCATION_DATA_ONLY, this means that this disk will never be used for allocating metadata increasing the likelihood of exhausting the metadata space. Signed-off-by: Goffredo Baroncelli --- fs/btrfs/volumes.c | 94 +++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.h | 1 + 2 files changed, 94 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 806b599c6a46..beee7d1ae79d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -184,6 +184,16 @@ enum btrfs_raid_types __attribute_const__ btrfs_bg_flags_to_raid_index(u64 flags return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } +#define BTRFS_DEV_ALLOCATION_HINT_COUNT (1ULL << \ + BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT) + +static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_HINT_COUNT] = { + [BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY] = -1, + [BTRFS_DEV_ALLOCATION_HINT_PREFERRED_DATA] = 0, + [BTRFS_DEV_ALLOCATION_HINT_PREFERRED_METADATA] = 1, + [BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY] = 2, +}; + const char *btrfs_bg_type_to_raid_name(u64 flags) { const int index = btrfs_bg_flags_to_raid_index(flags); @@ -5037,13 +5047,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info, } /* - * sort the devices in descending order by max_avail, total_avail + * sort the devices in descending order by alloc_hint, + * max_avail, total_avail */ static int btrfs_cmp_device_info(const void *a, const void *b) { const struct btrfs_device_info *di_a = a; const struct btrfs_device_info *di_b = b; + if (di_a->alloc_hint > di_b->alloc_hint) + return -1; + if (di_a->alloc_hint < di_b->alloc_hint) + return 1; if (di_a->max_avail > di_b->max_avail) return -1; if (di_a->max_avail < di_b->max_avail) @@ -5206,6 +5221,8 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, int ndevs = 0; u64 max_avail; u64 dev_offset; + int hint; + int i; /* * in the first pass through the devices list, we gather information @@ -5258,16 +5275,91 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, devices_info[ndevs].max_avail = max_avail; devices_info[ndevs].total_avail = total_avail; devices_info[ndevs].dev = device; + + if ((ctl->type & BTRFS_BLOCK_GROUP_DATA) && + (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) { + /* + * if mixed bg set all the alloc_hint + * fields to the same value, so the sorting + * is not affected + */ + devices_info[ndevs].alloc_hint = 0; + } else if (ctl->type & BTRFS_BLOCK_GROUP_DATA) { + hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + + /* + * skip BTRFS_DEV_METADATA_ONLY disks + */ + if (hint == BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY) + continue; + /* + * if a data chunk must be allocated, + * sort also by hint (data disk + * higher priority) + */ + devices_info[ndevs].alloc_hint = -alloc_hint_map[hint]; + } else { /* BTRFS_BLOCK_GROUP_METADATA */ + hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + + /* + * skip BTRFS_DEV_DATA_ONLY disks + */ + if (hint == BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY) + continue; + /* + * if a data chunk must be allocated, + * sort also by hint (metadata hint + * higher priority) + */ + devices_info[ndevs].alloc_hint = alloc_hint_map[hint]; + } + ++ndevs; } ctl->ndevs = ndevs; + /* + * no devices available + */ + if (!ndevs) + return 0; + /* * now sort the devices by hole size / available space */ sort(devices_info, ndevs, sizeof(struct btrfs_device_info), btrfs_cmp_device_info, NULL); + /* + * select the minimum set of disks grouped by hint that + * can host the chunk + */ + ndevs = 0; + while (ndevs < ctl->ndevs) { + hint = devices_info[ndevs++].alloc_hint; + while (ndevs < ctl->ndevs && + devices_info[ndevs].alloc_hint == hint) + ndevs++; + if (ndevs >= ctl->devs_min) + break; + } + + BUG_ON(ndevs > ctl->ndevs); + ctl->ndevs = ndevs; + + /* + * the next layers require the devices_info ordered by + * max_avail. If we are returing two (or more) different + * group of alloc_hint, this is not always true. So sort + * these gain. + */ + + for (i = 0 ; i < ndevs ; i++) + devices_info[i].alloc_hint = 0; + + sort(devices_info, ndevs, sizeof(struct btrfs_device_info), + btrfs_cmp_device_info, NULL); + return 0; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 5097c0c12a8e..61c0cba045e9 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -406,6 +406,7 @@ struct btrfs_device_info { u64 dev_offset; u64 max_avail; u64 total_avail; + int alloc_hint; }; struct btrfs_raid_attr { From patchwork Fri Dec 17 18:47:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00BABC4332F for ; Fri, 17 Dec 2021 18:52:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240500AbhLQSwb (ORCPT ); Fri, 17 Dec 2021 13:52:31 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:53000 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240519AbhLQSw0 (ORCPT ); Fri, 17 Dec 2021 13:52:26 -0500 X-Greylist: delayed 304 seconds by postgrey-1.27 at vger.kernel.org; Fri, 17 Dec 2021 13:52:21 EST Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnSru; Fri, 17 Dec 2021 18:47:27 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 5/6] btrfs: rename dev_item->type to dev_item->flags Date: Fri, 17 Dec 2021 19:47:21 +0100 Message-Id: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766847; bh=8qdEf8vUUxuTZ21PV9N1pj2eUYcmTxXdmeXdB/BTiyQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=z6XZTqzovjedmVAS0GyG1SUzXZ5TAewK4k4Bml5ggxLJA0jh/4WihbLUEYj1Jjv1+ HOiOyYzrM5LXxYv0T5ekB3JRxvfUUpjubsLF8ES3+k3HhGcHtASHASJbe7ALQFYf41 Sv1//fx6VHSQb63z1VdckVHeVb4zVRKw5eQ52KFY= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Rename the field type of dev_item from 'type' to 'flags' changing the struct btrfs_device and btrfs_dev_item. Signed-off-by: Goffredo Baroncelli --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 2 +- fs/btrfs/sysfs.c | 17 +++++++++-------- fs/btrfs/volumes.c | 10 +++++----- fs/btrfs/volumes.h | 4 ++-- include/uapi/linux/btrfs_tree.h | 4 ++-- 6 files changed, 21 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 459d00211181..778c7c807289 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1669,7 +1669,7 @@ static inline void btrfs_set_device_total_bytes(const struct extent_buffer *eb, } -BTRFS_SETGET_FUNCS(device_type, struct btrfs_dev_item, type, 64); +BTRFS_SETGET_FUNCS(device_flags, struct btrfs_dev_item, flags, 64); BTRFS_SETGET_FUNCS(device_bytes_used, struct btrfs_dev_item, bytes_used, 64); BTRFS_SETGET_FUNCS(device_io_align, struct btrfs_dev_item, io_align, 32); BTRFS_SETGET_FUNCS(device_io_width, struct btrfs_dev_item, io_width, 32); @@ -1682,7 +1682,7 @@ BTRFS_SETGET_FUNCS(device_seek_speed, struct btrfs_dev_item, seek_speed, 8); BTRFS_SETGET_FUNCS(device_bandwidth, struct btrfs_dev_item, bandwidth, 8); BTRFS_SETGET_FUNCS(device_generation, struct btrfs_dev_item, generation, 64); -BTRFS_SETGET_STACK_FUNCS(stack_device_type, struct btrfs_dev_item, type, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_flags, struct btrfs_dev_item, flags, 64); BTRFS_SETGET_STACK_FUNCS(stack_device_total_bytes, struct btrfs_dev_item, total_bytes, 64); BTRFS_SETGET_STACK_FUNCS(stack_device_bytes_used, struct btrfs_dev_item, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index fc7dd5109806..02ffb8bc7d6b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4342,7 +4342,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) continue; btrfs_set_stack_device_generation(dev_item, 0); - btrfs_set_stack_device_type(dev_item, dev->type); + btrfs_set_stack_device_flags(dev_item, dev->flags); btrfs_set_stack_device_id(dev_item, dev->devid); btrfs_set_stack_device_total_bytes(dev_item, dev->commit_total_bytes); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 53acc66065dd..be4196a1645c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1582,7 +1582,7 @@ static ssize_t btrfs_devinfo_allocation_hint_show(struct kobject *kobj, devid_kobj); return scnprintf(buf, PAGE_SIZE, "0x%08llx\n", - device->type & BTRFS_DEV_ALLOCATION_HINT_MASK ); + device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK ); } static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, @@ -1595,7 +1595,7 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, int ret; struct btrfs_trans_handle *trans; - u64 type, prev_type; + u64 flags, prev_flags; device = container_of(kobj, struct btrfs_device, devid_kobj); fs_info = device->fs_info; @@ -1606,24 +1606,25 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, if (sb_rdonly(fs_info->sb)) return -EROFS; - ret = kstrtou64(buf, 0, &type); + ret = kstrtou64(buf, 0, &flags); if (ret < 0) return -EINVAL; /* for now, allow to touch only the 'allocation hint' bits */ - if (type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) + if (flags & ~BTRFS_DEV_ALLOCATION_HINT_MASK) return -EINVAL; /* check if a change is really needed */ - if ((device->type & BTRFS_DEV_ALLOCATION_HINT_MASK) == type) + if ((device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK) == flags) return len; trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) return PTR_ERR(trans); - prev_type = device->type; - device->type = (device->type & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | type; + prev_flags = device->flags; + device->flags = (device->flags & ~BTRFS_DEV_ALLOCATION_HINT_MASK) | + flags; ret = btrfs_update_device(trans, device); @@ -1639,7 +1640,7 @@ static ssize_t btrfs_devinfo_allocation_hint_store(struct kobject *kobj, return len; abort: - device->type = prev_type; + device->flags = prev_flags; return ret; } BTRFS_ATTR_RW(devid, allocation_hint, btrfs_devinfo_allocation_hint_show, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index beee7d1ae79d..9184570c51b0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1876,7 +1876,7 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans, btrfs_set_device_id(leaf, dev_item, device->devid); btrfs_set_device_generation(leaf, dev_item, 0); - btrfs_set_device_type(leaf, dev_item, device->type); + btrfs_set_device_flags(leaf, dev_item, device->flags); btrfs_set_device_io_align(leaf, dev_item, device->io_align); btrfs_set_device_io_width(leaf, dev_item, device->io_width); btrfs_set_device_sector_size(leaf, dev_item, device->sector_size); @@ -2900,7 +2900,7 @@ noinline int btrfs_update_device(struct btrfs_trans_handle *trans, dev_item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_item); btrfs_set_device_id(leaf, dev_item, device->devid); - btrfs_set_device_type(leaf, dev_item, device->type); + btrfs_set_device_flags(leaf, dev_item, device->flags); btrfs_set_device_io_align(leaf, dev_item, device->io_align); btrfs_set_device_io_width(leaf, dev_item, device->io_width); btrfs_set_device_sector_size(leaf, dev_item, device->sector_size); @@ -5285,7 +5285,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, */ devices_info[ndevs].alloc_hint = 0; } else if (ctl->type & BTRFS_BLOCK_GROUP_DATA) { - hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + hint = device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK; /* * skip BTRFS_DEV_METADATA_ONLY disks @@ -5299,7 +5299,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, */ devices_info[ndevs].alloc_hint = -alloc_hint_map[hint]; } else { /* BTRFS_BLOCK_GROUP_METADATA */ - hint = device->type & BTRFS_DEV_ALLOCATION_HINT_MASK; + hint = device->flags & BTRFS_DEV_ALLOCATION_HINT_MASK; /* * skip BTRFS_DEV_DATA_ONLY disks @@ -7293,7 +7293,7 @@ static void fill_device_from_item(struct extent_buffer *leaf, device->commit_total_bytes = device->disk_total_bytes; device->bytes_used = btrfs_device_bytes_used(leaf, dev_item); device->commit_bytes_used = device->bytes_used; - device->type = btrfs_device_type(leaf, dev_item); + device->flags = btrfs_device_flags(leaf, dev_item); device->io_align = btrfs_device_io_align(leaf, dev_item); device->io_width = btrfs_device_io_width(leaf, dev_item); device->sector_size = btrfs_device_sector_size(leaf, dev_item); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 61c0cba045e9..27ecf062d50c 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -96,8 +96,8 @@ struct btrfs_device { /* optimal io width for this device */ u32 io_width; - /* type and info about this device */ - u64 type; + /* device flags (e.g. allocation hint) */ + u64 flags; /* minimal io size for this device */ u32 sector_size; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 55da906c2eac..f9891c94a75e 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -421,8 +421,8 @@ struct btrfs_dev_item { /* minimal io size for this device */ __le32 sector_size; - /* type and info about this device */ - __le64 type; + /* device flags (e.g. allocation hint) */ + __le64 flags; /* expected generation for this device */ __le64 generation; From patchwork Fri Dec 17 18:47:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goffredo Baroncelli X-Patchwork-Id: 12685535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C29BDC433F5 for ; Fri, 17 Dec 2021 18:52:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239838AbhLQSwb (ORCPT ); Fri, 17 Dec 2021 13:52:31 -0500 Received: from santino.mail.tiscali.it ([213.205.33.245]:53124 "EHLO smtp.tiscali.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S240526AbhLQSw2 (ORCPT ); Fri, 17 Dec 2021 13:52:28 -0500 Received: from venice.bhome ([78.12.25.242]) by santino.mail.tiscali.it with id XWnP2601f5DQHji01WnTsN; Fri, 17 Dec 2021 18:47:27 +0000 x-auth-user: kreijack@tiscali.it From: Goffredo Baroncelli To: linux-btrfs@vger.kernel.org Cc: Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: [PATCH 6/6] btrfs: add allocation_hint option. Date: Fri, 17 Dec 2021 19:47:22 +0100 Message-Id: <3dddd204abe208d8744913c801f32138f4924f4f.1639766364.git.kreijack@inwind.it> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Reply-To: Goffredo Baroncelli MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tiscali.it; s=smtp; t=1639766847; bh=Ko04ipVyYH9AXE8DoqsF3+Af5SCYy88vvP2e/sp3RrE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Reply-To; b=pS0mW5+PMCS0/1mpO/WNgji/RUpsvvdDZnQwg/CLOrTHoEQvRcGxWJuxaRElzwf2S +HcgyQ9LDq5c5nhmuQbrG9ZWDKGlZ1CO2acJq+e56KM15OtATsd6JhXYWbR5meN87V ba1cuVEGj/bLRBZcwmI1ZcUWAkYbUNmnB9Qhzt10= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Add allocation_hint mount option. This option accepts the following values: - 0 (default): the chunks allocator ignores the disk hints - 1: the chunks allocator considers the disk hints Signed-off-by: Goffredo Baroncelli --- fs/btrfs/ctree.h | 14 ++++++++++++++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/super.c | 17 +++++++++++++++++ fs/btrfs/volumes.c | 9 ++++++--- 4 files changed, 39 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 778c7c807289..bb31cdcaf959 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -620,6 +620,15 @@ enum btrfs_exclusive_operation { BTRFS_EXCLOP_SWAP_ACTIVATE, }; +/* + * allocation_hint mode + */ + +enum btrfs_allocation_hint_modes { + BTRFS_ALLOCATION_HINT_DISABLED, + BTRFS_ALLOCATION_HINT_ENABLED +}; + struct btrfs_fs_info { u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; unsigned long flags; @@ -1021,6 +1030,11 @@ struct btrfs_fs_info { u64 zoned; }; + /* allocation_hint mode */ + int allocation_hint_mode; + + /* Max size to emit ZONE_APPEND write command */ + u64 max_zone_append_size; struct mutex zoned_meta_io_lock; spinlock_t treelog_bg_lock; u64 treelog_bg; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 02ffb8bc7d6b..09d365a689c9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3160,6 +3160,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->swapfile_pins_lock); fs_info->swapfile_pins = RB_ROOT; + fs_info->allocation_hint_mode = BTRFS_ALLOCATION_HINT_DISABLED; + fs_info->bg_reclaim_threshold = BTRFS_DEFAULT_RECLAIM_THRESH; INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work); } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a1c54a2c787c..68911152420a 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -373,6 +373,7 @@ enum { Opt_thread_pool, Opt_treelog, Opt_notreelog, Opt_user_subvol_rm_allowed, + Opt_allocation_hint, /* Rescue options */ Opt_rescue, @@ -446,6 +447,7 @@ static const match_table_t tokens = { {Opt_treelog, "treelog"}, {Opt_notreelog, "notreelog"}, {Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"}, + {Opt_allocation_hint, "allocation_hint=%d"}, /* Rescue options */ {Opt_rescue, "rescue=%s"}, @@ -903,6 +905,19 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, case Opt_user_subvol_rm_allowed: btrfs_set_opt(info->mount_opt, USER_SUBVOL_RM_ALLOWED); break; + case Opt_allocation_hint: + ret = match_int(&args[0], &intarg); + if (ret || (intarg != 1 && intarg != 0)) { + btrfs_err(info, "invalid allocation_hint= parameter\n"); + ret = -EINVAL; + goto out; + } + if (intarg) + btrfs_info(info, "allocation_hint enabled"); + else + btrfs_info(info, "allocation_hint disabled"); + info->allocation_hint_mode = intarg; + break; case Opt_enospc_debug: btrfs_set_opt(info->mount_opt, ENOSPC_DEBUG); break; @@ -1497,6 +1512,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ",clear_cache"); if (btrfs_test_opt(info, USER_SUBVOL_RM_ALLOWED)) seq_puts(seq, ",user_subvol_rm_allowed"); + if (info->allocation_hint_mode) + seq_puts(seq, ",allocation_hint=1"); if (btrfs_test_opt(info, ENOSPC_DEBUG)) seq_puts(seq, ",enospc_debug"); if (btrfs_test_opt(info, AUTO_DEFRAG)) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9184570c51b0..15302c068008 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5276,10 +5276,13 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, devices_info[ndevs].total_avail = total_avail; devices_info[ndevs].dev = device; - if ((ctl->type & BTRFS_BLOCK_GROUP_DATA) && - (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) { + if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) && + (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) || + info->allocation_hint_mode == + BTRFS_ALLOCATION_HINT_DISABLED) { /* - * if mixed bg set all the alloc_hint + * if mixed bg or the allocator hint is + * disabled, set all the alloc_hint * fields to the same value, so the sorting * is not affected */