From patchwork Wed Dec 4 08:24:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86C376C1 for ; Wed, 4 Dec 2019 08:27:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64B6A20637 for ; Wed, 4 Dec 2019 08:27:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="FdQukjnQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727213AbfLDI1d (ORCPT ); Wed, 4 Dec 2019 03:27:33 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725839AbfLDI1d (ORCPT ); Wed, 4 Dec 2019 03:27:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448072; x=1606984072; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8bhNyzLhl9owrliJRAQ0iti/RcuoOYOekzekOMWKVqU=; b=FdQukjnQQYIvMsGTIXLkwoOPNj9z4edHXDX7aJNq2cqll7VHGBmp4JDq iwsoiv/BhTRkvJzRxIiB8U2BrB5Ta36inquYpveojbO1qydAucLQAfsQa HeF619rSToEbeB7TtJKh0XrvOX8wGsb70HJDRo8VGh6Hgddlvru3GO1/p 8zCOb8ekpEJbPvbN3LRSqXBg1FRNWVUngPvZF6IfB2SopBnHzkXQdR5ng Sfo31n9SYiOhIBtQyODReo7irbO8+Wu4e10jt/EUaYpmwpMy9QSk2Z7F+ L4iGvLqktjdq/IXf0EzpVrNGmNMamP2gTWJG+EJISbVqPnx2UJ09XTW/0 A==; IronPort-SDR: plkWdUdQD9KBmAkq02Xy/iNayLiXe3prJBXOmqR5soS64PdoqLm/X+G8fQnqjiIL96lFClzO0w q6sNsuED7wBrYP945OoVnbYcFxx6mMZQEi7oC6LM8ISnopQIv0v0zr4VpUVhtyrxXAkw7NK4cZ ZxAkzKA6HC58DoU3yekdr6jowdpoTtCPhgGPtDFJ11W6EJ1sxYg84/7r9rNA8JLYWtcMI4Wei+ oJZIpH4vm3zOB9sfgvVpGTnzAvDhwtZ7K71vSdKhs/iCSv90QbcXNantZx5AjwxKQmBWIqemFD +Mg= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031728" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:49 +0800 IronPort-SDR: 072lL82USHHv6OHdRszRrL7dsL14AZPxAM29ML+rOnR9TtCCTI4CfL5KYWtRY2woWdQ3eAZvJh RSwKEcqf1JjiXLhm2iFrThDf46O6M43G4TZF9RhbcCr+6UaP7UbXqeoVODbNYJFpnhUOKENZr+ A5LfetHka2NzQsykhd//61HXg1UBaqHiejuMAvJIgAOH9frzzZanx1SFtSBU1VQtvedZhAzFG7 BgvSa7Wl5krOac/h23wBfsseBNpPTecNiqq6PqaL4Oq8mPq/u1RfHmHLIV7jiHoO4W3oIzIxGg LU+bAMB642uGdIH+PijaqCn3 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:17 -0800 IronPort-SDR: 5Lf6IOEvuYl/lFv69OoGTdiyE87BhROKR2nCSnOgaqG3FlvsjvoQlPAp79lYEp/9jY2eapCuzE H8LAkUaWRvHgmI50HTHZ392nWN6Z5dY0InixtB/N0gVqhaZsOGRHf1UhhmmakpSThjEjCVf1d4 x5Q54PyMdtkKy7dTukjq9i6kWdGh/8x4hfVXxTaA+L8GSzkz72mt3igkHD4YcLDaPWPfUW7AA8 mB73TirFE/RsPVWi5Hh5G5mby+vBtfQI+cCeu9x2CZRNzzwcySiS/pS+7QaAE+VQYrEx0WFzJS 7zs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 01/15] btrfs-progs: utils: Introduce queue_param helper function Date: Wed, 4 Dec 2019 17:24:59 +0900 Message-Id: <20191204082513.857320-2-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce the queue_param helper function to get a device request queue parameter. This helper will be used later to query information of a zoned device. Furthermore, rewrite is_ssd() using the helper function. Signed-off-by: Damien Le Moal [Naohiro] fixed error return value Signed-off-by: Naohiro Aota --- common/device-utils.c | 46 +++++++++++++++++++++++++++++++++++++++++++ common/device-utils.h | 1 + mkfs/main.c | 40 ++----------------------------------- 3 files changed, 49 insertions(+), 38 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index b03d62faaf21..7fa9386f4677 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -252,3 +252,49 @@ u64 get_partition_size(const char *dev) return result; } +/* + * Get a device request queue parameter. + */ +int queue_param(const char *file, const char *param, char *buf, size_t len) +{ + blkid_probe probe; + char wholedisk[PATH_MAX]; + char sysfs_path[PATH_MAX]; + dev_t devno; + int fd; + int ret; + + probe = blkid_new_probe_from_filename(file); + if (!probe) + return 0; + + /* Device number of this disk (possibly a partition) */ + devno = blkid_probe_get_devno(probe); + if (!devno) { + blkid_free_probe(probe); + return 0; + } + + /* Get whole disk name (not full path) for this devno */ + ret = blkid_devno_to_wholedisk(devno, + wholedisk, sizeof(wholedisk), NULL); + if (ret) { + blkid_free_probe(probe); + return 0; + } + + snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/%s", + wholedisk, param); + + blkid_free_probe(probe); + + fd = open(sysfs_path, O_RDONLY); + if (fd < 0) + return 0; + + len = read(fd, buf, len); + close(fd); + + return len; +} + diff --git a/common/device-utils.h b/common/device-utils.h index 70d19cae3e50..d1799323d002 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -29,5 +29,6 @@ u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, u64 max_block_count, unsigned opflags); +int queue_param(const char *file, const char *param, char *buf, size_t len); #endif diff --git a/mkfs/main.c b/mkfs/main.c index 316ea82e45c6..14e9ae7aeb6d 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -432,49 +432,13 @@ static int zero_output_file(int out_fd, u64 size) static int is_ssd(const char *file) { - blkid_probe probe; - char wholedisk[PATH_MAX]; - char sysfs_path[PATH_MAX]; - dev_t devno; - int fd; char rotational; int ret; - probe = blkid_new_probe_from_filename(file); - if (!probe) + ret = queue_param(file, "rotational", &rotational, 1); + if (ret < 1) return 0; - /* Device number of this disk (possibly a partition) */ - devno = blkid_probe_get_devno(probe); - if (!devno) { - blkid_free_probe(probe); - return 0; - } - - /* Get whole disk name (not full path) for this devno */ - ret = blkid_devno_to_wholedisk(devno, - wholedisk, sizeof(wholedisk), NULL); - if (ret) { - blkid_free_probe(probe); - return 0; - } - - snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/rotational", - wholedisk); - - blkid_free_probe(probe); - - fd = open(sysfs_path, O_RDONLY); - if (fd < 0) { - return 0; - } - - if (read(fd, &rotational, 1) < 1) { - close(fd); - return 0; - } - close(fd); - return rotational == '0'; } From patchwork Wed Dec 4 08:25:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14FD8186D for ; Wed, 4 Dec 2019 08:27:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DDB35207DD for ; Wed, 4 Dec 2019 08:27:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="OqJ/beRH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727268AbfLDI1e (ORCPT ); Wed, 4 Dec 2019 03:27:34 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726053AbfLDI1d (ORCPT ); Wed, 4 Dec 2019 03:27:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448073; x=1606984073; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Hn8rXOZX1OA2+3TWZvhoWhKBiZE4/a1ND96omx9fCo4=; b=OqJ/beRH2GUUOXoK0nCiFzSza1XQLJDSmbCROuqbamq2bFvXiOhVkHVe JpFc1hiZliYPV5tjAoKEo9RrFsJn8jZeONDEk47pbFy0EqYJBjvKvEyp4 eZNrdVZYyBxwobErBjEzazP36fQYV3UqmeJm+KhWRS0+fzfpbBAVRDpij kzR0+CKUTugl9lWoKI6YHfFSfEFqU4j5FHeKQLxyT+cRfmnilUZMKuEpL q7SdsEQKTfpeX+9FiepxVFaoFPR+b4jcqgpJa9oI9hkfOncLnsPVCytis j5rR1/buDmp9QacDjdDnD9sqn8pxRT7+wESz9mckKWht3RJ5oTWITj0Sj w==; IronPort-SDR: F3jT1aipMiawpLNvtIqj8LS+aO4Hzrj71aHwGAGkvILDO0wpIvG4HP+zPLdCw8mrHi1htEGIJ1 CNU98JdF0ydQakt1Ngr97oV/2u5lFAMIzTDnMYxbAuOP3+obFLD4eDy/zbJmtcnOhs7P0ArYkx 3RDAJT2vY4QMTe4N+vob4f+5Y37PFwQfiOwl/l6TbHrViAqetx0fofydNvi5IFddhA2XXm/5q3 OH5i2k818NC6RIhxCreGiABzGApAjZ/7zZzFVNqyBevd6IRcj/rfcobkLPbPAP3RrpnFNwOUdl ubE= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031729" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:52 +0800 IronPort-SDR: miXKoQvodiPDbxEoQXl1vI87/WPOgGhrVqcRm813M4rNwKoiK9G/s6k+n8okVO7MclKSXnZU24 e+dLo76CN+8zoVjbPsJlCfApjkrB5vwqvKu8ssLKr5sZOVyHkVzaNuerfUeXheMU5PTSbbcrIh LjIeFStv2zysVzypC1/Dg9J28tT+B2GAxop54GDKC8jDCZISJlVBvlrFLWYnOtAX88pbjeUp9D NjTbNpWmXHg+2xJstWmw4irVWTrihRI6HcIz4wS9PLRB6XtBabHNffXFyTwQ8jIwhIqMwB6gxH /vgz87ocMxnknJf5+RG4jP9Y Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:20 -0800 IronPort-SDR: 6ieqC4pkVOYlOL0vdg6DXwi3m6rbjSBWcjNFoGq1NeNmKh+OupyUrIi387bLnfpIehr5af6Uhb Orb4SyOm/y9mGVmUubxC6Jp74Ql9YJTavus8v2dquVYoMBoOq8cU5iIBX3trl1Kc7Ipws8+7f2 yMRRvFHGKxdf/ThFVmWCN1aSGrQ/Bq8hmgqcAxXy9wU/sBUPgXyMcY3LdJCoIKYAGqfOayzlWX peFsL8a2lOu71JRUvho5g3GcTSjjKLBmfmbiC/bhuqsCunI7/bK8qzuos4UmAzItqL8Mc4LItr GHE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:31 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 02/15] btrfs-progs: introduce raid parameters variables Date: Wed, 4 Dec 2019 17:25:00 +0900 Message-Id: <20191204082513.857320-3-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Userland btrfs_alloc_chunk() and its kernel side counterpart __btrfs_alloc_chunk() is so diverged that it's difficult to use the kernel code as is. This commit introduces some RAID parameter variables and read them from btrfs_raid_array as the same as in kernel land. Signed-off-by: Naohiro Aota --- volumes.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/volumes.c b/volumes.c index 143164f02ac0..8bfffa5586eb 100644 --- a/volumes.c +++ b/volumes.c @@ -1014,6 +1014,18 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int max_stripes = 0; int min_stripes = 1; int sub_stripes = 1; + int dev_stripes __attribute__((unused)); + /* stripes per dev */ + int devs_max; /* max devs to use */ + int devs_min __attribute__((unused)); + /* min devs needed */ + int devs_increment __attribute__((unused)); + /* ndevs has to be a multiple of this */ + int ncopies __attribute__((unused)); + /* how many copies to data has */ + int nparity __attribute__((unused)); + /* number of stripes worth of bytes to + store parity information */ int looped = 0; int ret; int index; @@ -1025,6 +1037,18 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, return -ENOSPC; } + index = btrfs_bg_flags_to_raid_index(type); + + sub_stripes = btrfs_raid_array[index].sub_stripes; + dev_stripes = btrfs_raid_array[index].dev_stripes; + devs_max = btrfs_raid_array[index].devs_max; + if (!devs_max) + devs_max = BTRFS_MAX_DEVS(info); + devs_min = btrfs_raid_array[index].devs_min; + devs_increment = btrfs_raid_array[index].devs_increment; + ncopies = btrfs_raid_array[index].ncopies; + nparity = btrfs_raid_array[index].nparity; + if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { if (type & BTRFS_BLOCK_GROUP_SYSTEM) { calc_size = SZ_8M; @@ -1085,7 +1109,6 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (num_stripes < 4) return -ENOSPC; num_stripes &= ~(u32)1; - sub_stripes = 2; min_stripes = 4; } if (type & (BTRFS_BLOCK_GROUP_RAID5)) { From patchwork Wed Dec 4 08:25:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272411 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C20514B7 for ; Wed, 4 Dec 2019 08:27:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6A9BE20637 for ; Wed, 4 Dec 2019 08:27:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="jnKMPKiq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727312AbfLDI1g (ORCPT ); Wed, 4 Dec 2019 03:27:36 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726053AbfLDI1f (ORCPT ); Wed, 4 Dec 2019 03:27:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448075; x=1606984075; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=La31vyTc1D4i+NcSd8jwSvvTchAP74R1Y3emgnrp0iU=; b=jnKMPKiqbZB6ZurGxqOMPL1Lj6EIVPPm3MII78bICpHZzOLSDsR58SI3 RF6lPVsvAmiLDQjp9n3dnhf/AkCHLl4j3gbz3g1PghkQtWCR5GGjLZYLd 5HSi+Hsl6sOJ+fW5lZXyPcpygQZrzAso2ckePFFAMn3t4i3EGZtPBEcY2 rVwMWDrBSHGRK4+iZ6LmU+PtytM76ADGYrxNyqr+LHXfZOJaM2d04/DII zVMASHYcIoChvEWNcmBZC95eBSljNDeW2fDYXVlfhKXSyqQSQn8VK2M+0 fk6I6e/zAB6u/84gFZ3gtOkfGl5N1PVAAQi7l8QbYr+jWFsLyM0N96qaD g==; IronPort-SDR: 4T2uK/wTNbyfoeTuqHY8rCzuT62D8Cqajo0iW0Jtz4b5nZaRLiA52s8BXqYYUlLIAlHhhNWwdX uWclBNEU7J7dKeq1arFme9eEN3jKbuq24NGGYZ3HXtZ8AZgXA8lSvV29IGoevr94NCzCi+4z0d /1gna5s6zijtkYR+mMB//+E0Hn5O2w4ngBgL32L9v4fZJIQN5Y+aUgqpALLEK4Q/L14yzRAYAF 7CVtULfMEkMeT/Y0G97XxyWM88SP1c8z6rvGLMqKhb5Qh7Dd5Fg6rlxTlKHsSWOMdKF6Pstm+t 6cQ= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031732" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:55 +0800 IronPort-SDR: EpYxD1Fi66XANC/dKiPDwH2Y+Q2vbm6MErOGT/FsVjsfql8c1ErVlDsff7DvqvCidybR3bVVzw Zx33HaK4ei2X1NbKGq/zzxYFgK9zEHeZ8v1fQMoYn3O7ZPlT0TwG4Wyr3Umb7p6ykNIFeeafhA VRQBXkQEeRt2xzAwMLd3tZ8chESHxKFFnfJw1xP1MBYSDk3aU4IClBb6RG9CS4LaXiyHzYFo+S gvLHFkdcL8ymj1cwrpqPG3gdce4WChaMXAzZe6reZdeTTPFXGueW0MhZV4Kn1NgW9rzPiI8gca pLxgO78tGcPY3e0FimsxzDkE Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:22 -0800 IronPort-SDR: uqKWQE4e0U3hJKlPqJv6dUUOUiGxEdp73nYGS0N1syuKD6SuI5NFQQdYgwT2EEBLArqhcHm8Q3 jcJTr/czKUTDH9qgIRtbEh5URfiO/fQddjqdGK/vjuCMAfYRKqUezMc6ucKGIewoSBbysMVrit H5ToJPV+8hCdatpa3YSH5cAnq0jN8kqrnMtTKZ1Nlfgtwq5Uol9Lj3g4LB1nmvXG0G5zrcj1A5 +Hq5ZPEeMsK+0c+wIL2g7Z487LAjnSFTp3UctcXBfbLPJImgvGldfiR3mGTkrQQxPmh0sLPR8/ N6g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:34 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 03/15] btrfs-progs: build: Check zoned block device support Date: Wed, 4 Dec 2019 17:25:01 +0900 Message-Id: <20191204082513.857320-4-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If the kernel supports zoned block devices, the file /usr/include/linux/blkzoned.h will be present. Check this and define BTRFS_ZONED if the file is present. If it present, enables HMZONED feature, if not disable it. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- configure.ac | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/configure.ac b/configure.ac index cf792eb5488b..c637f72a8fe6 100644 --- a/configure.ac +++ b/configure.ac @@ -206,6 +206,18 @@ else AC_DEFINE([HAVE_OWN_FIEMAP_EXTENT_SHARED_DEFINE], [0], [We did not define FIEMAP_EXTENT_SHARED]) fi +AC_CHECK_HEADER(linux/blkzoned.h, [blkzoned_found=yes], [blkzoned_found=no]) +AC_ARG_ENABLE([zoned], + AS_HELP_STRING([--disable-zoned], [disable zoned block device support]), + [], [enable_zoned=$blkzoned_found] +) + +AS_IF([test "x$enable_zoned" = xyes], [ + AC_CHECK_HEADER(linux/blkzoned.h, [], + [AC_MSG_ERROR([Couldn't find linux/blkzoned.h])]) + AC_DEFINE([BTRFS_ZONED], [1], [enable zoned block device support]) +]) + dnl Define _LIBS= and _CFLAGS= by pkg-config dnl dnl The default PKG_CHECK_MODULES() action-if-not-found is end the @@ -307,6 +319,7 @@ AC_MSG_RESULT([ btrfs-restore zstd: ${enable_zstd} Python bindings: ${enable_python} Python interpreter: ${PYTHON} + zoned device: ${enable_zoned} Type 'make' to compile. ]) From patchwork Wed Dec 4 08:25:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 530B314B7 for ; Wed, 4 Dec 2019 08:27:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 309552073C for ; Wed, 4 Dec 2019 08:27:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="ciw+LlMp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727329AbfLDI1i (ORCPT ); Wed, 4 Dec 2019 03:27:38 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726053AbfLDI1h (ORCPT ); Wed, 4 Dec 2019 03:27:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448079; x=1606984079; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Mi5seVmN9UgYq/zTrB6RBvi6dWquGtoqLIXs6ZRq31A=; b=ciw+LlMprZVJCnRMlxVFjzTQ5GPSNrKoN0H36nz77KTkYAUvSRLS/VeN NAKBU/jQrR6Mntg71mLc185kkNeyOdgV37py7rkixLMOs8iWdtGCvg+Qa JTWNFp5547YYctR/yz8L75rqZ/yhILXg0wQNqhsnrWW+a4E3KmOM6UZxj JpaPsu0wMIvKQzi4elmW9l5rJ7njSXSSe2KG0ZADQJW+aRLB76oloqwg+ ZqKHFhjVpL5BAn5kSDuHG7MDPFWTGRhhAPYj6gEHYBaYyBECEt1S8KYuN s2If/yQ0b4IsPce2XTKIF6gjUXEnlbZGA01HmcWLFhM4WTK+4DDBSVzpU A==; IronPort-SDR: oDBbwZ3YHvhpJxCjKpm4n57vKMpkvDyJhq/AvV1TtbN2NmmcjS85lhRDAzfEPOPYIojhTdgPL3 LQzPBNQqny1g8D8VHFpdoBPi0zFgju7v1p2anNrUkFCPNz8PY4xRo6mTJB6PMbVfMEYiyG+zGh fvYNq1jo/4lT/mMJIPyTPsi5B0J0bk7YLYC0vcYxkx49cn8JZowUNZPo+SvGUldT74Fz7d1cS6 rRmnYw+GTpyo6Yr5AdStWChuima5PVXw9ahi+mXEWohoUtJ5J7CNmdKLCx0723O3XcHIgybaz7 4BE= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031734" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:58 +0800 IronPort-SDR: xdKO1Qi3uiLi5aZqzItsXl4S6IDATrkgLD4nQ+ZV5MDj/U1c8Bn+Ug68zdiKhWo2UkKYFjSeFl 3azWwE1wcm/3+yYF084yb4TDnorOjl5QD0q1uv33CiACVjOtPv0Q/YAw75AWTNN686P5gVCq2w 2pswaJhnfoIxACTAiFuJa3SnwnyFzmgP0TcEQRgw7PCTur2ut1nSYQDKCpe6lNlx5cVEXybF3G 46oAPSv+5hsLx0Heg1zfY+F3Qqjo/JbBYStAorTR10mo9/k4NhuRn8tpsebT7BNSz8WzMdw5ua wgK5ydh+7ZU8SNtHr+Ow6zV2 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:24 -0800 IronPort-SDR: W3P0/abQ0iCgi/o0Dzi5JsIqziUxJEWknm7y9qKygqXpJdQBdVKwYIRQt2HQL6GD0Em5SlNu4U 3ienbmGeHKNeBCCbeq1yE8aS5fUTFEwqJNmDWcHjc2Nw6DfICBZPUF1COip5zu3WrhQ6uQcWMZ hvyLMbhVzr6LqpODvUQ0pn6rJqwX4SEPzjE6wy+2H5XXDQb+ce8UVoiiJicQK+nZYLh+7SdISU DV/XDxh+fBakZIjOZW7k+JaI96InJlkHocqT1SkAODiktMSCiNCpb5mGoCs8WSLRdxFF4LbbHs wo8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:36 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 04/15] btrfs-progs: add new HMZONED feature flag Date: Wed, 4 Dec 2019 17:25:02 +0900 Message-Id: <20191204082513.857320-5-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org With this feature enabled, a zoned block device aware btrfs allocates block groups aligned to the device zones and always write in sequential zones at the zone write pointer position. Enabling this feature also disable conversion from ext4 volumes. Signed-off-by: Naohiro Aota --- cmds/inspect-dump-super.c | 1 + common/fsfeatures.c | 8 ++++++++ common/fsfeatures.h | 2 +- ctree.h | 4 +++- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c index f22633b99390..ddb2120fb397 100644 --- a/cmds/inspect-dump-super.c +++ b/cmds/inspect-dump-super.c @@ -229,6 +229,7 @@ static struct readable_flag_entry incompat_flags_array[] = { DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES), DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID), DEF_INCOMPAT_FLAG_ENTRY(RAID1C34), + DEF_INCOMPAT_FLAG_ENTRY(HMZONED) }; static const int incompat_flags_num = sizeof(incompat_flags_array) / sizeof(struct readable_flag_entry); diff --git a/common/fsfeatures.c b/common/fsfeatures.c index ac12d57b25a3..929a076e7b69 100644 --- a/common/fsfeatures.c +++ b/common/fsfeatures.c @@ -92,6 +92,14 @@ static const struct btrfs_fs_feature { NULL, 0, NULL, 0, "RAID1 with 3 or 4 copies" }, +#ifdef BTRFS_ZONED + { "hmzoned", BTRFS_FEATURE_INCOMPAT_HMZONED, + "hmzoned", + NULL, 0, + NULL, 0, + NULL, 0, + "support Host-Managed Zoned devices" }, +#endif /* Keep this one last */ { "list-all", BTRFS_FEATURE_LIST_ALL, NULL } }; diff --git a/common/fsfeatures.h b/common/fsfeatures.h index 3cc9452a3327..0918ee1aa113 100644 --- a/common/fsfeatures.h +++ b/common/fsfeatures.h @@ -25,7 +25,7 @@ | BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) /* - * Avoid multi-device features (RAID56) and mixed block groups + * Avoid multi-device features (RAID56), mixed block groups, and hmzoned device */ #define BTRFS_CONVERT_ALLOWED_FEATURES \ (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF \ diff --git a/ctree.h b/ctree.h index 3e50d0863bde..34fd7d00cabf 100644 --- a/ctree.h +++ b/ctree.h @@ -493,6 +493,7 @@ struct btrfs_super_block { #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) +#define BTRFS_FEATURE_INCOMPAT_HMZONED (1ULL << 12) #define BTRFS_FEATURE_COMPAT_SUPP 0ULL @@ -517,7 +518,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ - BTRFS_FEATURE_INCOMPAT_METADATA_UUID) + BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ + BTRFS_FEATURE_INCOMPAT_HMZONED) /* * A leaf is full of items. offset and size tell us where to find From patchwork Wed Dec 4 08:25:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272417 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C80116C1 for ; Wed, 4 Dec 2019 08:27:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9DC0A2073C for ; Wed, 4 Dec 2019 08:27:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Dx9sAF+E" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727354AbfLDI1l (ORCPT ); Wed, 4 Dec 2019 03:27:41 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727340AbfLDI1j (ORCPT ); Wed, 4 Dec 2019 03:27:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448082; x=1606984082; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=shwU0eOErzELxY3Vssd0HO8E4D5mNPTri6RDogBiwyM=; b=Dx9sAF+Eldi7E/YvnRaY0eatJDfvCUWipZhcr+JqzZmBoNpcfvMMneHt hlvT16XpMhgXVsjp/fV90cwtzyxQEtISXfBT7i/3niC0tUJsZFLwV8AQt 6YhLMl/Fnm0WjJ+360Z+eNa25YCVJZBhMAVq9SXFgjyr92gMGLGu7DUZ4 kT2k/Lqgf4p3DNUEncp29XKpnxDiNyCksiAagAz6R0iHQonZYInBeGiCv pkJVw2tXs2tafa5RBbdiuMS/eb5DoIlBR51QoOhdY5w+WdNevSopAG/Xe 6jM7QnRolp2OQ3+a8eISIesWurrULH8ZoE7BH5t+ehUflPZM33FrPHreX A==; IronPort-SDR: Edg1iI0vvvvbyOMCWTGOn7ux711gI76HXICvSdaE/ROzSR8/rkeecxJUL+YOOlbtnVk1FoFj7t 4SvrInpACHKSrP0qat/OzCjTOvi1T7p5QPn68pQDjgzGaBew0elkNzSjo7GkdW6qMvAGY2S85Q FjnV4VQdEC8SndQ2dXJNjmSGfPFcjgMiM2YLMOSmL8w8ndDzbebjpb4AdzV0PdE7OLm7yniAFL tMV0F/NptFlT95ytURlHy8GIq6MJBlIIOjAFkQMv2w5WgJXGoE/cOLJBz0Y5tMlhdz1trCyPY5 Zq4= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031737" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:02 +0800 IronPort-SDR: R6Msb7wrLhxR7hIFJiRRwhm8ug58oXc2BpjOG7f4BNuo4k08E2SQyIrBrAdnInJ3oeCHJt/7fl YxLwUY8/NmPz+SKYX4KyIgNuw2fKAEXF+1VcNU/sm2cIDj2KxkFA60HusOdg56n3phJsDE0ba5 kUCu3fWtLiQtJ8i7QElvMiJlq2N82Z23sFrW9MsA068N8p57e45SmrY5gBys4rR7/Lty7xEonx EzzSSwKBuGzNx+czTx1BbKPem4UR3nd4LYh3M9k2a3N4Go1/d4ekXq6Rx/S93WNr4ViSWOdLjr Qelp4hgW/mf4IqkbARMB8ev2 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:26 -0800 IronPort-SDR: h8Y/MQISQCkfMnut+57IK01GBFSRJE3MbxHAsrhLIfD0W3HGw+WK/XGi9OpSzOwcUmhZHEiazx l/v+KO8ogu9cne/GtVhSVsdAC0lsSCb3qi4PGlFSYB+FGJegByKcUHxXV82Jmof8zSsGq/Yiz/ PIeiMpPx9UnT3K2ljZBdVxRPVKJYHQXx8c5rbg2jp50OxcO2BQKcHtxq2mB14RR/XnPBKCoaZ8 eWni/MDZa7ghTYpRu+/FIRE+XLBQw1p2z5JY5/ejPbo7mW2ONmRUwCNWoDDDEKAcxS8vS6GMXH SUU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:38 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 05/15] btrfs-progs: Introduce zone block device helper functions Date: Wed, 4 Dec 2019 17:25:03 +0900 Message-Id: <20191204082513.857320-6-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introduce several zone related functions: btrfs_get_zone_info() to get zone information from the specified device and put the information in zinfo, and zone_is_sequential() to check if a zone is a sequential required zone. btrfs_get_zone_info() is intentionaly works with "struct btrfs_zone_info" instead of "struct btrfs_device". We need to load zone information at btrfs_prepare_device(), but there are no "struct btrfs_device" at that time. Signed-off-by: Naohiro Aota --- Makefile | 3 +- common/hmzoned.c | 219 +++++++++++++++++++++++++++++++++++++++++++++++ common/hmzoned.h | 63 ++++++++++++++ 3 files changed, 284 insertions(+), 1 deletion(-) create mode 100644 common/hmzoned.c create mode 100644 common/hmzoned.h diff --git a/Makefile b/Makefile index b00eafe44a8d..a67bf7ce7833 100644 --- a/Makefile +++ b/Makefile @@ -146,7 +146,8 @@ objects = dir-item.o inode-map.o \ inode.o file.o find-root.o common/help.o send-dump.o \ common/fsfeatures.o \ common/format-output.o \ - common/device-utils.o + common/device-utils.o \ + common/hmzoned.o cmds_objects = cmds/subvolume.o cmds/filesystem.o cmds/device.o cmds/scrub.o \ cmds/inspect.o cmds/balance.o cmds/send.o cmds/receive.o \ cmds/quota.o cmds/qgroup.o cmds/replace.o check/main.o \ diff --git a/common/hmzoned.c b/common/hmzoned.c new file mode 100644 index 000000000000..e11e56210709 --- /dev/null +++ b/common/hmzoned.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * Authors: + * Naohiro Aota + * Damien Le Moal + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include + +#include "common/utils.h" +#include "common/device-utils.h" +#include "common/messages.h" +#include "mkfs/common.h" +#include "common/hmzoned.h" + +#define BTRFS_REPORT_NR_ZONES 8192 + +enum btrfs_zoned_model zoned_model(const char *file) +{ + char model[32]; + int ret; + + ret = queue_param(file, "zoned", model, sizeof(model)); + if (ret <= 0) + return ZONED_NONE; + + if (strncmp(model, "host-aware", 10) == 0) + return ZONED_HOST_AWARE; + if (strncmp(model, "host-managed", 12) == 0) + return ZONED_HOST_MANAGED; + + return ZONED_NONE; +} + +size_t zone_size(const char *file) +{ + char chunk[32]; + int ret; + + ret = queue_param(file, "chunk_sectors", chunk, sizeof(chunk)); + if (ret <= 0) + return 0; + + return strtoul((const char *)chunk, NULL, 10) << 9; +} + +#ifdef BTRFS_ZONED +bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) +{ + unsigned int zno; + + if (!zinfo || zinfo->model == ZONED_NONE) + return false; + + zno = bytenr / zinfo->zone_size; + + /* + * Only sequential write required zones on host-managed + * devices cannot be written randomly. + */ + return zinfo->zones[zno].type == BLK_ZONE_TYPE_SEQWRITE_REQ; +} + +static int report_zones(int fd, const char *file, u64 block_count, + struct btrfs_zoned_device_info *zinfo) +{ + size_t zone_bytes = zone_size(file); + size_t rep_size; + u64 sector = 0; + struct blk_zone_report *rep; + struct blk_zone *zone; + unsigned int i, n = 0; + int ret; + + /* + * Zones are guaranteed (by the kernel) to be a power of 2 number of + * sectors. Check this here and make sure that zones are not too + * small. + */ + if (!zone_bytes || !is_power_of_2(zone_bytes)) { + error("Illegal zone size %zu (not a power of 2)", zone_bytes); + exit(1); + } + if (zone_bytes < BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + error("Illegal zone size %zu (smaller than %d)", zone_bytes, + BTRFS_MKFS_SYSTEM_GROUP_SIZE); + exit(1); + } + + /* Allocate the zone information array */ + zinfo->zone_size = zone_bytes; + zinfo->nr_zones = block_count / zone_bytes; + if (block_count & (zone_bytes - 1)) + zinfo->nr_zones++; + zinfo->zones = calloc(zinfo->nr_zones, sizeof(struct blk_zone)); + if (!zinfo->zones) { + error("No memory for zone information"); + exit(1); + } + + /* Allocate a zone report */ + rep_size = sizeof(struct blk_zone_report) + + sizeof(struct blk_zone) * BTRFS_REPORT_NR_ZONES; + rep = malloc(rep_size); + if (!rep) { + error("No memory for zones report"); + exit(1); + } + + /* Get zone information */ + zone = (struct blk_zone *)(rep + 1); + while (n < zinfo->nr_zones) { + memset(rep, 0, rep_size); + rep->sector = sector; + rep->nr_zones = BTRFS_REPORT_NR_ZONES; + + ret = ioctl(fd, BLKREPORTZONE, rep); + if (ret != 0) { + error("ioctl BLKREPORTZONE failed (%s)", + strerror(errno)); + exit(1); + } + + if (!rep->nr_zones) + break; + + for (i = 0; i < rep->nr_zones; i++) { + if (n >= zinfo->nr_zones) + break; + memcpy(&zinfo->zones[n], &zone[i], + sizeof(struct blk_zone)); + n++; + } + + sector = zone[rep->nr_zones - 1].start + + zone[rep->nr_zones - 1].len; + } + + free(rep); + + return 0; +} + +#endif + +int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, + struct btrfs_zoned_device_info **zinfo_ret) +{ +#ifdef BTRFS_ZONED + struct btrfs_zoned_device_info *zinfo; +#endif + struct stat st; + enum btrfs_zoned_model model; + int ret; + + *zinfo_ret = NULL; + + ret = fstat(fd, &st); + if (ret < 0) { + error("unable to stat %s", file); + return -ENOENT; + } + + if (!S_ISBLK(st.st_mode)) + return 0; + + /* Check zone model */ + model = zoned_model(file); + if (model == ZONED_NONE) + return 0; + + if (model == ZONED_HOST_MANAGED && !hmzoned) { + error( +"%s: host-managed zoned block device (enable zone block device support with -O hmzoned)", + file); + return -EIO; + } + + /* Treat host-aware devices as regular devices */ + if (!hmzoned) + return 0; + +#ifdef BTRFS_ZONED + zinfo = malloc(sizeof(*zinfo)); + if (!zinfo) { + error("No memory for zone information"); + exit(1); + } + + memset(zinfo, 0, sizeof(struct btrfs_zoned_device_info)); + zinfo->model = model; + + /* Get zone information */ + ret = report_zones(fd, file, btrfs_device_size(fd, &st), zinfo); + if (ret != 0) { + kfree(zinfo); + return ret; + } + *zinfo_ret = zinfo; +#else + error("%s: Unsupported host-%s zoned block device", file, + model == ZONED_HOST_MANAGED ? "managed" : "aware"); + if (model == ZONED_HOST_MANAGED) + return -EOPNOTSUPP; + + error("%s: handling host-aware block device as a regular disk", file); +#endif + return 0; +} diff --git a/common/hmzoned.h b/common/hmzoned.h new file mode 100644 index 000000000000..098952061bfb --- /dev/null +++ b/common/hmzoned.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Western Digital Corporation or its affiliates. + * Authors: + * Naohiro Aota + * Damien Le Moal + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#ifndef __BTRFS_HMZONED_H__ +#define __BTRFS_HMZONED_H__ + +#ifdef BTRFS_ZONED +#include +#else +struct blk_zone { + int dummy; +}; +#endif /* BTRFS_ZONED */ + +/* + * Zoned block device models. + */ +enum btrfs_zoned_model { + ZONED_NONE = 0, + ZONED_HOST_AWARE, + ZONED_HOST_MANAGED, +}; + +/* + * Zone information for a zoned block device. + */ +struct btrfs_zoned_device_info { + enum btrfs_zoned_model model; + u64 zone_size; + u32 nr_zones; + struct blk_zone *zones; +}; + +enum btrfs_zoned_model zoned_model(const char *file); +size_t zone_size(const char *file); +int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, + struct btrfs_zoned_device_info **zinfo); + +#ifdef BTRFS_ZONED +bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr); +#else +static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, + u64 bytenr) +{ + return true; +} +#endif /* BTRFS_ZONED */ + +#endif /* __BTRFS_HMZONED_H__ */ From patchwork Wed Dec 4 08:25:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272421 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E368C6C1 for ; Wed, 4 Dec 2019 08:27:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C177E2073C for ; Wed, 4 Dec 2019 08:27:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="L2DQ43c+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727377AbfLDI1n (ORCPT ); Wed, 4 Dec 2019 03:27:43 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727340AbfLDI1m (ORCPT ); Wed, 4 Dec 2019 03:27:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448063; x=1606984063; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BQC5SDziotAtQ5ZdGIn/PmyZQkwRARnH7DzOfW/K3pg=; b=L2DQ43c+Bymef3SluMOVCs19Q8KZK+PxAak/xrIlfn9fQHf7ZdBTp6JO T1wRipfUyeLWQTx3w73unqt5R+m6ZAUvOBJG+Ol9Jai/bKaXrHIOVFKBt P9WsyCNxMuYWETQhx+BOFLrT+bS59vyLz+3VwMMgEMtZ/INJvxlZowyZf VFXpy/WWg1gqoovV3Vfl5nTKSeLiJ57NVh0EkWKZrJnEOwLxuwepSVN+N EOjWo7y7xUwCLRzuFOHoFvUdb4QJoWSaHv/Bno8b4h08TWRj6JtJdMhY0 y51c/BOM1ukrJp9rnVNg4V4U7izwAUhuKJdxwJ9+UeHlfV29ryNBZSeBg w==; IronPort-SDR: QIyUaZrhnimBVfn6HrJUWEZJT3i1QVOs0hnKAfFuyQNz9EI7NpOwMK9dzseE8Y8jXRKK42WF/R r0QfJK3ZnNrdgb9SNG0jhGl1I9rOozRKOokNqJ3HfdLeiqqyQ6aIhaH1xhov5fXs104KXKxPg+ PBFLNOhxy/T5EhnLh1f4Y2NC7AKWK/LNntcn4eYn98OzrfzwWd+jUx4blvCHnIEpgm5601tMQh vc+TvKlwBqDZhnEo8mL5QHDJ7O3XB5du/uSKTJGsws91JV2YrmP/D5kjpZhsNMT55xlSfC/PeI Loc= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031743" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:42 +0800 IronPort-SDR: FVTXXrG6r8I04PlV0c3eayg4vBVBtYmxsTKtxFfV18l1B8L856PBjgUYQAnAhmW6iQwUePNNNW /EY7YScYF4lU/NlAv65/2BxwA2enY9DIC8dOm4PQArYVXkQvCC9O4ODwfjcAhy9xFBnRrELrt1 BVoYLJ89BxL6i4Ipmt1umkS5s6zOnut5aN/e0FKu0++vNUqMS8aLu42l0yKCYzRvCxn7m76rML y7o4/Hgb0wOtTmqegyrU+UpuoCyLruLubsut0GanPIoqk/pjPe3zcUlJudM5Gtgu0OfKEqk4d6 2ilEsB2QXskse0EWSudjRpC4 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:28 -0800 IronPort-SDR: EvCtgiJKUZ1cMJrXrOsWN4lwHB45zGSWoofJMRq/oGxiC474aRa6vFBYK82Qj2e4v8nGjQVAvM Bt7bO/wXYZOojxTwJ/WDyYzNdSvjILz6iqQXPNtDvf005pnLGYm7alV1ypC+So1OmLonqKrylS 6ApJ0UeimMl3mfrxKhg+2GhkZjX+OF+FlNDVCn4EZD+6bQ+wmc85SyQc9Of+nE0RL7hKnMI9SF B4VGW7EvdRPP6HiqCpBVVY665hC+kWsQ/anAvGCaKN9U29RB8TDbWIuCFRoKaSGbnJXgrc8A6S Ikg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:40 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 06/15] btrfs-progs: load and check zone information Date: Wed, 4 Dec 2019 17:25:04 +0900 Message-Id: <20191204082513.857320-7-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch checks if a device added to btrfs is a zoned block device. If it is, load zones information and the zone size for the device. For a btrfs volume composed of multiple zoned block devices, all devices must have the same zone size. Signed-off-by: Naohiro Aota --- common/device-scan.c | 15 +++++++++++++++ common/hmzoned.h | 2 ++ volumes.c | 31 +++++++++++++++++++++++++++++++ volumes.h | 4 ++++ 4 files changed, 52 insertions(+) diff --git a/common/device-scan.c b/common/device-scan.c index 48dbd9e19715..548e1322bb70 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -29,6 +29,7 @@ #include "kernel-lib/overflow.h" #include "common/path-utils.h" #include "common/device-scan.h" +#include "common/hmzoned.h" #include "common/messages.h" #include "common/utils.h" #include "common-defs.h" @@ -137,6 +138,19 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, goto out; } + ret = btrfs_get_zone_info(fd, path, fs_info->fs_devices->hmzoned, + &device->zone_info); + if (ret) + goto out; + if (fs_info->fs_devices->hmzoned) { + if (device->zone_info->zone_size != + fs_info->fs_devices->zone_size) { + error("Device zone size differ"); + ret = -EINVAL; + goto out; + } + } + disk_super = (struct btrfs_super_block *)buf; dev_item = &disk_super->dev_item; @@ -197,6 +211,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, return 0; out: + free(device->zone_info); free(device); free(buf); return ret; diff --git a/common/hmzoned.h b/common/hmzoned.h index 098952061bfb..d229b946e5ed 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -18,6 +18,8 @@ #ifndef __BTRFS_HMZONED_H__ #define __BTRFS_HMZONED_H__ +#include + #ifdef BTRFS_ZONED #include #else diff --git a/volumes.c b/volumes.c index 8bfffa5586eb..d92052e19330 100644 --- a/volumes.c +++ b/volumes.c @@ -27,6 +27,7 @@ #include "transaction.h" #include "print-tree.h" #include "volumes.h" +#include "common/hmzoned.h" #include "common/utils.h" #include "kernel-lib/raid56.h" @@ -214,6 +215,8 @@ static int device_list_add(const char *path, u64 found_transid = btrfs_super_generation(disk_super); bool metadata_uuid = (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID); + bool hmzoned = btrfs_super_incompat_flags(disk_super) & + BTRFS_FEATURE_INCOMPAT_HMZONED; if (metadata_uuid) fs_devices = find_fsid(disk_super->fsid, @@ -238,8 +241,18 @@ static int device_list_add(const char *path, fs_devices->latest_devid = devid; fs_devices->latest_trans = found_transid; fs_devices->lowest_devid = (u64)-1; + fs_devices->hmzoned = hmzoned; device = NULL; } else { + if (fs_devices->hmzoned != hmzoned) { + if (hmzoned) + error( + "Cannot add HMZONED device to non-HMZONED file system"); + else + error( + "Cannot add non-HMZONED device to HMZONED file system"); + return -EINVAL; + } device = find_device(fs_devices, devid, disk_super->dev_item.uuid); } @@ -335,6 +348,7 @@ again: /* free the memory */ free(device->name); free(device->label); + free(device->zone_info); free(device); } @@ -373,6 +387,8 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags) struct btrfs_device *device; int ret; + fs_devices->zone_size = 0; + list_for_each_entry(device, &fs_devices->devices, dev_list) { if (!device->name) { printk("no name for device %llu, skip it now\n", device->devid); @@ -396,6 +412,21 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int flags) device->fd = fd; if (flags & O_RDWR) device->writeable = 1; + + ret = btrfs_get_zone_info(fd, device->name, fs_devices->hmzoned, + &device->zone_info); + if (ret != 0) + goto fail; + if (!device->zone_info) + continue; + if (!fs_devices->zone_size) { + fs_devices->zone_size = device->zone_info->zone_size; + } else if (device->zone_info->zone_size != + fs_devices->zone_size) { + error("Device zone size differ"); + ret = -EINVAL; + goto fail; + } } return 0; fail: diff --git a/volumes.h b/volumes.h index 41574f21dd23..d52dbcba0410 100644 --- a/volumes.h +++ b/volumes.h @@ -28,6 +28,7 @@ struct btrfs_device { struct list_head dev_list; struct btrfs_root *dev_root; struct btrfs_fs_devices *fs_devices; + struct btrfs_zoned_device_info *zone_info; u64 total_ios; @@ -87,6 +88,9 @@ struct btrfs_fs_devices { int seeding; struct btrfs_fs_devices *seed; + + u64 zone_size; + bool hmzoned; }; struct btrfs_bio_stripe { From patchwork Wed Dec 4 08:25:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32CAA14B7 for ; Wed, 4 Dec 2019 08:27:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 105CF2073B for ; Wed, 4 Dec 2019 08:27:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Pn5LAFgK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727388AbfLDI1p (ORCPT ); Wed, 4 Dec 2019 03:27:45 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI1o (ORCPT ); Wed, 4 Dec 2019 03:27:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448066; x=1606984066; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G4NpQ2JyYnw5D/LgV6JIxZP9vvkUA4GtujvP4ktMxWo=; b=Pn5LAFgKZZHCMEcheOWe4KwU7E6FD0O4+0O93Z6iYGYfd+kPXzqtJxJk as2OXq8jY+tv5GeUcVHbythhn6KpR0Y3J9jI9qv1GYS5Um+aX9fHjQZGJ +E2hNVoalMOKhhW1knJfXu1j6PCOSW63ppP1VBSlv95N+aLDM27Kko3i5 y2CSlI18cUEnryZV2GKUNbq/DB9XwtqKbF5MZBMWAuN98bq3oJjc7EG5g wSII4mSPApusWwlhqfbDx9ApEZN9ZzX+F4vb+zpClqDEsqyqemfgXJR3h elbHeqApv5jzsjOqmVBOFTEZGUxQoR9QGB+AVO6cqDUIMzSDql5SWyV9A w==; IronPort-SDR: LwZp0I7m1Uzk74Pmqr45pkeoa+V4quP//SwIakFrYR1c2EfFoQlyFvZCqeSxWdaECLI/OijoY/ 5MSfj1el3y+S9SWr/NrWxMiFJvKwjUoiWoj++v/HGjnXAcURjMqgZdBhA+xtrcWezEVVVBJ4ju jWUgjLYsGSH/zFb6fQionConr3GVDbwmfO7q3yJYmOGaajTXjxcXkJPm40sZxAEJ4SxVPzOX6u avjGz89hZHf8hdV+c1auTsnswVd2uFSX8on2028vjCHWxwKzKPVrvEU3QRreUuwEeEhJP7iEiV dHY= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031745" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:46 +0800 IronPort-SDR: bA9O/UAHB9YEaUXHhh8FuJTzgxtnQwv7sKvdT32e74Cl6G5yCeKYb+y41qTpqWiPR/iyJYqG5Z CxEuTzUY769GqE+3BDkmmX9k6p60plPw1eopjJtffyIPnzFG8Br/oPYVhQZIf4tsC126LXMjCN KHYezjgVrWzIXD6UZT/+75OXAqECcTLnT8wktkq0vuGBp87pkFrkqa/EdD22H4p5tx8JKhNSh5 DFt82zm86Jzs5vnlEGfIvDktJy7d+CpmKaJcMRn0PUU0V/VxF8rKPkPfISDC83Udk4+lQ2soeX 9I1uObbE9dhpvXNndGMjxhB4 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:31 -0800 IronPort-SDR: q9vZ9QIF3fcyKL17H7ImZOzp/G5J8ItwSmbBGj6Baq2cp2idWCPr7RlB5GdHzt7wkjzhXl7mtn k4qBEdKsMqxY5DScXpOswZiToafwRiFQXF00HclbNNJD2C+ajC/v92Aau316U/HrLdOr61LJht SOxIhbXTBfY1wQzkV2AXsq7jcG/eiMGK/C0RpxvO8Qmm92o6EP/NqVxF4UgtpYYGwofQakcKNT 2JhwNv//URK5wVtrAI37PrjRtDKt8UdaYAJpm/Tq/7hJFtNxgmI/8UY2SXOUKBFlDBmVgK2XsJ gc0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:42 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 07/15] btrfs-progs: support discarding zoned device Date: Wed, 4 Dec 2019 17:25:05 +0900 Message-Id: <20191204082513.857320-8-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_HMZONED. This commit export discard_blocks() and use it from btrfs_discard_all_zones(). Signed-off-by: Naohiro Aota --- common/device-utils.c | 32 ++++++++++++++++++++++++++++++-- common/device-utils.h | 2 ++ common/hmzoned.c | 29 +++++++++++++++++++++++++++++ common/hmzoned.h | 6 ++++++ 4 files changed, 67 insertions(+), 2 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index 7fa9386f4677..2689f157aeea 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -29,6 +29,7 @@ #include "common/internal.h" #include "common/messages.h" #include "common/utils.h" +#include "common/hmzoned.h" #ifndef BLKDISCARD #define BLKDISCARD _IO(0x12,119) @@ -49,7 +50,7 @@ static int discard_range(int fd, u64 start, u64 len) /* * Discard blocks in the given range in 1G chunks, the process is interruptible */ -static int discard_blocks(int fd, u64 start, u64 len) +int discard_blocks(int fd, u64 start, u64 len) { while (len > 0) { /* 1G granularity */ @@ -155,6 +156,7 @@ out: int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, u64 max_block_count, unsigned opflags) { + struct btrfs_zoned_device_info *zinfo = NULL; u64 block_count; struct stat st; int i, ret; @@ -173,7 +175,30 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, if (max_block_count) block_count = min(block_count, max_block_count); - if (opflags & PREP_DEVICE_DISCARD) { + ret = btrfs_get_zone_info(fd, file, opflags & PREP_DEVICE_HMZONED, + &zinfo); + if (ret < 0) + return 1; + + if (opflags & PREP_DEVICE_HMZONED) { + if (!zinfo) { + error("unable to load zone information of %s", file); + return 1; + } + if (opflags & PREP_DEVICE_VERBOSE) + printf("Resetting device zones %s (%u zones) ...\n", + file, zinfo->nr_zones); + /* + * We cannot ignore zone discard (reset) errors for a zoned + * block device as this could result in the inability to + * write to non-empty sequential zones of the device. + */ + if (btrfs_discard_all_zones(fd, zinfo)) { + error("failed to reset device '%s' zones", file); + kfree(zinfo); + return 1; + } + } else if (opflags & PREP_DEVICE_DISCARD) { /* * We intentionally ignore errors from the discard ioctl. It * is not necessary for the mkfs functionality but just an @@ -198,6 +223,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, if (ret < 0) { errno = -ret; error("failed to zero device '%s': %m", file); + kfree(zinfo); return 1; } @@ -207,6 +233,8 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, return 1; } + kfree(zinfo); + *block_count_ret = block_count; return 0; } diff --git a/common/device-utils.h b/common/device-utils.h index d1799323d002..885a46937e0d 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -23,7 +23,9 @@ #define PREP_DEVICE_ZERO_END (1U << 0) #define PREP_DEVICE_DISCARD (1U << 1) #define PREP_DEVICE_VERBOSE (1U << 2) +#define PREP_DEVICE_HMZONED (1U << 3) +int discard_blocks(int fd, u64 start, u64 len); u64 get_partition_size(const char *dev); u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); diff --git a/common/hmzoned.c b/common/hmzoned.c index e11e56210709..5803b2c17a2b 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -16,6 +16,7 @@ */ #include +#include #include "common/utils.h" #include "common/device-utils.h" @@ -151,6 +152,34 @@ static int report_zones(int fd, const char *file, u64 block_count, return 0; } +/* + * Discard blocks in the zones of a zoned block device. Process this + * with zone size granularity so that blocks in conventional zones are + * discarded using discard_range and blocks in sequential zones are + * discarded though a zone reset. + */ +int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo) +{ + unsigned int i; + + ASSERT(zinfo); + + /* Zone size granularity */ + for (i = 0; i < zinfo->nr_zones; i++) { + if (zinfo->zones[i].type == BLK_ZONE_TYPE_CONVENTIONAL) { + discard_blocks(fd, zinfo->zones[i].start << 9, + zinfo->zone_size); + } else if (zinfo->zones[i].cond != BLK_ZONE_COND_EMPTY) { + struct blk_zone_range range = { + zinfo->zones[i].start, + zinfo->zone_size >> 9 }; + if (ioctl(fd, BLKRESETZONE, &range) < 0) + return errno; + } + } + return fsync(fd); +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index d229b946e5ed..631780537a77 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -54,12 +54,18 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, #ifdef BTRFS_ZONED bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr); +int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) { return true; } +static inline int btrfs_discard_all_zones(int fd, + struct btrfs_zoned_device_info *zinfo) +{ + return -EOPNOTSUPP; +} #endif /* BTRFS_ZONED */ #endif /* __BTRFS_HMZONED_H__ */ From patchwork Wed Dec 4 08:25:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272429 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEA6B6C1 for ; Wed, 4 Dec 2019 08:27:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CCCE82073B for ; Wed, 4 Dec 2019 08:27:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="nNG1ev4T" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727406AbfLDI1s (ORCPT ); Wed, 4 Dec 2019 03:27:48 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI1r (ORCPT ); Wed, 4 Dec 2019 03:27:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448070; x=1606984070; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PrMTxB1eV9pdw4y5Wl0F8npnyNNJeHfv+iAfkzKXh0c=; b=nNG1ev4TJlfqb9hRq5dkEnNTCsGlxB9ZWGKL21dCoLPFbF1BLYB0WNx5 xQZzVoOI1UldneVySE9eU0Bm5no146bkEgkmxUA46tWHnzjt1ZyEh7bdv 6CDNqbSnWX+WP4QWkg5wAQGbK1EojVkpOzVy7dN0dPmmmIW82wsNBMys/ wGjgUCdQ1GTN6zUAADVfld1N2JT4Xt7eLTVsPBpicGB1ja18FabEeJq5L jQVLbxO47+AKKPutyiudCfWkPJwfXbiMKJe4hvpxd8NPKdz94Grj0Logt LesE0QfGSjG5DbMCZRDzDyVUD5B5fTXO6s1ClErGTgZw/kkuGy4IlwJmc g==; IronPort-SDR: 9ArD2WTv6JYd1EulDUpwhd2k/+sirV4YDcB6ugQPu4WZb7zU6/SElotc3H+sR5OjwPA9QIaS77 PS7diEevxPUClvnjzFD1sY/j+ku2VHp2Hpp4c7B2fwjyIVNfduwEuoXEqS+X0R5h+TFxMp+N5K T4amhWgw88WbtKchwf8VTbMXUQCFTEC1HUmrcvXTQko4kMeGIt6W78TIBO7aGIQSnedLRxDzZo CwdsDNOkxqr0E1H1BWERylD6qx6AyccRIWoIsJmBePVW+fRe5XwtrFH11MOkj0kkDhSfEJHZYE PHQ= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031747" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:50 +0800 IronPort-SDR: QtEAp/ITlxMH9S9vrZigk2NEJEUyA3LwSjiRsra5OVZ/b/rZ2vgyNIwfR5OjboS12n4Cz3OblM yc/irx6L9XQrOGooegJgjz13afDivzj+QPOJJ59vvvDVJzHCvc+Glhq7Y6QJ+ogykrmB+yR5xu Ml27WcBSw6s6gEKV9hPuoEYM7zpm7ucZJWfv4WaThtQ3vJs+aCIADuv4kpwTpWovTigHvrUj9W 60P0CzXiuEm26UflNpK9KMup68c9oRnduSyJ3CAFXcz9XDH+t6n62r0Fg9Adh0hQXOGkx9V/I0 sFifoV2IxwWO4JLT1Uu9pxSK Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:33 -0800 IronPort-SDR: d6xIB2QI7lWc9LhcSw1XxsRYmcN+2ThfLAIMqpSRsf1kMKEKjEx4J8qBA510XWxwTCdShzZNjR T3oSYnfkyO7+F1TbMwr95z+xhFWzb5LJZJRXeVYKcOKCqRCJthHmAbsw7hkCH1h9p1GdOpnVeX 7jjqHLgcg2lnJ0APeoT2jjqwxvw/31UeoNG2AAnI6T21ZZG6hWS2InGHPjTPSsxFUTteL+lvzm O9UmKOl3tgjGtRROLBrBn1xS62f4YRvEPV8Cz58Fh/TfAjCRYPfDuVd7locsdF8vr+wmlNVh1j I70= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:45 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 08/15] btrfs-progs: support zero out on zoned block device Date: Wed, 4 Dec 2019 17:25:06 +0900 Message-Id: <20191204082513.857320-9-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota --- common/device-utils.c | 14 +++++++++----- common/device-utils.h | 1 + common/hmzoned.c | 28 ++++++++++++++++++++++++++++ common/hmzoned.h | 8 ++++++++ 4 files changed, 46 insertions(+), 5 deletions(-) diff --git a/common/device-utils.c b/common/device-utils.c index 2689f157aeea..2ac8e7d9802a 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -67,7 +67,7 @@ int discard_blocks(int fd, u64 start, u64 len) return 0; } -static int zero_blocks(int fd, off_t start, size_t len) +int zero_blocks(int fd, off_t start, size_t len) { char *buf = malloc(len); int ret = 0; @@ -86,7 +86,8 @@ static int zero_blocks(int fd, off_t start, size_t len) #define ZERO_DEV_BYTES SZ_2M /* don't write outside the device by clamping the region to the device size */ -static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size) +static int zero_dev_clamped(int fd, struct btrfs_zoned_device_info *zinfo, + off_t start, ssize_t len, u64 dev_size) { off_t end = max(start, start + len); @@ -99,6 +100,9 @@ static int zero_dev_clamped(int fd, off_t start, ssize_t len, u64 dev_size) start = min_t(u64, start, dev_size); end = min_t(u64, end, dev_size); + if (zinfo && zinfo->model == ZONED_HOST_MANAGED) + return zero_zone_blocks(fd, zinfo, start, end - start); + return zero_blocks(fd, start, end - start); } @@ -212,12 +216,12 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, } } - ret = zero_dev_clamped(fd, 0, ZERO_DEV_BYTES, block_count); + ret = zero_dev_clamped(fd, zinfo, 0, ZERO_DEV_BYTES, block_count); for (i = 0 ; !ret && i < BTRFS_SUPER_MIRROR_MAX; i++) - ret = zero_dev_clamped(fd, btrfs_sb_offset(i), + ret = zero_dev_clamped(fd, zinfo, btrfs_sb_offset(i), BTRFS_SUPER_INFO_SIZE, block_count); if (!ret && (opflags & PREP_DEVICE_ZERO_END)) - ret = zero_dev_clamped(fd, block_count - ZERO_DEV_BYTES, + ret = zero_dev_clamped(fd, zinfo, block_count - ZERO_DEV_BYTES, ZERO_DEV_BYTES, block_count); if (ret < 0) { diff --git a/common/device-utils.h b/common/device-utils.h index 885a46937e0d..7d5b622b8957 100644 --- a/common/device-utils.h +++ b/common/device-utils.h @@ -26,6 +26,7 @@ #define PREP_DEVICE_HMZONED (1U << 3) int discard_blocks(int fd, u64 start, u64 len); +int zero_blocks(int fd, off_t start, size_t len); u64 get_partition_size(const char *dev); u64 disk_size(const char *path); u64 btrfs_device_size(int fd, struct stat *st); diff --git a/common/hmzoned.c b/common/hmzoned.c index 5803b2c17a2b..484877743948 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -180,6 +180,34 @@ int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo) return fsync(fd); } +int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, + size_t len) +{ + size_t zone_len = zinfo->zone_size; + off_t ofst = start; + size_t count; + int ret; + + /* Make sure that zero_blocks does not write sequential zones */ + while (len > 0) { + /* Limit zero_blocks to a single zone */ + count = min_t(size_t, len, zone_len); + if (count > zone_len - (ofst & (zone_len - 1))) + count = zone_len - (ofst & (zone_len - 1)); + + if (!zone_is_sequential(zinfo, ofst)) { + ret = zero_blocks(fd, ofst, count); + if (ret != 0) + return ret; + } + + len -= count; + ofst += count; + } + + return 0; +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index 631780537a77..a902717335b0 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -55,6 +55,8 @@ int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, #ifdef BTRFS_ZONED bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr); int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); +int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, + size_t len); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) @@ -66,6 +68,12 @@ static inline int btrfs_discard_all_zones(int fd, { return -EOPNOTSUPP; } +static inline int zero_zone_blocks(int fd, + struct btrfs_zoned_device_info *zinfo, + off_t start, size_t len) +{ + return -EOPNOTSUPP; +} #endif /* BTRFS_ZONED */ #endif /* __BTRFS_HMZONED_H__ */ From patchwork Wed Dec 4 08:25:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272433 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5740B14B7 for ; Wed, 4 Dec 2019 08:27:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2328820637 for ; Wed, 4 Dec 2019 08:27:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="lmixI8oU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727417AbfLDI1u (ORCPT ); Wed, 4 Dec 2019 03:27:50 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI1t (ORCPT ); Wed, 4 Dec 2019 03:27:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448074; x=1606984074; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3NbJiwu7S9fnW557eUzb1z7V57FTzI46lTPoMii6Ks8=; b=lmixI8oUvYHZLA8u/mrJBMXhOgKx2/xNQMOtA2lDsPcYx0exMcZba2zc IGEhLYd+iVM75oXn4FcUiQa2ypHKHlkTUsnBmnyhChmfymAwS9LP/GxWi QQwLjNsg2CRjL6YudNEsxKPgTutNJ+cOnWXSpdP2fF0ylENLSA3cUvSOl aN9PhmtXdNU7VWgzjmnNZq6ptkpIw8q/WrZcb/DteljtXcB5gw9PVxx7w 49AY1Vh68RyXtZpz+Oqulf8cPiq2K/WhkKMj877nKhI9bHZ+0XMdm0Kik bL+ULqtGZpYHU5+YlPHTIanwdqNkdWKW4gYd0PEKioPikxvhDo1faXOos Q==; IronPort-SDR: pP+7mGCQTxEBO80r6N8wSxHQsBZrFCTSFCcXZV1450uRGzbPXkZWjVfa/jrm5JqVHsoyT/zX8E 4RU3wOhftloK4xXAoiMoFcy6Oky09o1MWLStDR2lt+ay+/+IE4pvd2zQ03YBgpR7CfbwVOlZ3t R16+vTjiBh9MFOMQ1T4aTKLXdNyPUancZoX1TIe8DQQelTM8rclFGyhTM1sUlKA7VrrzUx0nEE 33k+XqyxB892BQyXA+AxGTPr4tI2P3yekQzXf99gVmgKZREh6DbrpJW1OpxTa8aEFQfkVvOM14 6Yc= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031752" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:54 +0800 IronPort-SDR: JZQ/zZ97PmUxqi/ajN06wDlavc0Gw7C0v1PpX+IF2XxkU6Clc3XrC9AvHpTAdSGjJdSmni4YXJ Z+hQaGTu7/VjlsNBzy5ep3NJ0K7QFcpf16ffN06yFdp1Kg3mSRBcdxRs7NyreJiLtroPvTfwJz AREFWykFlBzMU4umUXseZ5FzFqNgCEuZy6HGK2AyRCMu8gUDFFj4/Zq2Zr9cywhg5Ki+MhqTZ9 5ginpM9MZmw8Wz8TaMMlE4tCzAoXcu0lf6/kR+E2BizDDXGJmMp1zwoV7Zb3CSB1lkN6AwRWaC 8sjqQ8qOLrmtjJQxddTMZsRV Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:36 -0800 IronPort-SDR: IjjmGSbO958Vm9hoAAOc3N78MdjTl0V0DMcVGB1ZtosQN48q0Y5Fzq0rIAK0S/qu+x9vWc4wsn 8ZlQRjtIIMlHMhcfP2hSVRHVDl1hyVPSNpEBHGz7KaxLOqZZOzbKcKGxHZb4CxB1EbPs8sBZk8 FT5YQ3PzbxxGJ99CYyY1aay29KYnKvqH1aXVw8LWhgC1Hkfp2keeBLRFtPenVcBEWSdmsZYYal bCtVfmOugvy/Ur/7cbQU15x66Xcz3DdV9jnWLd1udR0A6o3GPkGf2cAzFOHQKDFBxccKhTeUsV NkI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:48 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 09/15] btrfs-progs: implement log-structured superblock for HMZONED mode Date: Wed, 4 Dec 2019 17:25:07 +0900 Message-Id: <20191204082513.857320-10-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problem, we employ superblock log writing. It uses two zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second buffer and reset the first one. We can determine the postion of the latest superblock by reading write pointer information from a device. The following zones are reserved as the circular buffer on HMZONED btrfs. - The primary superblock: zones 0 and 1 - The first copy: zones 16 and 17 - The second copy: zones 1024 or zone at 256GB which is minimum, and next to it Signed-off-by: Naohiro Aota --- cmds/inspect-dump-super.c | 3 +- common/device-scan.c | 4 +- common/device-utils.c | 17 ++- common/hmzoned.c | 227 ++++++++++++++++++++++++++++++++++++++ common/hmzoned.h | 23 ++++ disk-io.c | 14 +-- kerncompat.h | 7 ++ 7 files changed, 281 insertions(+), 14 deletions(-) diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c index ddb2120fb397..e49dec560ca7 100644 --- a/cmds/inspect-dump-super.c +++ b/cmds/inspect-dump-super.c @@ -34,6 +34,7 @@ #include "cmds/commands.h" #include "crypto/crc32c.h" #include "common/help.h" +#include "common/hmzoned.h" static int check_csum_sblock(void *sb, int csum_size, u16 csum_type) { @@ -491,7 +492,7 @@ static int load_and_dump_sb(char *filename, int fd, u64 sb_bytenr, int full, sb = (struct btrfs_super_block *)super_block_data; - ret = pread64(fd, super_block_data, BTRFS_SUPER_INFO_SIZE, sb_bytenr); + ret = sbread(fd, super_block_data, sb_bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { /* check if the disk if too short for further superblock */ if (ret == 0 && errno == 0) diff --git a/common/device-scan.c b/common/device-scan.c index 548e1322bb70..7760ce50ad72 100644 --- a/common/device-scan.c +++ b/common/device-scan.c @@ -202,7 +202,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans, btrfs_set_stack_device_bytes_used(dev_item, device->bytes_used); memcpy(&dev_item->uuid, device->uuid, BTRFS_UUID_SIZE); - ret = pwrite(fd, buf, sectorsize, BTRFS_SUPER_INFO_OFFSET); + ret = sbwrite(fd, buf, BTRFS_SUPER_INFO_OFFSET); BUG_ON(ret != sectorsize); free(buf); @@ -279,7 +279,7 @@ int btrfs_device_already_in_root(struct btrfs_root *root, int fd, ret = -ENOMEM; goto out; } - ret = pread(fd, buf, BTRFS_SUPER_INFO_SIZE, super_offset); + ret = sbread(fd, buf, super_offset); if (ret != BTRFS_SUPER_INFO_SIZE) goto brelse; diff --git a/common/device-utils.c b/common/device-utils.c index 2ac8e7d9802a..d7bbac0e1730 100644 --- a/common/device-utils.c +++ b/common/device-utils.c @@ -231,10 +231,19 @@ int btrfs_prepare_device(int fd, const char *file, u64 *block_count_ret, return 1; } - ret = btrfs_wipe_existing_sb(fd); - if (ret < 0) { - error("cannot wipe superblocks on %s", file); - return 1; + if (!zinfo) { + ret = btrfs_wipe_existing_sb(fd); + if (ret < 0) { + error("cannot wipe superblocks on %s", file); + return 1; + } + } else { + ret = btrfs_wipe_sb_zones(fd, zinfo); + if (ret < 0) { + error("cannot wipe superblock log zones on %s", file); + kfree(zinfo); + return 1; + } } kfree(zinfo); diff --git a/common/hmzoned.c b/common/hmzoned.c index 484877743948..5080bd7dea5b 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -18,6 +18,7 @@ #include #include +#include "disk-io.h" #include "common/utils.h" #include "common/device-utils.h" #include "common/messages.h" @@ -56,6 +57,24 @@ size_t zone_size(const char *file) } #ifdef BTRFS_ZONED +static u32 sb_zone_number(u64 zone_size, int mirror) +{ + ASSERT(mirror < BTRFS_SUPER_MIRROR_MAX); + + switch (mirror) { + case 0: + return 0; + case 1: + return 16; + case 2: + return min(btrfs_sb_offset(mirror) / zone_size, 1024ULL); + default: + BUG(); + } + + return 0; +} + bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) { unsigned int zno; @@ -180,6 +199,39 @@ int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo) return fsync(fd); } +int btrfs_wipe_sb_zones(int fd, struct btrfs_zoned_device_info *zinfo) +{ + struct blk_zone_range range; + int i; + + if (!zinfo) + return 0; + + if (zinfo->model == ZONED_NONE) + return 0; + + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_pos = sb_zone_number(zinfo->zone_size, i); + + if (zinfo->nr_zones >= sb_pos + 1) + break; + + range.sector = (sb_pos * zinfo->zone_size) >> SECTOR_SHIFT; + range.nr_sectors = (2 * zinfo->zone_size) >> SECTOR_SHIFT; + + if (ioctl(fd, BLKRESETZONE, &range) < 0) { + error("failed to reset zone %u: %s", + sb_pos, strerror(errno)); + return 1; + } + } + + if (fsync(fd)) + return 1; + + return 0; +} + int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, size_t len) { @@ -208,6 +260,181 @@ int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, return 0; } +static int sb_write_pointer(struct blk_zone *zones, u64 *wp_ret) +{ + bool empty[2]; + bool full[2]; + sector_t sector; + + if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) { + *wp_ret = zones[0].start << SECTOR_SHIFT; + return -ENOENT; + } + + empty[0] = zones[0].cond == BLK_ZONE_COND_EMPTY; + empty[1] = zones[1].cond == BLK_ZONE_COND_EMPTY; + full[0] = zones[0].cond == BLK_ZONE_COND_FULL; + full[1] = zones[1].cond == BLK_ZONE_COND_FULL; + + /* + * Possible state of log buffer zones + * + * E I F + * E * x 0 + * I 0 x 0 + * F 1 1 x + * + * Row: zones[0] + * Col: zones[1] + * State: + * E: Empty, I: In-Use, F: Full + * Log position: + * *: Special case, no superblock is written + * 0: Use write pointer of zones[0] + * 1: Use write pointer of zones[1] + * x: Invalid state + */ + + if (empty[0] && empty[1]) { + /* special case to distinguish no superblock to read */ + *wp_ret = zones[0].start << SECTOR_SHIFT; + return -ENOENT; + } else if (full[0] && full[1]) { + /* cannot determine which zone has the newer superblock */ + return -EUCLEAN; + } else if (!full[0] && (empty[1] || full[1])) { + sector = zones[0].wp; + } else if (full[0]) { + sector = zones[1].wp; + } else { + return -EUCLEAN; + } + *wp_ret = sector << SECTOR_SHIFT; + return 0; +} + +size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) +{ + size_t count = BTRFS_SUPER_INFO_SIZE; + struct blk_zone_report *rep; + struct blk_zone *zones; + const u64 sb_size_sector = BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT; + u64 mapped; + u32 zone_num; + int reset_target; + u32 zone_size_sector; + size_t rep_size; + int mirror = -1; + int i; + int ret; + size_t ret_sz; + + ASSERT(rw == READ || rw == WRITE); + + ret = ioctl(fd, BLKGETZONESZ, &zone_size_sector); + if (ret) { + error("ioctl BLKGETZONESZ failed (%s)", strerror(errno)); + exit(1); + } + + if (zone_size_sector == 0) { + if (rw == READ) + return pread64(fd, buf, count, offset); + return pwrite64(fd, buf, count, offset); + } + + ASSERT(IS_ALIGNED(zone_size_sector, sb_size_sector)); + + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + if (offset == btrfs_sb_offset(i)) { + mirror = i; + break; + } + } + ASSERT(mirror != -1); + + zone_num = sb_zone_number(zone_size_sector * 512, mirror); + + rep_size = sizeof(struct blk_zone_report) + sizeof(struct blk_zone) * 2; + rep = malloc(rep_size); + if (!rep) { + error("No memory for zones report"); + exit(1); + } + + memset(rep, 0, rep_size); + rep->sector = zone_num * zone_size_sector; + rep->nr_zones = 2; + + ret = ioctl(fd, BLKREPORTZONE, rep); + if (ret) { + error("ioctl BLKREPORTZONE failed (%s)", strerror(errno)); + exit(1); + } + if (rep->nr_zones != 2) { + if (errno == ENOENT || errno == 0) + return (rw == WRITE ? count : 0); + error("failed to read zone info of %u and %u: %s", zone_num, + zone_num + 1, strerror(errno)); + free(rep); + return 0; + } + + zones = (struct blk_zone *)(rep + 1); + + ret = sb_write_pointer(zones, &mapped); + if (ret != -ENOENT && ret) + return -EIO; + if (rw == READ) { + if (ret != -ENOENT) { + if (mapped == zones[0].start << SECTOR_SHIFT) + mapped = (zones[1].start + zones[1].len) + << SECTOR_SHIFT; + mapped -= BTRFS_SUPER_INFO_SIZE; + } + return pread64(fd, buf, count, mapped); + } + + ret_sz = pwrite64(fd, buf, count, mapped); + if (zone_size_sector == 0) + return ret_sz; + + if (ret_sz != count) + return ret_sz; + if (fsync(fd)) { + error("failed to synchronize superblock: %s", strerror(errno)); + exit(1); + } + + reset_target = -1; + mapped += BTRFS_SUPER_INFO_SIZE; + if (mapped == (zones[0].start + zones[0].len) << SECTOR_SHIFT && + zones[1].cond != BLK_ZONE_COND_EMPTY) + reset_target = 1; + else if (mapped == (zones[1].start + zones[1].len) << SECTOR_SHIFT && + zones[0].cond != BLK_ZONE_COND_EMPTY) + reset_target = 0; + + if (reset_target != -1) { + struct blk_zone_range range = { + zone_size_sector * (zone_num + reset_target), + zone_size_sector, + }; + if (ioctl(fd, BLKRESETZONE, &range) < 0) { + error("failed to reset zone %u: %s", + zone_num + reset_target, strerror(errno)); + exit(1); + } + if (fsync(fd)) { + error("failed to synchronize zone reset: %s", + strerror(errno)); + exit(1); + } + } + + return ret_sz; +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index a902717335b0..920f992dbb93 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -57,6 +57,16 @@ bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr); int btrfs_discard_all_zones(int fd, struct btrfs_zoned_device_info *zinfo); int zero_zone_blocks(int fd, struct btrfs_zoned_device_info *zinfo, off_t start, size_t len); +size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw); +static inline size_t sbread(int fd, void *buf, off_t offset) +{ + return btrfs_sb_io(fd, buf, offset, READ); +} +static inline size_t sbwrite(int fd, void *buf, off_t offset) +{ + return btrfs_sb_io(fd, buf, offset, WRITE); +} +int btrfs_wipe_sb_zones(int fd, struct btrfs_zoned_device_info *zinfo); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) @@ -74,6 +84,19 @@ static inline int zero_zone_blocks(int fd, { return -EOPNOTSUPP; } +static inline u64 btrfs_map_sb_offset_for_zoned(int fd, u64 offset) +{ + return offset; +} +#define sbread(fd, buf, offset) \ + pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) +#define sbwrite(fd, buf, offset) \ + pwrite64(fd, buf, BTRFS_SUPER_INFO_SIZE, offset) +static inline int btrfs_wipe_sb_zones(int fd, + struct btrfs_zoned_device_info *zinfo) +{ + return 0; +} #endif /* BTRFS_ZONED */ #endif /* __BTRFS_HMZONED_H__ */ diff --git a/disk-io.c b/disk-io.c index 659f8b93a7ca..92f781ce4abe 100644 --- a/disk-io.c +++ b/disk-io.c @@ -35,6 +35,7 @@ #include "common/rbtree-utils.h" #include "common/device-scan.h" #include "crypto/hash.h" +#include "common/hmzoned.h" /* specified errno for check_tree_block */ #define BTRFS_BAD_BYTENR (-1) @@ -1553,7 +1554,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, u64 bytenr; if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { - ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, sb_bytenr); + ret = sbread(fd, buf, sb_bytenr); /* real error */ if (ret < 0) return -errno; @@ -1581,7 +1582,8 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, for (i = 0; i < max_super; i++) { bytenr = btrfs_sb_offset(i); - ret = pread64(fd, buf, BTRFS_SUPER_INFO_SIZE, bytenr); + ret = sbread(fd, buf, bytenr); + if (ret < BTRFS_SUPER_INFO_SIZE) break; @@ -1653,9 +1655,8 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is * zero filled, we can use it directly */ - ret = pwrite64(device->fd, fs_info->super_copy, - BTRFS_SUPER_INFO_SIZE, - fs_info->super_bytenr); + ret = sbwrite(device->fd, fs_info->super_copy, + fs_info->super_bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { errno = EIO; error( @@ -1688,8 +1689,7 @@ static int write_dev_supers(struct btrfs_fs_info *fs_info, * super_copy is BTRFS_SUPER_INFO_SIZE bytes and is * zero filled, we can use it directly */ - ret = pwrite64(device->fd, fs_info->super_copy, - BTRFS_SUPER_INFO_SIZE, bytenr); + ret = sbwrite(device->fd, fs_info->super_copy, bytenr); if (ret != BTRFS_SUPER_INFO_SIZE) { errno = EIO; error( diff --git a/kerncompat.h b/kerncompat.h index 01fd93a7b540..c38643437747 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -76,6 +76,10 @@ #define ULONG_MAX (~0UL) #endif +#ifndef SECTOR_SHIFT +#define SECTOR_SHIFT 9 +#endif + #define __token_glue(a,b,c) ___token_glue(a,b,c) #define ___token_glue(a,b,c) a ## b ## c #ifdef DEBUG_BUILD_CHECKS @@ -162,6 +166,7 @@ typedef long long s64; typedef int s32; #endif +typedef u64 sector_t; struct vma_shared { int prio_tree_node; }; struct vm_area_struct { @@ -362,6 +367,8 @@ typedef u32 __bitwise __be32; typedef u64 __bitwise __le64; typedef u64 __bitwise __be64; +#define U64_MAX UINT64_MAX + /* Macros to generate set/get funcs for the struct fields * assume there is a lefoo_to_cpu for every type, so lets make a simple * one for u8: From patchwork Wed Dec 4 08:25:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB3FC6C1 for ; Wed, 4 Dec 2019 08:27:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A063F2073C for ; Wed, 4 Dec 2019 08:27:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="h3nP9Lcl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727426AbfLDI1x (ORCPT ); Wed, 4 Dec 2019 03:27:53 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI1w (ORCPT ); Wed, 4 Dec 2019 03:27:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448078; x=1606984078; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oyFEVNdxrKbsPoFlkUzanLUNBnNj0svoO45/fjA8xoU=; b=h3nP9Lcl7X3i8DY73c3P9MBKJnHew3H/HWDUMXgqA8fwlSWRhAYiSVvM c3Px6d392KNtDQ4+sL1IMMPc2ni9LNGAIXsVwKzXr2KANqIQ2QthBL9HA 4SJuaPSvA6ObWLEZ/NkCU+hYIblHAKdX6o6l+U0lVjnizLnUf5FTuge8j Vr6q/hpO46LP/QxAwidaDCQX0sKr7yW5yFE3iqmzexlhZ/QW4dSDi5pSb aNI24GaUGmdj9sy2PVonvRL9vPikxMAeBLThGgbhd9TOTex4i5ee4TWNs 6thNrO/zI9e7lOygEeX3HibEsz/qH1Yu+4L1reDpb+1mKqDcRyk3YCvRj w==; IronPort-SDR: Xmchn2s/akEZzQ+g5RIumXbV/XCrbahoLX+7JR6H7V1aEnYuEgsQfFL4t1RRg83S1NSFy4w7ks rPUDTmGDyHCs/AY1RWBG+Ws8zRZ2Bm+qq70DmTeiLRFoUXUnQP3l2e7prx6qnXYhDOJeJZIjDV 2aM+LFddKi32/ZKHQDk6aVhwFagHWLkQJmArvzw11atbjZqoPurxrnlzzumzM3JAAGTIt/9G3V q6A3x2dwsRI1O3gABVhCEOwUhTyBxb06jQBBNM2H46GGOaeDv4GiBgciQifG07YaoRboY9itmw hIU= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031755" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:27:57 +0800 IronPort-SDR: r46LPXJSf3Xa1nSgJWdthJMmZKy5Q0uhYYPQEN9l10iYoIWYIlE0n8RDOseZ21EQjpwALU/QMP 49QQKvjJqpdFQtSVaq+IfvEaMXW+l1VEDho4rnefucuRiOYngmiJ0xFILwPgZ/k3MXx2FsTSNP pW5UOXZf4fFgxaxkT/UUBrUtvI17YBjJg7cD+Du2l4WxBLofngFVm9q/ewdkJMRcyStbnLr2W1 mi/QW2itwHkYyGZBtG4iP+896JE2yS0KNsZx9cbgrkD5El3KUSYqP5Qe8fbzWXvXn8cV28dmC7 SP8iPCj835WzI4zBJrXFySF0 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:38 -0800 IronPort-SDR: pe1aBi0sDVQgMMTSbl2G8xLn8VSskCdy2CDwFTMpUz+gJEEyR+ctbM/Iqdu5UrM/KYW1MR4FjG t5jNUYxyalzZksfY1DM85lE2qBXyP/6GuBm7JtLSWdGGusw7wAf02hHYo0SvrFr+jeViDSMNrz rzBaCCumE6JpM2Il14/aAlcA/Rd0LrLzyhXhEqv7wCeTAaL4TDuf8O5gnT4MRv4VDU4OM2A3pq dyBZStThtY75gdviLByxkz8pnyrkvNAH4WLcb+/j3H9h3z58mt9A9cEmFYTnZu8PZtes/pOWwp WG4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:50 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 10/15] btrfs-progs: align device extent allocation to zone boundary Date: Wed, 4 Dec 2019 17:25:08 +0900 Message-Id: <20191204082513.857320-11-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In HMZONED mode, align the device extents to zone boundaries so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, check that a region allocation is always over empty zones and it is not over any locations of super block zones. Signed-off-by: Naohiro Aota --- common/hmzoned.c | 70 ++++++++++++++++++++++++++++++++++++++++++++ common/hmzoned.h | 23 +++++++++++++++ kerncompat.h | 2 ++ volumes.c | 76 +++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 163 insertions(+), 8 deletions(-) diff --git a/common/hmzoned.c b/common/hmzoned.c index 5080bd7dea5b..2cbf2fc88cb0 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -24,6 +24,8 @@ #include "common/messages.h" #include "mkfs/common.h" #include "common/hmzoned.h" +#include "volumes.h" +#include "disk-io.h" #define BTRFS_REPORT_NR_ZONES 8192 @@ -435,6 +437,74 @@ size_t btrfs_sb_io(int fd, void *buf, off_t offset, int rw) return ret_sz; } +static inline bool btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + unsigned int zno; + + if (!zone_is_sequential(zinfo, pos)) + return true; + + zno = pos / zinfo->zone_size; + return zinfo->zones[zno].cond == BLK_ZONE_COND_EMPTY; +} + +/* + * btrfs_check_allocatable_zones - check if spcecifeid region is + * suitable for allocation + * @device: the device to allocate a region + * @pos: the position of the region + * @num_bytes: the size of the region + * + * In non-ZONED device, anywhere is suitable for allocation. In ZONED + * device, check if + * 1) the region is not on non-empty sequential zones, + * 2) all zones in the region have the same zone type, + * 3) it does not contain super block location + */ +bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, + u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u64 nzones, begin, end; + u64 sb_pos; + bool is_sequential; + int i; + + if (!zinfo || zinfo->model == ZONED_NONE) + return true; + + nzones = num_bytes / zinfo->zone_size; + begin = pos / zinfo->zone_size; + end = begin + nzones; + + ASSERT(IS_ALIGNED(pos, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return false; + + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + sb_pos = sb_zone_number(zinfo->zone_size, i); + if (!(end < sb_pos || sb_pos + 1 < begin)) + return false; + } + + is_sequential = btrfs_dev_is_sequential(device, pos); + + while (num_bytes) { + if (is_sequential && !btrfs_dev_is_empty_zone(device, pos)) + return false; + if (is_sequential != btrfs_dev_is_sequential(device, pos)) + return false; + + pos += zinfo->zone_size; + num_bytes -= zinfo->zone_size; + } + + return true; +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index 920f992dbb93..3444e2c1b0f5 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -19,6 +19,7 @@ #define __BTRFS_HMZONED_H__ #include +#include "volumes.h" #ifdef BTRFS_ZONED #include @@ -67,6 +68,8 @@ static inline size_t sbwrite(int fd, void *buf, off_t offset) return btrfs_sb_io(fd, buf, offset, WRITE); } int btrfs_wipe_sb_zones(int fd, struct btrfs_zoned_device_info *zinfo); +bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, + u64 num_bytes); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) @@ -97,6 +100,26 @@ static inline int btrfs_wipe_sb_zones(int fd, { return 0; } +static inline bool btrfs_check_allocatable_zones(struct btrfs_device *device, + u64 pos, u64 num_bytes) +{ + return true; +} + #endif /* BTRFS_ZONED */ +static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + return zone_is_sequential(device->zone_info, pos); +} +static inline u64 btrfs_zone_align(struct btrfs_device *device, u64 pos) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + + if (!zinfo || zinfo->model == ZONED_NONE) + return pos; + + return ALIGN(pos, zinfo->zone_size); +} + #endif /* __BTRFS_HMZONED_H__ */ diff --git a/kerncompat.h b/kerncompat.h index c38643437747..58cdcf921c5e 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -354,6 +355,7 @@ static inline void assert_trace(const char *assertion, const char *filename, /* Alignment check */ #define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) +#define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) static inline int is_power_of_2(unsigned long n) { diff --git a/volumes.c b/volumes.c index d92052e19330..148169d5b2a2 100644 --- a/volumes.c +++ b/volumes.c @@ -496,6 +496,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device, int slot; struct extent_buffer *l; u64 min_search_start; + u64 zone_size = 0; /* * We don't want to overwrite the superblock on the drive nor any area @@ -504,6 +505,14 @@ static int find_free_dev_extent_start(struct btrfs_device *device, */ min_search_start = max(root->fs_info->alloc_start, (u64)SZ_1M); search_start = max(search_start, min_search_start); + /* + * For a zoned block device, skip the first zone of the device + * entirely. + */ + if (device->zone_info) + zone_size = device->zone_info->zone_size; + search_start = max_t(u64, search_start, zone_size); + search_start = btrfs_zone_align(device, search_start); path = btrfs_alloc_path(); if (!path) @@ -512,6 +521,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device, max_hole_start = search_start; max_hole_size = 0; +again: if (search_start >= search_end) { ret = -ENOSPC; goto out; @@ -556,6 +566,13 @@ static int find_free_dev_extent_start(struct btrfs_device *device, goto next; if (key.offset > search_start) { + if (!btrfs_check_allocatable_zones(device, search_start, + num_bytes)) { + search_start += zone_size; + btrfs_release_path(path); + goto again; + } + hole_size = key.offset - search_start; /* @@ -598,6 +615,13 @@ next: * search_end may be smaller than search_start. */ if (search_end > search_start) { + if (!btrfs_check_allocatable_zones(device, search_start, + num_bytes)) { + search_start += zone_size; + btrfs_release_path(path); + goto again; + } + hole_size = search_end - search_start; if (hole_size > max_hole_size) { @@ -613,6 +637,7 @@ next: ret = 0; out: + ASSERT(zone_size == 0 || IS_ALIGNED(max_hole_start, zone_size)); btrfs_free_path(path); *start = max_hole_start; if (len) @@ -641,6 +666,11 @@ int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans, struct extent_buffer *leaf; struct btrfs_key key; + /* Check alignment to zone for a zoned block device */ + ASSERT(!device->zone_info || + device->zone_info->model != ZONED_HOST_MANAGED || + IS_ALIGNED(start, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -1045,17 +1075,13 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int max_stripes = 0; int min_stripes = 1; int sub_stripes = 1; - int dev_stripes __attribute__((unused)); - /* stripes per dev */ + int dev_stripes; /* stripes per dev */ int devs_max; /* max devs to use */ - int devs_min __attribute__((unused)); - /* min devs needed */ + int devs_min; /* min devs needed */ int devs_increment __attribute__((unused)); /* ndevs has to be a multiple of this */ - int ncopies __attribute__((unused)); - /* how many copies to data has */ - int nparity __attribute__((unused)); - /* number of stripes worth of bytes to + int ncopies; /* how many copies to data has */ + int nparity; /* number of stripes worth of bytes to store parity information */ int looped = 0; int ret; @@ -1063,6 +1089,8 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int stripe_len = BTRFS_STRIPE_LEN; struct btrfs_key key; u64 offset; + bool hmzoned = info->fs_devices->hmzoned; + u64 zone_size = info->fs_devices->zone_size; if (list_empty(dev_list)) { return -ENOSPC; @@ -1163,13 +1191,40 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, btrfs_super_stripesize(info->super_copy)); } + if (hmzoned) { + calc_size = zone_size; + max_chunk_size = max(max_chunk_size, zone_size); + max_chunk_size = round_down(max_chunk_size, zone_size); + } + /* we don't want a chunk larger than 10% of the FS */ percent_max = div_factor(btrfs_super_total_bytes(info->super_copy), 1); max_chunk_size = min(percent_max, max_chunk_size); + if (hmzoned) { + int min_num_stripes = devs_min * dev_stripes; + int min_data_stripes = (min_num_stripes - nparity) / ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + + max_chunk_size = max(round_down(max_chunk_size, + zone_size), + min_chunk_size); + } + again: if (chunk_bytes_by_type(type, calc_size, num_stripes, sub_stripes) > max_chunk_size) { + if (hmzoned) { + /* + * calc_size is fixed in HMZONED. Reduce + * num_stripes instead. + */ + num_stripes = max_chunk_size * ncopies / calc_size; + if (num_stripes < min_stripes) + return -ENOSPC; + goto again; + } + calc_size = max_chunk_size; calc_size /= num_stripes; calc_size /= stripe_len; @@ -1180,6 +1235,9 @@ again: calc_size /= stripe_len; calc_size *= stripe_len; + + ASSERT(!hmzoned || calc_size == zone_size); + INIT_LIST_HEAD(&private_devs); cur = dev_list->next; index = 0; @@ -1261,6 +1319,8 @@ again: if (ret < 0) goto out_chunk_map; + ASSERT(!zone_size || IS_ALIGNED(dev_offset, zone_size)); + device->bytes_used += calc_size; ret = btrfs_update_device(trans, device); if (ret < 0) From patchwork Wed Dec 4 08:25:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E363914B7 for ; Wed, 4 Dec 2019 08:27:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AEAEA2073B for ; Wed, 4 Dec 2019 08:27:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="PvJgC2ih" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727437AbfLDI1y (ORCPT ); Wed, 4 Dec 2019 03:27:54 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI1y (ORCPT ); Wed, 4 Dec 2019 03:27:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448081; x=1606984081; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P4gWUVdpCP7B3Ojc7ayMcAefVW58VKh9Ogz4kxt8SRE=; b=PvJgC2ih5iB5UWz9OP9F5zhR8YU2BsTDHh+Sv1x2y0f+lc8c7mKIKPHz 3a37UYXbZW8LTX4DtzSqWnmEfvnXANnFXvJct5T3Qmj4ssjeGxlsVAYwu /BJKMIYdn6R1sHzFluk6jeZaJFFFmZ4Vtl5+NKcCAQtXJ/D6xxuYDeIwM gOTnX/BqXaWT10rVxE6SVCFRV1QcDr8RxgLPwnViOButiMtMiS7Fb7Yeo pq+U8R6avStofl6i83TlVJ9tN8kre3LxswPkm7ltSrhIbiDI76vC4KEbT EnFDKLGBvkdDOLMOR0Mrz5CT65KyFG5TevPhH/73A/fsaNN0Q2o86YewA Q==; IronPort-SDR: EUbFeYqeAtgXAuXbh+RMDmRMlqoHOnH39x5DRyIRNnxrMQX3fDJrwIx5Ty6bSq7KFgJZ1z2hGz T3N31xMi65NUyyx+r9WJcXw/3++nQUG7ofOTsP+U7oJbZVqCBwYQjk4Wpxx7Uh+FbttwxgUdbV DQuVF2UpP/m53NORDS89HdCOk8AithvKY50AF74V0q5uGQl3g5b/BSUGNZAWnJkSWUXyfBMyOj iEKeejy5SJOEozHgakZgPezlOfpAX2UfSN0KNihGD9zUFnj3zFTahKnbQEUGokEGUabnLJRjRu eu8= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031760" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:00 +0800 IronPort-SDR: SBqEK48jJeMfO95+uRrIRYR7Ktj0ShvR1rcoQShhEvfFSQJ62epw27eKKTVgczhiJtL0Tvbggk LL1sPc4gvl0DveytIs/1ledr9vpy7ytPMumWQtqw7DrjxnHf4Kiw6QQCLFOpsGdW5D7gs5fnmT GpCBHMNoCfimWNUFoYVai97frCxh0YSvlwCjhFvNz1JKy+XewclyDhU2lLwdoz/djGlr+Vd6/f sPz3XKFuSV+2no0ioIuqbhdc300o0QccxVoFhEzQpJYgxaGQPnMx2Xb1Gl8gF2GLyhUim3mN6Y Lf2Sm+bIKZk7mubhDY+OOIqI Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:41 -0800 IronPort-SDR: 73XmvTRJl3viCCwLMWJM1zqcwsc9PkkDGgRjergEG1wiYq6ZKWarItWFu23Ba+/TrVRIIC6fiK k26W13FIEW/94KSWJpTmnjPX5WfsS7HkRmupUTtGTtSinggT6XOhlVfee9mn+uNxnMzoNJ55Ik cVEt8WbmYgJPAcykjDN5wE2nhXtx25lStX8VcWp8JRWyRMPS5sA5m6VdhEadYjnwQU1dHTfnY6 Ad++IC2EFEKr15IDpRqtwsf8emiX/FTTeec378O2UwoHZqsWHkz/7zWejTTBZWdo7sWQLfEiWN NZo= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:52 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 11/15] btrfs-progs: do sequential allocation in HMZONED mode Date: Wed, 4 Dec 2019 17:25:09 +0900 Message-Id: <20191204082513.857320-12-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not yet written. Signed-off-by: Naohiro Aota --- common/hmzoned.c | 406 +++++++++++++++++++++++++++++++++++++++++++++++ common/hmzoned.h | 7 + ctree.h | 6 + extent-tree.c | 16 ++ 4 files changed, 435 insertions(+) diff --git a/common/hmzoned.c b/common/hmzoned.c index 2cbf2fc88cb0..f268f360d8f7 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -29,6 +29,11 @@ #define BTRFS_REPORT_NR_ZONES 8192 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) + enum btrfs_zoned_model zoned_model(const char *file) { char model[32]; @@ -505,6 +510,407 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, return true; } +static int emulate_write_pointer(struct btrfs_fs_info *fs_info, + struct btrfs_block_group_cache *cache, + u64 *offset_ret) +{ + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_key search_key; + struct btrfs_key found_key; + int slot; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + search_key.objectid = cache->key.objectid + cache->key.offset; + search_key.type = 0; + search_key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0); + if (ret < 0) + goto out; + ASSERT(ret != 0); + slot = path->slots[0]; + leaf = path->nodes[0]; + ASSERT(slot != 0); + slot--; + btrfs_item_key_to_cpu(leaf, &found_key, slot); + + if (found_key.objectid < cache->key.objectid) { + *offset_ret = 0; + } else if (found_key.type == BTRFS_BLOCK_GROUP_ITEM_KEY) { + struct btrfs_key extent_item_key; + + if (found_key.objectid != cache->key.objectid) { + ret = -EUCLEAN; + goto out; + } + + length = 0; + + /* metadata may have METADATA_ITEM_KEY */ + if (slot == 0) { + ret = btrfs_prev_leaf(root, path); + if (ret < 0) + goto out; + if (ret == 0) { + slot = btrfs_header_nritems(leaf) - 1; + btrfs_item_key_to_cpu(leaf, &extent_item_key, + slot); + } + } else { + btrfs_item_key_to_cpu(leaf, &extent_item_key, slot - 1); + ret = 0; + } + + if (ret == 0 && + extent_item_key.objectid == cache->key.objectid) { + if (extent_item_key.type == BTRFS_METADATA_ITEM_KEY) + length = fs_info->nodesize; + else if (extent_item_key.type == BTRFS_EXTENT_ITEM_KEY) + length = extent_item_key.offset; + else { + ret = -EUCLEAN; + goto out; + } + } + + *offset_ret = length; + } else if (found_key.type == BTRFS_EXTENT_ITEM_KEY || + found_key.type == BTRFS_METADATA_ITEM_KEY) { + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->key.objectid && + found_key.objectid + length <= + cache->key.objectid + cache->key.offset)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->key.objectid; + } else { + ret = -ENOENT; + goto out; + } + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +static u64 offset_in_dev_extent(struct map_lookup *map, u64 *alloc_offsets, + u64 logical, int idx) +{ + u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK; + u64 stripe_nr = logical / map->stripe_len; + u64 full_stripes_cnt = stripe_nr / map->num_stripes; + u32 rest_stripes_cnt = stripe_nr % map->num_stripes; + u64 stripe_start, offset; + int data_stripes = map->num_stripes / map->sub_stripes; + int stripe_idx; + int i; + + ASSERT(profile == BTRFS_BLOCK_GROUP_RAID0 || + profile == BTRFS_BLOCK_GROUP_RAID10); + + stripe_idx = idx / map->sub_stripes; + + if (stripe_idx < rest_stripes_cnt) + return map->stripe_len * (full_stripes_cnt + 1); + + for (i = idx + map->sub_stripes; i < map->num_stripes; + i += map->sub_stripes) { + if (alloc_offsets[i] != WP_CONVENTIONAL && + alloc_offsets[i] > map->stripe_len * full_stripes_cnt) + return map->stripe_len * (full_stripes_cnt + 1); + } + + stripe_start = (full_stripes_cnt * data_stripes + stripe_idx) * + map->stripe_len; + if (stripe_start >= logical) + return full_stripes_cnt * map->stripe_len; + offset = min_t(u64, logical - stripe_start, map->stripe_len); + + return full_stripes_cnt * map->stripe_len + offset; +} + +int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, + struct btrfs_block_group_cache *cache) +{ + struct btrfs_device *device; + struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; + struct cache_extent *ce; + struct map_lookup *map; + u64 logical = cache->key.objectid; + u64 length = cache->key.offset; + u64 physical = 0; + int ret = 0; + int i, j; + u64 zone_size = fs_info->fs_devices->zone_size; + u64 *alloc_offsets = NULL; + u64 emulated_offset = 0; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_fs_incompat(fs_info, HMZONED)) + return 0; + + /* Sanity check */ + if (logical == BTRFS_BLOCK_RESERVED_1M_FOR_SUPER) { + if (length + SZ_1M != zone_size) { + error("unaligned initial system block group"); + return -EIO; + } + } else if (!IS_ALIGNED(length, zone_size)) { + error("unaligned block group at %llu + %llu", logical, length); + return -EIO; + } + + /* Get the chunk mapping */ + ce = search_cache_extent(&map_tree->cache_tree, logical); + if (!ce) { + error("failed to find block group at %llu", logical); + return -ENOENT; + } + map = container_of(ce, struct map_lookup, ce); + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_offsets = calloc(map->num_stripes, sizeof(*alloc_offsets)); + if (!alloc_offsets) { + error("failed to allocate alloc_offsets"); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->fd == -1) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, zone_size)); + zone = device->zone_info->zones[physical / zone_size]; + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + error("Offline/readonly zone %llu", + physical / fs_info->fs_devices->zone_size); + ret = -EIO; + goto out; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = ((zone.wp - zone.start) << 9); + break; + } + } + + if (num_conventional > 0) { + ret = emulate_write_pointer(fs_info, cache, &emulated_offset); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = emulated_offset; + goto out; + } + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + cache->alloc_offset = WP_MISSING_DEV; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV || + alloc_offsets[i] == WP_CONVENTIONAL) + continue; + if (cache->alloc_offset == WP_MISSING_DEV) + cache->alloc_offset = alloc_offsets[i]; + if (alloc_offsets[i] == cache->alloc_offset) + continue; + + error("write pointer mismatch: block group %llu", + logical); + ret = -EIO; + goto out; + } + if (num_conventional && emulated_offset > cache->alloc_offset) + ret = -EIO; + break; + case BTRFS_BLOCK_GROUP_RAID0: + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) { + error( + "cannot recover write pointer: block group %llu", + logical); + ret = -EIO; + goto out; + } + + if (alloc_offsets[i] == WP_CONVENTIONAL) + alloc_offsets[i] = + offset_in_dev_extent(map, alloc_offsets, + emulated_offset, + i); + + /* sanity check */ + if (i > 0) { + if ((alloc_offsets[i] % BTRFS_STRIPE_LEN != 0 && + alloc_offsets[i - 1] % + BTRFS_STRIPE_LEN != 0) || + (alloc_offsets[i - 1] < alloc_offsets[i]) || + (alloc_offsets[i - 1] - alloc_offsets[i] > + BTRFS_STRIPE_LEN)) { + error( + "write pointer mismatch at %d: block group %llu", + i, logical); + ret = -EIO; + goto out; + } + } + + cache->alloc_offset += alloc_offsets[i]; + } + break; + case BTRFS_BLOCK_GROUP_RAID10: + /* + * Pass1: check write pointer of RAID1 level: each pointer + * should be equal. + */ + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i * map->sub_stripes; + u64 offset = WP_MISSING_DEV; + int fill = 0, num_conventional = 0; + + for (j = 0; j < map->sub_stripes; j++) { + if (alloc_offsets[base+j] == WP_MISSING_DEV) { + fill++; + continue; + } + if (alloc_offsets[base+j] == WP_CONVENTIONAL) { + fill++; + num_conventional++; + continue; + } + if (offset == WP_MISSING_DEV) + offset = alloc_offsets[base + j]; + if (alloc_offsets[base + j] == offset) + continue; + + error( + "write pointer mismatch: block group %llu", + logical); + ret = -EIO; + goto out; + } + if (!fill) + continue; + /* this RAID0 stripe is free on conventional zones */ + if (num_conventional == map->sub_stripes) + offset = WP_CONVENTIONAL; + /* fill WP_MISSING_DEV or WP_CONVENTIONAL */ + for (j = 0; j < map->sub_stripes; j++) + alloc_offsets[base + j] = offset; + } + + /* Pass2: check write pointer of RAID0 level */ + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i * map->sub_stripes; + + if (alloc_offsets[base] == WP_MISSING_DEV) { + error( + "cannot recover write pointer: block group %llu", + logical); + ret = -EIO; + goto out; + } + + if (alloc_offsets[base] == WP_CONVENTIONAL) + alloc_offsets[base] = + offset_in_dev_extent(map, alloc_offsets, + emulated_offset, + base); + + /* sanity check */ + if (i > 0) { + int prev = base - map->sub_stripes; + + if ((alloc_offsets[base] % + BTRFS_STRIPE_LEN != 0 && + alloc_offsets[prev] % + BTRFS_STRIPE_LEN != 0) || + (alloc_offsets[prev] < + alloc_offsets[base]) || + (alloc_offsets[prev] - alloc_offsets[base] > + BTRFS_STRIPE_LEN)) { + error( + "write pointer mismatch: block group %llu", + logical); + ret = -EIO; + goto out; + } + } + + cache->alloc_offset += alloc_offsets[base]; + } + break; + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* RAID5/6 is not supported yet */ + default: + error("Unsupported profile %llu", + map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK); + ret = -EINVAL; + goto out; + } + +out: + /* an extent is allocated after the write pointer */ + if (num_conventional && emulated_offset > cache->alloc_offset) + ret = -EIO; + + free(alloc_offsets); + return ret; +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index 3444e2c1b0f5..a6b16d0ed35a 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -70,6 +70,8 @@ static inline size_t sbwrite(int fd, void *buf, off_t offset) int btrfs_wipe_sb_zones(int fd, struct btrfs_zoned_device_info *zinfo); bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, u64 num_bytes); +int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, + struct btrfs_block_group_cache *cache); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) @@ -105,6 +107,11 @@ static inline bool btrfs_check_allocatable_zones(struct btrfs_device *device, { return true; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_fs_info *fs_info, struct btrfs_block_group_cache *cache) +{ + return 0; +} #endif /* BTRFS_ZONED */ diff --git a/ctree.h b/ctree.h index 34fd7d00cabf..fe72bd8921b0 100644 --- a/ctree.h +++ b/ctree.h @@ -1119,6 +1119,12 @@ struct btrfs_block_group_cache { */ u32 bitmap_low_thresh; + /* + * Allocation offset for the block group to implement + * sequential allocation. This is used only with HMZONED mode + * enabled. + */ + u64 alloc_offset; }; struct btrfs_device; diff --git a/extent-tree.c b/extent-tree.c index 53be4f4c7369..89a8b935b602 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -30,6 +30,7 @@ #include "volumes.h" #include "free-space-cache.h" #include "free-space-tree.h" +#include "common/hmzoned.h" #include "common/utils.h" #define PENDING_EXTENT_INSERT 0 @@ -258,6 +259,14 @@ again: if (cache->ro || !block_group_bits(cache, data)) goto new_group; + if (root->fs_info->fs_devices->hmzoned) { + if (cache->key.offset - cache->alloc_offset < num) + goto new_group; + *start_ret = cache->key.objectid + cache->alloc_offset; + cache->alloc_offset += num; + return 0; + } + while(1) { ret = find_first_extent_bit(&root->fs_info->free_space_cache, last, &start, &end, EXTENT_DIRTY); @@ -2720,6 +2729,10 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info, } cache->space_info = space_info; + ret = btrfs_load_block_group_zone_info(fs_info, cache); + if (ret) + return ret; + set_extent_bits(block_group_cache, cache->key.objectid, cache->key.objectid + cache->key.offset - 1, bit | EXTENT_LOCKED); @@ -2785,6 +2798,9 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type, cache->key.objectid = chunk_offset; cache->key.offset = size; + ret = btrfs_load_block_group_zone_info(fs_info, cache); + BUG_ON(ret); + cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; cache->used = bytes_used; cache->flags = type; From patchwork Wed Dec 4 08:25:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A8F86C1 for ; Wed, 4 Dec 2019 08:27:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 09A0C2073B for ; Wed, 4 Dec 2019 08:27:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="bJZuiL/o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727445AbfLDI15 (ORCPT ); Wed, 4 Dec 2019 03:27:57 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDI14 (ORCPT ); Wed, 4 Dec 2019 03:27:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448085; x=1606984085; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aGA303FlVuqj2COxIAsnFqsYcyGVrobi6DD536IWbss=; b=bJZuiL/oqvG2j+jX+fkSHLwTkKSchF9LTTk4N3L1gV82IFCDlN5kNVGw K8GjrCnbuiiARrtddC8e+KEn3rylmOM/WK+6R4sYNriVWa5nn0/AMKnIe L5Iz2K59gxZRvgaoJGbW+8CBl21ZSeXHaDs9qD78oLpjWIrSsXFgVxoJU 5Na4uP/ic5yDn/sxv3Il9ce8mNXDXKE2VtLGUpBGhEiIY9XKS+Nsp8zgw XEwOwoNKmYe3Bs0H6mv9SZz+KumQ3aKW1uUDaguQIyi/WN3JkocYARgxZ Yd022QkraQClllTMtRr5XvPhh/Q+J+/Rj1cDO7fE6mToQBaR4KdomYabY A==; IronPort-SDR: YJ9g58t8RbZ5ixLXHzIWgxo1Jd6V7L8fr7fvoo4gYwY2WDxfiv2tAqlOdZnKi3vQmVvgi9q9la ARYCWQeikvnWDj5H+lRGX6UPVT+xa38+DZe/tXuezu+tXG+XaBKW12Q1PwX9WG8oi4iFzq4k9k 1Kvc9UhBqNwLPYuWkEDxV8O+AeBgoOvSHvD+87CSymR/yL5U90y95fyaWUa6nwXcl3GEYT8cj4 WGYIKkq0mvsbQQUt2DqHyXpXVMb+SCS1yupyWI9/imJjYhxQK6FklaD0ukr8kvop3/CK1m/Pez Jl4= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031764" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:04 +0800 IronPort-SDR: j2zWLUFF1fD/S2G77FKP9XLkzuuiqUNqRxYx+uIE0sSTeF7C4nbYq/3oIpuAFD8PC3PUaAqI0p rAyRrVLutAOCviKn1i4JEx8dIepw6UpqDGZCoH/d9rq14NxZp5b1RB4TpV/Tfgz1v3wuK7UNBw l0eqrwd8nBF4MUt5j2u8JtVNQHKV6XN20vqWXB/tXiblqb0Y1addAU9tdDjsqbmhZbF8of+s5E AuWJX+tJHpjysXHwOvt1AZHRRJzofBlcP9BAMSOixTxyU17xAD8YLkC0UYlbI7cewdL/HehL55 KKG+SHm0Y+inAsgNZgi9CoCZ Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:43 -0800 IronPort-SDR: vivBBb04zd08TabTTqniljCl2lTynxK9caNuJ34VHHljCAy4iHoejKDyLA611sQMsckR+7HpWq 9BOocTRh7tPi71Vx5WQy83hjPEIllSuw0kclUg6BIiG9AKDnjUHr0kP4WLaZhPyL23AZA55ZWD tzDRoxusI/KkgxzMBv46cBme6rhR3ND/UdcrxN+NfpNOBYrb3PTTEK7aQ+pTn0LdlYqopThyXb 7GYJddHVQfp6VSFl5TKSGUYuKYM6VUnSKiQjnVoZ5ZFjyVHRoF4ZTyKFusnD0N2l2vapL3uy0c B7g= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:55 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 12/15] btrfs-progs: redirty clean extent buffers in seq Date: Wed, 4 Dec 2019 17:25:10 +0900 Message-Id: <20191204082513.857320-13-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On HMZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota --- common/hmzoned.c | 30 ++++++++++++++++++++++++++++++ common/hmzoned.h | 7 +++++++ ctree.h | 1 + transaction.c | 7 +++++++ 4 files changed, 45 insertions(+) diff --git a/common/hmzoned.c b/common/hmzoned.c index f268f360d8f7..53c9e1cfd472 100644 --- a/common/hmzoned.c +++ b/common/hmzoned.c @@ -907,10 +907,40 @@ out: if (num_conventional && emulated_offset > cache->alloc_offset) ret = -EIO; + if (!ret) + cache->write_offset = cache->alloc_offset; + free(alloc_offsets); return ret; } +bool btrfs_redirty_extent_buffer_for_hmzoned(struct btrfs_fs_info *fs_info, + u64 start, u64 end) +{ + u64 next; + struct btrfs_block_group_cache *cache; + struct extent_buffer *eb; + + if (!fs_info->fs_devices->hmzoned) + return false; + + cache = btrfs_lookup_first_block_group(fs_info, start); + BUG_ON(!cache); + + if (cache->key.objectid + cache->write_offset < start) { + next = cache->key.objectid + cache->write_offset; + BUG_ON(next + fs_info->nodesize > start); + eb = btrfs_find_create_tree_block(fs_info, next); + btrfs_mark_buffer_dirty(eb); + free_extent_buffer(eb); + return true; + } + + cache->write_offset += (end + 1 - start); + + return false; +} + #endif int btrfs_get_zone_info(int fd, const char *file, bool hmzoned, diff --git a/common/hmzoned.h b/common/hmzoned.h index a6b16d0ed35a..ee2fab311967 100644 --- a/common/hmzoned.h +++ b/common/hmzoned.h @@ -72,6 +72,8 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos, u64 num_bytes); int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, struct btrfs_block_group_cache *cache); +bool btrfs_redirty_extent_buffer_for_hmzoned(struct btrfs_fs_info *fs_info, + u64 start, u64 end); #else static inline bool zone_is_sequential(struct btrfs_zoned_device_info *zinfo, u64 bytenr) @@ -112,6 +114,11 @@ static inline int btrfs_load_block_group_zone_info( { return 0; } +static inline bool btrfs_redirty_extent_buffer_for_hmzoned( + struct btrfs_fs_info *fs_info, u64 start, u64 end) +{ + return false; +} #endif /* BTRFS_ZONED */ diff --git a/ctree.h b/ctree.h index fe72bd8921b0..7418627cade3 100644 --- a/ctree.h +++ b/ctree.h @@ -1125,6 +1125,7 @@ struct btrfs_block_group_cache { * enabled. */ u64 alloc_offset; + u64 write_offset; }; struct btrfs_device; diff --git a/transaction.c b/transaction.c index 45bb9e1f9de6..7b37f12f118f 100644 --- a/transaction.c +++ b/transaction.c @@ -18,6 +18,7 @@ #include "disk-io.h" #include "transaction.h" #include "delayed-ref.h" +#include "common/hmzoned.h" #include "common/messages.h" @@ -136,10 +137,16 @@ int __commit_transaction(struct btrfs_trans_handle *trans, int ret; while(1) { +again: ret = find_first_extent_bit(tree, 0, &start, &end, EXTENT_DIRTY); if (ret) break; + + if (btrfs_redirty_extent_buffer_for_hmzoned(fs_info, start, + end)) + goto again; + while(start <= end) { eb = find_first_extent_buffer(tree, start); BUG_ON(!eb || eb->start != start); From patchwork Wed Dec 4 08:25:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A9126C1 for ; Wed, 4 Dec 2019 08:28:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6FFE52073B for ; Wed, 4 Dec 2019 08:28:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="aZCCrcqb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727457AbfLDI17 (ORCPT ); Wed, 4 Dec 2019 03:27:59 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727298AbfLDI16 (ORCPT ); Wed, 4 Dec 2019 03:27:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448088; x=1606984088; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YKEeNdIWhF7Iv1E5tol5mrhucL4RWf/bkKmdkmBW/EM=; b=aZCCrcqbxGEGmWLVyE2WknXaxTG8w+1OZleu62doZlKal+PG2UIuk//Q xZwNFv+CtGp5rGyMQ0kBR0KBKTgvXhEvZfievF5ad+2d02XuYhB8ECh8c Fbjq9FedLAPXI681AdCLuhvdUOeQdazkFnKq74kaYYRGh/0mtiSlwQ2Yv fEcYv0nYDwwAn5POB4FbzYQoj6FTqZiD205wb5EYP0DFLVCrZVTejAmOF y+LwEHeA84DcDAmt/WspFhWZxGBmllQE9yIWL9cneY3bGfMwO200kbcHo vkH4oudjKmg9gy6Eid9RPama75HTxHQ1OCDE5HxzCl/trz75ZhFDqwdHI g==; IronPort-SDR: HtxftwZtJy9L6wxN0uxJdgWTB2keeiLM1+7V16LAuPLYYaJ0Pr+rbffP3xICk5k106hCSJ6WfH O1FDYipWd4ecLkjdVabeu1VcBv/2sWW5z1+PbQQckltIQ9WJ5UJQ0gfxcq2DpHtU4s/8hZaQ9Y ubse8XWJYzIC+SWo/8dh9Y18kROp5OGXsq3vrM9UVTmvP4WuznFgAe9XAuA6wvHBdBXjUN6FIN qMGblwMETbwe+0mA7PC6Cy9ZCplVR1NR0CSRKADw0I/ILDDjp2vCBBFKtEQseIOlGsVXGJFtXM Lj4= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031766" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:08 +0800 IronPort-SDR: iuOPaAPpBz8Wg9+m7s7qNbhRI/kS3rf6+18DXPMib20j/7lRQ/nwXnuRrIJyWPCwPm2M7wyU0B mLOMzC20t9JTfQ5FBz2TLK9xxF0Wvg6d/U6FPLmK97Z9vgeZ4aHP3GUXpzbt0QNrSXq3L5xjnE Xxug+ahdghskiDAaaTBD9i1PThnhJbuwLNB+hw2G46kjLKemT/LxMNwtqSt20f/MkhhBSoTuCJ AGpVEJY5PcTNny8wKo7b6yseqcEb9LtIHke2hxMUwwacqFPp3qtPal5Q5yvkLp230wQmBLEkYS 9LdVXaMkagIm3CVJ3SndTvCm Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:45 -0800 IronPort-SDR: DnRkELUoEJIxmruqWfNJOHWxBdrGPMVWD89dTWoBEN+E0cjmwRndAZ1S/Zw3iv0c6smsvEbTHq cLamUiXGSSxvQI3d9zo/+mAKlauTjKSdn+WwSJUp0MqaIfOCZqxwd2j9YswYhmdqtPSwZMj2kt jDDiVtsuLyHcYH94WewshGUxuOdkFXR6eAhc1/QusYktEYJ1x81NNKqrNfXD2PIGzg6OEcAjCt MRcM2lDP7gby9j3FXY+cn6N0t+HOtDg36AzecDdDpz31Nju6Wl3gy12uDZ2Kv6iiSHBTpFNSTi CI8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:57 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 13/15] btrfs-progs: mkfs: Zoned block device support Date: Wed, 4 Dec 2019 17:25:11 +0900 Message-Id: <20191204082513.857320-14-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch makes the size of the temporary system group chunk equal to the device zone size. It also enables PREP_DEVICE_HMZONED if the user enables the HMZONED feature. Enabling HMZONED feature is done using option "-O hmzoned". This feature is incompatible for now with source directory setup. Signed-off-by: Naohiro Aota --- extent-tree.c | 8 ++++++++ mkfs/common.c | 38 +++++++++++++++++++++++++------------- mkfs/common.h | 1 + mkfs/main.c | 50 +++++++++++++++++++++++++++++++++++++++++++------- 4 files changed, 77 insertions(+), 20 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index 89a8b935b602..23b8bf44f4fe 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -32,6 +32,7 @@ #include "free-space-tree.h" #include "common/hmzoned.h" #include "common/utils.h" +#include "mkfs/common.h" #define PENDING_EXTENT_INSERT 0 #define PENDING_EXTENT_DELETE 1 @@ -2799,6 +2800,13 @@ btrfs_add_block_group(struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type, cache->key.offset = size; ret = btrfs_load_block_group_zone_info(fs_info, cache); + if (ret == -ENOENT && + cache->key.objectid == fs_info->fs_devices->zone_size * 2) { + /* Write pointer for initial SYSTEM block group */ + cache->write_offset = cache->alloc_offset = + fs_info->nodesize * (MKFS_BLOCK_COUNT - 1); + ret = 0; + } BUG_ON(ret); cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; diff --git a/mkfs/common.c b/mkfs/common.c index 469b88d6a8d3..c7406a3bd230 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -25,6 +25,7 @@ #include "common/utils.h" #include "common/path-utils.h" #include "common/device-utils.h" +#include "common/hmzoned.h" #include "mkfs/common.h" static u64 reference_root_table[] = { @@ -155,6 +156,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) int skinny_metadata = !!(cfg->features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA); u64 num_bytes; + u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; + u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; + + if ((cfg->features & BTRFS_FEATURE_INCOMPAT_HMZONED)) { + system_group_offset = cfg->zone_size * 2; + system_group_size = cfg->zone_size; + } buf = malloc(sizeof(*buf) + max(cfg->sectorsize, cfg->nodesize)); if (!buf) @@ -186,7 +194,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) cfg->blocks[MKFS_SUPER_BLOCK] = BTRFS_SUPER_INFO_OFFSET; for (i = 1; i < MKFS_BLOCK_COUNT; i++) { - cfg->blocks[i] = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER + + cfg->blocks[i] = system_group_offset + cfg->nodesize * (i - 1); } @@ -204,7 +212,10 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_super_stripesize(&super, cfg->stripesize); btrfs_set_super_csum_type(&super, cfg->csum_type); btrfs_set_super_chunk_root_generation(&super, 1); - btrfs_set_super_cache_generation(&super, -1); + if (cfg->features & BTRFS_FEATURE_INCOMPAT_HMZONED) + btrfs_set_super_cache_generation(&super, 0); + else + btrfs_set_super_cache_generation(&super, -1); btrfs_set_super_incompat_flags(&super, cfg->features); if (cfg->label) __strncpy_null(super.label, cfg->label, BTRFS_LABEL_SIZE - 1); @@ -320,8 +331,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_device_id(buf, dev_item, 1); btrfs_set_device_generation(buf, dev_item, 0); btrfs_set_device_total_bytes(buf, dev_item, num_bytes); - btrfs_set_device_bytes_used(buf, dev_item, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_device_bytes_used(buf, dev_item, system_group_size); btrfs_set_device_io_align(buf, dev_item, cfg->sectorsize); btrfs_set_device_io_width(buf, dev_item, cfg->sectorsize); btrfs_set_device_sector_size(buf, dev_item, cfg->sectorsize); @@ -342,14 +352,14 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) /* then we have chunk 0 */ btrfs_set_disk_key_objectid(&disk_key, BTRFS_FIRST_CHUNK_TREE_OBJECTID); - btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + btrfs_set_disk_key_offset(&disk_key, system_group_offset); btrfs_set_disk_key_type(&disk_key, BTRFS_CHUNK_ITEM_KEY); btrfs_set_item_key(buf, &disk_key, nritems); btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); btrfs_set_item_size(buf, btrfs_item_nr(nritems), item_size); chunk = btrfs_item_ptr(buf, nritems, struct btrfs_chunk); - btrfs_set_chunk_length(buf, chunk, BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_chunk_length(buf, chunk, system_group_size); btrfs_set_chunk_owner(buf, chunk, BTRFS_EXTENT_TREE_OBJECTID); btrfs_set_chunk_stripe_len(buf, chunk, BTRFS_STRIPE_LEN); btrfs_set_chunk_type(buf, chunk, BTRFS_BLOCK_GROUP_SYSTEM); @@ -359,7 +369,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_chunk_num_stripes(buf, chunk, 1); btrfs_set_stripe_devid_nr(buf, chunk, 0, 1); btrfs_set_stripe_offset_nr(buf, chunk, 0, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + system_group_offset); nritems++; write_extent_buffer(buf, super.dev_item.uuid, @@ -398,7 +408,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) sizeof(struct btrfs_dev_extent); btrfs_set_disk_key_objectid(&disk_key, 1); - btrfs_set_disk_key_offset(&disk_key, BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + btrfs_set_disk_key_offset(&disk_key, system_group_offset); btrfs_set_disk_key_type(&disk_key, BTRFS_DEV_EXTENT_KEY); btrfs_set_item_key(buf, &disk_key, nritems); btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); @@ -410,14 +420,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) btrfs_set_dev_extent_chunk_objectid(buf, dev_extent, BTRFS_FIRST_CHUNK_TREE_OBJECTID); btrfs_set_dev_extent_chunk_offset(buf, dev_extent, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER); + system_group_offset); write_extent_buffer(buf, chunk_tree_uuid, (unsigned long)btrfs_dev_extent_chunk_tree_uuid(dev_extent), BTRFS_UUID_SIZE); - btrfs_set_dev_extent_length(buf, dev_extent, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); + btrfs_set_dev_extent_length(buf, dev_extent, system_group_size); nritems++; btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_DEV_TREE]); @@ -464,13 +473,16 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) buf->len = BTRFS_SUPER_INFO_SIZE; csum_tree_block_size(buf, btrfs_csum_type_size(cfg->csum_type), 0, cfg->csum_type); - ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE, - cfg->blocks[MKFS_SUPER_BLOCK]); + ret = sbwrite(fd, buf->data, cfg->blocks[MKFS_SUPER_BLOCK]); if (ret != BTRFS_SUPER_INFO_SIZE) { ret = (ret < 0 ? -errno : -EIO); goto out; } + ret = fsync(fd); + if (ret) + goto out; + ret = 0; out: diff --git a/mkfs/common.h b/mkfs/common.h index 1ca71a4fcce5..b7742dedbae1 100644 --- a/mkfs/common.h +++ b/mkfs/common.h @@ -55,6 +55,7 @@ struct btrfs_mkfs_config { u64 num_bytes; /* checksum algorithm to use */ enum btrfs_csum_type csum_type; + u64 zone_size; /* Output fields, set during creation */ diff --git a/mkfs/main.c b/mkfs/main.c index 14e9ae7aeb6d..0aa73cce728b 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -48,6 +48,7 @@ #include "crypto/crc32c.h" #include "common/fsfeatures.h" #include "common/box.h" +#include "common/hmzoned.h" static int verbose = 1; @@ -68,8 +69,16 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, u64 bytes_used; u64 chunk_start = 0; u64 chunk_size = 0; + u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; + u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; int ret; + if (fs_info->fs_devices->hmzoned) { + /* Two zones are reserved for superblock */ + system_group_offset = fs_info->fs_devices->zone_size * 2; + system_group_size = fs_info->fs_devices->zone_size; + } + if (mixed) flags |= BTRFS_BLOCK_GROUP_DATA; @@ -89,9 +98,8 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, */ ret = btrfs_make_block_group(trans, fs_info, bytes_used, BTRFS_BLOCK_GROUP_SYSTEM, - BTRFS_BLOCK_RESERVED_1M_FOR_SUPER, - BTRFS_MKFS_SYSTEM_GROUP_SIZE); - allocation->system += BTRFS_MKFS_SYSTEM_GROUP_SIZE; + system_group_offset, system_group_size); + allocation->system += system_group_size; if (ret) return ret; @@ -789,6 +797,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) int metadata_profile_opt = 0; int discard = 1; int ssd = 0; + int hmzoned = 0; int force_overwrite = 0; char *source_dir = NULL; bool source_dir_set = false; @@ -803,6 +812,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) struct mkfs_allocation allocation = { 0 }; struct btrfs_mkfs_config mkfs_cfg; enum btrfs_csum_type csum_type = BTRFS_CSUM_TYPE_CRC32; + u64 system_group_size; crc32c_optimization_init(); @@ -934,6 +944,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) if (dev_cnt == 0) print_usage(1); + hmzoned = features & BTRFS_FEATURE_INCOMPAT_HMZONED; + if (source_dir_set && dev_cnt > 1) { error("the option -r is limited to a single device"); goto error; @@ -943,6 +955,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv) goto error; } + if (source_dir_set && hmzoned) { + error("The -r and hmzoned feature are incompatible"); + exit(1); + } + if (*fs_uuid) { uuid_t dummy_uuid; @@ -974,6 +991,16 @@ int BOX_MAIN(mkfs)(int argc, char **argv) file = argv[optind++]; ssd = is_ssd(file); + if (hmzoned) { + if (zoned_model(file) == ZONED_NONE) { + error("%s: not a zoned block device", file); + exit(1); + } + if (!zone_size(file)) { + error("%s: zone size undefined", file); + exit(1); + } + } /* * Set default profiles according to number of added devices. @@ -1130,7 +1157,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) ret = btrfs_prepare_device(fd, file, &dev_block_count, block_count, (zero_end ? PREP_DEVICE_ZERO_END : 0) | (discard ? PREP_DEVICE_DISCARD : 0) | - (verbose ? PREP_DEVICE_VERBOSE : 0)); + (verbose ? PREP_DEVICE_VERBOSE : 0) | + (hmzoned ? PREP_DEVICE_HMZONED : 0)); if (ret) goto error; if (block_count && block_count > dev_block_count) { @@ -1141,9 +1169,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv) } /* To create the first block group and chunk 0 in make_btrfs */ - if (dev_block_count < BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + system_group_size = hmzoned ? + zone_size(file) : BTRFS_MKFS_SYSTEM_GROUP_SIZE; + if (dev_block_count < system_group_size) { error("device is too small to make filesystem, must be at least %llu", - (unsigned long long)BTRFS_MKFS_SYSTEM_GROUP_SIZE); + (unsigned long long)system_group_size); goto error; } @@ -1161,6 +1191,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv) mkfs_cfg.stripesize = stripesize; mkfs_cfg.features = features; mkfs_cfg.csum_type = csum_type; + mkfs_cfg.zone_size = zone_size(file); ret = make_btrfs(fd, &mkfs_cfg); if (ret) { @@ -1244,7 +1275,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv) block_count, (verbose ? PREP_DEVICE_VERBOSE : 0) | (zero_end ? PREP_DEVICE_ZERO_END : 0) | - (discard ? PREP_DEVICE_DISCARD : 0)); + (discard ? PREP_DEVICE_DISCARD : 0) | + (hmzoned ? PREP_DEVICE_HMZONED : 0)); if (ret) { goto error; } @@ -1341,6 +1373,10 @@ raid_groups: btrfs_group_profile_str(metadata_profile), pretty_size(allocation.system)); printf("SSD detected: %s\n", ssd ? "yes" : "no"); + printf("Zoned device: %s\n", hmzoned ? "yes" : "no"); + if (hmzoned) + printf("Zone size: %s\n", + pretty_size(fs_info->fs_devices->zone_size)); btrfs_parse_features_to_string(features_buf, features); printf("Incompat features: %s\n", features_buf); printf("Checksum: %s", From patchwork Wed Dec 4 08:25:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272453 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 52F1B138C for ; Wed, 4 Dec 2019 08:28:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 303622073B for ; Wed, 4 Dec 2019 08:28:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="aQfsB0nW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727463AbfLDI2B (ORCPT ); Wed, 4 Dec 2019 03:28:01 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727298AbfLDI2A (ORCPT ); Wed, 4 Dec 2019 03:28:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448091; x=1606984091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FW1ER6aGFXHA3B7ZDtB14wqSkBNa+wEWDdGk78w4Rt4=; b=aQfsB0nWbKILz8OpilOo6tbE9ZWnwREr/smA6nhv3oDvcPxXdvBOyf0t Y1iC8Dl7qGDeEz0tnFYdy3QvxIkJ4GOzR2qjz0pDrGk8NWI5j6yCPmG6l 9iKq5dazHBi8cju5iuqJECcYlkYJbpECFfSDB9Cx4MCBxyGSFVJf/cXTH 5FswHt6AmwqTgqjZHIGd/aIq6aZ5weh46DwlrszJcAzWwr/nZnKrdiHfI vWP56q9TOwEKOmv7NFzni+/kZ2Ao99v7fWt8u+8OgfU2LvQKagLyvlluk yT1Md5YSuVbFo0xL+oWvZd1ZL870wb+qYB0pQuFpL+BZE5yG4lURt1WuF w==; IronPort-SDR: 5fGnf4XZ+GUwVaGItfgngD+uQkagfD85rCqYvaFN6sTtFVXNnYZuqcE/rHs0ufXPIefW85uLoh qSBdN4nSJctNoH4rn90/gLC/DxoJmVbzmBLxqRD9sogyZkM/adapqYX0r4Impxq0XKDD9RLQ9/ 4LSNtQN05vdKAcFqwxIq1ZM8WDBx1waB5LZYYK2mKlu+qzIPr9e6DXTH5J7q4y2d+gIV2wO5KS dO8ycugjLC0tOJ6DkafK7l6Gx0/3vYFYgeEHA72ZF6MD9HO0gRpFUibeOQDwC/T8+bxAh0pc4O mCM= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031769" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:11 +0800 IronPort-SDR: YlzzFFCQ63+DoMKDPH6tBPRj+lDuL4ms3Gjx217jka5nMARSGRJdXJeKZ3ysRtW56ddlMv0N8O FfCuJtnr64MVcYeVNxT9BHbgy43CaNi9r2r+gLyiSRpJGQC2Bim7YEHKPTKK2qky/LOEIsV+Fe oJw/oqw34s2pjrn+rzs/mlc+zCNvDdST3xFZGS90Opg+WvUY7QH6cg5ZVYhQqkmtoLYxQ4Snh+ yQabOiqZYMwIaez5M+v47MkozDsCOu1Nrxy+m82mrAZ+lbtgmGRBndjg7TTsb2KiCaTn+3hpJ3 QCC2ijqyv5o1SWov1iDn6JVV Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:47 -0800 IronPort-SDR: ya80aueA3aW5HpvrFt4TiKmJzT7+bfimu4V6Gd6LQbVGzTTke1vcNaUJaJ6KBs9lZ21kVRAFTF 6HAk1D6dk36jptcu6f+aryctybTmRwJhyTesMfKZjXR7N859WdP3Q+w6xfiQCN/ETA/2v5AWa4 rjniW+oEgsSSteoMKekR3xhcOp3Wn5/CH4z7S37SyoMIl6DbXj8VZp7Vbay6nyg7KAz1Z6aH8D Qe8lqk/W9rFF0jtCEspJfe15xTi425tgsIHfcEqME3tl5UDIedSVKopfYE8F1TTln4rwpTkwsD ZI8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:27:59 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 14/15] btrfs-progs: device-add: support HMZONED device Date: Wed, 4 Dec 2019 17:25:12 +0900 Message-Id: <20191204082513.857320-15-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch check if the target file system is flagged as HMZONED. If it is, the device to be added is flagged PREP_DEVICE_HMZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/device.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/cmds/device.c b/cmds/device.c index 24158308a41b..f85820cb1cc0 100644 --- a/cmds/device.c +++ b/cmds/device.c @@ -36,6 +36,7 @@ #include "common/path-utils.h" #include "common/device-utils.h" #include "common/device-scan.h" +#include "common/hmzoned.h" #include "mkfs/common.h" static const char * const device_cmd_group_usage[] = { @@ -61,6 +62,9 @@ static int cmd_device_add(const struct cmd_struct *cmd, int discard = 1; int force = 0; int last_dev; + int res; + int hmzoned; + struct btrfs_ioctl_feature_flags feature_flags; optind = 0; while (1) { @@ -96,12 +100,35 @@ static int cmd_device_add(const struct cmd_struct *cmd, if (fdmnt < 0) return 1; + res = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags); + if (res) { + error("error getting feature flags '%s': %m", mntpnt); + return 1; + } + hmzoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_HMZONED; + for (i = optind; i < last_dev; i++){ struct btrfs_ioctl_vol_args ioctl_args; - int devfd, res; + int devfd; u64 dev_block_count = 0; char *path; + if (hmzoned && zoned_model(argv[i]) == ZONED_NONE) { + error( + "cannot add non-zoned device to HMZONED file system '%s'", + argv[i]); + ret++; + continue; + } + + if (!hmzoned && zoned_model(argv[i]) == ZONED_HOST_MANAGED) { + error( + "cannot add host managed zoned device to non-HMZONED file system '%s'", + argv[i]); + ret++; + continue; + } + res = test_dev_for_mkfs(argv[i], force); if (res) { ret++; @@ -117,7 +144,8 @@ static int cmd_device_add(const struct cmd_struct *cmd, res = btrfs_prepare_device(devfd, argv[i], &dev_block_count, 0, PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE | - (discard ? PREP_DEVICE_DISCARD : 0)); + (discard ? PREP_DEVICE_DISCARD : 0) | + (hmzoned ? PREP_DEVICE_HMZONED : 0)); close(devfd); if (res) { ret++; From patchwork Wed Dec 4 08:25:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 11272457 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E13B86C1 for ; Wed, 4 Dec 2019 08:28:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BF6802073B for ; Wed, 4 Dec 2019 08:28:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="pbutSrS1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727473AbfLDI2E (ORCPT ); Wed, 4 Dec 2019 03:28:04 -0500 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:1552 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727298AbfLDI2D (ORCPT ); Wed, 4 Dec 2019 03:28:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1575448095; x=1606984095; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dPCgUh84ddPIdXNQGUf0B99pZTrM0cDCTQ2s5y40qKs=; b=pbutSrS1hzPeBb7dTnngWxl1lFUSK5kb2FvRvJNq5a3hqCcBBJXiK5GO Ze5SW+9u1lDi+PVGeZu5PJwEB+LO4xq6gP2gTc24GJ+/Hd0WZSIK9Qilb AL+/d9qM9Ih9+bWTwCqezh/MXlcG+xAi/c41G0N9rr9Es3vl0zJ1rpOWg K8h9pxOlKHbbZdzpOWMrVnm7TvSYKI1hlRrpq9vvQ4edvlZXvfmiIkboT rJWo1MrgAb/OE71HwoESzimcY+1L1ACaGiJ1pnPk7oHkv/VZ75f06zPZy pvX2i6uWLltoz2jf+Q/PQ2my95G3Sxfxfch3HjVVpBj8RMq3Ivoh3mcld Q==; IronPort-SDR: EPu0EHhVdPQdM2umkbhQ8QgUIc/k6hUnBuXW14tS1HLgxSlbsMvPJN8aV7I30ck87vq47yWWtn nXgFxM1J1XEICVsfzgfk/PzJMC147VkLJ3h6tovEsr7LEY+njPyrvU9N7PHhS1iY80vyqzNtYY pNA+OO1PpjmUOGC8sB3tiX9GbCvOO6O0WlgcFP01zTENPjqIYrj+73GeBw//WObpuY2ufN0qro 4mOdLM7cnQtXsl7iJRTWC9B6gpeAvDhM3g4gjb9fs4d5dNoFe64UdHB1h+qA6VWNvQJAGF/xNb dRw= X-IronPort-AV: E=Sophos;i="5.69,276,1571673600"; d="scan'208";a="226031776" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 04 Dec 2019 16:28:14 +0800 IronPort-SDR: Axtp4PKWPpdf1QCL4YcCWTbOkUrpFjAlW60XU/xXmgvGHJpPMfUgiH+XYJB079ua4n6o3zjXjn kWttx3UCek22KQStcW3RkT7oEhj9MmFHALg5syAbCVuDvm+xjVKuDBF6Kh6N+nki1ChUG8sJ2d n2EhIkrQnv2z+4NuMitHtC/+BwsYTGHFmYlRnBrdU6+FioTTHAqvpVkl/HxE3GkgsZGWYl7PHZ +UusFuykHQIt9xYXl7EA/jOzej9Vyb63v7XUdZG3t4oTSB1tIjZ0jQF2gfonV9r9K/pUZz9+uD De+NY+3EFnfAsDllUs45Z4+3 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2019 00:22:50 -0800 IronPort-SDR: i0W0oAz1kA7oShAQyeGnOkj1W13v4Nde6kTeyooGzaf5UtaKap6tfq1a2vXk13v9XteZ3JqmKu RCgOVxnXiNA/TokgY7mWOq4ZiKtBhWWlNIwBohLXNd2ZX9avsoD2jVwgL6quagQ0x5p0+8suKm TJo31WWQjWZcDZbSfzNKQjmbzRx4yKdjzPrJ6N0JnN8rWQGL0XKxUHTFW3BCo88EUBOiOPWqE3 IidL1FN9hcuknzg3hOmiEEFIAtYdIMKwjCIDVo4ZOpbLBrkAclNINzA4HjqnLhplFoS5CrXoMa agU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com (HELO naota.fujisawa.hgst.com) ([10.149.53.115]) by uls-op-cesaip01.wdc.com with ESMTP; 04 Dec 2019 00:28:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, David Sterba Cc: Chris Mason , Josef Bacik , Nikolay Borisov , Damien Le Moal , Johannes Thumshirn , Hannes Reinecke , Anand Jain , linux-fsdevel@vger.kernel.org, Naohiro Aota Subject: [PATCH v5 15/15] btrfs-progs: introduce support for device replace HMZONED device Date: Wed, 4 Dec 2019 17:25:13 +0900 Message-Id: <20191204082513.857320-16-naohiro.aota@wdc.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204082513.857320-1-naohiro.aota@wdc.com> References: <20191204082513.857320-1-naohiro.aota@wdc.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch check if the target file system is flagged as HMZONED. If it is, the device to be added is flagged PREP_DEVICE_HMZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/replace.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/cmds/replace.c b/cmds/replace.c index 2321aa156fe2..670df68a93f7 100644 --- a/cmds/replace.c +++ b/cmds/replace.c @@ -119,6 +119,7 @@ static const char *const cmd_replace_start_usage[] = { static int cmd_replace_start(const struct cmd_struct *cmd, int argc, char **argv) { + struct btrfs_ioctl_feature_flags feature_flags; struct btrfs_ioctl_dev_replace_args start_args = {0}; struct btrfs_ioctl_dev_replace_args status_args = {0}; int ret; @@ -126,6 +127,7 @@ static int cmd_replace_start(const struct cmd_struct *cmd, int c; int fdmnt = -1; int fddstdev = -1; + int hmzoned; char *path; char *srcdev; char *dstdev = NULL; @@ -166,6 +168,13 @@ static int cmd_replace_start(const struct cmd_struct *cmd, if (fdmnt < 0) goto leave_with_error; + ret = ioctl(fdmnt, BTRFS_IOC_GET_FEATURES, &feature_flags); + if (ret) { + error("ioctl(GET_FEATURES) on '%s' returns error: %m", path); + goto leave_with_error; + } + hmzoned = feature_flags.incompat_flags & BTRFS_FEATURE_INCOMPAT_HMZONED; + /* check for possible errors before backgrounding */ status_args.cmd = BTRFS_IOCTL_DEV_REPLACE_CMD_STATUS; status_args.result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT; @@ -260,7 +269,8 @@ static int cmd_replace_start(const struct cmd_struct *cmd, strncpy((char *)start_args.start.tgtdev_name, dstdev, BTRFS_DEVICE_PATH_NAME_MAX); ret = btrfs_prepare_device(fddstdev, dstdev, &dstdev_block_count, 0, - PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE); + PREP_DEVICE_ZERO_END | PREP_DEVICE_VERBOSE | + (hmzoned ? PREP_DEVICE_HMZONED : 0)); if (ret) goto leave_with_error;