From patchwork Thu Aug 9 18:04:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561649 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 813E313B4 for ; Thu, 9 Aug 2018 18:07:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6CD942B7A6 for ; Thu, 9 Aug 2018 18:07:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 610792B7BC; Thu, 9 Aug 2018 18:07:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 067AE2B7A6 for ; Thu, 9 Aug 2018 18:07:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727058AbeHIUbu (ORCPT ); Thu, 9 Aug 2018 16:31:50 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:39836 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUbt (ORCPT ); Thu, 9 Aug 2018 16:31:49 -0400 Received: by mail-pf1-f194.google.com with SMTP id j8-v6so3214233pff.6; Thu, 09 Aug 2018 11:05:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=PvQdS+YZIe+wiJ5ieDWw5b4M533zcVegstwJ0CglGQE=; b=EFo8X/TgD3VahlIHAUsX5R/ySsxz4FgZTSJTKfbfPtQZGnrRqBe9WsUy/yLRV+Ee0O LpovQvvaUYyXWIC414eM4PiJbgtno7RPoX2I92juVOoxgbNIDez0UXf4vkBLNOuMyeen EqT3bxCGXhT4BmNSo3jXs5Ljbre6bvC6EIzdLw31EJS9ARzZeAV3W2k8KaMTCWxIjwd7 iDmia0T7L+npj3eLJq/NjBZY7eAlGfxpw36XEF+KmBWRkpXLww937FZTOSuzGPiS5Kvu UgrBHTOmWX8ZG5df9L7yLaEx77/gmWBzVEZHcXkDhDokZOZmdI3YDtamhxDbJytrfuSN B5gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=PvQdS+YZIe+wiJ5ieDWw5b4M533zcVegstwJ0CglGQE=; b=b2Orjiy/ZZl/mYj9qP6AkGq6Q5L5J2UKxBO2/ld8VLQ+/scmm/sV+wKQuSHKLsn+yQ IgjT4M5ONoxR6C8y/kDdZ6nKe9+cqwJFlJ9IFgLLx1fdhHliYAeq2CoIMFkAbGrXB4Gg izfDfgWWKAra4o1gOHXKKUF+PavFtINivTOzZiMShLMsZXQO3hRxQ2UUaBlLCAedQOgo ILrRXkTNoi6vjn6jVsCqKQpYV5ju1dY4qYz+h2UxHNxqYDXM3LouhyW9SbnQAZxDDb7e 44+IzFbqgiaQws80K6PkegnPLFEgJWRdvVG7pEdZ3pChBROs4BRPVxe9sgouuEvE/xju EVBw== X-Gm-Message-State: AOUpUlFdJc+eQYPa3uEu46X0y/YQ1exnxVfCYSWPv/z2aXOxXYV4kWEJ uq+ru5xgTCXOC/rR6gswlw0= X-Google-Smtp-Source: AA+uWPxuIBWQd+veUoDBmAVeOkXD3FTg3Kq/caxQCUmofXJrs5CwnuYX2LcrL9mZkcUlJiCBf5phVQ== X-Received: by 2002:a63:5421:: with SMTP id i33-v6mr3157893pgb.417.1533837950755; Thu, 09 Aug 2018 11:05:50 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id t1-v6sm7299480pgp.32.2018.08.09.11.05.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:05:50 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 01/17] btrfs: introduce HMZONED feature flag Date: Fri, 10 Aug 2018 03:04:34 +0900 Message-Id: <20180809180450.5091-2-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces the HMZONED incompat flag. The flag indicates that the volume management will satisfy the constraints imposed by host-managed zoned block devices. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/sysfs.c | 2 ++ include/uapi/linux/btrfs.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 3717c864ba23..8065d416fb38 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -191,6 +191,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(extended_iref, EXTENDED_IREF); BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56); BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA); BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES); +BTRFS_FEAT_ATTR_INCOMPAT(hmzoned, HMZONED); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); static struct attribute *btrfs_supported_feature_attrs[] = { @@ -204,6 +205,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(raid56), BTRFS_FEAT_ATTR_PTR(skinny_metadata), BTRFS_FEAT_ATTR_PTR(no_holes), + BTRFS_FEAT_ATTR_PTR(hmzoned), BTRFS_FEAT_ATTR_PTR(free_space_tree), NULL }; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 245aace2a400..c37b31a5b29d 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7) #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8) #define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9) +#define BTRFS_FEATURE_INCOMPAT_HMZONED (1ULL << 10) struct btrfs_ioctl_feature_flags { __u64 compat_flags; From patchwork Thu Aug 9 18:04:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561647 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9572A13BB for ; Thu, 9 Aug 2018 18:07:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EE622B6CF for ; Thu, 9 Aug 2018 18:07:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 712872B7A6; Thu, 9 Aug 2018 18:07:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C420E2B6CF for ; Thu, 9 Aug 2018 18:07:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727098AbeHIUbx (ORCPT ); Thu, 9 Aug 2018 16:31:53 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:37991 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUbw (ORCPT ); Thu, 9 Aug 2018 16:31:52 -0400 Received: by mail-pg1-f195.google.com with SMTP id k3-v6so3109318pgq.5; Thu, 09 Aug 2018 11:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=RxBZVjs9DKEE5urK9kJMhOdrn6rSAMZwDy+4LM1RfYU=; b=KyywPLruD/2o8acM8tiuuGm7rxGmX3tSiZVOwhibnOoa0/3PzunAUHByySH6b089NI 0dck4E+4s6cSubdq47QP2FsmWNDNwPlgJiXBA1lbFdlOw6SS/+CyzU9nPllybQLdwRnj fwOSu0fkBRifHe0fGcjALkPxyP+qqiEZWhW1TjIVN1XeHVPEvdMCOS80SqU0ufYc/Y1c ytRCtN7H3xTVAb+fKqmst9Fk/tTfQhJAa6Rjk9AwsvQG1JS60KMefeoDGuFGqriaHNUP K46puXV0CJfDqCNkLF9JMQBssBLmUcMd77BJpzXF0dpdUBCnn2WjOBOOLWzeQF0GNNhR RtoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=RxBZVjs9DKEE5urK9kJMhOdrn6rSAMZwDy+4LM1RfYU=; b=bEsfJlnUOe2g7w+RNa5NSuDzrW3fyrfjqvl57MwIGV3CZtMf7xGpFXxpXKBeWzdv7L 1LRf3EZ2Jx7LCCBpjzEJDN0sXMqqV4lWjtvyiPnVYcJMY2AXuxErNQtXoQCvkr5LXpEC Cr0TYbpKZC8ymiJgXXlzZvlSq9WyGV1qUQmdc3/3TSqR9yzdaytBt+JHXdJqQ8r9NDca ibkpSEK5P84MVV2uNB9sh5ftKNAhMIbS+3iIG8vFk8QiKqYqH1sDP+noBboEAjXXVVs7 Kyi9ODdxsAJWUAq6tjQih9ZxKDBakUDiKybjSk3WPskV3am6mdKCxs4Ri8U6LWL7d+BK YhPg== X-Gm-Message-State: AOUpUlH8sx9YBlERTjVLeBJw5bHydFFSr702qAY5xJiCVrcIdGxj5z8c /3/Bwp2P5MhtfxxSvI1W7Io= X-Google-Smtp-Source: AA+uWPyd8xIz5Y05wbBfVXI/xy3fpBaX5sCf59Ymvp6qcsHVhJkcfRypBg4LeXTSJ180eDWNnlEuHQ== X-Received: by 2002:a63:8341:: with SMTP id h62-v6mr3045228pge.298.1533837953131; Thu, 09 Aug 2018 11:05:53 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id k1-v6sm9503724pfi.62.2018.08.09.11.05.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:05:52 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 02/17] btrfs: Get zone information of zoned block devices Date: Fri, 10 Aug 2018 03:04:35 +0900 Message-Id: <20180809180450.5091-3-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If a zoned block device is found, get its zone information (number of zones and zone size) using the new helper function btrfs_get_dev_zone(). To avoid costly run-time zone reports commands to test the device zones type during block allocation, attach the seqzones bitmap to the device structure to indicate if a zone is sequential or accept random writes. This patch also introduces the helper function btrfs_dev_is_sequential() to test if the zone storing a block is a sequential write required zone. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 146 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 32 ++++++++++ 2 files changed, 178 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index da86706123ff..35b3a2187653 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -677,6 +677,134 @@ static void btrfs_free_stale_devices(const char *path, } } +static int __btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, + struct blk_zone **zones, + unsigned int *nr_zones, gfp_t gfp_mask) +{ + struct blk_zone *z = *zones; + int ret; + + if (!z) { + z = kcalloc(*nr_zones, sizeof(struct blk_zone), GFP_KERNEL); + if (!z) + return -ENOMEM; + } + + ret = blkdev_report_zones(device->bdev, pos >> 9, + z, nr_zones, gfp_mask); + if (ret != 0) { + pr_err("BTRFS: Get zone at %llu failed %d\n", + pos, ret); + return ret; + } + + *zones = z; + + return 0; +} + +static void btrfs_drop_dev_zonetypes(struct btrfs_device *device) +{ + kfree(device->seq_zones); + kfree(device->empty_zones); + device->seq_zones = NULL; + device->empty_zones = NULL; + device->nr_zones = 0; + device->zone_size = 0; + device->zone_size_shift = 0; +} + +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask) +{ + unsigned int nr_zones = 1; + int ret; + + ret = __btrfs_get_dev_zones(device, pos, &zone, &nr_zones, gfp_mask); + if (ret != 0 || !nr_zones) + return ret ? ret : -EIO; + + return 0; +} + +static int btrfs_get_dev_zonetypes(struct btrfs_device *device) +{ + struct block_device *bdev = device->bdev; + sector_t nr_sectors = bdev->bd_part->nr_sects; + sector_t sector = 0; + struct blk_zone *zones = NULL; + unsigned int i, n = 0, nr_zones; + int ret; + + device->zone_size = 0; + device->zone_size_shift = 0; + device->nr_zones = 0; + device->seq_zones = NULL; + device->empty_zones = NULL; + + if (!bdev_is_zoned(bdev)) + return 0; + + device->zone_size = (u64)bdev_zone_sectors(bdev) << 9; + device->zone_size_shift = ilog2(device->zone_size); + device->nr_zones = nr_sectors >> ilog2(bdev_zone_sectors(bdev)); + if (nr_sectors & (bdev_zone_sectors(bdev) - 1)) + device->nr_zones++; + + device->seq_zones = kcalloc(BITS_TO_LONGS(device->nr_zones), + sizeof(*device->seq_zones), GFP_KERNEL); + if (!device->seq_zones) + return -ENOMEM; + + device->empty_zones = kcalloc(BITS_TO_LONGS(device->nr_zones), + sizeof(*device->empty_zones), GFP_KERNEL); + if (!device->empty_zones) + return -ENOMEM; + +#define BTRFS_REPORT_NR_ZONES 4096 + + /* Get zones type */ + while (sector < nr_sectors) { + nr_zones = BTRFS_REPORT_NR_ZONES; + ret = __btrfs_get_dev_zones(device, sector << 9, + &zones, &nr_zones, GFP_KERNEL); + if (ret != 0 || !nr_zones) { + if (!ret) + ret = -EIO; + goto out; + } + + for (i = 0; i < nr_zones; i++) { + if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ) + set_bit(n, device->seq_zones); + if (zones[i].cond == BLK_ZONE_COND_EMPTY) + set_bit(n, device->empty_zones); + sector = zones[i].start + zones[i].len; + n++; + } + } + + if (n != device->nr_zones) { + pr_err("BTRFS: Inconsistent number of zones (%u / %u)\n", + n, device->nr_zones); + ret = -EIO; + goto out; + } + + pr_info("BTRFS: host-%s zoned block device, %u zones of %llu sectors\n", + bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", + device->nr_zones, device->zone_size >> 9); + +out: + kfree(zones); + + if (ret) + btrfs_drop_dev_zonetypes(device); + + return ret; +} + + static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *device, fmode_t flags, void *holder) @@ -726,6 +854,13 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; + /* Get zone type information of zoned block devices */ + if (bdev_is_zoned(bdev)) { + ret = btrfs_get_dev_zonetypes(device); + if (ret != 0) + goto error_brelse; + } + fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { @@ -1012,6 +1147,7 @@ static void btrfs_close_bdev(struct btrfs_device *device) } blkdev_put(device->bdev, device->mode); + btrfs_drop_dev_zonetypes(device); } static void btrfs_close_one_device(struct btrfs_device *device) @@ -2439,6 +2575,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path mutex_unlock(&fs_info->chunk_mutex); mutex_unlock(&fs_devices->device_list_mutex); + /* Get zone type information of zoned block devices */ + if (bdev_is_zoned(bdev)) { + ret = btrfs_get_dev_zonetypes(device); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } + } + if (seeding_dev) { mutex_lock(&fs_info->chunk_mutex); ret = init_first_rw_device(trans, fs_info); @@ -2504,6 +2649,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path return ret; error_sysfs: + btrfs_drop_dev_zonetypes(device); btrfs_sysfs_rm_device_link(fs_devices, device); mutex_lock(&fs_info->fs_devices->device_list_mutex); mutex_lock(&fs_info->chunk_mutex); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 23e9285d88de..13d59bff204f 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -61,6 +61,16 @@ struct btrfs_device { struct block_device *bdev; + /* + * Number of zones, zone size and types of zones if bdev is a + * zoned block device. + */ + u64 zone_size; + u8 zone_size_shift; + u32 nr_zones; + unsigned long *seq_zones; + unsigned long *empty_zones; + /* the mode sent to blkdev_get */ fmode_t mode; @@ -404,6 +414,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); +int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, + struct blk_zone *zone, gfp_t gfp_mask); struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, void *holder); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); @@ -466,6 +478,26 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, u64 chunk_offset, u64 chunk_size); int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset); +static inline int btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) +{ + unsigned int zno = pos >> device->zone_size_shift; + + if (!device->seq_zones) + return 1; + + return test_bit(zno, device->seq_zones); +} + +static inline int btrfs_dev_is_empty_zone(struct btrfs_device *device, u64 pos) +{ + unsigned int zno = pos >> device->zone_size_shift; + + if (!device->empty_zones) + return 0; + + return test_bit(zno, device->empty_zones); +} + static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) { From patchwork Thu Aug 9 18:04:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561645 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15CAD13BB for ; Thu, 9 Aug 2018 18:07:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3ADB2B6CF for ; Thu, 9 Aug 2018 18:07:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E83702B7A6; Thu, 9 Aug 2018 18:07:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 28D9B2B6CF for ; Thu, 9 Aug 2018 18:07:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727204AbeHIUbz (ORCPT ); Thu, 9 Aug 2018 16:31:55 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:35931 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUby (ORCPT ); Thu, 9 Aug 2018 16:31:54 -0400 Received: by mail-pg1-f193.google.com with SMTP id h12-v6so3115271pgs.3; Thu, 09 Aug 2018 11:05:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=xcCotWz9K0Ns7M/AxzovGJojVJo03cr+h2Au5S0VhgI=; b=ZkWBxTnqvU24gvn3otMvu5/sl5jEIVDefT1K27v0LDJPIS5LvikHp+UTzoaQB4e/id ojAym84JeXtt43kXOL3BdVaimlxiL7AaliNhWhIWjZ//s2lhqJiM1mrB56hGgkpRuyPt tE4W+AN0eq8pkdP9kya2KfMFFhUK4QW6FL++DPwabf9UX/4PmkmNyuWO0JQlqwQu6R8c XI7pPBNPSnE6L6/sMHV4aJeAgYsG1X90JMbgUMA1KJowTII8GhTZ26BkqJKcXks3ahzl aqoT6DMtc+ZyMN5vU3yP+41Ije8dS1sFs0hpozfeg5pSzOIr9qJ2zFAP8Hr+sRIFaJWl rIZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=xcCotWz9K0Ns7M/AxzovGJojVJo03cr+h2Au5S0VhgI=; b=tsfI/l8ZJxPmpMm72jIH8QiDBqQDCd+wTyh4FdR1snHXlUNI956uoAKzeHd+ks+S6f Iailf9xveAhM6mfA1FWsdv+vz0jRq0mV/IBZ0DOHOa9ri9WXgv3NbfsBssDzUe0wZHEJ fd7J7Xd+Sr5gq52yO39ajp0q3IVS1xXwguMJ/GViV6EEPpn2R6XN7K/wAbYr7KCKvWOB VD5Oqlqq6Kn00wajzZ8j1T0lIhnKBwUVI8xPuwPzbWB3kbpqQqopGpJqFfvhd/Cb6oql TSV3TZEj0tTsm2sXZsLqYmYqTFneFvh3NPDgpY8eDijpRSz4r3ucUf89z9LuyLq0Z3om KHbg== X-Gm-Message-State: AOUpUlHELHGXPTs50tYtH79qKN2/2ZRvrndsIc46+clxjWryzo1GrS23 Ydkj1yEHZbc/EZ651CI6hPg= X-Google-Smtp-Source: AA+uWPzAG1TJj+GrnFULopInQjAVBxamXTbTCokfFMDd0/lmhyQIobG91muKLzCBl8pPz5gBtVukVw== X-Received: by 2002:a63:614d:: with SMTP id v74-v6mr3118505pgb.328.1533837955541; Thu, 09 Aug 2018 11:05:55 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id l185-v6sm7788706pga.65.2018.08.09.11.05.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:05:54 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 03/17] btrfs: Check and enable HMZONED mode Date: Fri, 10 Aug 2018 03:04:36 +0900 Message-Id: <20180809180450.5091-4-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP HMZONED mode cannot be used together with the RAID5/6 profile. Introduce the function btrfs_check_hmzoned_mode() to check this. This function will also check if HMZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Additionally, as updates to the space cache are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache. This does not introduces any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/dev-replace.c | 7 ++++ fs/btrfs/disk-io.c | 7 ++++ fs/btrfs/super.c | 12 +++--- fs/btrfs/volumes.c | 87 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 6 files changed, 112 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 66f1d3895bca..14f880126532 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -763,6 +763,9 @@ struct btrfs_fs_info { struct btrfs_root *uuid_root; struct btrfs_root *free_space_root; + /* Zone size when in HMZONED mode */ + u64 zone_size; + /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index dec01970d8c5..839a35008fd8 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -202,6 +202,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, return PTR_ERR(bdev); } + if ((bdev_zoned_model(bdev) == BLK_ZONED_HM && + !btrfs_fs_incompat(fs_info, HMZONED)) || + (!bdev_is_zoned(bdev) && btrfs_fs_incompat(fs_info, HMZONED))) { + ret = -EINVAL; + goto error; + } + filemap_write_and_wait(bdev->bd_inode->i_mapping); devices = &fs_info->fs_devices->devices; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5124c15705ce..14f284382ba7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3057,6 +3057,13 @@ int open_ctree(struct super_block *sb, btrfs_free_extra_devids(fs_devices, 1); + ret = btrfs_check_hmzoned_mode(fs_info); + if (ret) { + btrfs_err(fs_info, "failed to init hmzoned mode: %d", + ret); + goto fail_block_groups; + } + ret = btrfs_sysfs_add_fsid(fs_devices, NULL); if (ret) { btrfs_err(fs_info, "failed to init sysfs fsid interface: %d", diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 5fdd95e3de05..cc812e459197 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -435,11 +435,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, bool saved_compress_force; int no_compress = 0; - cache_gen = btrfs_super_cache_generation(info->super_copy); - if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) - btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); - else if (cache_gen) - btrfs_set_opt(info->mount_opt, SPACE_CACHE); + if (!btrfs_fs_incompat(info, HMZONED)) { + cache_gen = btrfs_super_cache_generation(info->super_copy); + if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE)) + btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE); + else if (cache_gen) + btrfs_set_opt(info->mount_opt, SPACE_CACHE); + } /* * Even the options are empty, we still need to do extra check diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 35b3a2187653..ba7ebb80de4d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1293,6 +1293,80 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, return ret; } +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + u64 hmzoned_devices = 0; + u64 nr_devices = 0; + u64 zone_size = 0; + int incompat_hmzoned = btrfs_fs_incompat(fs_info, HMZONED); + int ret = 0; + + /* Count zoned devices */ + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (!device->bdev) + continue; + if (bdev_zoned_model(device->bdev) == BLK_ZONED_HM || + (bdev_zoned_model(device->bdev) == BLK_ZONED_HA && + incompat_hmzoned)) { + hmzoned_devices++; + if (!zone_size) { + zone_size = device->zone_size; + } else if (device->zone_size != zone_size) { + btrfs_err(fs_info, + "Zoned block devices must have equal zone sizes"); + ret = -EINVAL; + goto out; + } + } + nr_devices++; + } + + if (!hmzoned_devices && incompat_hmzoned) { + /* No zoned block device, disable HMZONED */ + btrfs_err(fs_info, "HMZONED enabled file system should have zoned devices"); + ret = -EINVAL; + goto out; + } + + fs_info->zone_size = zone_size; + + if (hmzoned_devices != nr_devices) { + btrfs_err(fs_info, + "zoned devices mixed with regular devices"); + ret = -EINVAL; + goto out; + } + + /* RAID56 is not allowed */ + if (btrfs_fs_incompat(fs_info, RAID56)) { + btrfs_err(fs_info, "HMZONED mode does not support RAID56"); + ret = -EINVAL; + goto out; + } + + /* + * SPACE CACHE writing is not cowed. Disable that to avoid + * write errors in sequential zones. + */ + if (btrfs_test_opt(fs_info, SPACE_CACHE)) { + btrfs_info(fs_info, + "disabling disk space caching with HMZONED mode"); + btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE); + } + + btrfs_set_and_info(fs_info, NOTREELOG, + "disabling tree log with HMZONED mode"); + + btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B", + fs_info->zone_size); + +out: + + return ret; +} + static void btrfs_release_disk_super(struct page *page) { kunmap(page); @@ -2471,6 +2545,13 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path if (IS_ERR(bdev)) return PTR_ERR(bdev); + if ((bdev_zoned_model(bdev) == BLK_ZONED_HM && + !btrfs_fs_incompat(fs_info, HMZONED)) || + (!bdev_is_zoned(bdev) && btrfs_fs_incompat(fs_info, HMZONED))) { + ret = -EINVAL; + goto error; + } + if (fs_devices->seeding) { seeding_dev = 1; down_write(&sb->s_umount); @@ -2584,6 +2665,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path } } + ret = btrfs_check_hmzoned_mode(fs_info); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto error_sysfs; + } + if (seeding_dev) { mutex_lock(&fs_info->chunk_mutex); ret = init_first_rw_device(trans, fs_info); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 13d59bff204f..58053d2e24aa 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -416,6 +416,7 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone, gfp_t gfp_mask); +int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info); struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, void *holder); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); From patchwork Thu Aug 9 18:04:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561643 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F14B13B4 for ; Thu, 9 Aug 2018 18:07:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A56C2B6CF for ; Thu, 9 Aug 2018 18:07:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C8862B7A6; Thu, 9 Aug 2018 18:07:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1BEF2B6CF for ; Thu, 9 Aug 2018 18:07:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727240AbeHIUb6 (ORCPT ); Thu, 9 Aug 2018 16:31:58 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:44915 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUb5 (ORCPT ); Thu, 9 Aug 2018 16:31:57 -0400 Received: by mail-pf1-f195.google.com with SMTP id k21-v6so3198410pff.11; Thu, 09 Aug 2018 11:05:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=Vragor325c5+mkfaiNN9c9VU2fiaVIcgfu01gZ3wtGE=; b=tlikYUqnqLXIS0Lu1xCQBWPm6kMPZpvmTEwdsavB5f4HIPzW8ojZw3kHq0+w7RkWJA hO2fAfdnBrdBgqfYOfjhqq4k4LL4Xou+JK5rUGklaC6hsoGIyg5pqg4698tgZrm/zgB0 FMjZ0pRsfpNxd0QAhyf6IkUUtRLouYU1w9NYqBIy9uQyNzqGXdmzz0evhTNOeBnbbY5Q tLBYrBwM+uDnP8B/Ba4M7NmOztR5Qc1ShcybQPy+Gm2B8O+STacQkAplpBDppeuJhnEd TVHrOXjzWbwXDxDqtRO6DijH2Ri9bEocpusuhzK7iga3Zo7Mb5DZgOGr5Zduei85Wkvg wStA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=Vragor325c5+mkfaiNN9c9VU2fiaVIcgfu01gZ3wtGE=; b=JXzqdK3qywX+IF6Rpc5BH7IaXArtW1BYPiLZURkxxbcgGiaT3CI8xfgkcMDttstdpq 3Z1LBURY7yzmhP12rpGMICivS0hrCPkO/L5Ug9qqDYmkwIFVhxJVXOduBrIU9Z9l7I78 jXg0jgntyA+dgJnwWgMtWm6r2emZ31ONSoGfxc1QVpctD6UfwCYMPBxkEOHF8mPTNSZY aGaJrNQL48SbZ0PWgyb6W+f0sbG9SDWjbWYdpfgN80hOr54d4DAUF5YdV+AznxM8ehbt i7CLh6B/DQsBtpkMF3VwFjmo1AeIsd5MD2T8vI40gtfzLFTEbdrcUfgFbO/ghDpsZ/pK Izuw== X-Gm-Message-State: AOUpUlFxkPOb9Y1K0j7+04LbZcIPmtaq7KB5RzvUg5nJIGHj56Gq3Gzs 04n3sJzFX1GyF229jEWw740= X-Google-Smtp-Source: AA+uWPzYZKRdXMUMuxDSXKi7uGDHmxq/C1675FqLvAgjwpq1e/Jj3mnK2RIgNPKd07Io1rA/Vshbnw== X-Received: by 2002:a62:5b85:: with SMTP id p127-v6mr3478638pfb.33.1533837957898; Thu, 09 Aug 2018 11:05:57 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id 143-v6sm12128663pfy.156.2018.08.09.11.05.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:05:57 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 04/17] btrfs: limit super block locations in HMZONED mode Date: Fri, 10 Aug 2018 03:04:37 +0900 Message-Id: <20180809180450.5091-5-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When in HMZONED mode, make sure that device super blocks are located in randomly writable zones of zoned block devices. That is, do not write super blocks in sequential write required zones of host-managed zoned block devices as update would not be possible. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 14f284382ba7..6a014632ca1e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3435,6 +3435,13 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) return latest; } +static int check_super_location(struct btrfs_device *device, u64 pos) +{ + /* any address is good on a regular (zone_size == 0) device */ + /* non-SEQUENTIAL WRITE REQUIRED zones are capable on a zoned device */ + return device->zone_size == 0 || !btrfs_dev_is_sequential(device, pos); +} + /* * Write superblock @sb to the @device. Do not wait for completion, all the * buffer heads we write are pinned. @@ -3464,6 +3471,8 @@ static int write_dev_supers(struct btrfs_device *device, if (bytenr + BTRFS_SUPER_INFO_SIZE >= device->commit_total_bytes) break; + if (!check_super_location(device, bytenr)) + continue; btrfs_set_super_bytenr(sb, bytenr); From patchwork Thu Aug 9 18:04:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D814213BB for ; Thu, 9 Aug 2018 18:07:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2D9F2B725 for ; Thu, 9 Aug 2018 18:07:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B72062B7B6; Thu, 9 Aug 2018 18:07:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 694692B725 for ; Thu, 9 Aug 2018 18:07:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727266AbeHIUcA (ORCPT ); Thu, 9 Aug 2018 16:32:00 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:43644 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUb7 (ORCPT ); Thu, 9 Aug 2018 16:31:59 -0400 Received: by mail-pg1-f193.google.com with SMTP id a14-v6so3096035pgv.10; Thu, 09 Aug 2018 11:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=th2DL5AhDXI5K/RU49B+KPIKtEk+BoJ2NCXW6+0Iepo=; b=iQJqXaeLIhka/DlpPwd2wv1DhmxZlQGpNNu5gBeW+CnKKKhIRTC1dF3sMe2g9W1sbD YgfphpEXrFq+BUAJBm57L/gYwnZJ9BoN0T1+bmIS9NosNKfmOB2JLQJPmgt79URYIFqX SsB2H+XhLxIDjGjgaH0vHVEQhWgFHNAhr47SPQ0FyFMgyj/HROjLTrG6BxfDYpj4n7tp q/rgeatM/EK6Ahh+dmjFhJLcFBIWNIHGmTv3Vly9yLcIV2ffoZfS8eECYljhNdY4PigB DLjlYei3n4GZeKSNfivGv+uipTugYHabMjoo/Ee2MlZhrJGLDhJVgptHllpeyuEMCF3E 0DTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=th2DL5AhDXI5K/RU49B+KPIKtEk+BoJ2NCXW6+0Iepo=; b=CW+AvIUgm9iuKSObCaU2JgJ1L0FyUU8gvtDbY2Dll0yEd2er6posT3omv63Ker4dMB xk324iSGIFAU+glNeTuGhuhUnHQqQFYWGXt9rSFPN5AjAoC6xnvoqI307u7L7CPHLPEd bNp9yG6DVAtVDBRMZZ5T47eNqBFVcsffkP7ze4NzhPrPIFWnj0GwZgO/IG1euMc3WyV1 klkb8Ksr6KmBqoexikkiiIsmSeYSCI6Maw2LnzDZd70xNwKRlzkB+GWFiLK7hb4WagS8 PogeYXd7e5QmZ4cTJyGi/0JPU1cIJ7cj749UzYWOOlaqq7DMfMDivYyIC61tRdB6nZTN 2ENw== X-Gm-Message-State: AOUpUlEx7plL8RT5FYGlUxvqsGIFA1rFPIdW38vFIbEANWxJmlMeHPbc fTkOE5rnBTGsIEZMs/VEjfk= X-Google-Smtp-Source: AA+uWPxsMzc2+KUPvoyDkxHaP6NzkIW+JjWOQt/zEC36V8ENrtqJyu1ean2G4p65xR5LN85E+Qw4nA== X-Received: by 2002:a63:8a41:: with SMTP id y62-v6mr3050315pgd.291.1533837960238; Thu, 09 Aug 2018 11:06:00 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id z184-v6sm10354622pgd.83.2018.08.09.11.05.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:05:59 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 05/17] btrfs: disable fallocate in HMZONED mode Date: Fri, 10 Aug 2018 03:04:38 +0900 Message-Id: <20180809180450.5091-6-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fallocate() is implemented by reserving actual extent instead of reservations. This can result in exposing the sequential write constraint of host-managed zoned block devices to the application, which would break the POSIX semantic for the fallocated file. To avoid this, report fallocate() as not supported when in HMZONED mode. Signed-off-by: Naohiro Aota --- fs/btrfs/file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 095f0bb86bb7..6f4546ccb57d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2993,6 +2993,10 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); cur_offset = alloc_start; + /* Do not allow fallocate in HMZONED mode */ + if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED)) + return -EOPNOTSUPP; + /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) From patchwork Thu Aug 9 18:04:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561639 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7891E13B4 for ; Thu, 9 Aug 2018 18:07:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 623B92B6CF for ; Thu, 9 Aug 2018 18:07:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 567A72B7A6; Thu, 9 Aug 2018 18:07:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED1F02B6CF for ; Thu, 9 Aug 2018 18:07:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727309AbeHIUcC (ORCPT ); Thu, 9 Aug 2018 16:32:02 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:38890 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUcB (ORCPT ); Thu, 9 Aug 2018 16:32:01 -0400 Received: by mail-pf1-f196.google.com with SMTP id x17-v6so3213653pfh.5; Thu, 09 Aug 2018 11:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=RFAEmNnOCZ/Kp9MrXLAGDUXGdn+QYEpwfWj/EYZS03E=; b=gPQIoQ4lIVOzADQ7cuVb38GRq3XQ+5Nc/eTdm73tvxZJfcYlpN7o5miXBIiYV9ORT2 gaApJ1K5rwvwvJ7hccpRkvs0ThWcFLBdXnvsGToK/9CrC/ixiCIxhLaqCk8hOhg16/mv QvGbtK4hH3XGQJkEvCCLEeEgoIrMFSwobVzeM2Jrv5vFw/su7j5k7sZbx9VckCoHtjWC CExZQS5S0sJ3J3HHRUxVC+d7kNRDACZ2WYBvZyFZdcsy9pEcfyt64Dvqq/vFeaMVS9/w +lBTxo2sNeRlkw/CXix7e3EUbXTY9TumftHuLr1aGDTMepERkBqRza4m0HqsVhVVFUyL o3Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=RFAEmNnOCZ/Kp9MrXLAGDUXGdn+QYEpwfWj/EYZS03E=; b=PrvXXmXo9IQoXiQjoCXZyy6HqejQxPVbrilnjywrgEdzH5fxKc7sw9bpkPgQvjjcB0 FvFdhYMvLgHuNrgPyd8kBn5tgTklFweokWjRQvrOWgPYHNQrcENgnCOIfjp+D1jnTEyq Mda4msKng6qcEvcdpgfmsVTlPBzKAXUGRc1lMvAMuKhtvW9IVvzwCyxXBxZotwodx0CB 7vgDO73ypnLLjRNjg8go5eV9DSpY836WLL1+x/jKhVJhYzJ/e1w0PXuny6IHd4pfYEjA /U2uliMN49n47h2Vd1nz20nEuPQ4+KBevfphwyDtHorLJ/zmvFA2BvkGmgCyyPa6MnE9 7B5Q== X-Gm-Message-State: AOUpUlHgdylyy49ISPLAtRXX30bhrwHbA+x+sKSDWYp/T63pF8G3TwKz hLfBVzSc5vNGSi/mVlE+pZQ= X-Google-Smtp-Source: AA+uWPytyB9LSwNkRVph7qMwaie9wNtbDldRK1eR2hKOUeDao4hr2ETfzuwvY6kDTbBLQ1KP8xIQYw== X-Received: by 2002:a62:ccd0:: with SMTP id j77-v6mr3437875pfk.22.1533837962594; Thu, 09 Aug 2018 11:06:02 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id n9-v6sm13285315pfg.21.2018.08.09.11.06.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:01 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 06/17] btrfs: disable direct IO in HMZONED mode Date: Fri, 10 Aug 2018 03:04:39 +0900 Message-Id: <20180809180450.5091-7-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Direct write I/Os can be directed at existing extents that have already been written. Such write requests are prohibited on host-managed zoned block devices. So disable direct IO support for a volume with HMZONED mode enabled. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 212fa71317d6..05f5e05ccf37 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8523,6 +8523,9 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, unsigned int blocksize_mask = fs_info->sectorsize - 1; ssize_t retval = -EINVAL; + if (btrfs_fs_incompat(fs_info, HMZONED)) + goto out; + if (offset & blocksize_mask) goto out; From patchwork Thu Aug 9 18:04:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561637 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 215B513B4 for ; Thu, 9 Aug 2018 18:07:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DA732B6CF for ; Thu, 9 Aug 2018 18:07:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 022F32B7A6; Thu, 9 Aug 2018 18:07:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92D082B6CF for ; Thu, 9 Aug 2018 18:07:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727333AbeHIUcE (ORCPT ); Thu, 9 Aug 2018 16:32:04 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:45242 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUcE (ORCPT ); Thu, 9 Aug 2018 16:32:04 -0400 Received: by mail-pl0-f66.google.com with SMTP id j8-v6so2862832pll.12; Thu, 09 Aug 2018 11:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=NrqZh18qx3DHoiPixKO4KdHKXt6/Ehm+nt6+jgONI1w=; b=IrhVm3gPnOLyn0d4y9klzRiHMVOKLpQGD9mQx71AtZXKn4VHxeMyBrMdvXWlcCVZQ6 o4giP3iamqlC+znVvN6xiB5aKPb33OI4+p47PzJYIrwz6c3RudrLPHQWFur9B5ND3LRo rVYhsTYQxoLlBDtQ7I1OpzeiIJTZnqZV6vZ3joTboMm/HfHuuAC34Po7/vcAuVkVBqbh ygTR0kxNMfC7IRoCD+oJxqCSi05QYuXI0NkQ6AYhDuQHV8DjuGwMW1QvopxYOGFwWmy4 TtS+KyauBxrcBGshQtVMuG3Hr6l3e3A/hxkZKCTm3+ShCidFXxqivtOkxE/BKyGlWdNw YbGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=NrqZh18qx3DHoiPixKO4KdHKXt6/Ehm+nt6+jgONI1w=; b=KzF3xTPLu1j0VSWuo7TPrSrpaM+fbaBf0BR5SiqQ/JdoOCvtD8/Zxpf5AHyx64fjq6 CKsvrHWbnkuFZX7+UhXDSn4CHR1vswhfVyAypOlrxl+7LjTX24OeX76fdhjP76y005I8 t7FvDE2yObN0lvlkmBHfpkMBwXxkN3pKiETW1+H5jkv7fGh8Wqav1GMgRTvr/c10/i1y OHfoj8zE9YhINw2YmeWZ12IVkXo3cgrpDi3Z80JXJ7A9E68xXrj+X6bpK7LTUUFETVF4 Y2rwwwUQcvASpPhe+ZkEcCp71A0OIU89eLDOjESZb0r6y1Onb+KnJdfUlH/ynY4G4osi dRRw== X-Gm-Message-State: AOUpUlEHKJtrX13m6YpHeoYOdeYuJYZYTLzmvhVXkFz60+wfJlQJQLKT OZ/4tc2PJ6sF6fAqiumy5ZE= X-Google-Smtp-Source: AA+uWPwxMW68laEP+5a8wKJUo1+Q48DkBgjnnCAzpNMWKuS35nC7mRPNjJZnupBHNxYFLGINthTBLA== X-Received: by 2002:a17:902:925:: with SMTP id 34-v6mr2947097plm.307.1533837965120; Thu, 09 Aug 2018 11:06:05 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id 75-v6sm15976310pfr.115.2018.08.09.11.06.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:04 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 07/17] btrfs: disable device replace in HMZONED mode Date: Fri, 10 Aug 2018 03:04:40 +0900 Message-Id: <20180809180450.5091-8-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP To enable device replacing feature in HMZONED mode, we should avoid write replicating from replace source device to replace target device so that we do not write to a location after the position of a zone's device write pointer. In addition, scrub process should be modified to dispatch the write I/Os sequentially and to fill holes between the extents. Finally, write pointers of the zones should be synchronized to match with RAID siblings in a RAID case. It needs more works to solve all these issues. So disable the device replace feature in HMZONED mode for now. Signed-off-by: Naohiro Aota --- fs/btrfs/dev-replace.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 839a35008fd8..cde61fb217db 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -416,6 +416,9 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, struct btrfs_device *tgt_device = NULL; struct btrfs_device *src_device = NULL; + if (btrfs_fs_incompat(fs_info, HMZONED)) + return -EOPNOTSUPP; + ret = btrfs_find_device_by_devspec(fs_info, srcdevid, srcdev_name, &src_device); if (ret) From patchwork Thu Aug 9 18:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DE7913BB for ; Thu, 9 Aug 2018 18:07:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A3332B6CF for ; Thu, 9 Aug 2018 18:07:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5EC4D2B7A6; Thu, 9 Aug 2018 18:07:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4D182B6CF for ; Thu, 9 Aug 2018 18:07:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727358AbeHIUcI (ORCPT ); Thu, 9 Aug 2018 16:32:08 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:36040 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbeHIUcH (ORCPT ); Thu, 9 Aug 2018 16:32:07 -0400 Received: by mail-pl0-f65.google.com with SMTP id e11-v6so2877314plb.3; Thu, 09 Aug 2018 11:06:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=rARAqk1ykm9dLRLpQ087kt8xNkxGu2tjsN++jua7gEc=; b=PUv1yRcr9a16xU8SRrYAwnyBY2Px5fil2rk56rw5TDgSz2H2wd4tiLBrcuixFy8/XM F+gWTjAW8i3iBB2ECBVwWrRLmhWeyd0N4TwYqt1ozlR+fjHl+X33TkjWjZaW0UiylT3N vU896YyPJMdHsTEaSqiBUwl/Z+Eo+dEXxqAG4b62CVfmSgb9X+dsNLwPLqznVvvscWTy B/rCMSZexg/Mu7mZ+6Gq9//zdAQPz/gbumIInZlvXkfYct8ML3EwGpFgDs2X03fxXMoe nvnYwGFvuFcCKC6Heg1Ec9CJQ45+EKQlYXmiHQB2cYKjjqk+1JHPa0BDtcy6urL1dGIp y0kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=rARAqk1ykm9dLRLpQ087kt8xNkxGu2tjsN++jua7gEc=; b=AF+9pDn/7Y21V9zWpXc5++RHhoa6jxZ2uOjQthSEDs853Vf9U4mCsGcJ6pr2vzDgcx ivWKfkjedKXF8HA7Is6AuR8GoYCWdCq/8xZkoNEkXBpNPGO4ahjY+OBeAHd5M0JKr7DK ELRaDjU3j1HoPut/K5CZpwwbrlijckfh9tiH2qGAJI6hcEEpHsXiZKtOcFgJBQ/XYkst Cjr6ah4McOpw2D0nG0epFCjxGXZ88G4uX82lB0oofJBPQbWivse73RSX9sem6YVAjgjI kruZ3g+Y0/0d5Rt5Mw31a2uQpwgHUJxNxaRG5M334sJOMh2c5OV3+4iuXTNFZQDdm1zJ wqJA== X-Gm-Message-State: AOUpUlGdjXl3tb3PauPzNjKsx6yOiaUKc/GWIRQZDBd7DDyp0P1WVfAC /tuqDPXT9s7uGUutMKexUD4KXUaX3BM= X-Google-Smtp-Source: AA+uWPwStJ318ffx+52ZVy7y6Xisx5t0aOhKRgMNCl4MCYg2Nf6qqmJEFXAoNKcohRrUE11TA91ARw== X-Received: by 2002:a17:902:6b89:: with SMTP id p9-v6mr2983866plk.272.1533837968062; Thu, 09 Aug 2018 11:06:08 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id s195-v6sm23604653pgs.76.2018.08.09.11.06.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:07 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 08/17] btrfs: align extent allocation to zone boundary Date: Fri, 10 Aug 2018 03:04:41 +0900 Message-Id: <20180809180450.5091-9-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In HMZONED mode, align the device extents to zone boundaries so that write I/Os can begin at the start of a zone, as mandated on host-managed zoned block devices. Also, check that a region allocation is always over empty zones. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 3 ++ fs/btrfs/volumes.c | 69 ++++++++++++++++++++++++++++++++++++++---- 2 files changed, 66 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f77226d8020a..fc3daf0e5b92 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9527,6 +9527,9 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr) min_free = div64_u64(min_free, dev_min); } + /* We cannot allocate size less than zone_size anyway */ + min_free = max_t(u64, min_free, fs_info->zone_size); + /* We need to do this so that we can look at pending chunks */ trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ba7ebb80de4d..ada13120c2cd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1521,6 +1521,31 @@ static int contains_pending_extent(struct btrfs_transaction *transaction, return ret; } +static u64 dev_zone_align(struct btrfs_device *device, u64 pos) +{ + if (device->zone_size) + return ALIGN(pos, device->zone_size); + return pos; +} + +static int is_empty_zone_region(struct btrfs_device *device, + u64 pos, u64 num_bytes) +{ + if (device->zone_size == 0) + return 1; + + WARN_ON(!IS_ALIGNED(pos, device->zone_size)); + WARN_ON(!IS_ALIGNED(num_bytes, device->zone_size)); + + while (num_bytes > 0) { + if (!btrfs_dev_is_empty_zone(device, pos)) + return 0; + pos += device->zone_size; + num_bytes -= device->zone_size; + } + + return 1; +} /* * find_free_dev_extent_start - find free space in the specified device @@ -1564,9 +1589,14 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, /* * We don't want to overwrite the superblock on the drive nor any area * used by the boot loader (grub for example), so we make sure to start - * at an offset of at least 1MB. + * at an offset of at least 1MB on a regular disk. For a zoned block + * device, skip the first zone of the device entirely. */ - search_start = max_t(u64, search_start, SZ_1M); + if (device->zone_size) + search_start = max_t(u64, dev_zone_align(device, search_start), + device->zone_size); + else + search_start = max_t(u64, search_start, SZ_1M); path = btrfs_alloc_path(); if (!path) @@ -1632,6 +1662,8 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, if (contains_pending_extent(transaction, device, &search_start, hole_size)) { + search_start = dev_zone_align(device, + search_start); if (key.offset >= search_start) { hole_size = key.offset - search_start; } else { @@ -1640,6 +1672,14 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, } } + if (!is_empty_zone_region(device, search_start, + num_bytes)) { + search_start = dev_zone_align(device, + search_start+1); + btrfs_release_path(path); + goto again; + } + if (hole_size > max_hole_size) { max_hole_start = search_start; max_hole_size = hole_size; @@ -1664,7 +1704,7 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, extent_end = key.offset + btrfs_dev_extent_length(l, dev_extent); if (extent_end > search_start) - search_start = extent_end; + search_start = dev_zone_align(device, extent_end); next: path->slots[0]++; cond_resched(); @@ -1680,6 +1720,14 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction, if (contains_pending_extent(transaction, device, &search_start, hole_size)) { + search_start = dev_zone_align(device, + search_start); + btrfs_release_path(path); + goto again; + } + + if (!is_empty_zone_region(device, search_start, num_bytes)) { + search_start = dev_zone_align(device, search_start+1); btrfs_release_path(path); goto again; } @@ -4832,6 +4880,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int i; int j; int index; + int hmzoned = btrfs_fs_incompat(info, HMZONED); BUG_ON(!alloc_profile_is_valid(type, 0)); @@ -4851,13 +4900,18 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, ncopies = btrfs_raid_array[index].ncopies; if (type & BTRFS_BLOCK_GROUP_DATA) { - max_stripe_size = SZ_1G; + if (hmzoned) + max_stripe_size = info->zone_size; + else + max_stripe_size = SZ_1G; max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE; if (!devs_max) devs_max = BTRFS_MAX_DEVS(info); } else if (type & BTRFS_BLOCK_GROUP_METADATA) { /* for larger filesystems, use larger metadata chunks */ - if (fs_devices->total_rw_bytes > 50ULL * SZ_1G) + if (hmzoned) + max_stripe_size = info->zone_size; + else if (fs_devices->total_rw_bytes > 50ULL * SZ_1G) max_stripe_size = SZ_1G; else max_stripe_size = SZ_256M; @@ -4865,7 +4919,10 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (!devs_max) devs_max = BTRFS_MAX_DEVS(info); } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { - max_stripe_size = SZ_32M; + if (hmzoned) + max_stripe_size = info->zone_size; + else + max_stripe_size = SZ_32M; max_chunk_size = 2 * max_stripe_size; if (!devs_max) devs_max = BTRFS_MAX_DEVS_SYS_CHUNK; From patchwork Thu Aug 9 18:04:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD6A713BB for ; Thu, 9 Aug 2018 18:07:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C7F322B725 for ; Thu, 9 Aug 2018 18:07:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC21C2B7B6; Thu, 9 Aug 2018 18:07:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2C4F2B725 for ; Thu, 9 Aug 2018 18:07:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727380AbeHIUcK (ORCPT ); Thu, 9 Aug 2018 16:32:10 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:34392 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726757AbeHIUcJ (ORCPT ); Thu, 9 Aug 2018 16:32:09 -0400 Received: by mail-pl0-f68.google.com with SMTP id f6-v6so2883315plo.1; Thu, 09 Aug 2018 11:06:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=kFQ8CVg+jQXVFN/XTX6jSyz9wX2VnSTn+Xrbjcgf3ec=; b=j//YYSy56m+bbKnBz9j6FgDaeAONG2OaouM62P4GStoYrwjqrR8gnnG1eIyXgXhTmH R8NcOwxDI4+Qk4S7uojISpS9rfI0GbsFaHS2NNEM8zUqyVLGCF/ZpDcl8Wc/yTvr2y19 a8ie6eOkQ8YwTwl1mCT8d4smFtuIfHcb061YTtgxXo22pi2YKWe1cdF3/68CzPBHDXrx P74wlxNlrWeyw1vzouRIDONcE3ZH3YRc4xGrg7TCE7AsbCxyADWq7EwfJS2DKJQU1mEh /sJBgRjEN4uC1O6y2ZbHPduA9wjQd+h/BsiR/0rTXFlAQU/4Z9AfXv/QMu0r/73Zd1yA I1fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=kFQ8CVg+jQXVFN/XTX6jSyz9wX2VnSTn+Xrbjcgf3ec=; b=hpAhkIXGadQg5nzgK//mpelqZ50yEax2bfqSRTa0uMLOqfwu3xc84AVmpdw/J/De2I KGrLeW0BSEBau9RKOFfbmdTMpafR+7a70yUwNy85wecRdir6uiM4cvUwV1bmF09cqCyO K01d3ducmmPgAEN3eX47wZjj+tIRPNl2PWNPY1jrgWO/rLICQG45TRbsl9NleBsB1omg 4vRv9ubnWzjhIK0OOj4GZpg9+2kuP0SsXl0uz5/1lRF1ov15Nl7O4LmK9HEshBj+Vfib 5bFr1GNTE/Wacm4SKrQc8VFUKVPH3BZ19NERRSvg8WXNgMxdZh7t6S0Upeqds6ti0n2Z B+ew== X-Gm-Message-State: AOUpUlHhAoLyC4au3BHDbEXL0qyL6x+6y5PAhMW0b75gEr1FXy3bGeEd 6JAWS7bO+UnqLVSiEDomzZY= X-Google-Smtp-Source: AA+uWPwIlzcvMjHYDkfc/80HMQWRd7D8jn/CeAlT7bH/XUNO6INsSK4yy/+rtLSaUuesn/lSNaG81g== X-Received: by 2002:a17:902:bf44:: with SMTP id u4-v6mr3022703pls.84.1533837970486; Thu, 09 Aug 2018 11:06:10 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id f186-v6sm9495549pgc.4.2018.08.09.11.06.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:09 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 09/17] btrfs: do sequential allocation on HMZONED drives Date: Fri, 10 Aug 2018 03:04:42 +0900 Message-Id: <20180809180450.5091-10-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not yet written. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 22 ++++ fs/btrfs/extent-tree.c | 231 ++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.c | 36 ++++++ fs/btrfs/free-space-cache.h | 10 ++ 4 files changed, 299 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 14f880126532..5060bcdcb72b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -562,6 +562,20 @@ struct btrfs_full_stripe_locks_tree { struct mutex lock; }; +/* Block group allocation types */ +enum btrfs_alloc_type { + + /* Regular first fit allocation */ + BTRFS_ALLOC_FIT = 0, + + /* + * Sequential allocation: this is for HMZONED mode and + * will result in ignoring free space before a block + * group allocation offset. + */ + BTRFS_ALLOC_SEQ = 1, +}; + struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; @@ -674,6 +688,14 @@ struct btrfs_block_group_cache { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with HMZONED mode enabled and if + * the block group resides on a sequential zone. + */ + enum btrfs_alloc_type alloc_type; + u64 alloc_offset; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fc3daf0e5b92..d4355b9b494e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7412,6 +7412,15 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, } have_block_group: + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + offset = btrfs_find_space_for_alloc_seq(block_group, + num_bytes, + &max_extent_size); + if (!offset) + goto loop; + goto checks; + } + cached = block_group_cache_done(block_group); if (unlikely(!cached)) { have_caching_bg = true; @@ -9847,11 +9856,223 @@ static void link_block_group(struct btrfs_block_group_cache *cache) } } +static int +btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->key.objectid; + u64 length = cache->key.offset; + u64 physical = 0; + int ret, alloc_type; + int i, j; + u64 *alloc_offsets = NULL; + +#define WP_MISSING_DEV ((u64)-1) + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "unaligned block group at %llu + %llu", + logical, length); + return -EIO; + } + + /* Get the chunk mapping */ + em_tree = &fs_info->mapping_tree.map_tree; + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + /* + * Get the zone type: if the group is mapped to a non-sequential zone, + * there is no need for the allocation offset (fit allocation is OK). + */ + alloc_type = -1; + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + int is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (alloc_type == -1) + alloc_type = is_sequential ? + BTRFS_ALLOC_SEQ : BTRFS_ALLOC_FIT; + + if ((is_sequential && alloc_type != BTRFS_ALLOC_SEQ) || + (!is_sequential && alloc_type == BTRFS_ALLOC_SEQ)) { + btrfs_err(fs_info, "found block group of mixed zone types"); + ret = -EIO; + goto out; + } + + if (!is_sequential) + continue; + + /* this zone will be used for allocation, so mark this + * zone non-empty + */ + clear_bit(physical >> device->zone_size_shift, + device->empty_zones); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + ret = btrfs_get_dev_zone(device, physical, &zone, GFP_NOFS); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "Offline/readonly zone %llu", + physical >> device->zone_size_shift); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = ((zone.wp - zone.start) << 9); + break; + } + } + + if (alloc_type == BTRFS_ALLOC_FIT) + goto out; + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + cache->alloc_offset = WP_MISSING_DEV; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) + continue; + if (cache->alloc_offset == WP_MISSING_DEV) + cache->alloc_offset = alloc_offsets[i]; + if (alloc_offsets[i] != cache->alloc_offset) { + btrfs_err(fs_info, "zones' write pointer mismatch"); + ret = -EIO; + goto out; + } + } + break; + case BTRFS_BLOCK_GROUP_RAID0: + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes; i++) { + if (alloc_offsets[i] == WP_MISSING_DEV) { + btrfs_err(fs_info, "cannot recover Write pointer"); + ret = -EIO; + goto out; + } + cache->alloc_offset += alloc_offsets[i]; + if (alloc_offsets[0] < alloc_offsets[i]) { + btrfs_err(fs_info, "zones' write pointer mismatch"); + ret = -EIO; + goto out; + } + } + break; + case BTRFS_BLOCK_GROUP_RAID10: + /* + * Pass1: check write pointer of RAID1 level: each pointer + * should be equal + */ + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i*map->sub_stripes; + u64 offset = WP_MISSING_DEV; + + for (j = 0; j < map->sub_stripes; j++) { + if (alloc_offsets[base+j] == WP_MISSING_DEV) + continue; + if (offset == WP_MISSING_DEV) + offset = alloc_offsets[base+j]; + if (alloc_offsets[base+j] != offset) { + btrfs_err(fs_info, "zones' write pointer mismatch"); + ret = -EIO; + goto out; + } + } + for (j = 0; j < map->sub_stripes; j++) + alloc_offsets[base+j] = offset; + } + + /* Pass2: check write pointer of RAID1 level */ + cache->alloc_offset = 0; + for (i = 0; i < map->num_stripes / map->sub_stripes; i++) { + int base = i*map->sub_stripes; + + if (alloc_offsets[base] == WP_MISSING_DEV) { + btrfs_err(fs_info, "cannot recover Write pointer"); + ret = -EIO; + goto out; + } + if (alloc_offsets[0] < alloc_offsets[base]) { + btrfs_err(fs_info, "zones' write pointer mismatch"); + ret = -EIO; + goto out; + } + cache->alloc_offset += alloc_offsets[base]; + } + break; + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* RAID5/6 is not supported yet */ + default: + btrfs_err(fs_info, "Unsupported profile %llu", + map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK); + ret = -EINVAL; + goto out; + } + +out: + cache->alloc_type = alloc_type; + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} + static struct btrfs_block_group_cache * btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, u64 start, u64 size) { struct btrfs_block_group_cache *cache; + int ret; cache = kzalloc(sizeof(*cache), GFP_NOFS); if (!cache) @@ -9885,6 +10106,16 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); btrfs_init_full_stripe_locks_tree(&cache->full_stripe_locks_root); + cache->alloc_type = BTRFS_ALLOC_FIT; + cache->alloc_offset = 0; + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + ret = btrfs_get_block_group_alloc_offset(cache); + if (ret) { + kfree(cache); + return NULL; + } + } return cache; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index c3888c113d81..b3ff9809d1e4 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2582,6 +2582,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 align_gap = 0; u64 align_gap_len = 0; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -2616,6 +2618,38 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, return ret; } +/* + * Simple allocator for sequential only block group. It only allows sequential + * allocation. No need to play with trees. + */ + +u64 btrfs_find_space_for_alloc_seq(struct btrfs_block_group_cache *block_group, + u64 bytes, u64 *max_extent_size) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->key.objectid; + u64 avail; + u64 ret = 0; + + /* Sanity check */ + if (block_group->alloc_type != BTRFS_ALLOC_SEQ) + return 0; + + spin_lock(&ctl->tree_lock); + avail = block_group->key.offset - block_group->alloc_offset; + if (avail < bytes) { + *max_extent_size = avail; + goto out; + } + + ret = start + block_group->alloc_offset; + block_group->alloc_offset += bytes; + ctl->free_space -= bytes; +out: + spin_unlock(&ctl->tree_lock); + return ret; +} + /* * given a cluster, put all of its extents back into the free space * cache. If a block group is passed, this function will only free @@ -2701,6 +2735,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group, struct rb_node *node; u64 ret = 0; + WARN_ON(block_group->alloc_type == BTRFS_ALLOC_SEQ); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index 794a444c3f73..79b4fa31bc8f 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -80,6 +80,14 @@ static inline int btrfs_add_free_space(struct btrfs_block_group_cache *block_group, u64 bytenr, u64 size) { + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + + spin_lock(&ctl->tree_lock); + ctl->free_space += size; + spin_unlock(&ctl->tree_lock); + return 0; + } return __btrfs_add_free_space(block_group->fs_info, block_group->free_space_ctl, bytenr, size); @@ -92,6 +100,8 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache u64 btrfs_find_space_for_alloc(struct btrfs_block_group_cache *block_group, u64 offset, u64 bytes, u64 empty_size, u64 *max_extent_size); +u64 btrfs_find_space_for_alloc_seq(struct btrfs_block_group_cache *block_group, + u64 bytes, u64 *max_extent_size); u64 btrfs_find_ino_for_alloc(struct btrfs_root *fs_root); void btrfs_dump_free_space(struct btrfs_block_group_cache *block_group, u64 bytes); From patchwork Thu Aug 9 18:04:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 068FB13B4 for ; Thu, 9 Aug 2018 18:07:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E67F12B725 for ; Thu, 9 Aug 2018 18:07:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA5E52B7B6; Thu, 9 Aug 2018 18:07:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7791E2B725 for ; Thu, 9 Aug 2018 18:07:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727411AbeHIUcN (ORCPT ); Thu, 9 Aug 2018 16:32:13 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:41198 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcM (ORCPT ); Thu, 9 Aug 2018 16:32:12 -0400 Received: by mail-pl0-f65.google.com with SMTP id w19-v6so2868932ply.8; Thu, 09 Aug 2018 11:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=FlGM7QyTXZZAbwLjW+5tUEcWAtOHF7nYbnZjSh1iwcA=; b=MKEU20A+wvmTbdsM+sl5l9zrk8GHdqI8G1ZAFWNoar3kYscss6EfWAm/g86R5suhSm +5RXmIbkRWAOaKpz6DgfWqUzY3tXQ6ZGQ6gf8kbIuldrqWMg+Ltf9Xcm7mBRW5SB82wS MBJqQRISrp8Qm9gLbuIBxnO1/85cYHMkf/mhiVoNw4sffnnXl5n/kA7Q5fowAqrEeopR +NikCHmJbAOZwA3EX0Dbj2glCUk3kj3nxeTMR6b/3diNSk523/q6PzLAiLulqG/6p1NZ wW6Z72sI6vqIr0gVHZl7wjde4D/H9+pkzrzVUx4cU45LiD5l9P4MbNFqt7pyMsgYQgUT yOEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=FlGM7QyTXZZAbwLjW+5tUEcWAtOHF7nYbnZjSh1iwcA=; b=oCuArz0tUE2iWZ6W5e4xYHZCh6fgXu2Ve+cUSOJtgAe3umr9a/qCBpyC74sktz22D5 Xbf6hl3s4MDtQhLzETLmE8KYbGXElguLo5nmq2Vus5O8+SAIHY7s8wnOzFdVxCu4W0eT jJrqF1ToMw4m2AciiN1JFjemfyFccQv5UGDv2/dRLmep4kYhUAf8YMW4DCy6P1AsrURu JCNhigcIXQ9VCtFSerCg0X2fM+NwdVl045K+BfQuNyM05LBIONH1eprQVmI5YYU0WeOM mwaYtCCM2HrhJTwvha5uoW/f4lo5YTu4mDAvOj1hEVfRuxDb60K2TtqMA/pDluuSufIZ G62A== X-Gm-Message-State: AOUpUlHPNYTljs/i2Z4PXzSbk3MxWGBpfqSSd4PEGKhdTLMQiRvr3ELY OG1w8x4PoETD1Wn3zI18VOw= X-Google-Smtp-Source: AA+uWPyavTl/w9ftTLqukREpWaa8XaQ9ouzTXDBrAZuVVBq3my7zDw3BAoNfl9+dLhV2AORquO9R3A== X-Received: by 2002:a17:902:28e4:: with SMTP id f91-v6mr3021796plb.146.1533837973116; Thu, 09 Aug 2018 11:06:13 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id i26-v6sm9377561pfo.107.2018.08.09.11.06.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:12 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 10/17] btrfs: split btrfs_map_bio() Date: Fri, 10 Aug 2018 03:04:43 +0900 Message-Id: <20180809180450.5091-11-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch splits btrfs_map_bio() into two functions so that the following patches can make use of the latter part of this function. The first part of btrfs_map_bio() maps bios to a btrfs_bio and the second part submits the mapped bios in btrfs_bio to the actual devices. By splitting the function, we can now reuse the latter part to send buffered btrfs_bio. Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 53 +++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ada13120c2cd..08d13da2553f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6435,17 +6435,44 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) } } +static void __btrfs_map_bio(struct btrfs_fs_info *fs_info, u64 logical, + struct btrfs_bio *bbio, int async_submit) +{ + struct btrfs_device *dev; + int dev_nr; + int total_devs; + struct bio *first_bio = bbio->orig_bio; + struct bio *bio = first_bio; + + total_devs = bbio->num_stripes; + for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { + dev = bbio->stripes[dev_nr].dev; + if (!dev || !dev->bdev || + (bio_op(first_bio) == REQ_OP_WRITE && + !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { + bbio_error(bbio, first_bio, logical); + continue; + } + + if (dev_nr < total_devs - 1) + bio = btrfs_bio_clone(first_bio); + else + bio = first_bio; + + submit_stripe_bio(bbio, bio, bbio->stripes[dev_nr].physical, + dev_nr, async_submit); + } + btrfs_bio_counter_dec(fs_info); +} + blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit) { - struct btrfs_device *dev; struct bio *first_bio = bio; u64 logical = (u64)bio->bi_iter.bi_sector << 9; u64 length = 0; u64 map_length; int ret; - int dev_nr; - int total_devs; struct btrfs_bio *bbio = NULL; length = bio->bi_iter.bi_size; @@ -6459,7 +6486,6 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, return errno_to_blk_status(ret); } - total_devs = bbio->num_stripes; bbio->orig_bio = first_bio; bbio->private = first_bio->bi_private; bbio->end_io = first_bio->bi_end_io; @@ -6489,24 +6515,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, BUG(); } - for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { - dev = bbio->stripes[dev_nr].dev; - if (!dev || !dev->bdev || - (bio_op(first_bio) == REQ_OP_WRITE && - !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { - bbio_error(bbio, first_bio, logical); - continue; - } - - if (dev_nr < total_devs - 1) - bio = btrfs_bio_clone(first_bio); - else - bio = first_bio; - - submit_stripe_bio(bbio, bio, bbio->stripes[dev_nr].physical, - dev_nr, async_submit); - } - btrfs_bio_counter_dec(fs_info); + __btrfs_map_bio(fs_info, logical, bbio, async_submit); return BLK_STS_OK; } From patchwork Thu Aug 9 18:04:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DF8613B4 for ; Thu, 9 Aug 2018 18:06:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 66E5B2B6CF for ; Thu, 9 Aug 2018 18:06:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5AC402B7B6; Thu, 9 Aug 2018 18:06:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 729992B6CF for ; Thu, 9 Aug 2018 18:06:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727477AbeHIUcP (ORCPT ); Thu, 9 Aug 2018 16:32:15 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:37317 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcO (ORCPT ); Thu, 9 Aug 2018 16:32:14 -0400 Received: by mail-pg1-f196.google.com with SMTP id n7-v6so3111652pgq.4; Thu, 09 Aug 2018 11:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=zw+Oaw/Tl2oNtEEEP6XycO0L6Z2wH1XyyckuMiebc7A=; b=Ajt9eHyvhCtHlMp9/wqCgdpmQ1ildhiCibU8l439eFS+komvQRYTz1cNy+lsoC6Bhh xHEU15Mx506NO07WAIVR7o6sXHXnvRw7gxo1gE6Ys5BbWlyQ/DQgpejwmHe/4XSdvUVE Lgu0X4QunuddHyc1H/wnnvFQF8w4vF/PbZlfM2wILvXbodmiKnbRqgSBV5mxDfpPWhkx GiUm5AV54ILxwpO2JzlIKQvJLSe/MVLTxJxjaarhK0NlpL1O6RBPmnd9lBpcHXK1KM8T P6lJBt+ee3Q+NgdHwdQOQNY988cV6Nh7BBqGdK6oHkDv0spfKAsRJcNNMpg7fEIwXGgx Eq1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=zw+Oaw/Tl2oNtEEEP6XycO0L6Z2wH1XyyckuMiebc7A=; b=Pmp7HpGNEGv9E6hOI84kxRaNYY59AXq+g1E7V/qseyAxh48drVp8oHE5EXlxpEOobZ zARmUL/HO2c+cDPvrS0wAvjXIHRf/8qloRHEgo2B31XvM/C3qRCva1fN2HyBCO3aAZeR 95qkgPrffxrlgAROKn9DdsQ9v8BDxCGpoDBOUxUqSSmx2TolMAYD69XAL9pvoxLw8QPV 4scO7b/FCjuwAIrmKwOfUyXG5T6Tj6Gav9SiLx0EAtGvk9tf0J92O/ewQv3JOjiVjyAd +UHmQYELQ1zmJ8qyMGt/HjNDUihDdeScMACVdMdNja6GCui8zPD/3FHreZh6fvLJuapq jFbg== X-Gm-Message-State: AOUpUlHVzf6erHZH+ni+I2Qza6zjti4oT7IaZ1lcJR8TaCvNRRWPlODM aFtWIiyGDQZMsooBg2Qs9pU= X-Google-Smtp-Source: AA+uWPx109VKCWOn2yfdWuD85NYGOAFWe2LDBbeTm+4kbVhz9NCTZoGU6OmPTHgwukJsSUvkct7Oxw== X-Received: by 2002:a63:8c0b:: with SMTP id m11-v6mr3138084pgd.372.1533837975469; Thu, 09 Aug 2018 11:06:15 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id f90-v6sm12943493pfh.168.2018.08.09.11.06.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:14 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 11/17] btrfs: introduce submit buffer Date: Fri, 10 Aug 2018 03:04:44 +0900 Message-Id: <20180809180450.5091-12-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sequential allocation is not enough to maintain sequential delivery of write IOs to the device. Various features (async compress, async checksum, ...) of btrfs affect ordering of the IOs. This patch introduce submit buffer to sort WRITE bios belonging to a block group and sort them out sequentially in increasing block address to achieve sequential write sequences with submit_stripe_bio(). Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 + fs/btrfs/extent-tree.c | 5 ++ fs/btrfs/volumes.c | 121 +++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 3 + 4 files changed, 128 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5060bcdcb72b..ebbbf46aa540 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -696,6 +696,9 @@ struct btrfs_block_group_cache { */ enum btrfs_alloc_type alloc_type; u64 alloc_offset; + spinlock_t submit_lock; + u64 submit_offset; + struct list_head submit_buffer; }; /* delayed seq elem */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d4355b9b494e..6b7b632b0791 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -105,6 +105,7 @@ void btrfs_put_block_group(struct btrfs_block_group_cache *cache) if (atomic_dec_and_test(&cache->count)) { WARN_ON(cache->pinned > 0); WARN_ON(cache->reserved > 0); + WARN_ON(!list_empty(&cache->submit_buffer)); /* * If not empty, someone is still holding mutex of @@ -10059,6 +10060,8 @@ btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) goto out; } + cache->submit_offset = logical + cache->alloc_offset; + out: cache->alloc_type = alloc_type; kfree(alloc_offsets); @@ -10095,6 +10098,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, atomic_set(&cache->count, 1); spin_lock_init(&cache->lock); + spin_lock_init(&cache->submit_lock); init_rwsem(&cache->data_rwsem); INIT_LIST_HEAD(&cache->list); INIT_LIST_HEAD(&cache->cluster_list); @@ -10102,6 +10106,7 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&cache->ro_list); INIT_LIST_HEAD(&cache->dirty_list); INIT_LIST_HEAD(&cache->io_list); + INIT_LIST_HEAD(&cache->submit_buffer); btrfs_init_free_space_ctl(cache); atomic_set(&cache->trimming, 0); mutex_init(&cache->free_space_lock); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 08d13da2553f..ca03b7136892 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -513,6 +513,8 @@ static noinline void run_scheduled_bios(struct btrfs_device *device) spin_unlock(&device->io_lock); while (pending) { + struct btrfs_bio *bbio; + struct completion *sent = NULL; rmb(); /* we want to work on both lists, but do more bios on the @@ -550,7 +552,12 @@ static noinline void run_scheduled_bios(struct btrfs_device *device) sync_pending = 0; } + bbio = cur->bi_private; + if (bbio) + sent = bbio->sent; btrfsic_submit_bio(cur); + if (sent) + complete(sent); num_run++; batch_run++; @@ -5542,6 +5549,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes) atomic_set(&bbio->error, 0); refcount_set(&bbio->refs, 1); + INIT_LIST_HEAD(&bbio->list); return bbio; } @@ -6351,7 +6359,7 @@ static void btrfs_end_bio(struct bio *bio) * the work struct is scheduled. */ static noinline void btrfs_schedule_bio(struct btrfs_device *device, - struct bio *bio) + struct bio *bio, int need_seqwrite) { struct btrfs_fs_info *fs_info = device->fs_info; int should_queue = 1; @@ -6365,7 +6373,12 @@ static noinline void btrfs_schedule_bio(struct btrfs_device *device, /* don't bother with additional async steps for reads, right now */ if (bio_op(bio) == REQ_OP_READ) { + struct btrfs_bio *bbio = bio->bi_private; + struct completion *sent = bbio->sent; + btrfsic_submit_bio(bio); + if (sent) + complete(sent); return; } @@ -6373,7 +6386,7 @@ static noinline void btrfs_schedule_bio(struct btrfs_device *device, bio->bi_next = NULL; spin_lock(&device->io_lock); - if (op_is_sync(bio->bi_opf)) + if (op_is_sync(bio->bi_opf) && need_seqwrite == 0) pending_bios = &device->pending_sync_bios; else pending_bios = &device->pending_bios; @@ -6412,8 +6425,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_bio_counter_inc_noblocked(fs_info); + /* queue all bios into scheduler if sequential write is required */ + if (bbio->need_seqwrite) { + if (!async) { + DECLARE_COMPLETION_ONSTACK(sent); + + bbio->sent = &sent; + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); + wait_for_completion_io(&sent); + } else { + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); + } + return; + } if (async) - btrfs_schedule_bio(dev, bio); + btrfs_schedule_bio(dev, bio, bbio->need_seqwrite); else btrfsic_submit_bio(bio); } @@ -6465,6 +6491,90 @@ static void __btrfs_map_bio(struct btrfs_fs_info *fs_info, u64 logical, btrfs_bio_counter_dec(fs_info); } +static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, + struct btrfs_bio *bbio, int async_submit) +{ + u64 length = bbio->orig_bio->bi_iter.bi_size; + struct btrfs_block_group_cache *cache = NULL; + int sent; + LIST_HEAD(submit_list); + + WARN_ON(bio_op(bbio->orig_bio) != REQ_OP_WRITE); + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache || cache->alloc_type != BTRFS_ALLOC_SEQ) { + if (cache) + btrfs_put_block_group(cache); + __btrfs_map_bio(fs_info, logical, bbio, async_submit); + return; + } + + bbio->need_seqwrite = 1; + + spin_lock(&cache->submit_lock); + if (cache->submit_offset == logical) + goto send_bios; + + if (cache->submit_offset > logical) { + btrfs_info(fs_info, "sending unaligned bio... %llu+%llu %llu\n", + logical, length, cache->submit_offset); + spin_unlock(&cache->submit_lock); + WARN_ON(1); + btrfs_put_block_group(cache); + __btrfs_map_bio(fs_info, logical, bbio, async_submit); + return; + } + + /* buffer the unaligned bio */ + list_add_tail(&bbio->list, &cache->submit_buffer); + spin_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + + return; + +send_bios: + spin_unlock(&cache->submit_lock); + /* send this bio */ + __btrfs_map_bio(fs_info, logical, bbio, 1); + +loop: + /* and send previously buffered following bios */ + spin_lock(&cache->submit_lock); + cache->submit_offset += length; + length = 0; + INIT_LIST_HEAD(&submit_list); + + /* collect sequential bios into submit_list */ + do { + struct btrfs_bio *next; + + sent = 0; + list_for_each_entry_safe(bbio, next, + &cache->submit_buffer, list) { + struct bio *orig_bio = bbio->orig_bio; + u64 logical = (u64)orig_bio->bi_iter.bi_sector << 9; + + if (logical == cache->submit_offset + length) { + sent = 1; + length += orig_bio->bi_iter.bi_size; + list_move_tail(&bbio->list, &submit_list); + } + } + } while (sent); + spin_unlock(&cache->submit_lock); + + /* send the collected bios */ + list_for_each_entry(bbio, &submit_list, list) { + __btrfs_map_bio(bbio->fs_info, + (u64)bbio->orig_bio->bi_iter.bi_sector << 9, + bbio, 1); + } + + if (length) + goto loop; + btrfs_put_block_group(cache); +} + blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit) { @@ -6515,7 +6625,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, BUG(); } - __btrfs_map_bio(fs_info, logical, bbio, async_submit); + if (btrfs_fs_incompat(fs_info, HMZONED) && bio_op(bio) == REQ_OP_WRITE) + __btrfs_map_bio_zoned(fs_info, logical, bbio, async_submit); + else + __btrfs_map_bio(fs_info, logical, bbio, async_submit); return BLK_STS_OK; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 58053d2e24aa..3db90f5395cd 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -317,6 +317,9 @@ struct btrfs_bio { int mirror_num; int num_tgtdevs; int *tgtdev_map; + int need_seqwrite; + struct list_head list; + struct completion *sent; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, From patchwork Thu Aug 9 18:04:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4FF113BB for ; Thu, 9 Aug 2018 18:06:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9050D2B6CF for ; Thu, 9 Aug 2018 18:06:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 842152B7B6; Thu, 9 Aug 2018 18:06:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9B3D2B6CF for ; Thu, 9 Aug 2018 18:06:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727499AbeHIUcR (ORCPT ); Thu, 9 Aug 2018 16:32:17 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:38314 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcR (ORCPT ); Thu, 9 Aug 2018 16:32:17 -0400 Received: by mail-pl0-f66.google.com with SMTP id u11-v6so2875284plq.5; Thu, 09 Aug 2018 11:06:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=s25zPDFTarFvuJcu4A6FfGkARDGEj3jptP+cE2KN7Qc=; b=PvWqCkn/Jy934v2q55eULxVkKJ0YpmVuzQqaYx9IgN6DZnzdn1hqEW0uXTqdlx3lOj IY7ntGcwoI4aAAa4LkaaUROlOuvJpRJmQ2gqFZmJCtujPHx3XBoErfHSosOGnz0I8hjx vOLTE7joCz82g5rdlHU3B5emim5mEYdySYpni+Aa8V3mvavJk79jwPLemtjeOoWyHEyR KolHHy3myL+JeqULwV/UvWhekpStjL2AVGrsE46YkuFeMaKdKSBHSxZRjUGTeRNzU96c 28usYRa0Qrj637VaAhdl9Hr8t0wWG2nhEw8B1jpMBxfOcp/LQ6SO5EcgvDXU37JmK8Tt EJAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=s25zPDFTarFvuJcu4A6FfGkARDGEj3jptP+cE2KN7Qc=; b=Ok6zqqLEUdYSOFruCqOegrblmJh3KChIsFlUSFw7jtmSNAsF54fvPzt8nZf6qRJjzp EknG0q8lYNmbMniekh0ytpcfPuvPZoBbvrJPZllZiKREBBtu2u/MCMz+PYRXQs6b8YMG 8n2LogmKoBJFYXIPLCVy5VPTKF0egibVf7IEcIMMPJTMbKcNS2U8rMrhiVCemHyTfSYf WUKJ6b2PrRvp/ZOGoOhlLO/YTLD8wUW797abEI/XL0vKP+uVQJH+Pv7cCE9fmPfpqXvP KNUeYaNyO/Lg4XjQGQ0AXCV19r5CNKc0476aqC7kM8cPEOZ/+3fYBMog02+Uu+OVbT8q 0vbA== X-Gm-Message-State: AOUpUlFZ75Q4bwapsD9UFOVDmIrWQ7U6lHpnne6fLMueteg6aXcl/x6S L/8AKSy7ucZ7Hn0VCUP66UKGiYMWIe8= X-Google-Smtp-Source: AA+uWPxgyG5WsxmVOjxx2hFv2tnRC+ULgZJ7nFaMhH7zKCywU3bghUkMTZuypxgceEL43bBK2c1dDA== X-Received: by 2002:a17:902:e10d:: with SMTP id cc13-v6mr2968378plb.96.1533837977882; Thu, 09 Aug 2018 11:06:17 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id r64-v6sm13943063pfk.157.2018.08.09.11.06.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:17 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 12/17] btrfs: expire submit buffer on timeout Date: Fri, 10 Aug 2018 03:04:45 +0900 Message-Id: <20180809180450.5091-13-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It is possible to have bios stalled in the submit buffer due to some bug or device problem. In such such situation, btrfs stops working waiting for buffered bios completions. To avoid such hang, add a worker that will cancel the stalled bios after an expiration time out. Signed-off-by: Naohiro Aota --- fs/btrfs/async-thread.c | 1 + fs/btrfs/async-thread.h | 1 + fs/btrfs/ctree.h | 5 +++ fs/btrfs/disk-io.c | 7 +++- fs/btrfs/extent-tree.c | 20 ++++++++++ fs/btrfs/super.c | 20 ++++++++++ fs/btrfs/volumes.c | 83 ++++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.h | 1 + 8 files changed, 136 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index d522494698fa..86735dfbabcc 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -109,6 +109,7 @@ BTRFS_WORK_HELPER(scrub_helper); BTRFS_WORK_HELPER(scrubwrc_helper); BTRFS_WORK_HELPER(scrubnc_helper); BTRFS_WORK_HELPER(scrubparity_helper); +BTRFS_WORK_HELPER(bio_expire_helper); static struct __btrfs_workqueue * __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name, diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h index 7861c9feba5f..2c041f0668d4 100644 --- a/fs/btrfs/async-thread.h +++ b/fs/btrfs/async-thread.h @@ -54,6 +54,7 @@ BTRFS_WORK_HELPER_PROTO(scrub_helper); BTRFS_WORK_HELPER_PROTO(scrubwrc_helper); BTRFS_WORK_HELPER_PROTO(scrubnc_helper); BTRFS_WORK_HELPER_PROTO(scrubparity_helper); +BTRFS_WORK_HELPER_PROTO(bio_expire_helper); struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ebbbf46aa540..8f85c96cd262 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -699,6 +699,10 @@ struct btrfs_block_group_cache { spinlock_t submit_lock; u64 submit_offset; struct list_head submit_buffer; + struct btrfs_work work; + unsigned long last_submit; + int expired:1; + struct task_struct *expire_thread; }; /* delayed seq elem */ @@ -974,6 +978,7 @@ struct btrfs_fs_info { struct btrfs_workqueue *submit_workers; struct btrfs_workqueue *caching_workers; struct btrfs_workqueue *readahead_workers; + struct btrfs_workqueue *bio_expire_workers; /* * fixup workers take dirty pages that didn't properly go through diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6a014632ca1e..00fa6aca9bb5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2040,6 +2040,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) */ btrfs_destroy_workqueue(fs_info->endio_meta_workers); btrfs_destroy_workqueue(fs_info->endio_meta_write_workers); + btrfs_destroy_workqueue(fs_info->bio_expire_workers); } static void free_root_extent_buffers(struct btrfs_root *root) @@ -2245,6 +2246,9 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, btrfs_alloc_workqueue(fs_info, "extent-refs", flags, min_t(u64, fs_devices->num_devices, max_active), 8); + fs_info->bio_expire_workers = + btrfs_alloc_workqueue(fs_info, "bio-expire", flags, + max_active, 0); if (!(fs_info->workers && fs_info->delalloc_workers && fs_info->submit_workers && fs_info->flush_workers && @@ -2256,7 +2260,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers && fs_info->readahead_workers && fs_info->fixup_workers && fs_info->delayed_workers && fs_info->extent_workers && - fs_info->qgroup_rescan_workers)) { + fs_info->qgroup_rescan_workers && + fs_info->bio_expire_workers)) { return -ENOMEM; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6b7b632b0791..a5f5935315c8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9745,6 +9745,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) block_group->cached == BTRFS_CACHE_ERROR) free_excluded_extents(block_group); + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + spin_lock(&block_group->submit_lock); + if (block_group->expire_thread) + wake_up_process(block_group->expire_thread); + spin_unlock(&block_group->submit_lock); + flush_work(&block_group->work.normal_work); + } + btrfs_remove_free_space_cache(block_group); ASSERT(block_group->cached != BTRFS_CACHE_STARTED); ASSERT(list_empty(&block_group->dirty_list)); @@ -10061,6 +10069,10 @@ btrfs_get_block_group_alloc_offset(struct btrfs_block_group_cache *cache) } cache->submit_offset = logical + cache->alloc_offset; + btrfs_init_work(&cache->work, btrfs_bio_expire_helper, + expire_bios_fn, NULL, NULL); + cache->last_submit = 0; + cache->expired = 0; out: cache->alloc_type = alloc_type; @@ -10847,6 +10859,14 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) } spin_unlock(&fs_info->unused_bgs_lock); + if (block_group->alloc_type == BTRFS_ALLOC_SEQ) { + spin_lock(&block_group->submit_lock); + if (block_group->expire_thread) + wake_up_process(block_group->expire_thread); + spin_unlock(&block_group->submit_lock); + flush_work(&block_group->work.normal_work); + } + mutex_lock(&fs_info->delete_unused_bgs_mutex); /* Don't want to race with allocators so take the groups_sem */ diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index cc812e459197..4d1d6cc7cd59 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -154,6 +154,25 @@ void __btrfs_handle_fs_error(struct btrfs_fs_info *fs_info, const char *function * completes. The next time when the filesystem is mounted writeable * again, the device replace operation continues. */ + + /* expire pending bios in submit buffer */ + if (btrfs_fs_incompat(fs_info, HMZONED)) { + struct btrfs_block_group_cache *block_group; + struct rb_node *node; + + spin_lock(&fs_info->block_group_cache_lock); + for (node = rb_first(&fs_info->block_group_cache_tree); node; + node = rb_next(node)) { + block_group = rb_entry(node, + struct btrfs_block_group_cache, + cache_node); + spin_lock(&block_group->submit_lock); + if (block_group->expire_thread) + wake_up_process(block_group->expire_thread); + spin_unlock(&block_group->submit_lock); + } + spin_unlock(&fs_info->block_group_cache_lock); + } } #ifdef CONFIG_PRINTK @@ -1730,6 +1749,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->readahead_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->scrub_wr_completion_workers, new_pool_size); + btrfs_workqueue_set_max(fs_info->bio_expire_workers, new_pool_size); } static inline void btrfs_remount_prepare(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ca03b7136892..0e68003a429d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6498,6 +6498,7 @@ static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, struct btrfs_block_group_cache *cache = NULL; int sent; LIST_HEAD(submit_list); + int should_queue = 1; WARN_ON(bio_op(bbio->orig_bio) != REQ_OP_WRITE); @@ -6512,7 +6513,21 @@ static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, bbio->need_seqwrite = 1; spin_lock(&cache->submit_lock); - if (cache->submit_offset == logical) + + if (cache->expired) { + int i, total_devs = bbio->num_stripes; + + spin_unlock(&cache->submit_lock); + btrfs_err(cache->fs_info, + "IO in expired block group %llu+%llu", + logical, length); + for (i = 0; i < total_devs; i++) + bbio_error(bbio, bbio->orig_bio, logical); + btrfs_put_block_group(cache); + return; + } + + if (cache->submit_offset == logical || cache->expired) goto send_bios; if (cache->submit_offset > logical) { @@ -6527,7 +6542,11 @@ static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, /* buffer the unaligned bio */ list_add_tail(&bbio->list, &cache->submit_buffer); + should_queue = !cache->last_submit; + cache->last_submit = jiffies; spin_unlock(&cache->submit_lock); + if (should_queue) + btrfs_queue_work(fs_info->bio_expire_workers, &cache->work); btrfs_put_block_group(cache); return; @@ -6561,6 +6580,14 @@ static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, } } } while (sent); + + if (list_empty(&cache->submit_buffer)) { + should_queue = 0; + cache->last_submit = 0; + } else { + should_queue = !cache->last_submit; + cache->last_submit = jiffies; + } spin_unlock(&cache->submit_lock); /* send the collected bios */ @@ -6572,6 +6599,8 @@ static void __btrfs_map_bio_zoned(struct btrfs_fs_info *fs_info, u64 logical, if (length) goto loop; + if (should_queue) + btrfs_queue_work(fs_info->bio_expire_workers, &cache->work); btrfs_put_block_group(cache); } @@ -6632,6 +6661,58 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, return BLK_STS_OK; } +void expire_bios_fn(struct btrfs_work *work) +{ + struct btrfs_block_group_cache *cache; + struct btrfs_bio *bbio, *next; + unsigned long expire_time, cur; + unsigned long expire = 90 * HZ; + LIST_HEAD(submit_list); + + cache = container_of(work, struct btrfs_block_group_cache, work); + btrfs_get_block_group(cache); +loop: + spin_lock(&cache->submit_lock); + cache->expire_thread = current; + if (list_empty(&cache->submit_buffer)) { + cache->last_submit = 0; + cache->expire_thread = NULL; + spin_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + return; + } + cur = jiffies; + expire_time = cache->last_submit + expire; + if (time_before(cur, expire_time) && !sb_rdonly(cache->fs_info->sb)) { + spin_unlock(&cache->submit_lock); + schedule_timeout_interruptible(expire_time - cur); + goto loop; + } + + list_splice_init(&cache->submit_buffer, &submit_list); + cache->expired = 1; + cache->expire_thread = NULL; + spin_unlock(&cache->submit_lock); + + btrfs_handle_fs_error(cache->fs_info, -EIO, + "bio submit buffer expired"); + btrfs_err(cache->fs_info, "block group %llu submit pos %llu", + cache->key.objectid, cache->submit_offset); + + list_for_each_entry_safe(bbio, next, &submit_list, list) { + u64 logical = (u64)bbio->orig_bio->bi_iter.bi_sector << 9; + int i, total_devs = bbio->num_stripes; + + btrfs_err(cache->fs_info, "expiring %llu", logical); + list_del_init(&bbio->list); + for (i = 0; i < total_devs; i++) + bbio_error(bbio, bbio->orig_bio, logical); + } + + cache->last_submit = 0; + btrfs_put_block_group(cache); +} + struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid, u8 *uuid, u8 *fsid) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3db90f5395cd..2a3c046fa31b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -415,6 +415,7 @@ void btrfs_mapping_init(struct btrfs_mapping_tree *tree); void btrfs_mapping_tree_free(struct btrfs_mapping_tree *tree); blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit); +void expire_bios_fn(struct btrfs_work *work); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, From patchwork Thu Aug 9 18:04:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0E1213BB for ; Thu, 9 Aug 2018 18:06:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CDCD52B6CF for ; Thu, 9 Aug 2018 18:06:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C1BF12B7C5; Thu, 9 Aug 2018 18:06:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C0D72B6CF for ; Thu, 9 Aug 2018 18:06:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727503AbeHIUcU (ORCPT ); Thu, 9 Aug 2018 16:32:20 -0400 Received: from mail-pl0-f67.google.com ([209.85.160.67]:42520 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcT (ORCPT ); Thu, 9 Aug 2018 16:32:19 -0400 Received: by mail-pl0-f67.google.com with SMTP id g6-v6so2869211plq.9; Thu, 09 Aug 2018 11:06:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=/GURDllLYMHk2phIf3qJDu1l3AZY/16TJzzt0itEyGw=; b=cYnnVZaZzaefZs1xpQS72jbPduwS0XGILb6IFqCORgd7G3VXzqFjbDcfZtdCobEwdg wEv339LjyDL82Sy8qWRZrCP2BQBm/6HKh5dnZh7ItC9Zg9ziKc0vdCoqaaapclOC9Yl0 SGeP79vRax8ZlpZgPraimNptBIH+4vC+bEo1K2faJcmmYozmYnX62+cR6mKmgmu5go6/ JhhJNGYHaErjXU8KGiT+4FKCNeFT3UeHkEDjrIPNPU/GnLTa3S+XjuRF440ZWzYG5+5+ tGpsqHD3lOpsCjSXjwyMGV0DvsjA7t7Cw426olFrXO52gDyzYL9DuVwG+3pVcY8KsdK8 YY2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=/GURDllLYMHk2phIf3qJDu1l3AZY/16TJzzt0itEyGw=; b=h7p/tZkvshG1ZfJ1htN28ahOL1PgdJ/atpRkdgIAeu2mXG4lJnhIbn5Gme81XvidWV gfAPYCi+3RnqA0PQ1zDMFdtHF5/12BxGUWYQ7A2oUab5ewjugGwL/s9xCWOFVN5Te4Wx aafdvqdmip/SI86hnccKP8WkUgZedqHslw2NshaGogu5LeHzXjrCESAzcFF/MlNkFIL9 ijjuwzV31l1NitO6xUhxQPMZLcqowrVhCU4QSFkyvKOY/5ugYHOp3flL0otWju0ETrs1 2EPUV4cY7o+iFEyDzhRvs2eGrKmK3Axx6G2+mWXw+A5BHybMA8UKpkCK9RJfvbDjRob+ d01w== X-Gm-Message-State: AOUpUlEOY4DPX9azK7c3b6LdanGoy86hmcTYQ2Gv4/59Pq0mSxx5p+PP Z0xufkrZBr+GAV1D9tEcD+k= X-Google-Smtp-Source: AA+uWPzVbFgQwlPUIVEmy2bZoHjlwNCnRYywomp59rEHgkjb/Ek+4IE0Q9cRcVfH5ezqIqHXFhPXdw== X-Received: by 2002:a17:902:1703:: with SMTP id i3-v6mr2913894pli.263.1533837980310; Thu, 09 Aug 2018 11:06:20 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id c66-v6sm17034649pfc.138.2018.08.09.11.06.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:19 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 13/17] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Date: Fri, 10 Aug 2018 03:04:46 +0900 Message-Id: <20180809180450.5091-14-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP By prioritizing sync I/Os, btrfs calls btrfs_map_block() for blocks allocated later before calling the function allocated earlier. By the disorder of calling btrfs_map_block(), syncing on I/Os on larger LBAs sometime wait for I/Os on smaller LBAs. Since active checksum worker is limited to some specified number, it is possible to wait for non-starting checksum on smaller LBAs. In such situation, transactions are stucked waiting for I/Os on smaller LBAs to finish, which is never finished. This situation can be reproduced by e.g. fstests btrfs/073. To avoid such disordering, disable sync IO prioritization for now. In the future, it will be reworked to finish checksumming of I/Os on smaller I/Os on committing a transaction. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 00fa6aca9bb5..f79abd5e6b3a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -807,7 +807,7 @@ blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, async->status = 0; - if (op_is_sync(bio->bi_opf)) + if (op_is_sync(bio->bi_opf) && !btrfs_fs_incompat(fs_info, HMZONED)) btrfs_set_work_high_priority(&async->work); btrfs_queue_work(fs_info->workers, &async->work); From patchwork Thu Aug 9 18:04:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6665213BB for ; Thu, 9 Aug 2018 18:06:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 510CA2B6CF for ; Thu, 9 Aug 2018 18:06:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 453882B7A6; Thu, 9 Aug 2018 18:06:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF91C2B6CF for ; Thu, 9 Aug 2018 18:06:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727527AbeHIUcW (ORCPT ); Thu, 9 Aug 2018 16:32:22 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:40406 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcW (ORCPT ); Thu, 9 Aug 2018 16:32:22 -0400 Received: by mail-pl0-f66.google.com with SMTP id s17-v6so2867674plp.7; Thu, 09 Aug 2018 11:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=1DR/Zc6HEbAQi9Wid3WymID+sizARurLxZdyInZraZA=; b=Tvo3xxE7Sq034ulE2unm45ZYQsDUC8V93KfPrGo2RrujogESyiwvItf+AMdaZHyrkY t4FLALD1L8rQcHmRbZ0CLmQ5+FfryhZYSXILAz6I+aK5X/AsxkOFzo3VCmtB4Eh2Wn6j Fr5kElCcgL0YhjEQWR5gft2Ppi/RmyJUMQNV3EwDRfe9ZViD7b5xjv9BbTn9hXRQVCfC p9Mi5+R17kElqSAt+A070rLo4D2BAIZzRnTMeEleaeiR2xasVIWaZ+UxYcZBTb71RM6Y 4IYuXzOGKpCYdvF+BX6x42hWn6TngM0AfagJbOBKi/azgbe2rYNf3rrYdPIlDgNY76l1 8tpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=1DR/Zc6HEbAQi9Wid3WymID+sizARurLxZdyInZraZA=; b=oDKXB3gOh9qIwvJA6PaJ0k1B3UVgVqAgxWxm3mUK2WaSMoSuHJJzeZeu1Fkqw8+DUQ N7N7mRczTPFoNy0Q9qzOwaGoWrobFuBjsv5pK9rGyJI0J9x0ow2LGCnJWOlTF+D+NTD4 sVg0Ljj6Tym2SgtJ6CXLwLJUENWbEtyKBAY8RL8dFiZckrGlBecTQfAOVQZe8TaJAzkn pGy6tmVSu6aTlMDHBZ6xvEdJf147JYTPIAMVRBpNong/2pnu6ff87ODuyTxp1vzT6U+Q yXhF5wveuuecp3i5q+BfndUCO+jpbqSW2mgwzdWGXDgMBTF6ragRNamMZwGdgR111xY3 Sgnw== X-Gm-Message-State: AOUpUlHObYxj8yqkssVzX/1yJFLT0uKBYvs7ATzyufiZJfNkZPVclHV2 MxEbquLQpP731DiOyMg19Js= X-Google-Smtp-Source: AA+uWPwEBKB3j2xDGHyq+7ak/hol1BNjm0bpX3+bUFO+TvqkFo2yjkFk04+Ar/8F2eTHi8StKLOfQw== X-Received: by 2002:a17:902:904c:: with SMTP id w12-v6mr2986365plz.95.1533837982672; Thu, 09 Aug 2018 11:06:22 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id f18-v6sm16297855pff.29.2018.08.09.11.06.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:21 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 14/17] btrfs: redirty released extent buffers in sequential BGs Date: Fri, 10 Aug 2018 03:04:47 +0900 Message-Id: <20180809180450.5091-15-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On HMZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean extent buffers that have been released in a transaction. Btrfs consult the list before writing out and waiting for the IOs, and it redirties a buffer if 1) it's in sequential BG, 2) it's in un-submit range, and 3) it's not under IO. Thus, such buffers are marked for IO in btrfs_write_and_wait_transaction() to send proper bios to the disk. Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 23 +++++++++++++++++++++-- fs/btrfs/extent_io.c | 1 + fs/btrfs/extent_io.h | 1 + fs/btrfs/transaction.c | 32 ++++++++++++++++++++++++++++++++ fs/btrfs/transaction.h | 3 +++ 5 files changed, 58 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f79abd5e6b3a..aa69c167fd57 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1098,10 +1098,20 @@ struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, void clean_tree_block(struct btrfs_fs_info *fs_info, struct extent_buffer *buf) { - if (btrfs_header_generation(buf) == - fs_info->running_transaction->transid) { + struct btrfs_transaction *cur_trans = fs_info->running_transaction; + + if (btrfs_header_generation(buf) == cur_trans->transid) { btrfs_assert_tree_locked(buf); + if (btrfs_fs_incompat(fs_info, HMZONED) && + list_empty(&buf->release_list)) { + atomic_inc(&buf->refs); + spin_lock(&cur_trans->releasing_ebs_lock); + list_add_tail(&buf->release_list, + &cur_trans->releasing_ebs); + spin_unlock(&cur_trans->releasing_ebs_lock); + } + if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, -buf->len, @@ -4484,6 +4494,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, btrfs_destroy_pinned_extent(fs_info, fs_info->pinned_extents); + while (!list_empty(&cur_trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&cur_trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 736d097d2851..31996c6a5d46 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4825,6 +4825,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, init_waitqueue_head(&eb->read_lock_wq); btrfs_leak_debug_add(&eb->leak_list, &buffers); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index b4d03e677e1d..bcd9a068ed3b 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -192,6 +192,7 @@ struct extent_buffer { */ wait_queue_head_t read_lock_wq; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3b84f5015029..5146e287917a 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -273,6 +273,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(&cur_trans->dirty_pages, fs_info->btree_inode); @@ -2230,7 +2232,28 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(&fs_info->transaction_wait); + if (btrfs_fs_incompat(fs_info, HMZONED)) { + struct extent_buffer *eb; + + list_for_each_entry(eb, &cur_trans->releasing_ebs, + release_list) { + struct btrfs_block_group_cache *cache; + + cache = btrfs_lookup_block_group(fs_info, eb->start); + if (!cache) + continue; + spin_lock(&cache->submit_lock); + if (cache->alloc_type == BTRFS_ALLOC_SEQ && + cache->submit_offset <= eb->start && + !extent_buffer_under_io(eb)) + set_extent_buffer_dirty(eb); + spin_unlock(&cache->submit_lock); + btrfs_put_block_group(cache); + } + } + ret = btrfs_write_and_wait_transaction(trans); + if (ret) { btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction"); @@ -2238,6 +2261,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + while (!list_empty(&cur_trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&cur_trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 4cbb1b55387d..d88c335dd78c 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -88,6 +88,9 @@ struct btrfs_transaction { spinlock_t dropped_roots_lock; struct btrfs_delayed_ref_root delayed_refs; struct btrfs_fs_info *fs_info; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) From patchwork Thu Aug 9 18:04:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14A9D13B4 for ; Thu, 9 Aug 2018 18:06:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3E122B6CF for ; Thu, 9 Aug 2018 18:06:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E82F92B7A6; Thu, 9 Aug 2018 18:06:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89D8E2B6CF for ; Thu, 9 Aug 2018 18:06:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727555AbeHIUcZ (ORCPT ); Thu, 9 Aug 2018 16:32:25 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:46215 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUcY (ORCPT ); Thu, 9 Aug 2018 16:32:24 -0400 Received: by mail-pf1-f195.google.com with SMTP id u24-v6so3192953pfn.13; Thu, 09 Aug 2018 11:06:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=nIuHCF9683iMYU2rHMsdILJuGo9/J5CkhRVd0dD+FUY=; b=RDNCO6DAKkOyJV0udkYO90nl7ou3VLUqZt5cpOaugzrYSVVLB1ZKWOIi31/PRWbDJl ygqgo4QMZYokvyCMHgDAM3q5OTB8Lz0bn0Q2pWgUKe1X+c3Xqs8/bmFlHBFpoMngHnHu 4yVyGaoaducePqhIRQrz5vofy9whcy9SPuF6wmGRF3AkyzG1mEKI5bR7JwqCAqkibWRP 5+sMssFpNsAmsO+tRP71oBm4LAZz7Py6kIdz18n/Y6vyUT4g3eKkVLt0+tJmeLCkRB5g xnz28N8WVyC0iP9JQqXhLOcUmZaStL6aNwiGC2zaRPaumUK9W1dH4RD+RLeDcoo9aUUq NgaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=nIuHCF9683iMYU2rHMsdILJuGo9/J5CkhRVd0dD+FUY=; b=KWQ6FiBeA1Hrz4qbKMWQFqSjgx2FyYN04jPPxy13WTBhielf179z+Kfnh/hNOH4Vzc h3I6FE12yLoINFX2BIbcr861H9d18/pBLXy/EwZCPsMVQR0uCC1TC6aVh9tZ12v/L36n Y4ORH+qdyLgAm9spAu3VEVQwl6k1iQ9KCMdoyXPSTLXyljP9f8mBIL7CBPlC1/rNINUV BbsNlsC7CY8LDi97HKSnEgkNsHfGfT+pSj8yIEriS86gDd35i1oM/m2Gniect7aw82tW u30hAs45zR0ekb+L6sOWtPh5om59LZejwJEAoR2GxPymFs8D1cGWEXJaSg3QWer35GhG agiQ== X-Gm-Message-State: AOUpUlHL7/564ZGlgDdBTn4Z08cyFOJv+sT4IxfgRk0rr2Rup+g6dsEK CTksNkzqLBO0to+amAJ0kVQ= X-Google-Smtp-Source: AA+uWPxKsnV0AB7zVwcs3VzSoF1sItrzmf9cXWc96Vf4LkUQxBeXVMlPOkJy7sN0s+WtDMjxdAnNsw== X-Received: by 2002:a63:2dc1:: with SMTP id t184-v6mr3153420pgt.62.1533837985008; Thu, 09 Aug 2018 11:06:25 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id l85-v6sm12880248pfk.34.2018.08.09.11.06.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:24 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 15/17] btrfs: reset zones of unused block groups Date: Fri, 10 Aug 2018 03:04:48 +0900 Message-Id: <20180809180450.5091-16-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an HMZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a5f5935315c8..26989f6fe591 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2025,6 +2025,25 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + + if (clear == BTRFS_CLEAR_OP_DISCARD && + btrfs_dev_is_sequential(stripe->dev, + stripe->physical) && + stripe->length == stripe->dev->zone_size) { + ret = blkdev_reset_zones(stripe->dev->bdev, + stripe->physical >> 9, + stripe->length >> 9, + GFP_NOFS); + if (ret) + discarded_bytes += stripe->length; + else + break; + set_bit(stripe->physical >> + stripe->dev->zone_size_shift, + stripe->dev->empty_zones); + continue; + } + req_q = bdev_get_queue(stripe->dev->bdev); if (clear == BTRFS_CLEAR_OP_DISCARD && !blk_queue_discard(req_q)) @@ -10958,7 +10977,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_unlock(&space_info->lock); /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD); + trimming = btrfs_test_opt(fs_info, DISCARD) || + btrfs_fs_incompat(fs_info, HMZONED); /* Implicit trim during transaction commit. */ if (trimming) From patchwork Thu Aug 9 18:04:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561623 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFA5013B4 for ; Thu, 9 Aug 2018 18:06:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D87162B6CF for ; Thu, 9 Aug 2018 18:06:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CB09D2B7A6; Thu, 9 Aug 2018 18:06:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7C382B6CF for ; Thu, 9 Aug 2018 18:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727580AbeHIUc1 (ORCPT ); Thu, 9 Aug 2018 16:32:27 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:37896 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUc0 (ORCPT ); Thu, 9 Aug 2018 16:32:26 -0400 Received: by mail-pf1-f195.google.com with SMTP id x17-v6so3214218pfh.5; Thu, 09 Aug 2018 11:06:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=ChTqGWd0FnLHcIJZTe8utKIqzM+B07AfIKDdN92DwrY=; b=Mg6qQzJl/zTwwW9qoJFFSeY0HfPhnA/cXjCEYT13aSubDrS6HfICYuV1PRpP4ZLbpN wEnF5PyCOKFOZV1iFo4kmuWMbshLdJYhdGjknG4IhDqlVOnVxCkB2mcuVBhrtVbC3TTx T+gIztYProfNCwxbzpIGqYLX5Nd87kPbv2o900L/MooVN8ab/ZogVSLAaRkHaXCn6EJq uMUC7qBKjtrXTXSR4kiYn44b1R9pAQ3GjagfE+2dspxIQ8dMt8H/gfT35pI3dSIApKig SlZ47XIYpnlurBIsHZIcKGuxZCu30o8iQSa+Km7kCWZ334fRcnj2uYNj7Go70pVQni2c sU8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=ChTqGWd0FnLHcIJZTe8utKIqzM+B07AfIKDdN92DwrY=; b=jReKDLq2AyLvfpbC/6zjfF7KYjC3OxIy1fjXuopZ5MvxzydEzjR/qcGlsKlFvy6+mx ghtTahcixjDMK7hQp0WWqec93tYR+KZjtg2EBikf5gELnMwdnsM43Hq/wV/UrsqqE0w+ 5fus+y6QPS37aklTilLo5JxjhbHYt7aa1tY5Zmcq1qwRP0gwlxjKlCc9edWU22Kfe9Ln +AWPkxUYa8qnMv+adXQ4g176QvQTayVAIcyp0OasbSL2kWfM5i2rvaZuZhINI527fQvq CFJbitkhjhMYhCp4Vti3AuoayEv/osxPdMAWfWxXxptaIAlB4lZsRz2JD13FAsXb07SR 14SA== X-Gm-Message-State: AOUpUlEnZee2WSW6PGg26Rb/5i7x7bG6pG4s4S8OgxrNk128C2LH8SRi 6Mlv++846x3G16hn3PHNQA0= X-Google-Smtp-Source: AA+uWPxm631VIvu5pF2Vb2hKbDADufIA+SZp49DxfsUr5/PnRh5WMyShBrD86yv8lm5YLw7V3KU6Tw== X-Received: by 2002:a63:121a:: with SMTP id h26-v6mr3210726pgl.316.1533837987318; Thu, 09 Aug 2018 11:06:27 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id r1-v6sm22464908pfi.17.2018.08.09.11.06.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:26 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 16/17] btrfs: wait existing extents before truncating Date: Fri, 10 Aug 2018 03:04:49 +0900 Message-Id: <20180809180450.5091-17-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 05f5e05ccf37..d3f35f81834f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5193,6 +5193,17 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_end_write_no_snapshotting(root); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_fs_incompat(fs_info, HMZONED)) { + u64 sectormask = fs_info->sectorsize - 1; + + ret = btrfs_wait_ordered_range(inode, + newsize & (~sectormask), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Thu Aug 9 18:04:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 10561621 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6EB515A6 for ; Thu, 9 Aug 2018 18:06:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C22722B6CF for ; Thu, 9 Aug 2018 18:06:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5B1D2B7B6; Thu, 9 Aug 2018 18:06:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 626F02B6CF for ; Thu, 9 Aug 2018 18:06:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727614AbeHIUc3 (ORCPT ); Thu, 9 Aug 2018 16:32:29 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:36298 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727407AbeHIUc2 (ORCPT ); Thu, 9 Aug 2018 16:32:28 -0400 Received: by mail-pf1-f193.google.com with SMTP id b11-v6so3218029pfo.3; Thu, 09 Aug 2018 11:06:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=Wfgd7eS27CHVYb6s8wAx2iBTeRKXaiiHZGYV1m76CIY=; b=izS857VLce4+vKf6pqyvWHxsT8Bgz1Ctmlrp8CpmNt2rkmztvLECkl4cd2p3+KSs7o Nb/QtcoPrjpBe2Y845qzvrez62xM9cjEfZR9w32QjxwsXQ3Oi/anXQUus1ihRzDH00ZA Qehml7FvK9ujWmtqZ4qvdCrDDr486WE3/3fRAKOwE+hg8zkpFHVbFYwoaUZU5EV+dL7o b138nnK4WsBeQJPGVCG1xgFX8TNLFIoF7BVob30PiUao0xnEqUxYNbu6d5mWq9s2CVIl QU87rUOF6AEkf3Cn5hyAsAY76VvYdLZQNts4sGFZVIzfjt32AEk2hWA+56njtC0t39SR uisQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=Wfgd7eS27CHVYb6s8wAx2iBTeRKXaiiHZGYV1m76CIY=; b=mmafWhlQNukbVf6JnWx9OXEdE1XmTVLjSb3vtNafLE8aBcjX9PeQWvEu74Kx60J4eY hYl6CaEBwdhZYoOIv0hFTvHGPXgzZ/2MbwlZEtmofTicpafxJBHfo0SlkJ4wO+U2/gpm 9QrnAfyVfPu4uFZP+ygltbOpycKG6Lw9mSY0hP2CEpu2plDa6uB9yPFACSuSr7Viuyxr NFyv9pH36dEZrNXZ9DcnrfazXt0ukHla6Uiz0MrlZdh2U5DXXUvbnFduo356SX9z3KFV +qjkOV0Vb9LGuQYpMcT5HSshfVKnqhNrUCXO4BG9ZsLWntOw6yg5t/mvRmsgq0w9vUEr d2KA== X-Gm-Message-State: AOUpUlHqOV4xp1/rEf56RkWici6Dp5DOaml7Bk92YT1EVOjRhiJKPDIt SdHWrsa/uT840lN/s2QMFJs= X-Google-Smtp-Source: AA+uWPwrVxd4tn6bfhVEl1tX8DwkGv9HOAYehio+lN69HdFII4YgZNqD6anJMdnwLrXY9bW//25phw== X-Received: by 2002:a63:951e:: with SMTP id p30-v6mr3131052pgd.318.1533837989628; Thu, 09 Aug 2018 11:06:29 -0700 (PDT) Received: from localhost (h101-111-148-072.catv02.itscom.jp. [101.111.148.72]) by smtp.gmail.com with ESMTPSA id o21-v6sm12661852pfa.54.2018.08.09.11.06.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 11:06:28 -0700 (PDT) From: Naohiro Aota To: David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Hannes Reinecke , Damien Le Moal , Bart Van Assche , Matias Bjorling , Naohiro Aota Subject: [RFC PATCH 17/17] btrfs: enable to mount HMZONED incompat flag Date: Fri, 10 Aug 2018 03:04:50 +0900 Message-Id: <20180809180450.5091-18-naota@elisp.net> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> References: <20180809180450.5091-1-naota@elisp.net> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This final patch adds the HMZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount HMZONED flagged file system. Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8f85c96cd262..46a243b2f111 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -271,7 +271,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_RAID56 | \ BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ - BTRFS_FEATURE_INCOMPAT_NO_HOLES) + BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ + BTRFS_FEATURE_INCOMPAT_HMZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)