From patchwork Sat Feb 9 05:24:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10804131 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9EF5922 for ; Sat, 9 Feb 2019 05:26:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB7AA2DD01 for ; Sat, 9 Feb 2019 05:26:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF5242DD0F; Sat, 9 Feb 2019 05:26:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AF192DD01 for ; Sat, 9 Feb 2019 05:26:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726118AbfBIFYo (ORCPT ); Sat, 9 Feb 2019 00:24:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:60236 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725919AbfBIFYo (ORCPT ); Sat, 9 Feb 2019 00:24:44 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7B1D0AF5B; Sat, 9 Feb 2019 05:24:43 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Nikolay Borisov , Anand Jain Subject: [PATCH v2 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Date: Sat, 9 Feb 2019 13:24:36 +0800 Message-Id: <20190209052437.24020-2-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190209052437.24020-1-wqu@suse.com> References: <20190209052437.24020-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP verify_one_dev_extent() will call btrfs_find_device() for each dev extent, this waste some CPU time just searching the devices list. Move the search one level up, into the btrfs_verify_dev_extents(), so for each device we only call btrfs_find_device() once. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov Reviewed-by: Anand Jain --- fs/btrfs/volumes.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 03f223aa7194..bae03111273e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7772,13 +7772,14 @@ static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes) } static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, - u64 chunk_offset, u64 devid, - u64 physical_offset, u64 physical_len) + struct btrfs_device *dev, + u64 chunk_offset, u64 physical_offset, + u64 physical_len) { struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree; struct extent_map *em; struct map_lookup *map; - struct btrfs_device *dev; + u64 devid = dev->devid; u64 stripe_len; bool found = false; int ret = 0; @@ -7830,15 +7831,8 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, } /* Make sure no dev extent is beyond device bondary */ - dev = btrfs_find_device(fs_info->fs_devices, devid, NULL, NULL, true); - if (!dev) { - btrfs_err(fs_info, "failed to find devid %llu", devid); - ret = -EUCLEAN; - goto out; - } - - /* It's possible this device is a dummy for seed device */ if (dev->disk_total_bytes == 0) { + /* This device is a dummy for seed device */ dev = btrfs_find_device(fs_info->fs_devices->seed, devid, NULL, NULL, false); if (!dev) { @@ -7898,6 +7892,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) { struct btrfs_path *path; struct btrfs_root *root = fs_info->dev_root; + struct btrfs_device *device = NULL; struct btrfs_key key; u64 prev_devid = 0; u64 prev_dev_ext_end = 0; @@ -7941,6 +7936,17 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) devid = key.objectid; physical_offset = key.offset; + if (!device || devid != device->devid) { + device = btrfs_find_device(fs_info->fs_devices, devid, + NULL, NULL, true); + if (!device) { + btrfs_err(fs_info, "failed to find devid %llu", + devid); + ret = -EUCLEAN; + goto out; + } + } + dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent); chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext); physical_len = btrfs_dev_extent_length(leaf, dext); @@ -7954,7 +7960,7 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) goto out; } - ret = verify_one_dev_extent(fs_info, chunk_offset, devid, + ret = verify_one_dev_extent(fs_info, device, chunk_offset, physical_offset, physical_len); if (ret < 0) goto out; From patchwork Sat Feb 9 05:24:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10804135 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3152117FB for ; Sat, 9 Feb 2019 05:26:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 211EB2DCFE for ; Sat, 9 Feb 2019 05:26:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1318A2DD20; Sat, 9 Feb 2019 05:26:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B2342DD0A for ; Sat, 9 Feb 2019 05:26:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726268AbfBIFYr (ORCPT ); Sat, 9 Feb 2019 00:24:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:60246 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725919AbfBIFYq (ORCPT ); Sat, 9 Feb 2019 00:24:46 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 575A7AFC0 for ; Sat, 9 Feb 2019 05:24:45 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Date: Sat, 9 Feb 2019 13:24:37 +0800 Message-Id: <20190209052437.24020-3-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190209052437.24020-1-wqu@suse.com> References: <20190209052437.24020-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [PROBLEM] When allocating a 4G file, falloc() call is slower and slower if the fs is more and more filled. This is going to be super obvious when doing some crazy test. E.g. fallocating a 1PiB file TiB by TiB. The first 16T only takes 10 seconds to finish while the last 16TiB can take over 15min to finish, making the total time of fallocating 1PiB to astonishing 5 hours. [CAUSE] When allocating new dev extents for a chunk, btrfs doesn't have any free dev extent cache, thus it can only search device tree to find free slot. However it always search from device offset 0, and if there are a lot of dev extents already allocated, such search will be super time consuming. The following is function execution time for __btrfs_alloc_chunk() and find_free_dev_extent_start(), one is allocating 4G on an empty fs, the other one is allocating 4G on a 4T used fs. empty fs | 4T used fs | function ----------------------------------------------------------------------- 4) | 7) | __btrfs_alloc_chunk [btrfs]() { 4) 4.839 us | 7) ! 152.496 us | find_free_dev_extent_start [btrfs]() 4) + 64.692 us | 7) ! 185.488 us | } 4) | 7) | __btrfs_alloc_chunk [btrfs]() { 4) + 12.844 us | 7) ! 132.889 us | find_free_dev_extent_start [btrfs]() 4) ! 131.877 us | 7) ! 152.115 us | } 4) | 7) | __btrfs_alloc_chunk [btrfs]() { 4) 2.375 us | 7) ! 127.689 us | find_free_dev_extent_start [btrfs]() 4) + 16.992 us | 7) ! 146.595 us | } 4) | 7) | __btrfs_alloc_chunk [btrfs]() { 4) 2.144 us | 7) ! 126.657 us | find_free_dev_extent_start [btrfs]() 4) + 16.280 us | 7) ! 144.521 us | } It's pretty ovbious that find_free_free_dev_extent_start() takes over 20x more time for 4TB used fs, causing chunk allocation much slower. [ENHANCEMENT] This patch will introduce btrfs_device::hint_free_dev_extent member to give some hint for chunk allocator to find free dev extents. The hint itself is pretty simple, only tells where the first free slot could possibly be. It is not 100% correct, unlike free space cache, but since find_free_dev_extent_start() is already robust enough to handle search_hint, so there is not need to introduce a complex and fancy free dev extent cache. With this patch, allocating 4G on a 4T filled fs will be way more faster: v5.0-rc1 | patched | function --------------------------------------------------------------------- 7) | 7) | __btrfs_alloc_chunk [btrfs]() { 7) ! 152.496 us | 7) 7.885 us | find_free_dev_extent_start [btrfs](); 7) ! 185.488 us | 7) + 36.649 us | } 7) | 7) | __btrfs_alloc_chunk [btrfs]() { 7) ! 132.889 us | 7) 2.454 us | find_free_dev_extent_start [btrfs](); 7) ! 152.115 us | 7) + 24.145 us | } 7) | 7) | __btrfs_alloc_chunk [btrfs]() { 7) ! 127.689 us | 7) 2.245 us | find_free_dev_extent_start [btrfs](); 7) ! 146.595 us | 7) + 19.376 us | } 7) | 7) | __btrfs_alloc_chunk [btrfs]() { 7) ! 126.657 us | 7) 2.174 us | find_free_dev_extent_start [btrfs](); 7) ! 144.521 us | 7) + 16.321 us | } And for the crazy 1PiB fallocate, now it takes 15~20min other than 5 hours. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 28 +++++++++++++++++++--- fs/btrfs/volumes.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bae03111273e..a225101582c1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -411,6 +411,7 @@ static struct btrfs_device *__alloc_device(void) btrfs_device_data_ordered_init(dev); INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); + dev->hint_free_dev_extent = (u64)-1; return dev; } @@ -1746,9 +1747,14 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *len) { - /* FIXME use last free of some kind */ - return find_free_dev_extent_start(trans->transaction, device, - num_bytes, 0, start, len); + u64 search_hint; + + if (device->hint_free_dev_extent == (u64)-1) + search_hint = 0; + else + search_hint = device->hint_free_dev_extent; + return find_free_dev_extent_start(trans->transaction, device, num_bytes, + search_hint, start, len); } static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans, @@ -1804,6 +1810,7 @@ static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans, "Failed to remove dev extent item"); } else { set_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags); + btrfs_device_hint_add_free(device, key.offset, *dev_extent_len); } out: btrfs_free_path(path); @@ -1846,6 +1853,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans, btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset); btrfs_set_dev_extent_length(leaf, extent, num_bytes); + btrfs_device_hint_del_free(device, key.offset, num_bytes); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); @@ -7936,6 +7944,14 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) devid = key.objectid; physical_offset = key.offset; + /* + * previous device verification is done, update its free dev + * extent hint + */ + if (device && devid != device->devid) + btrfs_device_hint_add_free(device, prev_dev_ext_end, + device->disk_total_bytes - prev_dev_ext_end); + if (!device || devid != device->devid) { device = btrfs_find_device(fs_info->fs_devices, devid, NULL, NULL, true); @@ -7964,6 +7980,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) physical_offset, physical_len); if (ret < 0) goto out; + + btrfs_device_hint_add_free(device, prev_dev_ext_end, + physical_offset - prev_dev_ext_end); + prev_devid = devid; prev_dev_ext_end = physical_offset + physical_len; @@ -7975,6 +7995,8 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) break; } } + btrfs_device_hint_add_free(device, prev_dev_ext_end, + device->disk_total_bytes - prev_dev_ext_end); /* Ensure all chunks have corresponding dev extents */ ret = verify_chunk_dev_extent_mapping(fs_info); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3ad9d58d1b66..31f1e5f8d23a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -108,6 +108,14 @@ struct btrfs_device { /* bytes used on the current transaction */ u64 commit_bytes_used; + + /* + * hint about where the first possible free dev extent is. + * + * u64(-1) means no hint. + */ + u64 hint_free_dev_extent; + /* * used to manage the device which is resized * @@ -570,4 +578,54 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +static inline void btrfs_device_hint_add_free(struct btrfs_device *dev, + u64 start, u64 len) +{ + if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes) + return; + if (len < SZ_16M) + return; + if (start > dev->hint_free_dev_extent) + return; + dev->hint_free_dev_extent = start; +} + +static inline void btrfs_device_hint_del_free(struct btrfs_device *dev, + u64 start, u64 len) +{ + u64 free_hint = dev->hint_free_dev_extent; + + if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes) + return; + /* + * |<- to be removed ->| + * | free hint + * Not affecting free hint + */ + if (start + len <= free_hint) + return; + /* + * |<- to be removed ->| + * | free hint + * Or + * |<- to be removed ->| + * | free hint + * |<-->| Less than 16M + * + * Move the hint to the range end + */ + if ((start <= free_hint && start + len > free_hint) || + (start > free_hint && free_hint - start < SZ_16M)) { + dev->hint_free_dev_extent = start + len; + return; + } + + /* + * |<- to be removed ->| + * | free hint + * + * We still have larger than 16M free space, no need to update + * free hint + */ +} #endif