From patchwork Wed Feb 15 20:59:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13142186 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 832F2C64ED9 for ; Wed, 15 Feb 2023 21:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230013AbjBOVCW (ORCPT ); Wed, 15 Feb 2023 16:02:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229964AbjBOVBw (ORCPT ); Wed, 15 Feb 2023 16:01:52 -0500 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 988B8234F9 for ; Wed, 15 Feb 2023 12:59:56 -0800 (PST) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id E1B3A320098F; Wed, 15 Feb 2023 15:59:55 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Wed, 15 Feb 2023 15:59:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1676494795; x=1676581195; bh=kB t99LsxkdDNHk35DIiv7BtevojIx5IKtFuJHxTgYuw=; b=b2ebEzdrzK3MjN8lE8 M5oBRrPHToOUTrcSS7vZS3MJeeR1hTWw5Cx8+PdAh/pme+k1wOyDTTuxvZEX2PNZ dCdcw/UvChIPrk7QYSOJZONAfy5Gp27nrnDwze7U8KMJ+JYuOtgL31Ii9pesh8qM 82bd/7brZSdrxBVc1437SbzSw1oMr8viX44AAjK3wQZnX9n9xqnBftlrcB2lMA2j dnLu96cDaG4LMWJRDTosEvwXZv5j6LC6pUa9CuFX6EATzJt4DeQKqyznOEh8htK2 x2LgDGlDZJT9tCBCfUhVWfHBcseu8Y0Nqqx8SMm6FznGZYiwy3Q2l3QAv95aKeG5 U4ZA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1676494795; x=1676581195; bh=kBt99LsxkdDNH k35DIiv7BtevojIx5IKtFuJHxTgYuw=; b=H6TRuWQU8pTfI265y36Yp4f5AY931 57zaMx5dH3POO50s2qIQ22HWqpan1wqWyv1kHtVJbIVF1DhN4EwJenntHLGr/IUb vTsETCo56Ooert7udsxdDb84Gfk7n/68bv9JryIxeTrYDY1Fc+cLZ8B3doWQ62pw FhHNy2YZC3rxWm5cIqJwQ6bV/1+o7z/KhNFQIen/trqIKuLQS/EQWRxNW9vvP7Z3 XKYXusJ0btrV0ECS3u30lLOdr50PVEEbwKH81UWHFbSdlyiKwz8JSptNkwlIqrjv aTShpZIiu8UJf+8yu7+QOl/rIWKEkP+Fyvk4hxG0TWS+iW1A1Ky0T60cA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrudeihedgudefjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 15 Feb 2023 15:59:54 -0500 (EST) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 1/2] btrfs: add size class stats to sysfs Date: Wed, 15 Feb 2023 12:59:49 -0800 Message-Id: <2fff5aa06edf1387677bdb1329359295c6d38506.1676494550.git.boris@bur.io> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Make it possible to see the distribution of size classes for block groups. Helpful for testing and debugging the allocator w.r.t. to size classes. The new stats can be found at the path: /sys/fs/btrfs//allocation//size_class but they will only be non-zero for bg-type = data. Signed-off-by: Boris Burkov --- fs/btrfs/sysfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 8c5efa5813b3..4926cab2f507 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "messages.h" #include "ctree.h" @@ -778,6 +779,47 @@ static ssize_t btrfs_chunk_size_store(struct kobject *kobj, return len; } +static ssize_t btrfs_size_classes_show(struct kobject *kobj, + struct kobj_attribute *a, char *buf) +{ + struct btrfs_space_info *sinfo = to_space_info(kobj); + struct btrfs_block_group *bg; + u32 none = 0; + u32 small = 0; + u32 medium = 0; + u32 large = 0; + + down_read(&sinfo->groups_sem); + for (int i = 0; i < BTRFS_NR_RAID_TYPES; ++i) { + list_for_each_entry(bg, &sinfo->block_groups[i], list) { + if (!btrfs_block_group_should_use_size_class(bg)) + continue; + switch (bg->size_class) { + case BTRFS_BG_SZ_NONE: + none++; + break; + case BTRFS_BG_SZ_SMALL: + small++; + break; + case BTRFS_BG_SZ_MEDIUM: + medium++; + break; + case BTRFS_BG_SZ_LARGE: + large++; + break; + } + } + if (rwsem_is_contended(&sinfo->groups_sem)) { + up_read(&sinfo->groups_sem); + cond_resched(); + down_read(&sinfo->groups_sem); + } + } + up_read(&sinfo->groups_sem); + return sysfs_emit(buf, "none %u\nsmall %u\nmedium %u\nlarge %u\n", + none, small, medium, large); +} + #ifdef CONFIG_BTRFS_DEBUG /* * Request chunk allocation with current chunk size. @@ -835,6 +877,7 @@ SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR_RW(space_info, chunk_size, btrfs_chunk_size_show, btrfs_chunk_size_store); +BTRFS_ATTR(space_info, size_classes, btrfs_size_classes_show); static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj, struct kobj_attribute *a, @@ -887,6 +930,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold), BTRFS_ATTR_PTR(space_info, chunk_size), + BTRFS_ATTR_PTR(space_info, size_classes), #ifdef CONFIG_BTRFS_DEBUG BTRFS_ATTR_PTR(space_info, force_chunk_alloc), #endif From patchwork Wed Feb 15 20:59:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13142187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A5BC636CC for ; Wed, 15 Feb 2023 21:02:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbjBOVCY (ORCPT ); Wed, 15 Feb 2023 16:02:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230010AbjBOVB4 (ORCPT ); Wed, 15 Feb 2023 16:01:56 -0500 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 495C5457E2 for ; Wed, 15 Feb 2023 12:59:59 -0800 (PST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 8844D320096F; Wed, 15 Feb 2023 15:59:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 15 Feb 2023 15:59:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1676494798; x=1676581198; bh=2t O+FB5ytAz8pksUZ+oOfDha2s2wAy6utpIBnygJwr0=; b=CqS/tR3gf9SF/McTcT Pm7uWGfigFVeVjH52lxZ9UKzobV8IxBVopwfx68mE4r5nIvoht9flRUiVWIA0SJb d8VEQ001jSvd+0dF3jjChAz95eIkIUSP+aPxsAfkLNunVZg1oVLdaZsB0fmmJDra SKCuABAB6iAk3rI+cp8VJIYzDcXjysdyAFAZ4oTo9/3Ghs5WEoB8T2FJ8yoCXNVh FZ1Hyz/fPF9utLRdDEZ13il8jU1rAek+ju1cGpueMDqQxXUIK30S+L/3WjSDX1Jg ON54ZDew/CR4qute+cNsRT2+OJcWQ6hCJKYb9Qx8pTPLE8XiHQWXTzgmg1tHImU2 ma9g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1676494798; x=1676581198; bh=2tO+FB5ytAz8p ksUZ+oOfDha2s2wAy6utpIBnygJwr0=; b=nFP+58zGWH1ce0hml7hK5anQ5Bzd2 0v9KrLN6XLPt/nu7J7yL9gGPCLAJ31wdk5pBiDzRrgOiCBl4AojiA8pfcgugs0Hc ti2ko7UXOTsY6o2zCiqxHGKU0DqtQqQFEDI/5yasM24zdRKbodWh65Df1wBTZryz E2CCAwKWBXdCBFILH/wVdBmIfBIly89Ki+G+mIHNWF2yvJITLnO3izpvaLm4JWc3 y3wkZ1YuaPXowuQvsPw99M8kZI0Roo0hy6PU0RgLQjd1opG5ITZ0JQMC320l4nrZ ADAgv1gpvxqokRM2lWsi1Y/HQkrrHNQqNDtNyt6mO5S5y0mSs64GJN2tg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrudeihedgudefjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 15 Feb 2023 15:59:57 -0500 (EST) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v3 2/2] btrfs: fix size class loading logic Date: Wed, 15 Feb 2023 12:59:50 -0800 Message-Id: <19e54cb085684ba1825709ba0b4e3040950895e6.1676494550.git.boris@bur.io> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This is another incremental patch fixing bugs in: btrfs: load block group size class when caching The use of search_forward was incorrect, and I have replaced it with btrfs_for_each_slot. Since we only consider five samples (five search slots), don't bother with the complexity of looking for commit_root_sem contention. If necessary, it can be added to the load function in between samples. The mistake was Reported-by: Filipe Manana The commit message should be: btrfs: load block group size class when caching Since the size class is an artifact of an arbitrary anti fragmentation strategy, it doesn't really make sense to persist it. Furthermore, most of the size class logic assumes fresh block groups. That is of course not a reasonable assumption -- we will be upgrading kernels with existing filesystems whose block groups are not classified. To work around those issues, implement logic to compute the size class of the block groups as we cache them in. To perfectly assess the state of a block group, we would have to read the entire extent tree (since the free space cache mashes together contiguous extent items) which would be prohibitively expensive for larger file systems with more extents. We can do it relatively cheaply by implementing a simple heuristic of sampling a handful of extents and picking the smallest one we see. In the happy case where the block group was classified, we will only see extents of the correct size. In the unhappy case, we will hopefully find one of the smaller extents, but there is no perfect answer anyway. Autorelocation will eventually churn up the block group if there is significant freeing anyway. There was no regression in mount performance at end state of the fsperf test suite, and the delay until the block group is marked cached is minimized by the constant number of extent samples. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 42 ++++++++++++++++-------------------------- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5b10401d803b..05102a55710c 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -558,14 +558,15 @@ u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end static int sample_block_group_extent_item(struct btrfs_caching_control *caching_ctl, struct btrfs_block_group *block_group, int index, int max_index, - struct btrfs_key *key) + struct btrfs_key *found_key) { struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_root *extent_root; - int ret = 0; u64 search_offset; u64 search_end = block_group->start + block_group->length; struct btrfs_path *path; + struct btrfs_key search_key; + int ret = 0; ASSERT(index >= 0); ASSERT(index <= max_index); @@ -585,37 +586,24 @@ static int sample_block_group_extent_item(struct btrfs_caching_control *caching_ path->reada = READA_FORWARD; search_offset = index * div_u64(block_group->length, max_index); - key->objectid = block_group->start + search_offset; - key->type = BTRFS_EXTENT_ITEM_KEY; - key->offset = 0; + search_key.objectid = block_group->start + search_offset; + search_key.type = BTRFS_EXTENT_ITEM_KEY; + search_key.offset = 0; - while (1) { - ret = btrfs_search_forward(extent_root, key, path, 0); - if (ret != 0) - goto out; + btrfs_for_each_slot(extent_root, &search_key, found_key, path, ret) { /* Success; sampled an extent item in the block group */ - if (key->type == BTRFS_EXTENT_ITEM_KEY && - key->objectid >= block_group->start && - key->objectid + key->offset <= search_end) - goto out; + if (found_key->type == BTRFS_EXTENT_ITEM_KEY && + found_key->objectid >= block_group->start && + found_key->objectid + found_key->offset <= search_end) + break; /* We can't possibly find a valid extent item anymore */ - if (key->objectid >= search_end) { + if (found_key->objectid >= search_end) { ret = 1; break; } - if (key->type < BTRFS_EXTENT_ITEM_KEY) - key->type = BTRFS_EXTENT_ITEM_KEY; - else - key->objectid++; - btrfs_release_path(path); - up_read(&fs_info->commit_root_sem); - mutex_unlock(&caching_ctl->mutex); - cond_resched(); - mutex_lock(&caching_ctl->mutex); - down_read(&fs_info->commit_root_sem); } -out: + lockdep_assert_held(&caching_ctl->mutex); lockdep_assert_held_read(&fs_info->commit_root_sem); btrfs_free_path(path); @@ -659,6 +647,7 @@ static int sample_block_group_extent_item(struct btrfs_caching_control *caching_ static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl, struct btrfs_block_group *block_group) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_key key; int i; u64 min_size = block_group->length; @@ -668,6 +657,8 @@ static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl if (!btrfs_block_group_should_use_size_class(block_group)) return 0; + lockdep_assert_held(&caching_ctl->mutex); + lockdep_assert_held_read(&fs_info->commit_root_sem); for (i = 0; i < 5; ++i) { ret = sample_block_group_extent_item(caching_ctl, block_group, i, 5, &key); if (ret < 0) @@ -682,7 +673,6 @@ static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl block_group->size_class = size_class; spin_unlock(&block_group->lock); } - out: return ret; }