From patchwork Thu Oct 13 22:52:09 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Boris Burkov <boris@bur.io>
X-Patchwork-Id: 13006543
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EBE9CC433FE
	for <linux-btrfs@archiver.kernel.org>; Thu, 13 Oct 2022 22:52:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229556AbiJMWwT (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Thu, 13 Oct 2022 18:52:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51036 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229471AbiJMWwS (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Thu, 13 Oct 2022 18:52:18 -0400
Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com
 [64.147.123.21])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2C4014707F
        for <linux-btrfs@vger.kernel.org>;
 Thu, 13 Oct 2022 15:52:16 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
        by mailout.west.internal (Postfix) with ESMTP id 150D53200900;
        Thu, 13 Oct 2022 18:52:15 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
  by compute2.internal (MEProxy); Thu, 13 Oct 2022 18:52:15 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc
        :content-transfer-encoding:date:date:from:from:in-reply-to
        :in-reply-to:message-id:mime-version:references:reply-to:sender
        :subject:subject:to:to; s=fm3; t=1665701534; x=1665787934; bh=Gh
        hxkU8z2KgfSHPfBl5G1uonjloZ949IZtRfjjPWZDw=; b=JTIO4nLIVub0SkMJVO
        oUQpHoUfOZ+kjloXLwLZmyQ1b49eISR2kWRc6xxLWADTpwtSoDAWkPPgN8I9FJhY
        3RdkJ/g4As32YsfFJGJ1mRMS3Q7yRw1GP1g9J50Mzv6dOSh1UtMXAFWFieufMlGg
        obmV6oDx6+1DGDZkPcVZ9MWHbaWJMMGpWrZKlUIHQqC03yBWmoww3v9s13vKkYsc
        BmiAkYuA2I9BiXEgyP/di2DU+QyD2D2m46ZtNXl8uP/oUFXGpDF2rw6szQ5TUljg
        tenEBpsU2ELbTXlTXjr74rawhoOS2YDHHOpRxHj0P+V1lchpF5Bgm/5FzlN6ZKzk
        7PDw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
        messagingengine.com; h=cc:content-transfer-encoding:date:date
        :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
        :message-id:mime-version:references:reply-to:sender:subject
        :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
        :x-sasl-enc; s=fm3; t=1665701534; x=1665787934; bh=GhhxkU8z2KgfS
        HPfBl5G1uonjloZ949IZtRfjjPWZDw=; b=Q8Dwyo3CPXXy/88iAqAji3NPyiL9s
        kbK/wOxtCQUukbw4JNg2zTpzXEpc7dtO9/13isWSxAL3yZlr4Ujd3LrFKvMoYBUZ
        4wdwXUmF7x/oQPFZsBxqXd9ppqMEh9fIcchYrRIk9CUft8zYdTwyUaKhQu2UyAuy
        2QvGNKqMCBPXD1DPSkJZnhwjB7QpwLo0/oiTl3QjBhvdHu0s547K28VqLN+FcDcG
        3lxspAmmYxdmGYZAZax8TKRtftmZWbWZFs/GRKiW8uNH6WeYUXAYGU3BaA6IxmDJ
        N++PQX1t+CMoMszHklnVUYAYhzq0Z+kgsw2ldpSW42rwnx/4Ovk9rlWFQ==
X-ME-Sender: <xms:npZIY8_LjA6esfMH8K29klEbT0-lWua_KY5noELL2BHwotZZADFKnw>
    <xme:npZIY0vd2F0CvVoTyj_vN5vovYsnaXVNl_-X91XiNsG-X1BYkrbptnpHaWhS458Di
    AghdUzXuwsvotyB5b0>
X-ME-Received: 
 <xmr:npZIYyDzq87zdx3qkmWPTejXwWv2Qc5uizYhk8aF6uqqLdJA_574q0aJlYi001TfDsjsDUCNxBaU_YFjWyP5nZ9SxI5JWw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvfedrfeekuddgudehucetufdoteggodetrfdotf
    fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
    uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
    cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeeuohhrihhs
    uceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhioheqnecuggftrfgrthhtvghrnhepie
    euffeuvdeiueejhfehiefgkeevudejjeejffevvdehtddufeeihfekgeeuheelnecuvehl
    uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsghorhhishessg
    hurhdrihho
X-ME-Proxy: <xmx:npZIY8ewCKKytRu7RzQBD4bcw-8ZVdefUPMTU08w83MKzc69Z7QZVg>
    <xmx:npZIYxPTPZK26Wuq4HVf_xzp6mYspmuApu6lkzlTmNJMX9W3Qozasg>
    <xmx:npZIY2mNYGBCmXryfMF7OJ2b0e1n_FjwQU3KOu4K0Rjov-XfLD0-TQ>
    <xmx:npZIY_1tBIqegYx9rZPUcjnsYddEsnPgFZRsUtGepe0tEamvmMLFLw>
Feedback-ID: i083147f8:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu,
 13 Oct 2022 18:52:14 -0400 (EDT)
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
        'Filipe Manana ' <fdmanana@kernel.org>
Subject: [PATCH v2 1/2] btrfs: skip reclaim if block_group is empty
Date: Thu, 13 Oct 2022 15:52:09 -0700
Message-Id: 
 <977bdffbf57cca3ee6541efa1563167d4d282b08.1665701210.git.boris@bur.io>
X-Mailer: git-send-email 2.38.0
In-Reply-To: <cover.1665701210.git.boris@bur.io>
References: <cover.1665701210.git.boris@bur.io>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

As we delete extents from a block group, at some deletion we cross below
the reclaim threshold. It is possible we are still in the middle of
deleting more extents and might soon hit 0. If the block group is empty
by the time the reclaim worker runs, we will still relocate it.

This works just fine, as relocating an empty block group ultimately
results in properly deleting it. However, we have more direct ways of
removing empty block groups in the cleaner thread. Those are either
async discard or the unused_bgs list. In fact, when we decide whether to
relocate a block group during extent deletion, we do check for emptiness
and prefer the discard/unused_bgs mechanisms when possible.

Not using relocation for this case reduces some modest overhead from
empty bg relocation:
- extra transactions
- extra metadata use/churn for creating relocation metadata
- trying to read the extent tree to look for extents (and in this case
  finding none)

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3f8b1cbbbc43..684401aa014a 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1606,6 +1606,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			up_write(&space_info->groups_sem);
 			goto next;
 		}
+		if (bg->used == 0) {
+			/*
+			 * It is possible that we trigger relocation on a block
+			 * group as its extents are deleted and it first goes
+			 * below the threshold, then shortly after goes empty.
+			 *
+			 * In this case, relocating it does delete it, but has
+			 * some overhead in relocation specific metadata, looking
+			 * for the non-existent extents and running some extra
+			 * transactions, which we can avoid by using one of the
+			 * other mechanisms for dealing with empty block groups.
+			 */
+			if (!btrfs_test_opt(fs_info, DISCARD_ASYNC))
+				btrfs_mark_bg_unused(bg);
+			spin_unlock(&bg->lock);
+			up_write(&space_info->groups_sem);
+			goto next;
+		}
 		spin_unlock(&bg->lock);
 
 		/* Get out fast, in case we're unmounting the filesystem */

From patchwork Thu Oct 13 22:52:10 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Boris Burkov <boris@bur.io>
X-Patchwork-Id: 13006545
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0FE0AC43219
	for <linux-btrfs@archiver.kernel.org>; Thu, 13 Oct 2022 22:52:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229746AbiJMWwV (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Thu, 13 Oct 2022 18:52:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51044 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229660AbiJMWwT (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Thu, 13 Oct 2022 18:52:19 -0400
Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com
 [64.147.123.21])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0AFC3C152
        for <linux-btrfs@vger.kernel.org>;
 Thu, 13 Oct 2022 15:52:18 -0700 (PDT)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
        by mailout.west.internal (Postfix) with ESMTP id 012413200916;
        Thu, 13 Oct 2022 18:52:17 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
  by compute2.internal (MEProxy); Thu, 13 Oct 2022 18:52:18 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc
        :content-transfer-encoding:date:date:from:from:in-reply-to
        :in-reply-to:message-id:mime-version:references:reply-to:sender
        :subject:subject:to:to; s=fm3; t=1665701537; x=1665787937; bh=eq
        aoHR37MvZZlXqPb3LN/niyVkreraYCjWUl5YjIQ1c=; b=t7VK9lWOyDnLrAP+dF
        Gy7+AC7aysFy664rGAUpn8a2cV7IQVZ+09jq9BdHzMhVS9WBiZXi8FN3jvTZRy3G
        jKtjm+AdI+tInxG8XYTzCxCV9eak4ZQv+Xiz/jKIc7lXZDBSCEKaJvuZwszckPXI
        +5V9X90gBENlLx7NIDoMk8XqRyF1V8JRKY5ZToKn0bGpMTXUa+LPNYMKNQTIUKEW
        FPK5CSSvPzsGkypdBT0TV/YYqrbBHmKci/71Csiy14Mkv6+mK7/QsuKN7ySGWYw5
        cv3GWHwmouDQw8IP3hc/XSyXGygc3GAWo+optik62lDJ8jYX+tKPSvzgihTAq4ZV
        UbLw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
        messagingengine.com; h=cc:content-transfer-encoding:date:date
        :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
        :message-id:mime-version:references:reply-to:sender:subject
        :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
        :x-sasl-enc; s=fm3; t=1665701537; x=1665787937; bh=eqaoHR37MvZZl
        XqPb3LN/niyVkreraYCjWUl5YjIQ1c=; b=HNy25jnnwztX51qRMjDdH0GdZARJx
        F66QAaX/7tCiCFXd3uj6WNaqcvHtLJJj0OjUZ4+ipRNr/tbgYsg8C46fyCETUd7j
        Wfjhk13iaLtOvWljHm7dNOwMRCG+JGzSTVmT0ceOoujv5aGL7Gd1OgcYxz6eghex
        44yJaRvTBpKileDBAJFWmbKs0QBF8wobjj4NhURJEmVC9VJcok9qTxFLrIQFUFnp
        Pw01UAgC9U8KCGouc/dXa4IBWhA0tL3MNmhS7QxIUr3apsYV82PUlJn2iHLbFWmV
        XoioINTncad1fRIBu4qdsCk3RvF/5lQIpg31lOkomCgT2RmG7XatG/KfQ==
X-ME-Sender: <xms:oZZIY3orXrpPXVyYwnvv9ersqLhdxm6UAe_zNixEyqtJgaMf41Xe0Q>
    <xme:oZZIYxo7lh0NDxcxkEpdUN4USbok5puMappFyDWy8UAFcULYiKf8n5IKclAU1us8R
    cH-OSlun_dU_hh4lew>
X-ME-Received: 
 <xmr:oZZIY0PeaJFaaa4weMxu7IGmSEPi5LMrWk6kIjKKlYKsgt9I87xyIhJJUTKktOY497i63-UcBSarYws4gK-mg1OTDSTZmw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvfedrfeekuddgudehucetufdoteggodetrfdotf
    fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
    uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
    cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeeuohhrihhs
    uceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhioheqnecuggftrfgrthhtvghrnhepie
    euffeuvdeiueejhfehiefgkeevudejjeejffevvdehtddufeeihfekgeeuheelnecuvehl
    uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsghorhhishessg
    hurhdrihho
X-ME-Proxy: <xmx:oZZIY65A_mWfDu3Cd_q61rqOWMhQBkV1sZxZM7NfjrrPb4FpymtuQg>
    <xmx:oZZIY26iCZxMP99OPuCGVX7dR4x2gXW8q1KN5ZGFKImgk2RcoqaIUw>
    <xmx:oZZIYyiZV7pt97Xxa-uNauB9p9O79KtGTxWeffpL4PLlo9OPEZ8k3Q>
    <xmx:oZZIY8Rzb5YNAo4odK9ZAgvlRPbdknDgeemMLi1Y6IvGvc6BB5jMDg>
Feedback-ID: i083147f8:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu,
 13 Oct 2022 18:52:17 -0400 (EDT)
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
        'Filipe Manana ' <fdmanana@kernel.org>
Subject: [PATCH v2 2/2] btrfs: re-check reclaim condition in reclaim worker
Date: Thu, 13 Oct 2022 15:52:10 -0700
Message-Id: 
 <5f8c37f6ebc9024ef4351ae895f3e5fdb9c67baf.1665701210.git.boris@bur.io>
X-Mailer: git-send-email 2.38.0
In-Reply-To: <cover.1665701210.git.boris@bur.io>
References: <cover.1665701210.git.boris@bur.io>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

I have observed the following case play out and lead to unnecessary
relocations:

1. write a file across multiple block groups
2. delete the file
3. several block groups fall below the reclaim threshold
4. reclaim the first, moving extents into the others
5. reclaim the others which are now actually very full, leading to poor
   reclaim behavior with lots of writing, allocating new block groups,
   etc.

I believe the risk of missing some reasonable reclaims is worth it
when traded off against the savings of avoiding overfull reclaims.

Going forward, it could be interesting to make the check more advanced
(zoned aware, fragmentation aware, etc...) so that it can be a really
strong signal both at extent delete and reclaim time.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 65 ++++++++++++++++++++++++++----------------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 684401aa014a..b3e9b1bc566e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1539,6 +1539,30 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
 	return true;
 }
 
+static inline bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed)
+{
+	const struct btrfs_space_info *space_info = bg->space_info;
+	const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
+	const u64 new_val = bg->used;
+	const u64 old_val = new_val + bytes_freed;
+	u64 thresh;
+
+	if (reclaim_thresh == 0)
+		return false;
+
+	thresh = div_factor_fine(bg->length, reclaim_thresh);
+
+	/*
+	 * If we were below the threshold before don't reclaim, we are likely a
+	 * brand new block group and we don't want to relocate new block groups.
+	 */
+	if (old_val < thresh)
+		return false;
+	if (new_val >= thresh)
+		return false;
+	return true;
+}
+
 void btrfs_reclaim_bgs_work(struct work_struct *work)
 {
 	struct btrfs_fs_info *fs_info =
@@ -1623,6 +1647,22 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 			spin_unlock(&bg->lock);
 			up_write(&space_info->groups_sem);
 			goto next;
+
+		}
+		/*
+		 * The block group might no longer meet the reclaim condition by
+		 * the time we get around to reclaiming it, so to avoid
+		 * reclaiming overly full block_groups, skip reclaiming them.
+		 *
+		 * Since the decision making process also depends on the amount
+		 * being freed, pass in a fake giant value to skip that extra
+		 * check, which is more meaningful when adding to the list in
+		 * the first place.
+		 */
+		if (!should_reclaim_block_group(bg, bg->length)) {
+			spin_unlock(&bg->lock);
+			up_write(&space_info->groups_sem);
+			goto next;
 		}
 		spin_unlock(&bg->lock);
 
@@ -3241,31 +3281,6 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
-static inline bool should_reclaim_block_group(struct btrfs_block_group *bg,
-					      u64 bytes_freed)
-{
-	const struct btrfs_space_info *space_info = bg->space_info;
-	const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
-	const u64 new_val = bg->used;
-	const u64 old_val = new_val + bytes_freed;
-	u64 thresh;
-
-	if (reclaim_thresh == 0)
-		return false;
-
-	thresh = div_factor_fine(bg->length, reclaim_thresh);
-
-	/*
-	 * If we were below the threshold before don't reclaim, we are likely a
-	 * brand new block group and we don't want to relocate new block groups.
-	 */
-	if (old_val < thresh)
-		return false;
-	if (new_val >= thresh)
-		return false;
-	return true;
-}
-
 int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 			     u64 bytenr, u64 num_bytes, bool alloc)
 {