From patchwork Mon Jun 17 23:11:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701546 Received: from wfhigh6-smtp.messagingengine.com (wfhigh6-smtp.messagingengine.com [64.147.123.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E31341991BD for ; Mon, 17 Jun 2024 23:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.157 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665908; cv=none; b=dzvIHpMj5KaoAWDWSucfWwwOhZ03/YOW74iw+E9q3R+0yRiMVPy1gaarVUH1FRpyaX1voNnMboyjhlxREZj+OuGDgK0VZunMG7/UkY5CWCKXxYu3ecyA6RUCbz9CafEK9NujmjQ8Dp3sU5IB1tgGQKTceJW0nDNwhnH7YsHxaiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665908; c=relaxed/simple; bh=V5pq6RsQMqshkMZO+7vRqbKwCHfssSYCfykW2brzq7U=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iwZV9wmYMBgT45ZxZ/7tPBCRVMog9hUbB9qF/7RFNvV90two7HA/0nmEanVEBkkerk2gjUbc4pSbVCYacOZ2q7ajKrvNn+H5l+4QnVaofvoqkpYitoj9Rzu50JMNy1WGr6/8eNGKZzNN70P/F5dGKFLegjp1QG8SD4Jcc9npmbI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=XoZqZl7S; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=ZtdkABK3; arc=none smtp.client-ip=64.147.123.157 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="XoZqZl7S"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ZtdkABK3" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.west.internal (Postfix) with ESMTP id ED0491800070; Mon, 17 Jun 2024 19:11:45 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 17 Jun 2024 19:11:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665905; x= 1718752305; bh=drP16WlchPKqzld7F0BNrNzZYtWwaBpBu/Xv9OA06ME=; b=X oZqZl7SZTWaFGqQduvmiMXamYWkZseP+8BfBs0PuZ28HUr/UTtp2K3p8ZIql1CK+ aJg9YRMt+RGOg2GIu9X3qOgG93/Wg86bK8riqyBkYo8FzmQwzSTFRPUNzfYyMwxB CKGebjg67C0xBm6NYaAscTXb12DimQwxh2sCQcQtPhbBA65AIZFXeGeew5Sgim5k jxSRqrqU86FjWO9zcPaZUaOggRd+mlt/HzZILSTBi1IsRkmI9zZgSqboif7vLpOv 7WTEImDYyJ/mBAd6JqSt1hxJqPlEBWGBtPHwWEedsRcZmu8riLmVvkpSNBJgO0ro 0Wa5fNnb96BpVtueJGMSw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665905; x=1718752305; bh=drP16WlchPKqz ld7F0BNrNzZYtWwaBpBu/Xv9OA06ME=; b=ZtdkABK3PlUGwkSK9cnySmAj/AoMP VrMuTev921YosLrfp78skjXrycNdcB9USgKM1prrpxqteLIceTXjd4Sbm2sJLois dpQ6arYaLf8C6i8I2X1vWUnyHynPF3BFUJ6BhCF4cFkaG7siow+HkBFDK6h4n6o+ BRwjMKXe8KWtI6SqGa/V7ac7uHqti7RF3DZA31xtIL9vYC74XLmRK8s1QG6PjTMB pbjUdxVdxqM8+aFkB956epZwe/j39o2Jl+4qQ931+sJYavlRbki8qUZzsnQbZ4g7 MkSW13ODdr8PPsA/ZzfmujxiDN+W28Ufnw/2gxNkx6CWMdS1NKcSGCJ6g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:45 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 1/6] btrfs: report reclaim stats in sysfs Date: Mon, 17 Jun 2024 16:11:13 -0700 Message-ID: <65bfe2e6aae82f4d58f6592dcdbb827e20982a3f.1718665689.git.boris@bur.io> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When evaluating various reclaim strategies/thresholds against each other, it is useful to collect data about the amount of reclaim happening. Expose a count, error count, and byte count via sysfs per space_info. Note that this is only for automatic reclaim, not manually invoked balances or other codepaths that use "relocate_block_group" Reviewed-by: Johannes Thumshirn Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 10 ++++++++++ fs/btrfs/space-info.h | 18 ++++++++++++++++++ fs/btrfs/sysfs.c | 6 ++++++ 3 files changed, 34 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 9f1d328b603e..824fd229d129 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1829,6 +1829,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) list_sort(NULL, &fs_info->reclaim_bgs, reclaim_bgs_cmp); while (!list_empty(&fs_info->reclaim_bgs)) { u64 zone_unusable; + u64 reclaimed; int ret = 0; bg = list_first_entry(&fs_info->reclaim_bgs, @@ -1921,12 +1922,21 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) div64_u64(bg->used * 100, bg->length), div64_u64(zone_unusable * 100, bg->length)); trace_btrfs_reclaim_block_group(bg); + reclaimed = bg->used; ret = btrfs_relocate_chunk(fs_info, bg->start); if (ret) { btrfs_dec_block_group_ro(bg); btrfs_err(fs_info, "error relocating chunk %llu", bg->start); + reclaimed = 0; + spin_lock(&space_info->lock); + space_info->reclaim_errors++; + spin_unlock(&space_info->lock); } + spin_lock(&space_info->lock); + space_info->reclaim_count++; + space_info->reclaim_bytes += reclaimed; + spin_unlock(&space_info->lock); next: if (ret) { diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index a733458fd13b..98ea35ae60fe 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -165,6 +165,24 @@ struct btrfs_space_info { struct kobject kobj; struct kobject *block_group_kobjs[BTRFS_NR_RAID_TYPES]; + + /* + * Monotonically increasing counter of block group reclaim attempts + * Exposed in /sys/fs//allocation//reclaim_count + */ + u64 reclaim_count; + + /* + * Monotonically increasing counter of reclaimed bytes + * Exposed in /sys/fs//allocation//reclaim_bytes + */ + u64 reclaim_bytes; + + /* + * Monotonically increasing counter of reclaim errors + * Exposed in /sys/fs//allocation//reclaim_errors + */ + u64 reclaim_errors; }; struct reserve_ticket { diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index af545b6b1190..919c7ba45121 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -894,6 +894,9 @@ SPACE_INFO_ATTR(bytes_readonly); SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); +SPACE_INFO_ATTR(reclaim_count); +SPACE_INFO_ATTR(reclaim_bytes); +SPACE_INFO_ATTR(reclaim_errors); BTRFS_ATTR_RW(space_info, chunk_size, btrfs_chunk_size_show, btrfs_chunk_size_store); BTRFS_ATTR(space_info, size_classes, btrfs_size_classes_show); @@ -949,6 +952,9 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold), BTRFS_ATTR_PTR(space_info, chunk_size), BTRFS_ATTR_PTR(space_info, size_classes), + BTRFS_ATTR_PTR(space_info, reclaim_count), + BTRFS_ATTR_PTR(space_info, reclaim_bytes), + BTRFS_ATTR_PTR(space_info, reclaim_errors), #ifdef CONFIG_BTRFS_DEBUG BTRFS_ATTR_PTR(space_info, force_chunk_alloc), #endif From patchwork Mon Jun 17 23:11:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701547 Received: from wfout5-smtp.messagingengine.com (wfout5-smtp.messagingengine.com [64.147.123.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38192199E94 for ; Mon, 17 Jun 2024 23:11:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665910; cv=none; b=Q0Ewk+bBYxrTAsWwWkcIr64vDXuSCEm0zbiMBt1el20rCjgrdw4SrghcOBUmo955EVxHwTCdcNvizS/R/8G20KPS1SMhHTSot+PtFePT+FV9BaraFyOYJf1ypgoBCKu1qsqSLpgXJbfW6E9YHpDLJSGh4ldJjSFpKcCkgbk9gCI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665910; c=relaxed/simple; bh=kO75C1mFM36NrV1Ek11JGtW84s5lIwl2K3ZyCex2bLk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ILb/HBNm7k0jua0AxeUGPE+GN7gg8nxgvP/D4Nzxp2VYtMuk68E6NgiIbWmUovMB1Kfyyzlvv5CDPJRKIdX5AV2cjEAjxaaE4JxtYY5eYTGH2wE0kno95kol+oD2Y1D+8HnJQNoKpvCBrYcai8Fseu8P8NmUFCUJ3NOtZJEqxzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=B5dFaSjR; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=cBn1rt9H; arc=none smtp.client-ip=64.147.123.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="B5dFaSjR"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="cBn1rt9H" Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfout.west.internal (Postfix) with ESMTP id 457A31C000F6; Mon, 17 Jun 2024 19:11:48 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Mon, 17 Jun 2024 19:11:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665907; x= 1718752307; bh=N/xpas+NVz2cj97u5QiYsX1Pa63CueybSoZU5PhqNDg=; b=B 5dFaSjRdxlj9qK2pO/OoTi48UHh83rEPtCWk4m0u8pYvup1g8Gfj+u5k/VGH4zRf NUShZY/nPB3g7hXUr6t5AwaHNM9sW44LmUrvI/Dd8AoulHU8g7rJjbbhXkpIvaVE zHQ534+ZyOK6khHbsqEC1wooNZOk7xS7ii4E41Fxvh3U90yte4gqvdydxB8b0UC9 pz9ZYdSUO/91Ol+aYDCAeGWemLV9ODcwApTKVKU4vzq77zThfIhz9vIWlfXvdAio U0tRrnsN432vLRyBLWrNYZMwBj9Jpk/FO05NyJGny63qBxXaVVQ+YljX6BPspiXx 2QXcVSRcDvIf6E6ZLjP9w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665907; x=1718752307; bh=N/xpas+NVz2cj 97u5QiYsX1Pa63CueybSoZU5PhqNDg=; b=cBn1rt9HqmzrQDKYoW3zshp8OropF lWyM0BHgnnjGBCVoN7T6JtiT2Wvd+qkOofkv4SmsxLl7VXphj5tvTR4rHvvrxMhg 2a9mL+99IiUg0TXQxXgXMnLrI8zqrsUVUtODqNnEP1aqUqQlNbBF5fbIeyQgVO6e xDw6I+nVZQkho6FvIMip+hT2MjjgB4yDwD1h81FyoK8AU2sq5Q/RCx2R+E/4A5LR KJHKD5UeNn9DZU+0zsAzB9pFeJIc+HyRLbPBbfCRIwGm/JF4KzGWjObV/f6pIbqu tH821k9r7wr4f0PJ2Ve4vZPBXrEiktY6oLjryjlACFwWZZS9Tz2OyxM7Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:47 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 2/6] btrfs: store fs_info on space_info Date: Mon, 17 Jun 2024 16:11:14 -0700 Message-ID: <5ecb8fd320d2c0aca46570aaa985e83f1f59e63d.1718665689.git.boris@bur.io> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is handy when computing space_info dynamic reclaim thresholds where we do not have access to a block group. We could add it to the various functions as a parameter, but it seems reasonable for space_info to have an fs_info pointer. Reviewed-by: Johannes Thumshirn Signed-off-by: Boris Burkov --- fs/btrfs/space-info.c | 1 + fs/btrfs/space-info.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 0283ee9bf813..7384286c5058 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -232,6 +232,7 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags) if (!space_info) return -ENOMEM; + space_info->fs_info = info; for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) INIT_LIST_HEAD(&space_info->block_groups[i]); init_rwsem(&space_info->groups_sem); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 98ea35ae60fe..25edfd453b27 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -94,6 +94,7 @@ enum btrfs_flush_state { }; struct btrfs_space_info { + struct btrfs_fs_info *fs_info; spinlock_t lock; u64 total_bytes; /* total bytes in the space, From patchwork Mon Jun 17 23:11:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701548 Received: from wfhigh6-smtp.messagingengine.com (wfhigh6-smtp.messagingengine.com [64.147.123.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4FF4199EBA for ; Mon, 17 Jun 2024 23:11:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.157 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665913; cv=none; b=WjgKIdtnoyPf4X7AwGJtzftIFQOsMobAWm9vr6t/TTMrY0xa5Zy/cREtpwPpuvovHP1Z9pV/Hp7lx+7SMslKogN8wT5wTtVGxrHYsgVU/44LghCZ1flwLxHp4j0VDr/82+/1kwLMMmRsMSS6ZlVFUYE8hbCYyT4lZrSNfc0ewqw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665913; c=relaxed/simple; bh=W7MKq/xQm4Bzx5ULYPydOlQPh9Bi53t0fgMzgB/a+lU=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hNGq7Ty7/pSkz38/CNblGwJe3+uVF95PRwRvEuo2oqWh/bMQYFNJ4qL2chDg2j1w9eRsvSHTmPMRZ0Y9XQZPFlR5Uh4JESjYV1DuT4qmeW6nhYio15/uo/H7qRhJ77bkzYSmO2loUgB8e8b5GAWsOO4xk0z7vQnD/+aTJjR8fO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=YJiErCF/; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=FiGuTxL6; arc=none smtp.client-ip=64.147.123.157 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="YJiErCF/"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="FiGuTxL6" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.west.internal (Postfix) with ESMTP id B589F18000E8; Mon, 17 Jun 2024 19:11:50 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 17 Jun 2024 19:11:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665910; x= 1718752310; bh=L4yRSO0MjM8iyRWeAKeFZnD7a5GPIHL91lsj1xqIPeA=; b=Y JiErCF/KmoGxhpnl25aWcpE+K+s/ynv9BAjBx1kRTlapmdFMjm0Omp6nS2S5OPgh E2FvkMtCdU9AkXgXb/fdIRvedA1Lu4rIsbQVHhPFjB6erFCetmpRFy1aikV3K8fO E/uaY2AznBRlG4qHjGGmzvTpdx3NFmzDyz6YJ9Be7tKurSjwDHasIjTcQZcq4GNK H6DOVUWPQVO2W4b/xz1wATvI10eeLbKjj+EyFp/LgzcyVX+qy1ayi7WXAg+LXvIj +WHRm7LipygdlHrqOcMmUovWj8iNRQ20xXrsVIEBJA/KwpUrW+8GzYQEqiXxyDO9 zMp0TE0gq8GgYYbQiL4mA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665910; x=1718752310; bh=L4yRSO0MjM8iy RWeAKeFZnD7a5GPIHL91lsj1xqIPeA=; b=FiGuTxL6kft5jcfKR7f0aFx+yKAxz 1q2zQpXAHsZJYLf8qJBqz4xWzFIkbdsZLxF9uSI2CAk0+8hdtgkli/spDougLkSm hWTl1jZnS13qZv7oc5s+WyTIX9oCujchw0uFURNSNwOR/2yg7sd2atRaSunlQ4vm YARBpe+a8cG9k7+Odi2A62Gijqew49jlp7eH29cRfnC5ajIfOi/SZUhaQ3Dr9fDz Vwm0JiMFgpkn3U2DD1sJDnePHRKXN3gA+YgtZy87mDm4d4fpHsHt/X8wOljkNrA5 K2u7H2kc78iW7ck01vQ8xC6DVRaqkVDavRinaUqT7alccCGgNu1yfDS3A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:49 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 3/6] btrfs: dynamic block_group reclaim threshold Date: Mon, 17 Jun 2024 16:11:15 -0700 Message-ID: <13ee8fb749036b3aafb331417199625c5bd12b25.1718665689.git.boris@bur.io> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We can currently recover allocated block_groups by: - explicitly starting balance operations - "auto reclaim" via bg_reclaim_threshold The latter works by checking against a fixed threshold on frees. If we pass from above the threshold to below, relocation triggers and the block group will get reclaimed by the cleaner thread (assuming it is still eligible) Picking a threshold is challenging. Too high, and you end up trying to reclaim very full block_groups which is quite costly, and you don't do reclaim on block_groups that don't get quite THAT full, but could still be quite fragmented and stranding a lot of space. Too low, and you similarly miss out on reclaim even if you badly need it to avoid running out of unallocated space, if you have heavily fragmented block groups living above the threshold. No matter the threshold, it suffers from a workload that happens to bounce around that threshold, which can introduce arbitrary amounts of reclaim waste. To improve this situation, introduce a dynamic threshold. The basic idea behind this threshold is that it should be very lax when there is plenty of unallocated space, and increasingly aggressive as we approach zero unallocated space. To that end, it sets a target for unallocated space (10 chunks) and then linearly increases the threshold as the amount of space short of the target we are increases. The formula is: (target - unalloc) / target I tested this by running it on three interesting workloads: 1. bounce allocations around X% full. 2. fill up all the way and introduce full fragmentation. 3. write in a fragmented way until the filesystem is just about full. 1. and 2. attack the weaknesses of a fixed threshold; fixed either works perfectly or fully falls apart, depending on the threshold. Dynamic always handles these cases well. 3. attacks dynamic by checking whether it is too zealous to reclaim in conditions with low unallocated and low unused. It tends to claw back 1GiB of unallocated fairly aggressively, but not much more. Early versions of dynamic threshold struggled on this test. Additional work could be done to intelligently ratchet up the urgency of reclaim in very low unallocated conditions. Existing mechanisms are already useless in that case anyway. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 18 ++++--- fs/btrfs/space-info.c | 115 +++++++++++++++++++++++++++++++++++++---- fs/btrfs/space-info.h | 8 +++ fs/btrfs/sysfs.c | 43 ++++++++++++++- 4 files changed, 164 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 824fd229d129..c3313697475f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1764,24 +1764,21 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info) static bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed) { - const struct btrfs_space_info *space_info = bg->space_info; - const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold); + const int thresh_pct = btrfs_calc_reclaim_threshold(bg->space_info); + u64 thresh_bytes = mult_perc(bg->length, thresh_pct); const u64 new_val = bg->used; const u64 old_val = new_val + bytes_freed; - u64 thresh; - if (reclaim_thresh == 0) + if (thresh_bytes == 0) return false; - thresh = mult_perc(bg->length, reclaim_thresh); - /* * If we were below the threshold before don't reclaim, we are likely a * brand new block group and we don't want to relocate new block groups. */ - if (old_val < thresh) + if (old_val < thresh_bytes) return false; - if (new_val >= thresh) + if (new_val >= thresh_bytes) return false; return true; } @@ -1843,6 +1840,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) /* Don't race with allocators so take the groups_sem */ down_write(&space_info->groups_sem); + spin_lock(&space_info->lock); spin_lock(&bg->lock); if (bg->reserved || bg->pinned || bg->ro) { /* @@ -1852,6 +1850,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) * this block group. */ spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; } @@ -1870,6 +1869,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) if (!btrfs_test_opt(fs_info, DISCARD_ASYNC)) btrfs_mark_bg_unused(bg); spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; @@ -1886,10 +1886,12 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) */ if (!should_reclaim_block_group(bg, bg->length)) { spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; } spin_unlock(&bg->lock); + spin_unlock(&space_info->lock); /* * Get out fast, in case we're read-only or unmounting the diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 7384286c5058..0d13282dac05 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include "misc.h" #include "ctree.h" #include "space-info.h" @@ -190,6 +191,8 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info) */ #define BTRFS_DEFAULT_ZONED_RECLAIM_THRESH (75) +#define BTRFS_UNALLOC_BLOCK_GROUP_TARGET (10ULL) + /* * Calculate chunk size depending on volume type (regular or zoned). */ @@ -341,11 +344,27 @@ struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, return NULL; } +static u64 calc_effective_data_chunk_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_space_info *data_sinfo; + u64 data_chunk_size; + /* + * Calculate the data_chunk_size, space_info->chunk_size is the + * "optimal" chunk size based on the fs size. However when we actually + * allocate the chunk we will strip this down further, making it no more + * than 10% of the disk or 1G, whichever is smaller. + */ + data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); + data_chunk_size = min(data_sinfo->chunk_size, + mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); + return min_t(u64, data_chunk_size, SZ_1G); + +} + static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, enum btrfs_reserve_flush_enum flush) { - struct btrfs_space_info *data_sinfo; u64 profile; u64 avail; u64 data_chunk_size; @@ -369,16 +388,7 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, if (avail == 0) return 0; - /* - * Calculate the data_chunk_size, space_info->chunk_size is the - * "optimal" chunk size based on the fs size. However when we actually - * allocate the chunk we will strip this down further, making it no more - * than 10% of the disk or 1G, whichever is smaller. - */ - data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); - data_chunk_size = min(data_sinfo->chunk_size, - mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); - data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); + data_chunk_size = calc_effective_data_chunk_size(fs_info); /* * Since data allocations immediately use block groups as part of the @@ -1860,3 +1870,86 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) return free_bytes; } + +static u64 calc_pct_ratio(u64 x, u64 y) +{ + int err; + + if (!y) + return 0; +again: + err = check_mul_overflow(100, x, &x); + if (err) + goto lose_precision; + return div64_u64(x, y); +lose_precision: + x >>= 10; + y >>= 10; + if (!y) + y = 1; + goto again; +} + +/* + * A reasonable buffer for unallocated space is 10 data block_groups. + * If we claw this back repeatedly, we can still achieve efficient + * utilization when near full, and not do too much reclaim while + * always maintaining a solid buffer for workloads that quickly + * allocate and pressure the unallocated space. + */ +static u64 calc_unalloc_target(struct btrfs_fs_info *fs_info) +{ + return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * calc_effective_data_chunk_size(fs_info); +} + +/* + * The fundamental goal of automatic reclaim is to protect the filesystem's + * unallocated space and thus minimize the probability of the filesystem going + * read only when a metadata allocation failure causes a transaction abort. + * + * However, relocations happen into the space_info's unused space, therefore + * automatic reclaim must also back off as that space runs low. There is no + * value in doing trivial "relocations" of re-writing the same block group + * into a fresh one. + * + * Furthermore, we want to avoid doing too much reclaim even if there are good + * candidates. This is because the allocator is pretty good at filling up the + * holes with writes. So we want to do just enough reclaim to try and stay + * safe from running out of unallocated space but not be wasteful about it. + * + * Therefore, the dynamic reclaim threshold is calculated as follows: + * - calculate a target unallocated amount of 5 block group sized chunks + * - ratchet up the intensity of reclaim depending on how far we are from + * that target by using a formula of unalloc / target to set the threshold. + * + * Typically with 10 block groups as the target, the discrete values this comes + * out to are 0, 10, 20, ... , 80, 90, and 99. + */ +static int calc_dynamic_reclaim_threshold(struct btrfs_space_info *space_info) +{ + struct btrfs_fs_info *fs_info = space_info->fs_info; + u64 unalloc = atomic64_read(&fs_info->free_chunk_space); + u64 target = calc_unalloc_target(fs_info); + u64 alloc = space_info->total_bytes; + u64 used = btrfs_space_info_used(space_info, false); + u64 unused = alloc - used; + u64 want = target > unalloc ? target - unalloc : 0; + u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); + /* Cast to int is OK because want <= target */ + int ratio = calc_pct_ratio(want, target); + + /* If we have no unused space, don't bother, it won't work anyway */ + if (unused < data_chunk_size) + return 0; + + return ratio; +} + +int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) +{ + lockdep_assert_held(&space_info->lock); + + if (READ_ONCE(space_info->dynamic_reclaim)) + return calc_dynamic_reclaim_threshold(space_info); + return READ_ONCE(space_info->bg_reclaim_threshold); +} diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 25edfd453b27..2cac771321c7 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -184,6 +184,12 @@ struct btrfs_space_info { * Exposed in /sys/fs//allocation//reclaim_errors */ u64 reclaim_errors; + + /* + * If true, use the dynamic relocation threshold, instead of the + * fixed bg_reclaim_threshold. + */ + bool dynamic_reclaim; }; struct reserve_ticket { @@ -266,4 +272,6 @@ void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info); void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); +int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); + #endif /* BTRFS_SPACE_INFO_H */ diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 919c7ba45121..360d6093476f 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -905,8 +905,12 @@ static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj, char *buf) { struct btrfs_space_info *space_info = to_space_info(kobj); + ssize_t ret; - return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->bg_reclaim_threshold)); + spin_lock(&space_info->lock); + ret = sysfs_emit(buf, "%d\n", btrfs_calc_reclaim_threshold(space_info)); + spin_unlock(&space_info->lock); + return ret; } static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj, @@ -917,6 +921,9 @@ static ssize_t btrfs_sinfo_bg_reclaim_threshold_store(struct kobject *kobj, int thresh; int ret; + if (READ_ONCE(space_info->dynamic_reclaim)) + return -EINVAL; + ret = kstrtoint(buf, 10, &thresh); if (ret) return ret; @@ -933,6 +940,39 @@ BTRFS_ATTR_RW(space_info, bg_reclaim_threshold, btrfs_sinfo_bg_reclaim_threshold_show, btrfs_sinfo_bg_reclaim_threshold_store); +static ssize_t btrfs_sinfo_dynamic_reclaim_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + + return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->dynamic_reclaim)); +} + +static ssize_t btrfs_sinfo_dynamic_reclaim_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + int dynamic_reclaim; + int ret; + + ret = kstrtoint(buf, 10, &dynamic_reclaim); + if (ret) + return ret; + + if (dynamic_reclaim < 0) + return -EINVAL; + + WRITE_ONCE(space_info->dynamic_reclaim, dynamic_reclaim != 0); + + return len; +} + +BTRFS_ATTR_RW(space_info, dynamic_reclaim, + btrfs_sinfo_dynamic_reclaim_show, + btrfs_sinfo_dynamic_reclaim_store); + /* * Allocation information about block group types. * @@ -950,6 +990,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold), + BTRFS_ATTR_PTR(space_info, dynamic_reclaim), BTRFS_ATTR_PTR(space_info, chunk_size), BTRFS_ATTR_PTR(space_info, size_classes), BTRFS_ATTR_PTR(space_info, reclaim_count), From patchwork Mon Jun 17 23:11:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701549 Received: from wfhigh6-smtp.messagingengine.com (wfhigh6-smtp.messagingengine.com [64.147.123.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17750199386 for ; Mon, 17 Jun 2024 23:11:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.157 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665915; cv=none; b=R9oTPGemjnZ0bf9+3bGIBBBA/q0fkU2EfMCauRQeXFgjMISpDzdi8zBvQOkDpiCGpYjo/kuOzTla/5kYzKinIT8Ok0bWx15/C9iOfFzekvPMbuiZ55oUQ3ftBQIDSKEerJOfyUvBDAf6Cja+aaYPnZMiGiyueRv9OYLKw568Ssw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665915; c=relaxed/simple; bh=qExTGby6CosLPl7qPrN2Orkxy2bM8icBzUZXg6rHZT8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j9M/d0vxlWZs7rGDPuoOn7Vz+ulO/omRWyB7O4DD6Pc07ZbfPl0I5JHlTEsIBO+pJryC5nyMFZg+WIFMiXYxgrfkn79o1RNoXIziw4Jue66Lh3q0bbRuiQAFGleqZohnEC7MyITztdBQnYaLMgEK70QGAXw1QAR1ZO4klHSyvAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=uXS6e/Tg; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=GkIU/09a; arc=none smtp.client-ip=64.147.123.157 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="uXS6e/Tg"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="GkIU/09a" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.west.internal (Postfix) with ESMTP id 35C691800109; Mon, 17 Jun 2024 19:11:53 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 17 Jun 2024 19:11:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665912; x= 1718752312; bh=i/a9YxOLy7wdQLBkKQ+/YGmQFtHN3uziYDpWaMMJ+js=; b=u XS6e/TgmmYxoYa/koZDmcNAVGVJdzB6y5IVjlQx56rlzks+/JlaWqXTldtgvMUaZ ONc+Ea81FE959/4pckSP0Oj+JJvRE7sk+YSoO9hWG1u3FXluggNFNOL3VPxQWDgn cgzb/nW5jKSU+zefEs1BhOrWul9CzdXcMleG7qE1feOYgh7bvB906amC5rCzOgSI op1k5rMDkwr1E2sK2jLUhnHqBMyOxcCp65pAkt6KDaCKTCrr5lczAujotOUMVMNW 2Za03NQjSPDGMLk5nrOozvnOvutz5DP8M5Wk1p5pG58PUvNkxNfbvPTyTaWxNd5P 6ucYNG6KdU+fojQptjysQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665912; x=1718752312; bh=i/a9YxOLy7wdQ LBkKQ+/YGmQFtHN3uziYDpWaMMJ+js=; b=GkIU/09aUR6SSrKP9PJc1xFzbdgs+ vFHqHOxeNO3LUtOEPhLxQrIV1ytCRTRpRbfXu9iYD9ClLKV3e6tcMWbMaVnEQMCx jGgq8T2yZnWocO5ZK19GNp4MnSEgOAp7uqmtZbbdjhJw3k3NH2XrCU751fvr/J6W WCmrzj+SEbhFKfqKC6B5J7oDEfBZhgLB/fjNcvNI0ZhKTJwHgO6PpYWkqssiy460 JMllbkVjYDl8X5MYMhz5wU2/ERkbrkRSPWUb+NKOb+No/hE7BranSBSeoyGZDuk7 J9q5x/f36vr1Ls9btsiwfjy+rTaR+o3QROrXiu4gAlPUxDOQqRI0YhOfw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:52 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 4/6] btrfs: periodic block_group reclaim Date: Mon, 17 Jun 2024 16:11:16 -0700 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We currently employ a edge-triggered block group reclaim strategy which marks block groups for reclaim as they free down past a threshold. With a dynamic threshold, this is worse than doing it in a level-triggered fashion periodically. That is because the reclaim itself happens periodically, so the threshold at that point in time is what really matters, not the threshold at freeing time. If we mark the reclaim in a big pass, then sort by usage and do reclaim, we also benefit from a negative feedback loop preventing unnecessary reclaims as we crunch through the "best" candidates. Since this is quite a different model, it requires some additional support. The edge triggered reclaim has a good heuristic for not reclaiming fresh block groups, so we need to replace that with a typical GC sweep mark which skips block groups that have seen an allocation since the last sweep. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/block-group.h | 1 + fs/btrfs/space-info.c | 51 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/space-info.h | 7 ++++++ fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++ 5 files changed, 95 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index c3313697475f..6bcf24f2ac79 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1974,6 +1974,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info) { + btrfs_reclaim_sweep(fs_info); spin_lock(&fs_info->unused_bgs_lock); if (!list_empty(&fs_info->reclaim_bgs)) queue_work(system_unbound_wq, &fs_info->reclaim_bgs_work); @@ -3672,6 +3673,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, old_val += num_bytes; cache->used = old_val; cache->reserved -= num_bytes; + cache->reclaim_mark = 0; space_info->bytes_reserved -= num_bytes; space_info->bytes_used += num_bytes; space_info->disk_used += num_bytes * factor; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 85e2d4cd12dc..8656b38f1fa5 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -263,6 +263,7 @@ struct btrfs_block_group { struct work_struct zone_finish_work; struct extent_buffer *last_eb; enum btrfs_block_group_size_class size_class; + u64 reclaim_mark; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 0d13282dac05..ff92ad26ffa2 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1953,3 +1953,54 @@ int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) return calc_dynamic_reclaim_threshold(space_info); return READ_ONCE(space_info->bg_reclaim_threshold); } + +static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, + struct btrfs_space_info *space_info, int raid) +{ + struct btrfs_block_group *bg; + int thresh_pct; + + spin_lock(&space_info->lock); + thresh_pct = btrfs_calc_reclaim_threshold(space_info); + spin_unlock(&space_info->lock); + + down_read(&space_info->groups_sem); + list_for_each_entry(bg, &space_info->block_groups[raid], list) { + u64 thresh; + bool reclaim = false; + + btrfs_get_block_group(bg); + spin_lock(&bg->lock); + thresh = mult_perc(bg->length, thresh_pct); + if (bg->used < thresh && bg->reclaim_mark) + reclaim = true; + bg->reclaim_mark++; + spin_unlock(&bg->lock); + if (reclaim) + btrfs_mark_bg_to_reclaim(bg); + btrfs_put_block_group(bg); + } + up_read(&space_info->groups_sem); + return 0; +} + +int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) +{ + int ret; + int raid; + struct btrfs_space_info *space_info; + + list_for_each_entry(space_info, &fs_info->space_info, list) { + if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) + continue; + if (!READ_ONCE(space_info->periodic_reclaim)) + continue; + for (raid = 0; raid < BTRFS_NR_RAID_TYPES; raid++) { + ret = do_reclaim_sweep(fs_info, space_info, raid); + if (ret) + return ret; + } + } + + return ret; +} diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 2cac771321c7..ae4a1f7d5856 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -190,6 +190,12 @@ struct btrfs_space_info { * fixed bg_reclaim_threshold. */ bool dynamic_reclaim; + + /* + * Periodically check all block groups against the reclaim + * threshold in the cleaner thread. + */ + bool periodic_reclaim; }; struct reserve_ticket { @@ -273,5 +279,6 @@ void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); +int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info); #endif /* BTRFS_SPACE_INFO_H */ diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 360d6093476f..c58cea0da597 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -973,6 +973,39 @@ BTRFS_ATTR_RW(space_info, dynamic_reclaim, btrfs_sinfo_dynamic_reclaim_show, btrfs_sinfo_dynamic_reclaim_store); +static ssize_t btrfs_sinfo_periodic_reclaim_show(struct kobject *kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + + return sysfs_emit(buf, "%d\n", READ_ONCE(space_info->periodic_reclaim)); +} + +static ssize_t btrfs_sinfo_periodic_reclaim_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + struct btrfs_space_info *space_info = to_space_info(kobj); + int periodic_reclaim; + int ret; + + ret = kstrtoint(buf, 10, &periodic_reclaim); + if (ret) + return ret; + + if (periodic_reclaim < 0) + return -EINVAL; + + WRITE_ONCE(space_info->periodic_reclaim, periodic_reclaim != 0); + + return len; +} + +BTRFS_ATTR_RW(space_info, periodic_reclaim, + btrfs_sinfo_periodic_reclaim_show, + btrfs_sinfo_periodic_reclaim_store); + /* * Allocation information about block group types. * @@ -996,6 +1029,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, reclaim_count), BTRFS_ATTR_PTR(space_info, reclaim_bytes), BTRFS_ATTR_PTR(space_info, reclaim_errors), + BTRFS_ATTR_PTR(space_info, periodic_reclaim), #ifdef CONFIG_BTRFS_DEBUG BTRFS_ATTR_PTR(space_info, force_chunk_alloc), #endif From patchwork Mon Jun 17 23:11:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701550 Received: from wfout5-smtp.messagingengine.com (wfout5-smtp.messagingengine.com [64.147.123.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B90619AA6A for ; Mon, 17 Jun 2024 23:11:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.148 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665918; cv=none; b=L9GN7TY+PbqII+JkGugawapImWhWxBgWYKFwfx1ntG8YtW5l9VwZlDKUXGQmgTwLtcXj9n+Kv50AVVVtzJ3+wGpjGW3W2LHkDWdtxOdQdvkyNR9wBLjq6laKCgS0wMQC7ocFD5OIhraFg+/ZNURyj7Lou81KPiiauLNCIsjpdGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665918; c=relaxed/simple; bh=2A8PNGhHxdhenxAO1bs9UcGiO/MQ2ZRsNOostZeZkxQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mfkpBL0K+75RF2klfosFgfcYFe8GKdEAYtFH+hGjt/RWS7T8zaveSyrRBUq7jcrnIiIXpFpLzm9S9wYqiyiU0q4FKrVUOVOYYpTdj86jBg023PSxAJqRJl1UDMPbh/IlY44k8ntE7SyFZ527CWOMzWjX4jYMxtmDghUySOtNve0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=WeytFU/u; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=jRN7r9aF; arc=none smtp.client-ip=64.147.123.148 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="WeytFU/u"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jRN7r9aF" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfout.west.internal (Postfix) with ESMTP id C173C1C000F6; Mon, 17 Jun 2024 19:11:55 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 17 Jun 2024 19:11:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665915; x= 1718752315; bh=4aFYZAoUSlDDUjKWHfHhgaMmQ7R6x6/872NjJjXnpRg=; b=W eytFU/ux9iIkPmlNd/qOifIOaAFvdAtZK4ebDu9SQbbY+layay+JYR+Opi26uPfu TL2DqS7Zf0tWVONmhia08zo2gNA04nKGATxH9EZr2DEAwN8t2ufbpKIreADoouoc OcAcY5QpAdlGIqX/JPawz+mpyyTu3BOAXcS+SPdjrucTSAvZ2GYYVV5OgRGG9Sl/ xNj31WmT4XOXcst2xx9OWul+ete7FzxNGYwM2Y7TjPmLpTuIQj6XsVvCUWF9gnCS A69QPLySsIKQjrjVg97WQi0AtSuls5R6Loh96DLPfkytHMP9AizU0NRifGoTdK0f WViDHrBimUU5z9flqrn1w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665915; x=1718752315; bh=4aFYZAoUSlDDU jKWHfHhgaMmQ7R6x6/872NjJjXnpRg=; b=jRN7r9aFIRC8WlHQ0dkTvdYgvOqFP VTrerPfVovtRL49oOZp9+nzZZfnvBtCeJOagsMsY21QnGbnifks1+pROJ8XTvxsn uELma7O0jDzw2ax733GJbAyyavQj9mIqKK4UeCgCqlvvuejbJR2/JnzPyylrph9K NlZk9T1vGADDpy3iSVp2bZkjDj0t0BYZbiqlUxQS/Wfep0B9sCp7Akbkj98d8sE/ nB7xLe8iGKjf/idPqmQOWd31AONk/q1HMIFzeQeiKgQ7eHJsdlNjoirHdCkpvPsD S96alDos8PDLO8RH3IEG3K7EGZRet6wg/uj80ZfVWTQefU2NipCHMupTw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:54 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 5/6] btrfs: prevent pathological periodic reclaim loops Date: Mon, 17 Jun 2024 16:11:17 -0700 Message-ID: <34fe3a28628bcd97e2b7c9659da73f43744f4bdf.1718665689.git.boris@bur.io> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Periodic reclaim runs the risk of getting stuck in a state where it keeps reclaiming the same block group over and over. This can happen if 1. reclaiming that block_group fails 2. reclaiming that block_group fails to move any extents into existing block_groups and just allocates a fresh chunk and moves everything. Currently, 1. is a very tight loop inside the reclaim worker. That is critical for edge triggered reclaim or else we risk forgetting about a reclaimable group. On the other hand, with level triggered reclaim we can break out of that loop and get it later. With that fixed, 2. applies to both failures and "successes" with no progress. If we have done a periodic reclaim on a space_info and nothing has changed in that space_info, there is not much point to trying again, so don't, until enough space gets free, which we capture with a heuristic of needing to net free 1 chunk. Signed-off-by: Boris Burkov --- fs/btrfs/block-group.c | 12 ++++++--- fs/btrfs/space-info.c | 56 ++++++++++++++++++++++++++++++++++++------ fs/btrfs/space-info.h | 14 +++++++++++ 3 files changed, 71 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 6bcf24f2ac79..ba9afb94e7ce 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1933,6 +1933,8 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) reclaimed = 0; spin_lock(&space_info->lock); space_info->reclaim_errors++; + if (READ_ONCE(space_info->periodic_reclaim)) + space_info->periodic_reclaim_ready = false; spin_unlock(&space_info->lock); } spin_lock(&space_info->lock); @@ -1941,7 +1943,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) spin_unlock(&space_info->lock); next: - if (ret) { + if (ret && !READ_ONCE(space_info->periodic_reclaim)) { /* Refcount held by the reclaim_bgs list after splice. */ btrfs_get_block_group(bg); list_add_tail(&bg->bg_list, &retry_list); @@ -3677,6 +3679,8 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, space_info->bytes_reserved -= num_bytes; space_info->bytes_used += num_bytes; space_info->disk_used += num_bytes * factor; + if (READ_ONCE(space_info->periodic_reclaim)) + btrfs_space_info_update_reclaimable(space_info, -num_bytes); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); } else { @@ -3686,8 +3690,10 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, btrfs_space_info_update_bytes_pinned(info, space_info, num_bytes); space_info->bytes_used -= num_bytes; space_info->disk_used -= num_bytes * factor; - - reclaim = should_reclaim_block_group(cache, num_bytes); + if (READ_ONCE(space_info->periodic_reclaim)) + btrfs_space_info_update_reclaimable(space_info, num_bytes); + else + reclaim = should_reclaim_block_group(cache, num_bytes); spin_unlock(&cache->lock); spin_unlock(&space_info->lock); diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index ff92ad26ffa2..e7a2aa751f8f 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 +#include "linux/spinlock.h" #include #include "misc.h" #include "ctree.h" @@ -1899,7 +1900,9 @@ static u64 calc_pct_ratio(u64 x, u64 y) */ static u64 calc_unalloc_target(struct btrfs_fs_info *fs_info) { - return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * calc_effective_data_chunk_size(fs_info); + u64 chunk_sz = calc_effective_data_chunk_size(fs_info); + + return BTRFS_UNALLOC_BLOCK_GROUP_TARGET * chunk_sz; } /* @@ -1935,14 +1938,13 @@ static int calc_dynamic_reclaim_threshold(struct btrfs_space_info *space_info) u64 unused = alloc - used; u64 want = target > unalloc ? target - unalloc : 0; u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); - /* Cast to int is OK because want <= target */ - int ratio = calc_pct_ratio(want, target); - /* If we have no unused space, don't bother, it won't work anyway */ + /* If we have no unused space, don't bother, it won't work anyway. */ if (unused < data_chunk_size) return 0; - return ratio; + /* Cast to int is OK because want <= target. */ + return calc_pct_ratio(want, target); } int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) @@ -1984,6 +1986,46 @@ static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, return 0; } +void btrfs_space_info_update_reclaimable(struct btrfs_space_info *space_info, s64 bytes) +{ + u64 chunk_sz = calc_effective_data_chunk_size(space_info->fs_info); + + assert_spin_locked(&space_info->lock); + space_info->reclaimable_bytes += bytes; + + if (space_info->reclaimable_bytes >= chunk_sz) + btrfs_set_periodic_reclaim_ready(space_info, true); +} + +void btrfs_set_periodic_reclaim_ready(struct btrfs_space_info *space_info, bool ready) +{ + assert_spin_locked(&space_info->lock); + if (!READ_ONCE(space_info->periodic_reclaim)) + return; + if (ready != space_info->periodic_reclaim_ready) { + space_info->periodic_reclaim_ready = ready; + if (!ready) + space_info->reclaimable_bytes = 0; + } +} + +bool btrfs_should_periodic_reclaim(struct btrfs_space_info *space_info) +{ + bool ret; + + if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) + return false; + if (!READ_ONCE(space_info->periodic_reclaim)) + return false; + + spin_lock(&space_info->lock); + ret = space_info->periodic_reclaim_ready; + btrfs_set_periodic_reclaim_ready(space_info, false); + spin_unlock(&space_info->lock); + + return ret; +} + int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) { int ret; @@ -1991,9 +2033,7 @@ int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info) struct btrfs_space_info *space_info; list_for_each_entry(space_info, &fs_info->space_info, list) { - if (space_info->flags & BTRFS_BLOCK_GROUP_SYSTEM) - continue; - if (!READ_ONCE(space_info->periodic_reclaim)) + if (!btrfs_should_periodic_reclaim(space_info)) continue; for (raid = 0; raid < BTRFS_NR_RAID_TYPES; raid++) { ret = do_reclaim_sweep(fs_info, space_info, raid); diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index ae4a1f7d5856..4db8a0267c16 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -196,6 +196,17 @@ struct btrfs_space_info { * threshold in the cleaner thread. */ bool periodic_reclaim; + + /* + * Periodic reclaim should be a no-op if a space_info hasn't + * freed any space since the last time we tried. + */ + bool periodic_reclaim_ready; + + /* + * Net bytes freed or allocated since the last reclaim pass. + */ + s64 reclaimable_bytes; }; struct reserve_ticket { @@ -278,6 +289,9 @@ void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info); void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); +void btrfs_space_info_update_reclaimable(struct btrfs_space_info *space_info, s64 bytes); +void btrfs_set_periodic_reclaim_ready(struct btrfs_space_info *space_info, bool ready); +bool btrfs_should_periodic_reclaim(struct btrfs_space_info *space_info); int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info); int btrfs_reclaim_sweep(struct btrfs_fs_info *fs_info); From patchwork Mon Jun 17 23:11:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13701551 Received: from wfhigh6-smtp.messagingengine.com (wfhigh6-smtp.messagingengine.com [64.147.123.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D21919AD65 for ; Mon, 17 Jun 2024 23:11:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.157 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665920; cv=none; b=W4ZTaWgqd65LWlmqPa69B8iBea1KiwWukN3ZhdFQmvcOD4vSbcaq7OTcN3cLROaGV0iRiQaQTn3oB3woFxWmClB4X/KJ28bz4VxBYBLEc5ipSGgW6IY3tgFPYZUhGFw/WgU8Uf5HSnTV4f+qr13Vk44oHVRTra0o5HvfCjFxlgY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718665920; c=relaxed/simple; bh=hzhoGO92xl8LPbFjODE5N7gGSveoLFVMK34FF/OXWV4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hl9rvsKcjZ91TbABzkE1oQxJX+pt6ZxfP6DqLRtUzH4E2k0QAZBKSQ6tJu7bRqK5C1pQxN7Vub49ofF593zeNnCBCHg30fQaPgG1bo7utOCMfgHgxqVudU8DXcaIoEWwmxtWmZ/2wNyPbbM1QFQDRhmqT/05hBtvWjRMOrdW+jE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=DIuDPh5W; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=p5YgUgDK; arc=none smtp.client-ip=64.147.123.157 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="DIuDPh5W"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="p5YgUgDK" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.west.internal (Postfix) with ESMTP id 3A9661800070; Mon, 17 Jun 2024 19:11:58 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 17 Jun 2024 19:11:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1718665917; x= 1718752317; bh=PsTNSV15U4LIhwToEqb9sR3MJ90lJ2kczFDkVNsdOqE=; b=D IuDPh5WTDSrfGV4pNEMEm+hFSjrcql5AitXC93IBy3Hjs7+Hx5tTDCMMmtt5LYo0 SdP9hPTbFgG7pMz1Pnt7nthecAxBkHFgKfNwO/26u63aKOTUWVkSfUidlJWFsXoZ FrdAfrsEJAXwCkwGHglhExDlWedcVvHB3el/nOnRCkrwUabCChmgSNKb22Z3gWBZ dgd6Az1zlrkteNzMJKsphPqjJs1iuvKbC8lyikrNM9Jvs80pLZ0QfaEQ/8B8Qw6J RTqxZNDNo7exvG9tkfpCgeZ0XbUUEwEugZ8GkwOpRZN1KcjGhWywOdD2URe1eSk5 qHjCBjdsPyefApVGOrotg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1718665917; x=1718752317; bh=PsTNSV15U4LIh wToEqb9sR3MJ90lJ2kczFDkVNsdOqE=; b=p5YgUgDKwFyGNeiXAs+iloc456zmg IVUTpEJDAWYvdCRt0Tru4rRqfRVhuViXD2w715ly4DuXkjHKiyqRK5XhWTFHNPhH IHuV07OKBnrJvJSpVd3UGwOaC9qg5STtvEWzPnYjyxcW+0p796emzofXYeu6+YDV cvjsqskyTgpanC7qE1+ol//CYciBtV6RpMrrfFniclgPcPMdZKaep4OLTgy244SY W8mjzpXU4aiwverp+M8/TYKXIUXO81AP7wxX62UEO2c0uoN42ggS5/VVIGA24gEs FqF4abttv9UmnLXHkJq5kThhKsASJkVEKcrXqvf9tFsTuurPC//bDKGOA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedviedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 17 Jun 2024 19:11:57 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 6/6] btrfs: urgent periodic reclaim pass Date: Mon, 17 Jun 2024 16:11:18 -0700 Message-ID: <6bf9d464d1a1b73853cc4fa82e233ff5e007a14a.1718665689.git.boris@bur.io> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Periodic reclaim attempts to avoid block_groups seeing active use with a sweep mark that gets cleared on allocation and set on a sweep. In urgent conditions where we have very little unallocated space (less than one chunk used by the threshold calculation for the unallocated target), we want to be able to override this mechanism. Introduce a second pass that only happens if we fail to find a reclaim candidate and reclaim is urgent. In that case, do a second pass where all block groups are eligible. Signed-off-by: Boris Burkov --- fs/btrfs/space-info.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index e7a2aa751f8f..95e65d5163ab 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1956,17 +1956,35 @@ int btrfs_calc_reclaim_threshold(struct btrfs_space_info *space_info) return READ_ONCE(space_info->bg_reclaim_threshold); } +/* + * Under "urgent" reclaim, we will reclaim even fresh block groups that have + * recently seen successful allocations, as we are desperate to reclaim + * whatever we can to avoid ENOSPC in a transaction leading to a readonly fs. + */ +static bool is_reclaim_urgent(struct btrfs_space_info *space_info) +{ + struct btrfs_fs_info *fs_info = space_info->fs_info; + u64 unalloc = atomic64_read(&fs_info->free_chunk_space); + u64 data_chunk_size = calc_effective_data_chunk_size(fs_info); + + return unalloc < data_chunk_size; +} + static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, int raid) { struct btrfs_block_group *bg; int thresh_pct; + bool try_again = true; + bool urgent; spin_lock(&space_info->lock); + urgent = is_reclaim_urgent(space_info); thresh_pct = btrfs_calc_reclaim_threshold(space_info); spin_unlock(&space_info->lock); down_read(&space_info->groups_sem); +again: list_for_each_entry(bg, &space_info->block_groups[raid], list) { u64 thresh; bool reclaim = false; @@ -1974,14 +1992,29 @@ static int do_reclaim_sweep(struct btrfs_fs_info *fs_info, btrfs_get_block_group(bg); spin_lock(&bg->lock); thresh = mult_perc(bg->length, thresh_pct); - if (bg->used < thresh && bg->reclaim_mark) + if (bg->used < thresh && bg->reclaim_mark) { + try_again = false; reclaim = true; + } bg->reclaim_mark++; spin_unlock(&bg->lock); if (reclaim) btrfs_mark_bg_to_reclaim(bg); btrfs_put_block_group(bg); } + + /* + * In situations where we are very motivated to reclaim (low unalloc) + * use two passes to make the reclaim mark check best effort. + * + * If we have any staler groups, we don't touch the fresher ones, but if we + * really need a block group, do take a fresh one. + */ + if (try_again && urgent) { + try_again = false; + goto again; + } + up_read(&space_info->groups_sem); return 0; }