From patchwork Wed Sep 13 00:13:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A6FAEE3F39 for ; Wed, 13 Sep 2023 00:12:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233205AbjIMAMv (ORCPT ); Tue, 12 Sep 2023 20:12:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231704AbjIMAMt (ORCPT ); Tue, 12 Sep 2023 20:12:49 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7864F1706 for ; Tue, 12 Sep 2023 17:12:45 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id BF612320094A; Tue, 12 Sep 2023 20:12:44 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Tue, 12 Sep 2023 20:12:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563964; x= 1694650364; bh=IhZ+SP48gWe0D/xTDFXqJUAqlzAsp1JVpYkoerbXuIE=; b=S lG5JL1u1SCsxDK0qF2elDnZLFHQ+yoHeyJYgac8EwGgNTsk9xbdRUkaDv5hleVt3 XtCPNJvsNGW4Wrd0mZyQckUvck5PrnJ552BWgHM2NmMcENhhdM844hMBjkFlWrmC c9cBxwiP7Hnj6BcJfDMlJvP/lZK5Xq9prslZr8sAcEe7MG4rcA6zFEFOehYzfQDD Oj4NKFxxUBz2lphzCUyKSr4MaLYwcD9diTcDWgOKPOl8uFzhGC6RduzHB6EOhIPY nBj+OT5QJFGw3IhpF89b7Up6q+PzF/PFW+VIpW1wgiwQb4EASMrby9fGncmZCfdC thLC0w0hinkop+hIEYPmA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563964; x=1694650364; bh=I hZ+SP48gWe0D/xTDFXqJUAqlzAsp1JVpYkoerbXuIE=; b=LyFPw5v8Kg6F1U5BO ox4W+LK+gaOTTUc+5VKqAFcpfmmwxW1HLFqonomXvhNTtrU8vN/a0IDqlowqBqb9 9V3nSpo2q2/h/e3dWOx6frgn9GssBoyhioXtwX7koLBZ2OtcsoaQy1tuxaQ7Zozv B5mbMRUA/C3CRJJVRlVeRx23TZ9xBdKXvQJEbrburWbfnKdciQbFI6OzTCvANhjV DNwOLcvh4e6BYxZJL9PMoy+ZSrH2+VSS1SrrtZJviBfMTMA+GLvv6MYeLuFUrGO2 rQtggNF0zwvLM0k9bG1bc8+j5S5xrPAPdXuJezjrGBD1j3ZDgqiN2X4LZZULwpN4 XKyXw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:43 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 01/18] btrfs: introduce quota mode Date: Tue, 12 Sep 2023 17:13:12 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In preparation for introducing simple quotas, change from a binary setting for quotas to an enum based mode. Initially, the possible modes are disabled/full. Full quotas is normal btrfs qgroups. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/qgroup.c | 7 +++++++ fs/btrfs/qgroup.h | 6 ++++++ 2 files changed, 13 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index a51f1ceb867a..8f8318e0509b 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -30,6 +30,13 @@ #include "root-tree.h" #include "tree-checker.h" +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info) +{ + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return BTRFS_QGROUP_MODE_DISABLED; + return BTRFS_QGROUP_MODE_FULL; +} + /* * Helpers to access qgroup reservation * diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 12614bc1e70b..aed611774047 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -277,6 +277,12 @@ enum { }; int btrfs_quota_enable(struct btrfs_fs_info *fs_info); +enum btrfs_qgroup_mode { + BTRFS_QGROUP_MODE_DISABLED, + BTRFS_QGROUP_MODE_FULL, +}; + +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); From patchwork Wed Sep 13 00:13:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B21AEE49A4 for ; Wed, 13 Sep 2023 00:12:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234532AbjIMAMx (ORCPT ); Tue, 12 Sep 2023 20:12:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231704AbjIMAMw (ORCPT ); Tue, 12 Sep 2023 20:12:52 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98FBC10F2 for ; Tue, 12 Sep 2023 17:12:48 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id DDB5632005D8; Tue, 12 Sep 2023 20:12:47 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Tue, 12 Sep 2023 20:12:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563967; x= 1694650367; bh=47ze1KhPk4SZlJDhyLtaEq5gOL/Ejw+EPPnKfI64ulc=; b=I i3Ph23aTJbs+sJ275kXFhMvJrnBOZa3A9dYXrAkj4Z/odaii50E40m5lYmFSpt7P /mKw6i81nDjhlLB8/y7YNQyxdoDjh2Sp7HgKslST+1I9TJhQeNbi4Y7GqVE1hESM 4KnWlmBMkYi8tPLyq/J0GES+D+L0/Fv9H1kFr75xBlgZAhjs913+p5MYqU/4pQXu UsTI4Og2wuB3879kLZr75mWSjhO6pfFH3dzHi74CgLOmHxF46eTNKIDjcw9aG1Gv 4LS77/0W8OVcH49D/UuCcCdUxpcxwsK1daBS4BIkpmNvD26SoyyN1TNqBfeXB5KA oudMW6XECwFWOxrgdvkPg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563967; x=1694650367; bh=4 7ze1KhPk4SZlJDhyLtaEq5gOL/Ejw+EPPnKfI64ulc=; b=JuDxPjdG0n/UdY73V 2O6r7aoGZHGk0cKV75SZihJoT3Bwfik2YDyUnf31InukyNYLHK8iusPo8vPyHgKT 3IlOLqNaEtMWToYYhlAcJvVsLGCiBCzWBDB1qvkJmICHiQ0EoJR4ropU00kUnHAO d2bfP1ka/JwfAwINdI/VUgplUPlQgWDrCELcMMxAukIcxDenuSyDraG1T1tYb6Ln Z+oHPna/BNqnsUC2n6rZaMwXWDm2h1XOCSqmRu/G8Xz1pM+cHIGJTycdCXYqt3q4 qtm06QDKY6Gzeiv5KeXRtBlNDb9s04znVQn1jW34S88WrVMePiQuzdpIe7vZhsyS JcTIA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:46 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 02/18] btrfs: add new quota mode for simple quotas Date: Tue, 12 Sep 2023 17:13:13 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new quota mode called "simple quotas". It can be enabled by the existing quota enable ioctl via a new command, and sets an incompat bit, as the implementation of simple quotas will make backwards incompatible changes to the disk format of the extent tree. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 6 +- fs/btrfs/fs.h | 3 +- fs/btrfs/ioctl.c | 3 +- fs/btrfs/qgroup.c | 109 +++++++++++++++++++++++--------- fs/btrfs/qgroup.h | 17 ++++- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 4 +- include/uapi/linux/btrfs.h | 2 + include/uapi/linux/btrfs_tree.h | 10 ++- 9 files changed, 114 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 25d0cdf85a91..0a80224c8784 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -959,8 +959,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, return -ENOMEM; } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && - !generic_ref->skip_qgroup) { + if (btrfs_qgroup_enabled(fs_info) && !generic_ref->skip_qgroup) { record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref); @@ -1063,8 +1062,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, return -ENOMEM; } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && - !generic_ref->skip_qgroup) { + if (btrfs_qgroup_enabled(fs_info) && !generic_ref->skip_qgroup) { record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { kmem_cache_free(btrfs_delayed_data_ref_cachep, ref); diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index d84a390336fc..49fdac7dfd07 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -214,7 +214,8 @@ enum { BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ - BTRFS_FEATURE_INCOMPAT_ZONED) + BTRFS_FEATURE_INCOMPAT_ZONED | \ + BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA) #ifdef CONFIG_BTRFS_DEBUG /* diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 75ab766fe156..6211bce7f146 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3697,7 +3697,8 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) switch (sa->cmd) { case BTRFS_QUOTA_CTL_ENABLE: - ret = btrfs_quota_enable(fs_info); + case BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA: + ret = btrfs_quota_enable(fs_info, sa); break; case BTRFS_QUOTA_CTL_DISABLE: ret = btrfs_quota_disable(fs_info); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 8f8318e0509b..9348529270bf 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -34,9 +34,21 @@ enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info) { if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) return BTRFS_QGROUP_MODE_DISABLED; + if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE) + return BTRFS_QGROUP_MODE_SIMPLE; return BTRFS_QGROUP_MODE_FULL; } +bool btrfs_qgroup_enabled(struct btrfs_fs_info *fs_info) +{ + return btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_DISABLED; +} + +bool btrfs_qgroup_full_accounting(struct btrfs_fs_info *fs_info) +{ + return btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL; +} + /* * Helpers to access qgroup reservation * @@ -360,6 +372,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info) { + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; fs_info->qgroup_flags |= (BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT | BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN | BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING); @@ -380,8 +394,9 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) int ret = 0; u64 flags = 0; u64 rescan_progress = 0; + bool simple; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) return 0; fs_info->qgroup_ulist = ulist_alloc(GFP_KERNEL); @@ -431,14 +446,14 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) "old qgroup version, quota disabled"); goto out; } + fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr); + simple = (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE); if (btrfs_qgroup_status_generation(l, ptr) != - fs_info->generation) { + fs_info->generation && !simple) { qgroup_mark_inconsistent(fs_info); btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); } - fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, - ptr); rescan_progress = btrfs_qgroup_status_rescan(l, ptr); goto next1; } @@ -581,7 +596,7 @@ bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info) struct rb_node *node; bool ret = false; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) return ret; /* * Since we're unmounting, there is no race and no need to grab qgroup @@ -980,7 +995,8 @@ static int btrfs_clean_quota_tree(struct btrfs_trans_handle *trans, return ret; } -int btrfs_quota_enable(struct btrfs_fs_info *fs_info) +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args) { struct btrfs_root *quota_root; struct btrfs_root *tree_root = fs_info->tree_root; @@ -993,6 +1009,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup *prealloc = NULL; struct btrfs_trans_handle *trans = NULL; struct ulist *ulist = NULL; + const bool simple = (quota_ctl_args->cmd == BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA); int ret = 0; int slot; @@ -1095,8 +1112,11 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup_status_item); btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); - fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON | - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; + if (simple) + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE; + else + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAGS_MASK); btrfs_set_qgroup_status_rescan(leaf, ptr, 0); @@ -1224,8 +1244,14 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) spin_lock(&fs_info->qgroup_lock); fs_info->quota_root = quota_root; set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); + if (simple) + btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); spin_unlock(&fs_info->qgroup_lock); + /* Skip rescan for simple qgroups */ + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + goto out_free_path; + ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); @@ -1340,6 +1366,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) quota_root = fs_info->quota_root; fs_info->quota_root = NULL; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; + fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE; fs_info->qgroup_drop_subtree_thres = BTRFS_MAX_LEVEL; spin_unlock(&fs_info->qgroup_lock); @@ -1820,6 +1847,9 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info, struct btrfs_qgroup_extent_record *entry; u64 bytenr = record->bytenr; + if (!btrfs_qgroup_full_accounting(fs_info)) + return 0; + lockdep_assert_held(&delayed_refs->lock); trace_btrfs_qgroup_trace_extent(fs_info, record); @@ -1873,6 +1903,8 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, struct btrfs_backref_walk_ctx ctx = { 0 }; int ret; + if (!btrfs_qgroup_full_accounting(trans->fs_info)) + return 0; /* * We are always called in a context where we are already holding a * transaction handle. Often we are called when adding a data delayed @@ -1941,7 +1973,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr, struct btrfs_delayed_ref_root *delayed_refs; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) + if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0) return 0; record = kzalloc(sizeof(*record), GFP_NOFS); @@ -1980,7 +2012,7 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans, u64 bytenr, num_bytes; /* We can be called directly from walk_up_proc() */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; for (i = 0; i < nr; i++) { @@ -2356,7 +2388,7 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, int level; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; /* Wrong parameter order */ @@ -2423,7 +2455,7 @@ int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, BUG_ON(root_level < 0 || root_level >= BTRFS_MAX_LEVEL); BUG_ON(root_eb == NULL); - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; spin_lock(&fs_info->qgroup_lock); @@ -2757,7 +2789,7 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, * If quotas get disabled meanwhile, the resources need to be freed and * we can't just exit here. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (!btrfs_qgroup_full_accounting(fs_info) || fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING) goto out_free; @@ -2826,6 +2858,9 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) u64 qgroup_to_skip; int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return 0; + delayed_refs = &trans->transaction->delayed_refs; qgroup_to_skip = delayed_refs->qgroup_to_skip; while ((node = rb_first(&delayed_refs->dirty_extent_root))) { @@ -2941,7 +2976,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) qgroup_mark_inconsistent(fs_info); spin_lock(&fs_info->qgroup_lock); } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_enabled(fs_info)) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_ON; else fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; @@ -3000,7 +3035,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, if (!committing) mutex_lock(&fs_info->qgroup_ioctl_lock); - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_enabled(fs_info)) goto out; quota_root = fs_info->quota_root; @@ -3086,7 +3121,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, qgroup_dirty(fs_info, dstgroup); } - if (srcid) { + if (srcid && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) { srcgroup = find_qgroup_rb(fs_info, srcid); if (!srcgroup) goto unlock; @@ -3349,6 +3384,9 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, int slot; int ret; + if (!btrfs_qgroup_full_accounting(fs_info)) + return 1; + mutex_lock(&fs_info->qgroup_rescan_lock); extent_root = btrfs_extent_root(fs_info, fs_info->qgroup_rescan_progress.objectid); @@ -3429,10 +3467,15 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, static bool rescan_should_stop(struct btrfs_fs_info *fs_info) { - return btrfs_fs_closing(fs_info) || - test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state) || - !test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || - fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN; + if (btrfs_fs_closing(fs_info)) + return true; + if (test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)) + return true; + if (!btrfs_qgroup_enabled(fs_info)) + return true; + if (fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN) + return true; + return false; } static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) @@ -3446,6 +3489,9 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) bool stopped = false; bool did_leaf_rescans = false; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; + path = btrfs_alloc_path(); if (!path) goto out; @@ -3549,6 +3595,11 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, { int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) { + btrfs_warn(fs_info, "qgroup rescan init failed, running in simple mode"); + return -EINVAL; + } + if (!init_flags) { /* we're resuming qgroup rescan at mount time */ if (!(fs_info->qgroup_flags & @@ -3579,7 +3630,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, btrfs_warn(fs_info, "qgroup rescan init failed, qgroup is not enabled"); ret = -EINVAL; - } else if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { + } else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) { /* Quota disable is in progress */ ret = -EBUSY; } @@ -3838,7 +3889,7 @@ static int qgroup_reserve_data(struct btrfs_inode *inode, u64 to_reserve; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) || + if (btrfs_qgroup_mode(root->fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid) || len == 0) return 0; @@ -3970,7 +4021,7 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, int trace_op = QGROUP_RELEASE; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &inode->root->fs_info->flags)) + if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) return 0; /* In release case, we shouldn't have @reserved */ @@ -4081,7 +4132,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, struct btrfs_fs_info *fs_info = root->fs_info; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid) || num_bytes == 0) return 0; @@ -4126,7 +4177,7 @@ void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; @@ -4142,7 +4193,7 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes, { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; @@ -4201,7 +4252,7 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes) { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; /* Same as btrfs_qgroup_free_meta_prealloc() */ @@ -4309,7 +4360,7 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, int level = btrfs_header_level(subvol_parent) - 1; int ret = 0; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; if (btrfs_node_ptr_generation(subvol_parent, subvol_slot) > @@ -4419,7 +4470,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, int ret = 0; int i; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; if (!is_fstree(root->root_key.objectid) || !root->reloc_root) return 0; diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index aed611774047..7fc64d665353 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -101,8 +101,15 @@ * subtree rescan for them. */ -#define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN (1UL << 3) -#define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING (1UL << 4) +/* + * These flags share the flags field of the btrfs_qgroup_status_item + * with the persisted flags defined in btrfs_tree.h + * + * To minimize the chance of collision with new persisted status flags, + * these count backwards from the MSB. + */ +#define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN (1ULL << 63) +#define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING (1ULL << 62) /* * Record a dirty extent, and info qgroup to update quota on it @@ -276,13 +283,17 @@ enum { ENUM_BIT(QGROUP_FREE), }; -int btrfs_quota_enable(struct btrfs_fs_info *fs_info); enum btrfs_qgroup_mode { BTRFS_QGROUP_MODE_DISABLED, BTRFS_QGROUP_MODE_FULL, + BTRFS_QGROUP_MODE_SIMPLE }; enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info); +bool btrfs_qgroup_enabled(struct btrfs_fs_info *fs_info); +bool btrfs_qgroup_full_accounting(struct btrfs_fs_info *fs_info); +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index db992f7a5d38..8b5942cc13cd 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -510,7 +510,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { + if (btrfs_qgroup_enabled(fs_info)) { /* One for parent inode, two for dir entries */ qgroup_num_bytes = 3 * fs_info->nodesize; ret = btrfs_qgroup_reserve_meta_prealloc(root, diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3f9f933039c6..12876517c29e 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1620,11 +1620,11 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that qgroups are not + * Save some performance in the case that full qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (!btrfs_qgroup_full_accounting(fs_info)) return 0; /* diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index dbb8b96da50d..0e42f4a2121d 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -333,6 +333,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) #define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 (1ULL << 13) +#define BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA (1ULL << 14) struct btrfs_ioctl_feature_flags { __u64 compat_flags; @@ -753,6 +754,7 @@ struct btrfs_ioctl_get_dev_stats { #define BTRFS_QUOTA_CTL_ENABLE 1 #define BTRFS_QUOTA_CTL_DISABLE 2 #define BTRFS_QUOTA_CTL_RESCAN__NOTUSED 3 +#define BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA 4 struct btrfs_ioctl_quota_ctl_args { __u64 cmd; __u64 status; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index fc3c32186d7e..a4b44b13fff5 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -1204,9 +1204,17 @@ static inline __u16 btrfs_qgroup_level(__u64 qgroupid) */ #define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT (1ULL << 2) +/* + * Whether or not this filesystem is using simple quotas. + * Not exactly the incompat bit, because we support using simple quotas, + * disabling it, then going back to full qgroup quotas. + */ +#define BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE (1ULL << 3) + #define BTRFS_QGROUP_STATUS_FLAGS_MASK (BTRFS_QGROUP_STATUS_FLAG_ON | \ BTRFS_QGROUP_STATUS_FLAG_RESCAN | \ - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) + BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT | \ + BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE) #define BTRFS_QGROUP_STATUS_VERSION 1 From patchwork Wed Sep 13 00:13:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A7A1EE3F3F for ; Wed, 13 Sep 2023 00:12:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235846AbjIMAM4 (ORCPT ); Tue, 12 Sep 2023 20:12:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231704AbjIMAMz (ORCPT ); Tue, 12 Sep 2023 20:12:55 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 405C810F2 for ; Tue, 12 Sep 2023 17:12:51 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 8464F320094A; Tue, 12 Sep 2023 20:12:50 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 12 Sep 2023 20:12:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563970; x= 1694650370; bh=8ZKqJFpf2MCiacH4RTQm2DLEUGqAaL/59FLXKRhsXME=; b=B Z8ueHear6fPXrg9Q8j994+3xlqcyAyhPiT/lbGraL+fZh9fQ2QBfJRxLMKU4LOZ1 sv7exCyLhph6RlQFDFUppYuatW8PXB3nkJ6UA0a7PkwJfxkoi+/EyjqUk5f99qfN aj64gCzB1ODrDstEb1xPu5xytm6FQz6oc2EUk2PZIGL8LXnAFsaGPVh8lK1+XhZU 0cSF2RYIHrMT23ddZB6Dq5MWCI0yjfRa+/7YuHo+shDu785GKdSR95yZVWigMSRk NS1BhtYbYg7/qNFORAClD0CUoS0Sjbm6M6DLU5sL2//OUAg+ebkCdKJ6RzNAuIGE HMrrgjNTIbHzKU7Tg4Hcw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563970; x=1694650370; bh=8 ZKqJFpf2MCiacH4RTQm2DLEUGqAaL/59FLXKRhsXME=; b=ugiHmt0dkQWK7KlrR CPypfalt7W2s2ouxPZHiluuAmjsld9ReLZQ2/cb0olMvdl3UBJ9JaCorv8B8kH63 L0wUGxXDWvzVrrVlUvfOZOJ5sRGeD+4Py1Q3IRHu/4OEuQz3Grye6qRzsZ1MLTU1 n7Ux1H9NjEhVha4kbMBcX6Ijybf0v0Ukh3VRfUYUJITGdSIMKoSWc+rg9kg2c4/s cGOfUpoDqciYbK+J0aBApxUf6+eXbhdNYvYCQME+iqzEdDGPIggt6okWtvAXzpcE IadRxcpTa/JSh7nxxssNOx6qJ4UQG2glW7wbP7wSedsMBE7iQYrpklrMvF0/McWY QoYng== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:49 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 03/18] btrfs: expose quota mode via sysfs Date: Tue, 12 Sep 2023 17:13:14 -0700 Message-ID: <4446e7c0594be81d56f70184b12504a553d824af.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new sysfs file /sys/fs/btrfs//qgroups/mode which prints out the mode qgroups is running in. The possible modes are qgroup, and squota. If quotas are not enabled, then the qgroups directory will not exist, so don't handle that mode. Signed-off-by: Boris Burkov --- fs/btrfs/sysfs.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index b1d1ac25237b..98d935bc1ee4 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -2086,6 +2086,33 @@ static ssize_t qgroup_enabled_show(struct kobject *qgroups_kobj, } BTRFS_ATTR(qgroups, enabled, qgroup_enabled_show); +static ssize_t qgroup_mode_show(struct kobject *qgroups_kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(qgroups_kobj->parent); + ssize_t ret = 0; + + spin_lock(&fs_info->qgroup_lock); + ASSERT(btrfs_qgroup_enabled(fs_info)); + switch (btrfs_qgroup_mode(fs_info)) { + case BTRFS_QGROUP_MODE_FULL: + ret = sysfs_emit(buf, "qgroup\n"); + break; + case BTRFS_QGROUP_MODE_SIMPLE: + ret = sysfs_emit(buf, "squota\n"); + break; + default: + btrfs_warn(fs_info, "unexpected qgroup mode %d\n", + btrfs_qgroup_mode(fs_info)); + break; + } + spin_unlock(&fs_info->qgroup_lock); + + return ret; +} +BTRFS_ATTR(qgroups, mode, qgroup_mode_show); + static ssize_t qgroup_inconsistent_show(struct kobject *qgroups_kobj, struct kobj_attribute *a, char *buf) @@ -2148,6 +2175,7 @@ static struct attribute *qgroups_attrs[] = { BTRFS_ATTR_PTR(qgroups, enabled), BTRFS_ATTR_PTR(qgroups, inconsistent), BTRFS_ATTR_PTR(qgroups, drop_subtree_threshold), + BTRFS_ATTR_PTR(qgroups, mode), NULL }; ATTRIBUTE_GROUPS(qgroups); From patchwork Wed Sep 13 00:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C0A5EE3F39 for ; Wed, 13 Sep 2023 00:12:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237002AbjIMAM6 (ORCPT ); Tue, 12 Sep 2023 20:12:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236963AbjIMAM5 (ORCPT ); Tue, 12 Sep 2023 20:12:57 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB57710F2 for ; Tue, 12 Sep 2023 17:12:53 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 2E7E53200955; Tue, 12 Sep 2023 20:12:53 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 12 Sep 2023 20:12:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563972; x= 1694650372; bh=9JIs4/6JO5xrpSJLD/U6PzfBaWLugDO4W1+FC1bphgk=; b=q jmhbSJFTIHEANedJtAos6rDNUu/18Nc6DZRXlhtpwISGjLO0rGE2Qy45QzsoXr5h kOQZLb6Zr6GDFflvNyYlEGnlFfLY7LsVYHiuR/E6WDHOyCU6B0Hol79mFiqvqZA3 TSjAU/l0uxtw4h5VSpyH5fkRTn0++CAKpP0TOweSKOhgG8K5hv2twydkvPYRM4BZ LF0DGwq0+ZcSKGaRO8SSVzFa+XQoBwXr3PDHx4JaNM+PO0rukcLJnK1qzZxfqnXP fEPSV56UoWIpeGIx4DNxZgvUb+4tvrgc0aVw528XK4GssLoQtqGCo3gK6idBu9ri wKkHNCDZuCbhXE8xmqZNg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563972; x=1694650372; bh=9 JIs4/6JO5xrpSJLD/U6PzfBaWLugDO4W1+FC1bphgk=; b=MsMdDL31YY9ppgf1A nrBTN1XRcZrzVADTVM7MipoV0aoPrFSF5sAPNc+yKoMiVp1nXno+2UlZo9xslpz/ 6QSIqLB7Gwx9COXvA2NNDCDzgzgCL9uUJoG0OaF4qqe5SaIrLXMpsTn5z83ALUEo MHspofS9VOTKueumqlkREt6QTdkrftpq7vGPdQPfstpOQY3FAgTsmiQ1ODK1ppvd VdipSQF+9PCWx3rjgA7W4zSOOBN9jEFg3sNgpm6ajo0KQaNHTZyF4fWTM7nFZKIP jANjh4nBIDN8IaEJPaDz8c/58YTvAxsGmUi2kB1DGbjX6LrMT9031dreXQEby6Yr +YxIA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:52 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 04/18] btrfs: add simple_quota incompat feature to sysfs Date: Tue, 12 Sep 2023 17:13:15 -0700 Message-ID: <88ef63d3b102acd61920290d2ceca16f87cc3d99.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add an entry in the features directory for the new incompat flag Signed-off-by: Boris Burkov --- fs/btrfs/sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 98d935bc1ee4..6f1cf1c9aaec 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -291,6 +291,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID); BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE); BTRFS_FEAT_ATTR_COMPAT_RO(block_group_tree, BLOCK_GROUP_TREE); BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); +BTRFS_FEAT_ATTR_INCOMPAT(simple_quota, SIMPLE_QUOTA); #ifdef CONFIG_BLK_DEV_ZONED BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED); #endif @@ -322,6 +323,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(free_space_tree), BTRFS_FEAT_ATTR_PTR(raid1c34), BTRFS_FEAT_ATTR_PTR(block_group_tree), + BTRFS_FEAT_ATTR_PTR(simple_quota), #ifdef CONFIG_BLK_DEV_ZONED BTRFS_FEAT_ATTR_PTR(zoned), #endif From patchwork Wed Sep 13 00:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ACE0EE3F3F for ; Wed, 13 Sep 2023 00:12:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237004AbjIMANB (ORCPT ); Tue, 12 Sep 2023 20:13:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236262AbjIMANA (ORCPT ); Tue, 12 Sep 2023 20:13:00 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77C691706 for ; Tue, 12 Sep 2023 17:12:56 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id C178132005D8; Tue, 12 Sep 2023 20:12:55 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Tue, 12 Sep 2023 20:12:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563975; x= 1694650375; bh=SkaZ5QzY4Lty86iONX1xzAe7sEvlPAOjV/TrGBs9kiY=; b=f 5KxP/oZrah/RibborGcr9+2jYh6s9FZRx8bvmZTq4BcPRAn0nQk/DZxQQOjJicWE UplyBYtLzEdZlOgJZMqd+H7BCTJBV7srOCkiMGMkoIPAuLMX153LB4sWU26VBCHp rpbi+u3uAVUHCX2f1c6wJf6PHjcPCJhquKMelXsorJhP2xCVdxbt2rD2wsUBLUx2 ueNXzxUVg4hnxtQ1OCMEJ6jcgnmLOIWHdCaVfQA5jIE7eAryLOI/Rk9Lsj7JSSIS 0v5FCwnocdat4LWTQLYpb15gRCWTwsrIZCFx7HBE7/Cqw29nIfW02hup5InOFXA5 LTtZyXqv6YWn/2K23qZjg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563975; x=1694650375; bh=S kaZ5QzY4Lty86iONX1xzAe7sEvlPAOjV/TrGBs9kiY=; b=qVfIKAVHj7cb1SdIX xI4TLue5/Y402PvOesacOxLoNqQQrSIi4Zt3P9o4WCwiV6QAbMzx9yRC2aBXnOl9 EcEyopf21uWUeopHDWREgb01Dzp6JE4W0BVzzOxYW91rdVa41b+mHdpFTHMhFL4c 1SW/8qhuyeLJ+I9wrdvso31tqG/shY3/T5jVWC2M4+RnUs3wf/u1qA2FGaPyvJAR 7Y69aDbdeKme22SI33BoKRoo67v38qjmYclAbuyWpUalbgpjlqfFl8l5P8MhuF8j 9v9jdJmP3sWVHoLYVG6HjwCzFkUIvAVvPsTlJYTTEBB/fFCOhcWAoh0ZIrx7er0k N+3jg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:54 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 05/18] btrfs: flush reservations during quota disable Date: Tue, 12 Sep 2023 17:13:16 -0700 Message-ID: <6de26f3c99ed89cb41500cc28e1d797a8819636b.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The following sequence: enable simple quotas do some writes reserve space create ordered_extent release rsv (store rsv_bytes in OE, mark QGROUP_RESERVED bits) disable quotas enable simple quotas set qgroup rsv to 0 on all subvols ordered_extent finishes create delayed ref with rsv_bytes from before run delayed ref record_simple_quota_delta free rsv_bytes (0 -> -rsv_delta) results in us reliably underflowing the subvolume's qgroup rsv counter, because disabling/re-enabling quotas toggles reservation counters down to 0, but does not remove other file system state which represents successful acquisition of qgroup rsv space. Specifically metadata rsv counters on the root object and rsv_bytes on ordered_extent objects that have released their reservation as well as the corresponding QGROUP_RESERVED extent bits. Normal qgroups gets away with this, I believe because it forces more work to happen on transaction commit, but I am not certain it is totally safe from the ordered_extent/leaked extent bit variant. Simple quotas hits this reliably. The intent of the fix is to make disable take the time to clear that external to qgroups state as well: after flipping off the quota bit on fs_info, flush delalloc and ordered extents, clearing the extent bits along the way. This makes it so there are no ordered extents or meta prealloc hanging around from the first enablement period during the second. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/qgroup.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 9348529270bf..84030f81a1d8 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1296,6 +1296,40 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, return ret; } +/* + * It is possible to have outstanding ordered extents + * which reserved bytes before we disabled. We need to fully flush + * delalloc, ordered extents, and a commit to ensure that + * we don't leak such reservations, only to have them come back + * if we re-enable. + * + * i.e.: + * enable simple quotas + * reserve space + * release it, store rsv_bytes in OE + * disable quotas + * enable simple quotas (qgroup rsv are all 0) + * OE finishes + * run delayed refs + * free rsv_bytes, resulting in miscounting or even underflow + */ +static int flush_reservations(struct btrfs_fs_info *fs_info) +{ + struct btrfs_trans_handle *trans; + int ret; + + ret = btrfs_start_delalloc_roots(fs_info, LONG_MAX, false); + if (ret) + return ret; + btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); + trans = btrfs_join_transaction(fs_info->tree_root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_commit_transaction(trans); + + return ret; +} + int btrfs_quota_disable(struct btrfs_fs_info *fs_info) { struct btrfs_root *quota_root; @@ -1340,6 +1374,10 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); btrfs_qgroup_wait_for_completion(fs_info, false); + ret = flush_reservations(fs_info); + if (ret) + goto out; + /* * 1 For the root item * @@ -1401,7 +1439,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) if (ret && trans) btrfs_end_transaction(trans); else if (trans) - ret = btrfs_end_transaction(trans); + ret = btrfs_commit_transaction(trans); mutex_unlock(&fs_info->cleaner_mutex); return ret; @@ -4021,8 +4059,11 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, int trace_op = QGROUP_RELEASE; int ret; - if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) - return 0; + if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) { + extent_changeset_init(&changeset); + return clear_record_extent_bits(&inode->io_tree, start, start + len - 1, + EXTENT_QGROUP_RESERVED, &changeset); + } /* In release case, we shouldn't have @reserved */ WARN_ON(!free && reserved); From patchwork Wed Sep 13 00:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB447EE3F39 for ; Wed, 13 Sep 2023 00:13:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237279AbjIMAND (ORCPT ); Tue, 12 Sep 2023 20:13:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236262AbjIMANC (ORCPT ); Tue, 12 Sep 2023 20:13:02 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30C011706 for ; Tue, 12 Sep 2023 17:12:59 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 78223320093C; Tue, 12 Sep 2023 20:12:58 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 12 Sep 2023 20:12:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563977; x= 1694650377; bh=6IvEpnpGHmhKZx01M9taeNawAipMUx6oDLlNde+G5is=; b=u HdX+eO05vzCMEW6d5BfskL6e75aVuYxVLhW1v1A6qGALeUQf5H97aQgrlr7NrUBL tgrDJ/nck5j+UB9e8bhHs40kxAtTecWWDf43gTfdbFcnthyyBNPKyDgUAAJ64Kz5 4Ohl3ZCfPe7DvnEqgmGr5scn09rMVI/qK95UVcqfyDbVX24LmZ6kRp9Cuhj+uPy4 7nJEGI3FD8AF6ODSpyuh2/CGmVTwMFO5BqDtpDq5KPzzyg7gQQ5/DveTs7VTQIdv J/MPFCbgnLos/W0VrvWkfcDzWrus52nb4bPjP2l+kh4tVz7eBwdYqNj/L3poW4wW zXw6zrDZ9qmnPfUt5DEVA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563977; x=1694650377; bh=6 IvEpnpGHmhKZx01M9taeNawAipMUx6oDLlNde+G5is=; b=Ld+y92jl+aJY3vr+M gXX9tEtvxI6GgMhBHsv/Ms2wY6XZZHnGBgQbnx434vxv1sFo9WIGhYUZqePPSecP aA33FUAWo5R/d4bsHneMdqqiO1T+q7NgZ92WOaKvxWdcIHQwQwktybM3+8/7lFQ6 jYMfsCk5WoLF5pdnJR0svjadVKli6NffftYA1faBOek9rYVQ+4zz9i2VXCT+6C9e 9WyzPV62+I9+OsXRU99tn+QleUs2N3wUsPUCG8lsTVscIZQO9p2PIIbzxbu/zD8B b4DzxVG33i1xo6L3gzveoOQqFV0Y3n4pw3mYxtSALGWG7rVsIliHT727+TqzLmN3 yhfHQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:12:57 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 06/18] btrfs: create qgroup earlier in snapshot creation Date: Tue, 12 Sep 2023 17:13:17 -0700 Message-ID: <6355e2bf2ef93b5c998659f59e2b576ece7b7e1b.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Pull creating the qgroup earlier in the snapshot. This allows simple quotas qgroups to see all the metadata writes related to the snapshot being created and to be born with the root node accounted. Signed-off-by: Boris Burkov --- fs/btrfs/qgroup.c | 3 +++ fs/btrfs/transaction.c | 6 ++++++ 2 files changed, 9 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 84030f81a1d8..ae6ccf1eb0e4 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1700,6 +1700,9 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid) struct btrfs_qgroup *prealloc = NULL; int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) + return 0; + mutex_lock(&fs_info->qgroup_ioctl_lock); if (!fs_info->quota_root) { ret = -ENOTCONN; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 12876517c29e..eb649860ecfb 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1813,6 +1813,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, } btrfs_release_path(path); + ret = btrfs_create_qgroup(trans, objectid); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto fail; + } + /* * pull in the delayed directory update * and the delayed inode item From patchwork Wed Sep 13 00:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C3FBEE3F39 for ; Wed, 13 Sep 2023 00:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237605AbjIMANG (ORCPT ); Tue, 12 Sep 2023 20:13:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236262AbjIMANF (ORCPT ); Tue, 12 Sep 2023 20:13:05 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3A0E1706 for ; Tue, 12 Sep 2023 17:13:01 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 17D6E320093C; Tue, 12 Sep 2023 20:13:01 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 12 Sep 2023 20:13:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563980; x= 1694650380; bh=EyihdLC6f5fcFnSYV+3SyrA2LL3gKkYGzp6vnGpbX3k=; b=B 9QLgq+Dh83rvND8ypSFJ+T61odXN9SKmxX41A7Aagcx3LJFuMgzHH6CD483tJp1h y0O1oXSWnd2vdmlv8mlSZLvMHYH7+cHOvt7wlcOMxHZxsiNvLkHGdfCFLiSaIATZ GSctVAAZEQ4NIkgZwGWtuYny366hQXFQIbyJ2apd35I6AvKRvXWZH7al8y6FE3g2 pe/6Ck6QFAbqQBvzAfs1qOatkUMpZoOjHjGondSMuOPFhbYiAmnm4cmDX2i8IzXQ 2XnuOveX1yN1YIhc3so4DVhUxoKKBCQI8OMNgWPDVHwQ557v2+Exa2GyrYCQUo9t xSrLyN37UT7gS/v1mtxVg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563980; x=1694650380; bh=E yihdLC6f5fcFnSYV+3SyrA2LL3gKkYGzp6vnGpbX3k=; b=wdmbAYyTndzzuvtxQ Un+gNZoudxEFBKe8smePwRBeca48cE7/B3+KGwZPf8HSW6GSbrvOyg6rMfj+n5Zm hdurKjwLP98HxNjolgGcW2jNB4mvZYLcZZQ8IHX4Fyssg7K5tijnnqQl0up+RVCQ 0CT/PTu+eyshDN4t1wAWeOQKHBWCUZ+kS9RMJTrkCz8Yare910vf4EdyhC1kUpf2 IWqsxAorgRSjli8bCU01INSshtWv5cs6N7fiZCIBJ1JJz890eOcGx6TMZyh14Oai z9yH9vPnKzXlHoyJJdFCYXpjOQGfU+DNYjBOeLWpLDfI016exVnl8ks0vRjNwjkX qjuMQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:00 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 07/18] btrfs: function for recording simple quota deltas Date: Tue, 12 Sep 2023 17:13:18 -0700 Message-ID: <98756e1ee7ffe6acf2743e88e9c53fd7836a8f6d.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Rather than re-computing shared/exclusive ownership based on backrefs and walking roots for implicit backrefs, simple quotas does an increment when creating an extent and a decrement when deleting it. Add the API for the extent item code to use to track those events. Signed-off-by: Boris Burkov --- fs/btrfs/qgroup.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 15 +++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index ae6ccf1eb0e4..86d3a46ee33f 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -4597,3 +4597,48 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans) } *root = RB_ROOT; } + +int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_squota_delta *delta) +{ + int ret; + struct btrfs_qgroup *qgroup; + struct btrfs_qgroup *qg; + LIST_HEAD(qgroup_list); + u64 root = delta->root; + u64 num_bytes = delta->num_bytes; + int sign = (delta->is_inc ? 1 : -1); + + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE) + return 0; + + if (!is_fstree(root)) + return 0; + + spin_lock(&fs_info->qgroup_lock); + qgroup = find_qgroup_rb(fs_info, root); + if (!qgroup) { + ret = -ENOENT; + goto out; + } + + ret = 0; + qgroup_iterator_add(&qgroup_list, qgroup); + list_for_each_entry(qg, &qgroup_list, iterator) { + struct btrfs_qgroup_list *glist; + + qg->excl += num_bytes * sign; + qg->rfer += num_bytes * sign; + qgroup_dirty(fs_info, qg); + + list_for_each_entry(glist, &qg->groups, next_group) + qgroup_iterator_add(&qgroup_list, glist->group); + } + qgroup_iterator_clean(&qgroup_list); + +out: + spin_unlock(&fs_info->qgroup_lock); + if (!ret && delta->rsv_bytes) + btrfs_qgroup_free_refroot(fs_info, root, delta->rsv_bytes, BTRFS_QGROUP_RSV_DATA); + return ret; +} diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 7fc64d665353..5eabd944c782 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -269,6 +269,19 @@ struct btrfs_qgroup { struct kobject kobj; }; +struct btrfs_squota_delta { + /* The fstree root this delta counts against */ + u64 root; + /* The number of bytes in the extent being counted */ + u64 num_bytes; + /* The number of bytes reserved for this extent */ + u64 rsv_bytes; + /* Whether we are using or freeing the extent */ + bool is_inc; + /* Whether the extent is data or metadata */ + bool is_data; +}; + static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) { return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); @@ -407,5 +420,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb); void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info); +int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_squota_delta *delta); #endif From patchwork Wed Sep 13 00:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CD54EE3F39 for ; Wed, 13 Sep 2023 00:13:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237729AbjIMANJ (ORCPT ); Tue, 12 Sep 2023 20:13:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236262AbjIMANI (ORCPT ); Tue, 12 Sep 2023 20:13:08 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FD751706 for ; Tue, 12 Sep 2023 17:13:04 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B645E32005D8; Tue, 12 Sep 2023 20:13:03 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Tue, 12 Sep 2023 20:13:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563983; x= 1694650383; bh=itcokK/KDzUBf93gLC5j0YCNwJjfjhXdBk3PAYsEFvg=; b=C 5y8lfy7IfwN/TIuGD5buN0wYs2Qf7edmPamNThCw6g4vD1D1UDwmQU2bX0ho2oJT dLFw/+Wl2hWzKRDw90U27AWMktzaki2+oQzWfvdqrm/J504Qeb6hgj7WwqLReJ1Q QEVBnwuPwyERkhGnKqjqbq3qgqjDg96FFDQyY6h0Y/l868soEkbgMvxrmiXVtvO4 YZq0X6KDXvlaLi+uu45VOxOdfEqPIH5DPCpTmTWowvR8ClUruAJa7OmCHzuW/kz1 Q+VEMKtKnrS1fGLgVrZF+Prgk/5hpe+5LLNXrQX9JxAJkYJhH9i0qfX6USoPv44r OKy1O3uVoan7LU8GGuA5g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563983; x=1694650383; bh=i tcokK/KDzUBf93gLC5j0YCNwJjfjhXdBk3PAYsEFvg=; b=BGLntnfY7xqJ5tBzj UzwWqHj9ZCL7E7rrRRhwuFLcuGXKAFPCwqB5Fa9oLsAecqEA7EOcoydwzp8peBSn 03Z2CdXEhUHhfXh9kPGcFoNFzppIB7KcPZHEr/YdX35fMs5oooM23fFerarivkcG rK9MRPANhYl0g18mrxuOhZfaFUDriVwjFm7LaRdZ5+7dVR0UywKsvNKxvJ+9gpNA EdcQi548GxQzO+Pegf1kTF7UWLeDNFpD02Vw/CjyjvBY2QYv3NMm8ovinZoAZqKK DnuegVWWELoH/8MeANrM55wjZHi9WdSQ+pYtxSALGlIPhbKfI64mXV85S5UUVzMl 1Qu9Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:02 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 08/18] btrfs: rename tree_ref and data_ref owning_root Date: Tue, 12 Sep 2023 17:13:19 -0700 Message-ID: <41364ddd9f0def9ddd73f7a560e1a9ccb33939d7.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 113479d5b8eb ("btrfs: rename root fields in delayed refs structs") changed these from ref_root to owning_root. However, there are many circumstances where that name is not really accurate and the root on the ref struct _is_ the referring root. In general, these are not the owning root, though it does happen in some ref merging cases involving overwrites during snapshots and similar. Simple quotas cares quite a bit about tracking the original owner of an extent through delayed refs, so rename these back to free up the name for the real owning root (which will live on the generic btrfs_ref and the head ref) Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 10 +++++----- fs/btrfs/delayed-ref.h | 12 ++++++------ fs/btrfs/extent-tree.c | 10 +++++----- fs/btrfs/ref-verify.c | 4 ++-- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 0a80224c8784..9d6a5dafd9b8 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -946,7 +946,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, u64 parent = generic_ref->parent; u8 ref_type; - is_system = (generic_ref->tree_ref.owning_root == BTRFS_CHUNK_TREE_OBJECTID); + is_system = (generic_ref->tree_ref.ref_root == BTRFS_CHUNK_TREE_OBJECTID); ASSERT(generic_ref->type == BTRFS_REF_METADATA && generic_ref->action); ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS); @@ -974,14 +974,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, ref_type = BTRFS_TREE_BLOCK_REF_KEY; init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, - generic_ref->tree_ref.owning_root, action, + generic_ref->tree_ref.ref_root, action, ref_type); - ref->root = generic_ref->tree_ref.owning_root; + ref->root = generic_ref->tree_ref.ref_root; ref->parent = parent; ref->level = level; init_delayed_ref_head(head_ref, record, bytenr, num_bytes, - generic_ref->tree_ref.owning_root, 0, action, + generic_ref->tree_ref.ref_root, 0, action, false, is_system); head_ref->extent_op = extent_op; @@ -1034,7 +1034,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, u64 bytenr = generic_ref->bytenr; u64 num_bytes = generic_ref->len; u64 parent = generic_ref->parent; - u64 ref_root = generic_ref->data_ref.owning_root; + u64 ref_root = generic_ref->data_ref.ref_root; u64 owner = generic_ref->data_ref.ino; u64 offset = generic_ref->data_ref.offset; u8 ref_type; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 783f84c9f2f4..1b4d16864d97 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -194,8 +194,8 @@ enum btrfs_ref_type { struct btrfs_data_ref { /* For EXTENT_DATA_REF */ - /* Original root this data extent belongs to */ - u64 owning_root; + /* Root which owns this data reference */ + u64 ref_root; /* Inode which refers to this data extent */ u64 ino; @@ -218,11 +218,11 @@ struct btrfs_tree_ref { int level; /* - * Root which owns this tree block. + * Root which owns this tree block reference. * * For TREE_BLOCK_REF (skinny metadata, either inline or keyed) */ - u64 owning_root; + u64 ref_root; /* For non-skinny metadata, no special member needed */ }; @@ -311,7 +311,7 @@ static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, generic_ref->real_root = mod_root ?: root; #endif generic_ref->tree_ref.level = level; - generic_ref->tree_ref.owning_root = root; + generic_ref->tree_ref.ref_root = root; generic_ref->type = BTRFS_REF_METADATA; if (skip_qgroup || !(is_fstree(root) && (!mod_root || is_fstree(mod_root)))) @@ -329,7 +329,7 @@ static inline void btrfs_init_data_ref(struct btrfs_ref *generic_ref, /* If @real_root not set, use @root as fallback */ generic_ref->real_root = mod_root ?: ref_root; #endif - generic_ref->data_ref.owning_root = ref_root; + generic_ref->data_ref.ref_root = ref_root; generic_ref->data_ref.ino = ino; generic_ref->data_ref.offset = offset; generic_ref->type = BTRFS_REF_DATA; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4135e6ec3d7c..ab13d2cd0ed5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1419,7 +1419,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, ASSERT(generic_ref->type != BTRFS_REF_NOT_SET && generic_ref->action); BUG_ON(generic_ref->type == BTRFS_REF_METADATA && - generic_ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID); + generic_ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID); if (generic_ref->type == BTRFS_REF_METADATA) ret = btrfs_add_delayed_tree_ref(trans, generic_ref, NULL); @@ -3368,9 +3368,9 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref) * tree, just update pinning info and exit early. */ if ((ref->type == BTRFS_REF_METADATA && - ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID) || + ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID) || (ref->type == BTRFS_REF_DATA && - ref->data_ref.owning_root == BTRFS_TREE_LOG_OBJECTID)) { + ref->data_ref.ref_root == BTRFS_TREE_LOG_OBJECTID)) { /* unlocks the pinned mutex */ btrfs_pin_extent(trans, ref->bytenr, ref->len, 1); ret = 0; @@ -3381,9 +3381,9 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref) } if (!((ref->type == BTRFS_REF_METADATA && - ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID) || + ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID) || (ref->type == BTRFS_REF_DATA && - ref->data_ref.owning_root == BTRFS_TREE_LOG_OBJECTID))) + ref->data_ref.ref_root == BTRFS_TREE_LOG_OBJECTID))) btrfs_ref_tree_mod(fs_info, ref); return ret; diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index 26a7fb655f71..e9e1ebd8dd6a 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -681,10 +681,10 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info, if (generic_ref->type == BTRFS_REF_METADATA) { if (!parent) - ref_root = generic_ref->tree_ref.owning_root; + ref_root = generic_ref->tree_ref.ref_root; owner = generic_ref->tree_ref.level; } else if (!parent) { - ref_root = generic_ref->data_ref.owning_root; + ref_root = generic_ref->data_ref.ref_root; owner = generic_ref->data_ref.ino; offset = generic_ref->data_ref.offset; } From patchwork Wed Sep 13 00:13:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9986EE49A4 for ; Wed, 13 Sep 2023 00:13:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237871AbjIMANL (ORCPT ); Tue, 12 Sep 2023 20:13:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231166AbjIMANK (ORCPT ); Tue, 12 Sep 2023 20:13:10 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FED010F2 for ; Tue, 12 Sep 2023 17:13:07 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 649DC3200940; Tue, 12 Sep 2023 20:13:06 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 12 Sep 2023 20:13:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563985; x= 1694650385; bh=ZrexDphNPmLfGP1T7l4G2kAQd1dqxx8hykG/I+tQajw=; b=S N06ppg3JfC3I34LaZMHeCI9XLuNz3cO8rYZV5dWQQBKJoIwbBiznVZbQS4zXZYSg wfX3Fd2zyTnXcESJYzzCofjwifzVNKqLmyUJy7bMb0N7w+BAQQD9PZC28mlFKbyK j/plO55BeFqe80yGHq/zqFRcb6RKyBFAyKZquL637iaklyOz6tnYDxmHiK3c0Koq TMjTOCGu6nTer3FdYki3RjFVlNOFyIfaZIvHHUnByNo+pMz/ULbbq05TpF7mrlq1 kN3RPJESwGb9foJlmkCrrlegHyxRHUuTBXn3WhJyGqmyLrO3UZHgty0KUsPKASY4 ZwumWo/yET5CzCa5y/tbQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563985; x=1694650385; bh=Z rexDphNPmLfGP1T7l4G2kAQd1dqxx8hykG/I+tQajw=; b=MRiTBrxYnOx/p2sKj 90WkQTumZlzoqElHU7Zf84JSCMmyI3i0R8B0odfZ13sOm2f2A39j3DNNxwdQGOAh olml6gWeKiq12iLkT9v531FTVkFb7VuFeFIdhTudmuu9Fih90fT9YpzptfCJU9dR 3L2E9QmMirhA4weQToobpGp2ozUtq8LzUy2qrxA9emKTvRgTfsCNi7mTYuv0a6gF OaDcXO4fJCvXvzuNzUa800Q0lle0rEseUBeESYSEJdFKfHIxUA1I2kkHZKlhEo9h xlK1slNk2L9LAKOI07tzKZM41gGeQ+GXdRlOdcuT3n7HodCgqHb2pE7Ey3QrsBXw w7J4A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:05 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 09/18] btrfs: track owning root in btrfs_ref Date: Tue, 12 Sep 2023 17:13:20 -0700 Message-ID: <054dd41cf8c0f9ac01c4ed0b1a8faef34c0d3691.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While data extents require us to store additional inline refs to track the original owner on free, this information is available implicitly for metadata. It is found in the owner field of the header of the tree block. Even if other trees refer to this block and the original ref goes away, we will not rewrite that header field, so it will reliably give the original owner. In addition, there is a relocation case where a new data extent needs to have an owning root separate from the referring root wired through delayed refs. To use it for recording simple quota deltas, we need to wire this root id through from when we create the delayed ref until we fully process it. Store it in the generic btrfs_ref struct of the delayed ref. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.h | 7 +++++-- fs/btrfs/extent-tree.c | 19 +++++++++++-------- fs/btrfs/file.c | 10 +++++----- fs/btrfs/inode-item.c | 2 +- fs/btrfs/relocation.c | 17 ++++++++++------- fs/btrfs/tree-log.c | 3 ++- 6 files changed, 34 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 1b4d16864d97..2c384862110e 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -245,6 +245,7 @@ struct btrfs_ref { #endif u64 bytenr; u64 len; + u64 owning_root; /* Bytenr of the parent tree block */ u64 parent; @@ -295,16 +296,18 @@ static inline u64 btrfs_calc_delayed_ref_csum_bytes(const struct btrfs_fs_info * } static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref, - int action, u64 bytenr, u64 len, u64 parent) + int action, u64 bytenr, u64 len, u64 parent, u64 owning_root) { generic_ref->action = action; generic_ref->bytenr = bytenr; generic_ref->len = len; generic_ref->parent = parent; + generic_ref->owning_root = owning_root; } static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, - int level, u64 root, u64 mod_root, bool skip_qgroup) + int level, u64 root, u64 mod_root, + bool skip_qgroup) { #ifdef CONFIG_BTRFS_FS_REF_VERIFY /* If @real_root not set, use @root as fallback */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ab13d2cd0ed5..336957737c6c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2447,7 +2447,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, num_bytes = btrfs_file_extent_disk_num_bytes(buf, fi); key.offset -= btrfs_file_extent_offset(buf, fi); btrfs_init_generic_ref(&generic_ref, action, bytenr, - num_bytes, parent); + num_bytes, parent, ref_root); btrfs_init_data_ref(&generic_ref, ref_root, key.objectid, key.offset, root->root_key.objectid, for_reloc); @@ -2460,8 +2460,9 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, } else { bytenr = btrfs_node_blockptr(buf, i); num_bytes = fs_info->nodesize; + /* We don't know the owning_root, use 0 */ btrfs_init_generic_ref(&generic_ref, action, bytenr, - num_bytes, parent); + num_bytes, parent, 0); btrfs_init_tree_ref(&generic_ref, level - 1, ref_root, root->root_key.objectid, for_reloc); if (inc) @@ -3281,7 +3282,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, int ret; btrfs_init_generic_ref(&generic_ref, BTRFS_DROP_DELAYED_REF, - buf->start, buf->len, parent); + buf->start, buf->len, parent, btrfs_header_owner(buf)); btrfs_init_tree_ref(&generic_ref, btrfs_header_level(buf), root_id, 0, false); @@ -4744,12 +4745,14 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_key *ins) { struct btrfs_ref generic_ref = { 0 }; + u64 root_objectid = root->root_key.objectid; + u64 owning_root = root_objectid; - BUG_ON(root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID); + BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID); btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins->objectid, ins->offset, 0); - btrfs_init_data_ref(&generic_ref, root->root_key.objectid, owner, + ins->objectid, ins->offset, 0, owning_root); + btrfs_init_data_ref(&generic_ref, root_objectid, owner, offset, 0, false); btrfs_ref_tree_mod(root->fs_info, &generic_ref); @@ -4975,7 +4978,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, extent_op->level = level; btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins.objectid, ins.offset, parent); + ins.objectid, ins.offset, parent, btrfs_header_owner(buf)); btrfs_init_tree_ref(&generic_ref, level, root_objectid, root->root_key.objectid, false); btrfs_ref_tree_mod(fs_info, &generic_ref); @@ -5396,7 +5399,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, find_next_key(path, level, &wc->drop_progress); btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - fs_info->nodesize, parent); + fs_info->nodesize, parent, btrfs_header_owner(next)); btrfs_init_tree_ref(&ref, level - 1, root->root_key.objectid, 0, false); ret = btrfs_free_extent(trans, &ref); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fd2f8fec115f..59c2478d644b 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -374,7 +374,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (update_refs && disk_bytenr > 0) { btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, - disk_bytenr, num_bytes, 0); + disk_bytenr, num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, new_key.objectid, @@ -464,7 +464,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, } else if (update_refs && disk_bytenr > 0) { btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, - disk_bytenr, num_bytes, 0); + disk_bytenr, num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, key.objectid, @@ -746,7 +746,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, bytenr, - num_bytes, 0); + num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, ino, orig_offset, 0, false); ret = btrfs_inc_extent_ref(trans, &ref); @@ -772,7 +772,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, other_start = end; other_end = 0; btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - num_bytes, 0); + num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, ino, orig_offset, 0, false); if (extent_mergeable(leaf, path->slots[0] + 1, @@ -2288,7 +2288,7 @@ static int btrfs_insert_replace_extent(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, extent_info->disk_offset, - extent_info->disk_len, 0); + extent_info->disk_len, 0, root->root_key.objectid); ref_offset = extent_info->file_offset - extent_info->data_offset; btrfs_init_data_ref(&ref, root->root_key.objectid, btrfs_ino(inode), ref_offset, 0, false); diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index c19c0f10f0e2..d4af1dd4f5ba 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -676,7 +676,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, bytes_deleted += extent_num_bytes; btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, - extent_start, extent_num_bytes, 0); + extent_start, extent_num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), control->ino, extent_offset, root->root_key.objectid, false); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index ad67a88f2bbf..bd07d5322c61 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1158,7 +1158,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans, key.offset -= btrfs_file_extent_offset(leaf, fi); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr, - num_bytes, parent); + num_bytes, parent, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), key.objectid, key.offset, root->root_key.objectid, false); @@ -1169,7 +1169,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans, } btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - num_bytes, parent); + num_bytes, parent, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), key.objectid, key.offset, root->root_key.objectid, false); @@ -1382,7 +1382,8 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_mark_buffer_dirty(path->nodes[level]); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, old_bytenr, - blocksize, path->nodes[level]->start); + blocksize, path->nodes[level]->start, + src->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, 0, true); ret = btrfs_inc_extent_ref(trans, &ref); @@ -1391,7 +1392,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, break; } btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr, - blocksize, 0); + blocksize, 0, dest->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, 0, true); ret = btrfs_inc_extent_ref(trans, &ref); @@ -1400,8 +1401,9 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, break; } + /* We don't know the real owning_root, use 0 */ btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, new_bytenr, - blocksize, path->nodes[level]->start); + blocksize, path->nodes[level]->start, 0); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, 0, true); ret = btrfs_free_extent(trans, &ref); @@ -1410,8 +1412,9 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, break; } + /* We don't know the real owning_root, use 0 */ btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, old_bytenr, - blocksize, 0); + blocksize, 0, 0); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, 0, true); ret = btrfs_free_extent(trans, &ref); @@ -2520,7 +2523,7 @@ static int do_relocation(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, node->eb->start, blocksize, - upper->eb->start); + upper->eb->start, btrfs_header_owner(upper->eb)); btrfs_init_tree_ref(&ref, node->level, btrfs_header_owner(upper->eb), root->root_key.objectid, false); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 15c8cb6627fe..07e7fdaf8eb7 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -766,7 +766,8 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, } else if (ret == 0) { btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, - ins.objectid, ins.offset, 0); + ins.objectid, ins.offset, 0, + root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, key->objectid, offset, 0, false); From patchwork Wed Sep 13 00:13:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39A51EE3F3F for ; Wed, 13 Sep 2023 00:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237882AbjIMANP (ORCPT ); Tue, 12 Sep 2023 20:13:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237875AbjIMANN (ORCPT ); Tue, 12 Sep 2023 20:13:13 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B834D10F2 for ; Tue, 12 Sep 2023 17:13:09 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 0B907320093C; Tue, 12 Sep 2023 20:13:08 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 12 Sep 2023 20:13:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563988; x= 1694650388; bh=SGEDFssxFE2fKFxdbDm9hL15NcHIRG/cKEcaI3dyB3k=; b=o PBrWEoMWqy/DQwlRsWwlLL1TAqG9SQpGGweyL0qqUV3umaX1gTUg8uJCBLGUgJ8y PHEeKFtElSQJCRy4tPArmonzJwkZdxeSi/kSgunyx/y3q9IGelbLfJVXBH2iyR43 Kmcf7xqCNXQTpfQLBEj33lpgV33v3xtxLTttqSPfELHnEg0Ybs348tLtxZAeDA83 XDaEgvM6H/zlnM8jfpGo/irVhazyCeV0bXRKhraXAyjO/Cz1JPVreUurklS6A1Li l4WtV4Obp+tgg6pZMNG26qJCCk8RDmA51IGaDL0D7zi4BmDbmg8TT80QRiV25LWc WbDzKzeJXyb4N+PCH3qrQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563988; x=1694650388; bh=S GEDFssxFE2fKFxdbDm9hL15NcHIRG/cKEcaI3dyB3k=; b=QX+s+WKmHV8nxV1ba 2B7VchtJxdleX9GMQmPN/tUhezGbQBhEIowestemyhL5CVQ0fJPoVlud9na4A7Jl oDxr5eCRKdnm71n31xt6OgyLyd7nq9dX8rp0dZb8dDhutoautCuTlypB9SnmUCqb xWPJSUp1smvUeNio0k2+NTex1EUN2O95ocnH5QQ79hPFbrWbFYlDmJ7rPgG3IDCF /K3rINczKoNdqAWuEb/Auk/0p/2FxPBn9VAtbdxcxDpvYH9awiwQw+Xy5WlLrelu 3riN2ztVlS/7j3saDeseni3i/s6vOddMRdssprcrCDSiRAkWkldAOYlwAjpEtaD3 208Jg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:08 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 10/18] btrfs: track original extent owner in head_ref Date: Tue, 12 Sep 2023 17:13:21 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Simple quotas requires tracking the original creating root of any given extent. This gets complicated when multiple subvolumes create overlapping/contradictory refs in the same transaction. For example, due to modifying or deleting an extent while also snapshotting it. To resolve this in a general way, take advantage of the fact that we are essentially already tracking this for handling releasing reservations. The head ref coalesces the various refs and uses must_insert_reserved to check if it needs to create an extent/free reservation. Store the ref that set must_insert_reserved as the owning ref on the head ref. Note that this can result in writing an extent for the very first time with an owner different from its only ref, but it will look the same as if you first created it with the original owning ref, then added the other ref, then removed the owning ref. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 20 ++++++++++++++++---- fs/btrfs/delayed-ref.h | 7 +++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 9d6a5dafd9b8..b8ae48e8a63b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -678,6 +678,16 @@ static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, BUG_ON(existing->is_data != update->is_data); spin_lock(&existing->lock); + + /* + * When freeing an extent, we may not know the owning root + * when we first create the head_ref. However, some deref before the + * last deref will know it, so we just need to update the head_ref + * accordingly. + */ + if (!existing->owning_root) + existing->owning_root = update->owning_root; + if (update->must_insert_reserved) { /* if the extent was freed and then * reallocated before the delayed ref @@ -687,6 +697,7 @@ static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, * Set it again here */ existing->must_insert_reserved = update->must_insert_reserved; + existing->owning_root = update->owning_root; /* * update the num_bytes so we make sure the accounting @@ -751,7 +762,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, struct btrfs_qgroup_extent_record *qrecord, u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, int action, bool is_data, - bool is_system) + bool is_system, u64 owning_root) { int count_mod = 1; bool must_insert_reserved = false; @@ -792,6 +803,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, head_ref->num_bytes = num_bytes; head_ref->ref_mod = count_mod; head_ref->must_insert_reserved = must_insert_reserved; + head_ref->owning_root = owning_root; head_ref->is_data = is_data; head_ref->is_system = is_system; head_ref->ref_tree = RB_ROOT_CACHED; @@ -982,7 +994,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, init_delayed_ref_head(head_ref, record, bytenr, num_bytes, generic_ref->tree_ref.ref_root, 0, action, - false, is_system); + false, is_system, generic_ref->owning_root); head_ref->extent_op = extent_op; delayed_refs = &trans->transaction->delayed_refs; @@ -1073,7 +1085,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, } init_delayed_ref_head(head_ref, record, bytenr, num_bytes, ref_root, - reserved, action, true, false); + reserved, action, true, false, generic_ref->owning_root); head_ref->extent_op = NULL; delayed_refs = &trans->transaction->delayed_refs; @@ -1119,7 +1131,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans, return -ENOMEM; init_delayed_ref_head(head_ref, NULL, bytenr, num_bytes, 0, 0, - BTRFS_UPDATE_DELAYED_HEAD, false, false); + BTRFS_UPDATE_DELAYED_HEAD, false, false, 0); head_ref->extent_op = extent_op; delayed_refs = &trans->transaction->delayed_refs; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 2c384862110e..faa2e000dadc 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -110,6 +110,12 @@ struct btrfs_delayed_ref_head { */ int ref_mod; + /* + * The root that triggered the allocation when must_insert_reserved is + * set to true. + */ + u64 owning_root; + /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree @@ -123,6 +129,7 @@ struct btrfs_delayed_ref_head { * the free has happened. */ bool must_insert_reserved; + bool is_data; bool is_system; bool processing; From patchwork Wed Sep 13 00:13:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23459EE49A4 for ; Wed, 13 Sep 2023 00:13:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237864AbjIMANR (ORCPT ); Tue, 12 Sep 2023 20:13:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237875AbjIMANQ (ORCPT ); Tue, 12 Sep 2023 20:13:16 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6064D1706 for ; Tue, 12 Sep 2023 17:13:12 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id AAA5E320093C; Tue, 12 Sep 2023 20:13:11 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 12 Sep 2023 20:13:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563991; x= 1694650391; bh=Ixeb9JpSNzpXGkqNR8txifhVE73rHXZtceR1pGhm2yA=; b=k tvzlleBubVrL7bgNs4Ig7dOp5twI20MFD6f2JwHz3roWpZJyqhCrpw4xeluqeI5I vyxQOqmMBfwip8h1+Ui+PacL2MTVjWKKwWhjbhLu3PpWf46A6uwbK7L9YZXiZCRb pFjhY/eKB4td7+shsB6FTvlpYDqw0qE1dRQ913uJDKXmy3YwfHiMwTihx/9Dc+WN dWpuQifylnkNKispJ5DZoazg4hUjxJSL+KC4I6V7k5jZyrb5tE0K8DP7EqDu153E ehVu/AdxCfushbJguIzXafPfw26M9ifZKPHh6APyF/kPJNbhpnltpiP/eP7lJtnP Qp55Wby+TRxVfsIoKBaYA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563991; x=1694650391; bh=I xeb9JpSNzpXGkqNR8txifhVE73rHXZtceR1pGhm2yA=; b=k9IcRjYU2rpMcR9ie UURkRMqesOzk1NO/avX7jjnxMPjGtx0cb57KinW9ATz7nEM4wFFsvzK3pkeGRvsw FVjQm7J1+wOQ2WlBaYaUtmhXRZ0HvXWcsa71Hsl3D9D7quOPDwQqX0Ccxx9BbUsQ b8W/2WSrsywIIEVruk0Yn1lepUGUPouV3MIYULlENKhopylpR3fQpe3UKgV+DZ7J spqYzt6zd0g1JWS7uLYiOuP3nzWSVLje+Zd4IY0n4GTB3ld+qWrlmVrm0pFM/hAn N2f6HD0xn8LK36bGXbVVhVBXc7lMlZYJWdm7/hk/U+Rd4TJG6C2XJ2L5qHz1lzZm oNRPg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:10 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 11/18] btrfs: new inline ref storing owning subvol of data extents Date: Tue, 12 Sep 2023 17:13:22 -0700 Message-ID: <8256c4f8cfd25f7179e988ea3b35135739319d39.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In order to implement simple quota groups, we need to be able to associate a data extent with the subvolume that created it. Once you account for reflink, this information cannot be recovered without explicitly storing it. Options for storing it are: - a new key/item - a new extent inline ref item The former is backwards compatible, but wastes space, the latter is incompat, but is efficient in space and reuses the existing inline ref machinery, while only abusing it a tiny amount -- specifically, the new item is not a ref, per-se. Signed-off-by: Boris Burkov --- fs/btrfs/accessors.h | 4 +++ fs/btrfs/backref.c | 3 ++ fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++------- fs/btrfs/print-tree.c | 12 +++++++ fs/btrfs/ref-verify.c | 3 ++ fs/btrfs/tree-checker.c | 3 ++ include/uapi/linux/btrfs_tree.h | 12 +++++++ 7 files changed, 83 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index f958eccff477..ad8aa1ae5c0c 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -350,6 +350,8 @@ BTRFS_SETGET_FUNCS(extent_data_ref_count, struct btrfs_extent_data_ref, count, 3 BTRFS_SETGET_FUNCS(shared_data_ref_count, struct btrfs_shared_data_ref, count, 32); +BTRFS_SETGET_FUNCS(extent_owner_ref_root_id, struct btrfs_extent_owner_ref, root_id, 64); + BTRFS_SETGET_FUNCS(extent_inline_ref_type, struct btrfs_extent_inline_ref, type, 8); BTRFS_SETGET_FUNCS(extent_inline_ref_offset, struct btrfs_extent_inline_ref, @@ -366,6 +368,8 @@ static inline u32 btrfs_extent_inline_ref_size(int type) if (type == BTRFS_EXTENT_DATA_REF_KEY) return sizeof(struct btrfs_extent_data_ref) + offsetof(struct btrfs_extent_inline_ref, offset); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) + return sizeof(struct btrfs_extent_inline_ref); return 0; } diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index b7d54efb4728..0cde873bdee2 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1129,6 +1129,9 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, count, sc, GFP_NOFS); break; } + case BTRFS_EXTENT_OWNER_REF_KEY: + ASSERT(btrfs_fs_incompat(ctx->fs_info, SIMPLE_QUOTA)); + break; default: WARN_ON(1); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 336957737c6c..4fb8fd9d9e40 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -344,9 +344,15 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, struct btrfs_extent_inline_ref *iref, enum btrfs_inline_ref_type is_data) { + struct btrfs_fs_info *fs_info = eb->fs_info; int type = btrfs_extent_inline_ref_type(eb, iref); u64 offset = btrfs_extent_inline_ref_offset(eb, iref); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + ASSERT(btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + return type; + } + if (type == BTRFS_TREE_BLOCK_REF_KEY || type == BTRFS_SHARED_BLOCK_REF_KEY || type == BTRFS_SHARED_DATA_REF_KEY || @@ -355,26 +361,25 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, if (type == BTRFS_TREE_BLOCK_REF_KEY) return type; if (type == BTRFS_SHARED_BLOCK_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ - if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + if (offset && IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else if (is_data == BTRFS_REF_TYPE_DATA) { if (type == BTRFS_EXTENT_DATA_REF_KEY) return type; if (type == BTRFS_SHARED_DATA_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else { @@ -385,7 +390,7 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, WARN_ON(1); btrfs_print_leaf(eb); - btrfs_err(eb->fs_info, + btrfs_err(fs_info, "eb %llu iref 0x%lx invalid extent inline ref type %d", eb->start, (unsigned long)iref, type); @@ -886,6 +891,11 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, while (ptr < end) { iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + ASSERT(btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + ptr += btrfs_extent_inline_ref_size(type); + continue; + } if (type == BTRFS_REF_TYPE_INVALID) { ret = -EUCLEAN; goto out; @@ -1737,6 +1747,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, node->type == BTRFS_SHARED_DATA_REF_KEY) ret = run_delayed_data_ref(trans, node, extent_op, insert_reserved); + else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) + ret = 0; else BUG(); if (ret && insert_reserved) @@ -2308,6 +2320,7 @@ static noinline int check_committed_ref(struct btrfs_root *root, struct btrfs_extent_item *ei; struct btrfs_key key; u32 item_size; + u32 expected_size; int type; int ret; @@ -2334,10 +2347,22 @@ static noinline int check_committed_ref(struct btrfs_root *root, ret = 1; item_size = btrfs_item_size(leaf, path->slots[0]); ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); + expected_size = sizeof(*ei) + btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY); + + /* No inline refs; we need to bail before checking for owner ref */ + if (item_size == sizeof(*ei)) + goto out; + + /* Check for an owner ref; skip over it to the real inline refs */ + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA) && type == BTRFS_EXTENT_OWNER_REF_KEY) { + expected_size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + iref = (struct btrfs_extent_inline_ref *)(iref + 1); + } /* If extent item has more than 1 inline ref then it's shared */ - if (item_size != sizeof(*ei) + - btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) + if (item_size != expected_size) goto out; /* @@ -2349,8 +2374,6 @@ static noinline int check_committed_ref(struct btrfs_root *root, btrfs_root_last_snapshot(&root->root_item))) goto out; - iref = (struct btrfs_extent_inline_ref *)(ei + 1); - /* If this extent has SHARED_DATA_REF then it's shared */ type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); if (type != BTRFS_EXTENT_DATA_REF_KEY) @@ -4610,18 +4633,23 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root; int ret; struct btrfs_extent_item *extent_item; + struct btrfs_extent_owner_ref *oref; struct btrfs_extent_inline_ref *iref; struct btrfs_path *path; struct extent_buffer *leaf; int type; u32 size; + const bool simple_quota = (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE); if (parent > 0) type = BTRFS_SHARED_DATA_REF_KEY; else type = BTRFS_EXTENT_DATA_REF_KEY; - size = sizeof(*extent_item) + btrfs_extent_inline_ref_size(type); + size = sizeof(*extent_item); + if (simple_quota) + size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + size += btrfs_extent_inline_ref_size(type); path = btrfs_alloc_path(); if (!path) @@ -4643,7 +4671,14 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, flags | BTRFS_EXTENT_FLAG_DATA); iref = (struct btrfs_extent_inline_ref *)(extent_item + 1); + if (simple_quota) { + btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY); + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid); + iref = (struct btrfs_extent_inline_ref *)(oref + 1); + } btrfs_set_extent_inline_ref_type(leaf, iref, type); + if (parent > 0) { struct btrfs_shared_data_ref *ref; ref = (struct btrfs_shared_data_ref *)(iref + 1); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index 0c93439e929f..8860075ab25e 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -80,12 +80,20 @@ static void print_extent_data_ref(const struct extent_buffer *eb, btrfs_extent_data_ref_count(eb, ref)); } +static void print_extent_owner_ref(const struct extent_buffer *eb, + const struct btrfs_extent_owner_ref *ref) +{ + ASSERT(btrfs_fs_incompat(eb->fs_info, SIMPLE_QUOTA)); + pr_cont("extent data owner root %llu\n", btrfs_extent_owner_ref_root_id(eb, ref)); +} + static void print_extent_item(const struct extent_buffer *eb, int slot, int type) { struct btrfs_extent_item *ei; struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; + struct btrfs_extent_owner_ref *oref; struct btrfs_disk_key key; unsigned long end; unsigned long ptr; @@ -161,6 +169,10 @@ static void print_extent_item(const struct extent_buffer *eb, int slot, int type "\t\t\t(parent %llu not aligned to sectorsize %u)\n", offset, eb->fs_info->sectorsize); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + print_extent_owner_ref(eb, oref); + break; default: pr_cont("(extent %llu has INVALID ref type %d)\n", eb->start, type); diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index e9e1ebd8dd6a..1f62976bee82 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -485,6 +485,9 @@ static int process_extent_item(struct btrfs_fs_info *fs_info, ret = add_shared_data_ref(fs_info, offset, count, key->objectid, key->offset); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: btrfs_err(fs_info, "invalid key type in iref"); ret = -EINVAL; diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 8ad92aa43924..01bba79165e7 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1466,6 +1466,9 @@ static int check_extent_item(struct extent_buffer *leaf, } inline_refs += btrfs_shared_data_ref_count(leaf, sref); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: extent_err(leaf, slot, "unknown inline ref type: %u", inline_type); diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index a4b44b13fff5..e60b62c1627e 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -230,6 +230,14 @@ #define BTRFS_SHARED_DATA_REF_KEY 184 +/* + * Special inline ref key which stores the id of the subvolume which originally + * created the extent. This subvolume owns the extent permanently from the + * perspective of simple quotas. Needed to know which subvolume to free quota + * usage from when the extent is deleted. + */ +#define BTRFS_EXTENT_OWNER_REF_KEY 188 + /* * block groups give us hints into the extent allocation trees. Which * blocks are free etc etc @@ -787,6 +795,10 @@ struct btrfs_shared_data_ref { __le32 count; } __attribute__ ((__packed__)); +struct btrfs_extent_owner_ref { + __le64 root_id; +} __attribute__ ((__packed__)); + struct btrfs_extent_inline_ref { __u8 type; __le64 offset; From patchwork Wed Sep 13 00:13:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73089EE3F39 for ; Wed, 13 Sep 2023 00:13:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237900AbjIMANU (ORCPT ); Tue, 12 Sep 2023 20:13:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMANT (ORCPT ); Tue, 12 Sep 2023 20:13:19 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E1001706 for ; Tue, 12 Sep 2023 17:13:15 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 54F52320093C; Tue, 12 Sep 2023 20:13:14 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Tue, 12 Sep 2023 20:13:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563993; x= 1694650393; bh=h1MeA0in5gQ9ykVn6wpcdrwY1FhiRV4SiNAZTQ3HGqw=; b=k omiVqFsHuH8o0Wmn/Ly5WHzXkyYRxycP7shyQvPahUpjIkvMXg9Mshn5A6odUWFp 6ixrgi1JdYNx/QgHYUrf7kbk85u/NPL8t+jby+ULkNiBIvThIf/fRCfhrLYeSMoU s7GRXpJxF7NY6Sd4kVxwm/LqRFam1UhsmL6ZNmtMh7X3p+126LHsj27MmiBn0RBW L4MiYXEykWD7YCyZOlr2PglTSvYvV3sr9Kbf/iAyh7mFxwWacL9iEUPiMeyyNazz 6d8YQNofnhtlTGMFUX7km+7B0st3HiWcdAp7R6FxuPOlVsWBiohw8dtwzyRD/wye w9t4XVybPGp3LMn3dLAxQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563993; x=1694650393; bh=h 1MeA0in5gQ9ykVn6wpcdrwY1FhiRV4SiNAZTQ3HGqw=; b=coQ2S+hfmtn9DQ0Y2 Rh1oSaOxoODcliIMiz2eYhhv4l6z8I9gnYtYZZ0uJa2d2TDrANXdv8bg0u4rrwhm QJwrfs8Tbv1t+xLrPrBRdgQ/HLu7X9nLry8f0gFqaz2MzB1UmPR7PtSe9C5CZrPc km6AGpN/zJ/nvYeAUxWRKxAes7Ip3XV9l3B6HRR9WguQl0hoTxWexWl7iZT3xrLO kPg6GR80vKzQg0zsif4tJQD1xPs443fD2JO5Y/rnwNTst0xXnTVLfzDZBvg3BWn3 6nYqr+PaKNY+Hou19utEbabeVH233f9LwLbVb0Z8tHG07H/+8vpoH36vCv/P1nyw VlG8w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:13 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 12/18] btrfs: inline owner ref lookup helper Date: Tue, 12 Sep 2023 17:13:23 -0700 Message-ID: <8c92388560eace362e182f6416141a45957fe41d.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Inline ref parsing is a bit tricky and relies on a decent amount of implicit information, so I think it is beneficial to have a helper function for reading the owner ref, if only to "document" the format, along with the write path. The main subtlety of note which I was missing by open-coding this was that it is important to check whether or not inline refs are present *at all*. i.e., if we are writing out a new extent under squotas, we will always use a big enough item for the inline ref and have it. However, it is possible that some random item predating squotas will not have any inline refs. In that case, trying to read the "type" field of the first inline ref will just be reading garbage in the form of whatever is in the next item. This will be used by the extent free-ing path, which looks up data extent owners as well as a relocation path which needs to grab the owner before relocating an extent. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/extent-tree.h | 3 +++ 2 files changed, 53 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4fb8fd9d9e40..249b3bfe181a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2865,6 +2865,56 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) return 0; } +/* + * Helper to parse an extent item's inline extents looking for a simple + * quotas owner ref. + * + * @fs_info: the btrfs_fs_info for this mount + * @leaf: a leaf in the extent tree containing the extent item + * @slot: the slot in the leaf where the extent item is found + * + * Returns the objectid of the root that originally allocated the extent item + * if the inline owner ref is expected and present, otherwise 0. + * + * If an extent item has an owner ref item, it will be the first inline ref + * item. Therefore the logic is to check whether there are any inline ref + * items, then check the type of the first one. + */ +u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, + int slot) +{ + struct btrfs_extent_item *ei; + struct btrfs_extent_inline_ref *iref; + struct btrfs_extent_owner_ref *oref; + unsigned long ptr; + unsigned long end; + int type; + + if (!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return 0; + + ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + ptr = (unsigned long)(ei + 1); + end = (unsigned long)ei + btrfs_item_size(leaf, slot); + + /* No inline ref items of any kind, can't check type */ + if (ptr == end) + return 0; + + iref = (struct btrfs_extent_inline_ref *)ptr; + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_ANY); + + /* We found an owner ref, get the root out of it */ + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + return btrfs_extent_owner_ref_root_id(leaf, oref); + } + + /* We have inline refs, but not an owner ref */ + return 0; +} + static int do_free_extent_accounting(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, bool is_data) { diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 397cccafc885..5a02faa05464 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -137,6 +137,9 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, struct extent_buffer *eb, u64 flags); int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref); +u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, + int slot); int btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info, u64 start, u64 len, int delalloc); int btrfs_pin_reserved_extent(struct btrfs_trans_handle *trans, From patchwork Wed Sep 13 00:13:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E997BEE49A4 for ; Wed, 13 Sep 2023 00:13:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237912AbjIMANW (ORCPT ); Tue, 12 Sep 2023 20:13:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMANV (ORCPT ); Tue, 12 Sep 2023 20:13:21 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB77610F2 for ; Tue, 12 Sep 2023 17:13:17 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id F23BC32005D8; Tue, 12 Sep 2023 20:13:16 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Tue, 12 Sep 2023 20:13:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563996; x= 1694650396; bh=A1CND227lvobr1FHOClaSn0NRI2uCHMnUjt3w/a2+e0=; b=b mog5byj/4Ua9PuwFMG4v3ROtGGa8VdXMb+9oKBBfqjHEy8UNkH6dvSoLp4cyTP7S WKsKW3KEdMAxkQDKY+Eadl5KYDiyOncdXcJgosK+ybGTj+RSFi24kuXAqreN1BpC qoZg42h8Ok/ub/UpczKHy2JgoHA4fmkhbqy3ADaD97gvNjcpIqANi0S5RsXuWkYW Pd2CQffm1Ta9Bmn6z8XSoD/ABAWdfc+hgS+AMOWIqHuwUHRKGvjtGN/nwqYbSjOI b5DkPnNNBqRt8QZSRVifcCkAUdJnISFrmUexOdPgAmJUw6l/kTdUtLLAi4HFTWVV O3sDNZJ38NSFipvV3JrXg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563996; x=1694650396; bh=A 1CND227lvobr1FHOClaSn0NRI2uCHMnUjt3w/a2+e0=; b=uf+TDI1pg3CEC69Mr 8TXfL3XFsiVF4ggwHCErV1iQdYhX2198TrT1vpRSRpANmcm+Z6Ki0e0Q2J+jmgIr Jtzs7Nmfxlfl6Non/nvJHM4wsyIsC7ldZ1ppkZvdI0T0nRcwhYKIYVHlfqWE/D5j NsXkCSduO57IrgnhIkXpyzaFEuSLqD7ikNy9Y5EK5gfJ0gjx2Pw5T8/rBnqnStTv +Ob0R6g9L/QcShCWWLulLIa52EbBQJEiLDtVZz4cabFw3DkZl2yPTSxn35er25KJ hHcXbb9rNHzvMYmlVREMrrSMbWZg/pczAu4fzoB+V3hBPQbFTBajWQv1HU6uZ8Pv UFKfQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:15 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 13/18] btrfs: record simple quota deltas Date: Tue, 12 Sep 2023 17:13:24 -0700 Message-ID: <41b5b59ce38d17f9cbe3b77257a9716a3b8f12ea.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org At the moment that we run delayed refs, we make the final ref-count based decision on creating/removing extent (and metadata) items. Therefore, it is exactly the spot to hook up simple quotas. There are a few important subtleties to the fields we must collect to accurately track simple quotas, particularly when removing an extent. When removing a data extent, the ref could be in any tree (due to reflink, for example) and so we need to recover the owning root id from the owner ref item. When removing a metadata extent, we know the owning root from the owner field in the header when we create the delayed ref, so we can recover it from there. We must also be careful to handle reservations properly to not leaked reserved space. The happy path is freeing the reservation when the simple quota delta runs on a data extent. If that doesn't happen, due to refs canceling out or some error, the ref head already has the must_insert_reserved machinery to handle this, so we piggy back on that and use it to clean up the reserved data. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 1 + fs/btrfs/delayed-ref.h | 6 +++ fs/btrfs/extent-tree.c | 85 +++++++++++++++++++++++++++++++++++++----- 3 files changed, 82 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index b8ae48e8a63b..c6c3a4f67db7 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -802,6 +802,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, head_ref->bytenr = bytenr; head_ref->num_bytes = num_bytes; head_ref->ref_mod = count_mod; + head_ref->reserved_bytes = reserved; head_ref->must_insert_reserved = must_insert_reserved; head_ref->owning_root = owning_root; head_ref->is_data = is_data; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index faa2e000dadc..d2cc779e10a6 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -116,6 +116,12 @@ struct btrfs_delayed_ref_head { */ u64 owning_root; + /* + * Track reserved bytes when setting must_insert_reserved. + * On success or cleanup, we will need to free the reservation. + */ + u64 reserved_bytes; + /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 249b3bfe181a..c399127f9918 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -47,6 +47,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, u64 parent, u64 root_objectid, u64 owner_objectid, u64 owner_offset, @@ -1536,6 +1537,7 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, } static int run_delayed_data_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) @@ -1553,6 +1555,13 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { struct btrfs_key key; + struct btrfs_squota_delta delta = { + .root = href->owning_root, + .num_bytes = node->num_bytes, + .rsv_bytes = href->reserved_bytes, + .is_data = true, + .is_inc = true, + }; if (extent_op) flags |= extent_op->flags_to_set; @@ -1565,12 +1574,14 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, flags, ref->objectid, ref->offset, &key, node->ref_mod); + if (!ret) + ret = btrfs_record_squota_delta(trans->fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref->root, ref->objectid, ref->offset, extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { - ret = __btrfs_free_extent(trans, node, parent, + ret = __btrfs_free_extent(trans, href, node, parent, ref->root, ref->objectid, ref->offset, extent_op); } else { @@ -1687,11 +1698,13 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, } static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) { int ret = 0; + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_tree_ref *ref; u64 parent = 0; u64 ref_root = 0; @@ -1711,13 +1724,23 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, return -EUCLEAN; } if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_squota_delta delta = { + .root = href->owning_root, + .num_bytes = fs_info->nodesize, + .rsv_bytes = 0, + .is_data = false, + .is_inc = true, + }; + BUG_ON(!extent_op || !extent_op->update_flags); ret = alloc_reserved_tree_block(trans, node, extent_op); + if (!ret) + btrfs_record_squota_delta(fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->level, 0, extent_op); } else if (node->action == BTRFS_DROP_DELAYED_REF) { - ret = __btrfs_free_extent(trans, node, parent, ref_root, + ret = __btrfs_free_extent(trans, href, node, parent, ref_root, ref->level, 0, extent_op); } else { BUG(); @@ -1727,6 +1750,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, /* helper function to actually process a single delayed ref entry */ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) @@ -1741,12 +1765,12 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, if (node->type == BTRFS_TREE_BLOCK_REF_KEY || node->type == BTRFS_SHARED_BLOCK_REF_KEY) - ret = run_delayed_tree_ref(trans, node, extent_op, + ret = run_delayed_tree_ref(trans, href, node, extent_op, insert_reserved); else if (node->type == BTRFS_EXTENT_DATA_REF_KEY || node->type == BTRFS_SHARED_DATA_REF_KEY) - ret = run_delayed_data_ref(trans, node, extent_op, - insert_reserved); + ret = run_delayed_data_ref(trans, href, node, + extent_op, insert_reserved); else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) ret = 0; else @@ -1847,6 +1871,11 @@ u64 btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, return btrfs_calc_delayed_ref_csum_bytes(fs_info, nr_csums); } + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE && + head->must_insert_reserved && head->is_data) + btrfs_qgroup_free_refroot(fs_info, head->owning_root, + head->reserved_bytes, + BTRFS_QGROUP_RSV_DATA); return 0; } @@ -1995,10 +2024,11 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, locked_ref->extent_op = NULL; spin_unlock(&locked_ref->lock); - ret = run_one_delayed_ref(trans, ref, extent_op, - must_insert_reserved); + ret = run_one_delayed_ref(trans, locked_ref, ref, + extent_op, must_insert_reserved); btrfs_delayed_refs_rsv_release(fs_info, 1, 0); *bytes_released += btrfs_calc_delayed_ref_bytes(fs_info, 1); + btrfs_free_delayed_extent_op(extent_op); if (ret) { unselect_delayed_ref_head(delayed_refs, locked_ref); @@ -2916,11 +2946,12 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, } static int do_free_extent_accounting(struct btrfs_trans_handle *trans, - u64 bytenr, u64 num_bytes, bool is_data) + u64 bytenr, struct btrfs_squota_delta *delta) { int ret; + u64 num_bytes = delta->num_bytes; - if (is_data) { + if (delta->is_data) { struct btrfs_root *csum_root; csum_root = btrfs_csum_root(trans->fs_info, bytenr); @@ -2931,6 +2962,12 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans, } } + ret = btrfs_record_squota_delta(trans->fs_info, delta); + if (ret) { + btrfs_abort_transaction(trans, ret); + return ret; + } + ret = add_to_free_space_tree(trans, bytenr, num_bytes); if (ret) { btrfs_abort_transaction(trans, ret); @@ -3011,6 +3048,7 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans, * And that (13631488 EXTENT_DATA_REF ) gets removed. */ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, u64 parent, u64 root_objectid, u64 owner_objectid, u64 owner_offset, @@ -3034,6 +3072,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, u64 bytenr = node->bytenr; u64 num_bytes = node->num_bytes; bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA); + u64 delayed_ref_root = href->owning_root; extent_root = btrfs_extent_root(info, bytenr); ASSERT(extent_root); @@ -3234,6 +3273,14 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } } } else { + struct btrfs_squota_delta delta = { + .root = delayed_ref_root, + .num_bytes = num_bytes, + .rsv_bytes = 0, + .is_data = is_data, + .is_inc = false, + }; + /* In this branch refs == 1 */ if (found_extent) { if (is_data && refs_to_drop != @@ -3272,6 +3319,16 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, num_to_del = 2; } } + /* + * We can't infer the data owner from the delayed ref, so we + * need to try to get it from the owning ref item. + * + * If it is not present, then that extent was not written under + * simple quotas mode, so we don't need to account for its + * deletion. + */ + if (is_data) + delta.root = btrfs_get_extent_owner_root(trans->fs_info, leaf, extent_slot); ret = btrfs_del_items(trans, extent_root, path, path->slots[0], num_to_del); @@ -3281,7 +3338,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } btrfs_release_path(path); - ret = do_free_extent_accounting(trans, bytenr, num_bytes, is_data); + ret = do_free_extent_accounting(trans, bytenr, &delta); } btrfs_release_path(path); @@ -4857,6 +4914,13 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, int ret; struct btrfs_block_group *block_group; struct btrfs_space_info *space_info; + struct btrfs_squota_delta delta = { + .root = root_objectid, + .num_bytes = ins->offset, + .rsv_bytes = 0, + .is_data = true, + .is_inc = true, + }; /* * Mixed block groups will exclude before processing the log so we only @@ -4885,6 +4949,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, offset, ins, 1); if (ret) btrfs_pin_extent(trans, ins->objectid, ins->offset, 1); + ret = btrfs_record_squota_delta(fs_info, &delta); btrfs_put_block_group(block_group); return ret; } From patchwork Wed Sep 13 00:13:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01CEFEE3F3F for ; Wed, 13 Sep 2023 00:13:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237919AbjIMANY (ORCPT ); Tue, 12 Sep 2023 20:13:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMANY (ORCPT ); Tue, 12 Sep 2023 20:13:24 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 554A710F2 for ; Tue, 12 Sep 2023 17:13:20 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 9A6B532005D8; Tue, 12 Sep 2023 20:13:19 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Tue, 12 Sep 2023 20:13:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694563999; x= 1694650399; bh=gJ6Hd1hTQfz73DLfp6p3BrC9gFTFYIBvVY6l7VnWDH4=; b=I yI8WtQBJxDPTjKjGyzZGCqk3wLGDvbB22x0sLMcqStKqsAIvrHHuv4K7/u8Ej9uw cs9IIApNKQjFYyZwAzebHwfn4hJp/Q5uwituNe6bpyiTK9DD7J9949/N5Ce931eC yRxud3vzANxvaoovmG+DbZvAFnmn1ZB+wicOnWVOcTAWGyqdwwaUlyIZLHnwOrcj Q25ZLqK5ykvSXL769swt+l8AkSdPACB0txfTrLVTJiE4+mqDjM+dVAiqwmFQdKSJ c2ggcDmenpXbwlLGSFK9gPSU6LyWCNMxrTIrOCLD6qCxfejeyg8+DR+JAYQ8E1rd nJOsOY4dUlukH8LG+eaVA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694563999; x=1694650399; bh=g J6Hd1hTQfz73DLfp6p3BrC9gFTFYIBvVY6l7VnWDH4=; b=hWk9dRkFgxFSyhA6w eUPEe6lGTjXVBeDC5wo6SPRHaWzaYkZhGZnZoQtG/JsVRiU0HPBCDzX1Fz2n9/4P cGrKQwIxAemBklxp1Y8HfaWAGVGnDhHgPfgtgfYqcTYhtluu1eSLkXaNZ2fi3JGR a60mPNTqqsFfAM7LV0AVz7Ehn3l+bBmG+Npp2S4snLJ8NW6kdyUfw1J6Tpghpy1B gJ0bBJ6AcaAgk33ZEcXQcqDyqCDdWJ9odSoZJI3SPpLvu0T7eD5ZKnQ6QalPM55q SUo0au4df5hRPRcWYadBO/MzPcMNzXXR2NQ6JwwyfN9Jx5bFakVF2c+hQqVH+T86 QAvPA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:18 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 14/18] btrfs: simple quota auto hierarchy for nested subvols Date: Tue, 12 Sep 2023 17:13:25 -0700 Message-ID: <76f7eec4fb236001c18c9d5263ef141d70bdaabd.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Consider the following sequence: - enable quotas - create subvol S id 256 at dir outer/ - create a qgroup 1/100 - add 0/256 (S's auto qgroup) to 1/100 - create subvol T id 257 at dir outer/inner/ With full qgroups, there is no relationship between 0/257 and either of 0/256 or 1/100. There is an inherit feature that the creator of inner/ can use to specify it ought to be in 1/100. Simple quotas are targeted at container isolation, where such automatic inheritance for not necessarily trusted/controlled nested subvol creation would be quite helpful. Therefore, add a new default behavior for simple quotas: when you create a nested subvol, automatically inherit as parents any parents of the qgroup of the subvol the new inode is going in. In our example, 257/0 would also be under 1/100, allowing easy control of a total quota over an arbitrary hierarchy of subvolumes. I think this _might_ be a generally useful behavior, so it could be interesting to put it behind a new inheritance flag that simple quotas always use while traditional quotas let the user specify, but this is a minimally intrusive change to start. Signed-off-by: Boris Burkov --- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 57 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/qgroup.h | 6 ++--- fs/btrfs/transaction.c | 12 ++++++--- 4 files changed, 66 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6211bce7f146..59fcb7424d93 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, /* Tree log can't currently deal with an inode which is a new root. */ btrfs_set_log_full_commit(trans); - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); if (ret) goto out; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 86d3a46ee33f..1b41686f934f 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1560,8 +1560,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, return ret; } -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst) +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_qgroup *parent; @@ -3030,6 +3029,47 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) return ret; } +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, + u64 inode_rootid, + struct btrfs_qgroup_inherit **inherit) +{ + int i = 0; + u64 num_qgroups = 0; + struct btrfs_qgroup *inode_qg; + struct btrfs_qgroup_list *qg_list; + struct btrfs_qgroup_inherit *res; + size_t struct_sz; + u64 *qgids; + + if (*inherit) + return -EEXIST; + + inode_qg = find_qgroup_rb(fs_info, inode_rootid); + if (!inode_qg) + return -ENOENT; + + num_qgroups = list_count_nodes(&inode_qg->groups); + + if (!num_qgroups) + return 0; + + struct_sz = struct_size(res, qgroups, num_qgroups); + if (struct_sz == SIZE_MAX) + return -ERANGE; + + res = kzalloc(struct_sz, GFP_NOFS); + if (!res) + return -ENOMEM; + res->num_qgroups = num_qgroups; + qgids = res->qgroups; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) + qgids[i] = qg_list->group->qgroupid; + + *inherit = res; + return 0; +} + /* * Copy the accounting information between qgroups. This is necessary * when a snapshot or a subvolume is created. Throwing an error will @@ -3037,7 +3077,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) * when a readonly fs is a reasonable outcome. */ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit) + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit) { int ret = 0; int i; @@ -3049,6 +3090,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, struct btrfs_qgroup *dstgroup; struct btrfs_qgroup *prealloc; struct btrfs_qgroup_list **qlist_prealloc = NULL; + bool free_inherit = false; bool need_rescan = false; u32 level_size = 0; u64 nums; @@ -3085,6 +3127,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, goto out; } + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE && !inherit) { + ret = qgroup_auto_inherit(fs_info, inode_rootid, &inherit); + if (ret) + goto out; + free_inherit = true; + } + if (inherit) { i_qgroups = (u64 *)(inherit + 1); nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + @@ -3263,6 +3312,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, mutex_unlock(&fs_info->qgroup_ioctl_lock); if (need_rescan) qgroup_mark_inconsistent(fs_info); + if (free_inherit) + kfree(inherit); if (qlist_prealloc) { for (int i = 0; i < inherit->num_qgroups; i++) kfree(qlist_prealloc[i]); diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 5eabd944c782..126a599c7349 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -312,8 +312,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, bool interruptible); -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst); +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); @@ -343,7 +342,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); int btrfs_run_qgroups(struct btrfs_trans_handle *trans); int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit); + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit); void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index eb649860ecfb..645646e587d5 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1620,7 +1620,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that full qgroups are not + * Save some performance in the case that qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ @@ -1663,7 +1663,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, /* Now qgroup are all updated, we can inherit it to new qgroups */ ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid, - inherit); + parent->root_key.objectid, inherit); if (ret < 0) goto out; @@ -1930,8 +1930,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * To co-operate with that hack, we do hack again. * Or snapshot will be greatly slowed down by a subtree qgroup rescan */ - ret = qgroup_account_snapshot(trans, root, parent_root, - pending->inherit, objectid); + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) + ret = qgroup_account_snapshot(trans, root, parent_root, + pending->inherit, objectid); + else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + ret = btrfs_qgroup_inherit(trans, root->root_key.objectid, objectid, + parent_root->root_key.objectid, pending->inherit); if (ret < 0) goto fail; From patchwork Wed Sep 13 00:13:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C58B2EE3F3F for ; Wed, 13 Sep 2023 00:13:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237927AbjIMAN2 (ORCPT ); Tue, 12 Sep 2023 20:13:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMAN1 (ORCPT ); Tue, 12 Sep 2023 20:13:27 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46CAD10F2 for ; Tue, 12 Sep 2023 17:13:23 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 4C5A432005D8; Tue, 12 Sep 2023 20:13:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Tue, 12 Sep 2023 20:13:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694564001; x= 1694650401; bh=ii2WWIkNvtrK1HXOr7zA7xTNGGHgx33IdLKX8NKBrC8=; b=b HOgY54boPSk7A+ixcy7v7EP7ZfjeQNA0zhilHjweDF/QySj7F5W94CKdrl/4/SN0 6XlrvrtTLxK67+88WzL8j05ris4MZX9kI0r2Su5D5b0i0zRYCi129xgFEWVkVKh9 5B7M1UP0jzwtvuaqMniC93a7zn6Wt4L6YYIKkwzxgPt+2SoSAiS2pxPkQEi5sSpI 24T7tkMOgoaVJEEZC/yqZPZmTMCbQ084wVrMcVVjeVNJ9RUJMXBVFmMNvqXpUacE RvqTBq/RGcRc83Um1hMWKoAOmlV7Ib1EYmn5qcNvpLxNnBTlrkCvfImtI4+7yqG/ I3iUnI+hoQc8J7hXiNIkA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694564001; x=1694650401; bh=i i2WWIkNvtrK1HXOr7zA7xTNGGHgx33IdLKX8NKBrC8=; b=d3Fs2FzngTltnW76R vyuvTa5nZXH44WVpcMPUL5z248BQH95x9MsGhwDnqo2UbjiGqgnxWi7Rhu2wJluD W7fQ7Is2aTrjrM+CfcDBildnw7QWNOG9K7zbJBTggHfswpvKgBKDEVswc3o8re/i BiSrv1B9rJMefg810js0+ffE2JGNFonB0xciuYUjy7MRDI0oCUYTdiuJEgc3f8OI JwLvjZ35z/+c3hvEWvKlDyNgPkW3949S4XpawqM9/iE762BPUgsLj0xZMCMIsEsR mQUuA0AgiacLVxebO1TKmi/uSqwGVpcZULz4hZW+tmxdU6xrUzkhFmQCKFT9SAxW RdNvw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeegnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:21 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 15/18] btrfs: check generation when recording simple quota delta Date: Tue, 12 Sep 2023 17:13:26 -0700 Message-ID: <5d8f271d9040cf02fca6dbfce348c7edc9e66dd5.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Simple quotas count extents only from the moment the feature is enabled. Therefore, if we do something like: 1. create subvol S 2. write F in S 3. enable quotas 4. remove F 5. write G in S then after 3. and 4. we would expect the simple quota usage of S to be 0 (putting aside some metadata extents that might be written) and after 5., it should be the size of G plus metadata. Therefore, we need to be able to determine whether a particular quota delta we are processing predates simple quota enablement. To do this, store the transaction id when quotas were enabled. In fs_info for immediate use and in the quota status item to make it recoverable on mount. When we see a delta, check if the generation of the extent item is less than that of quota enablement. If so, we should ignore the delta from this extent. Signed-off-by: Boris Burkov --- fs/btrfs/accessors.h | 2 ++ fs/btrfs/extent-tree.c | 4 ++++ fs/btrfs/fs.h | 1 + fs/btrfs/qgroup.c | 28 ++++++++++++++++++++++------ fs/btrfs/qgroup.h | 2 ++ include/uapi/linux/btrfs_tree.h | 9 +++++++++ 6 files changed, 40 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index ad8aa1ae5c0c..5c5f89079b9a 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -971,6 +971,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, rescan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item, + enable_gen, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c399127f9918..8d16f7b4786d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1561,6 +1561,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, .rsv_bytes = href->reserved_bytes, .is_data = true, .is_inc = true, + .generation = trans->transid, }; if (extent_op) @@ -1730,6 +1731,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = false, .is_inc = true, + .generation = trans->transid, }; BUG_ON(!extent_op || !extent_op->update_flags); @@ -3279,6 +3281,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = is_data, .is_inc = false, + .generation = btrfs_extent_generation(leaf, ei), }; /* In this branch refs == 1 */ @@ -4917,6 +4920,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, struct btrfs_squota_delta delta = { .root = root_objectid, .num_bytes = ins->offset, + .generation = trans->transid, .rsv_bytes = 0, .is_data = true, .is_inc = true, diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 49fdac7dfd07..ea72240c0554 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -679,6 +679,7 @@ struct btrfs_fs_info { /* Protected by qgroup_rescan_lock */ bool qgroup_rescan_running; u8 qgroup_drop_subtree_thres; + u64 qgroup_enable_gen; /* * If this is not 0, then it indicates a serious filesystem error has diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1b41686f934f..759395e83f9e 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -379,6 +379,15 @@ static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info) BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING); } +static void qgroup_read_enable_gen(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, int slot, + struct btrfs_qgroup_status_item *ptr) +{ + ASSERT(btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + ASSERT(btrfs_item_size(leaf, slot) >= sizeof(*ptr)); + fs_info->qgroup_enable_gen = btrfs_qgroup_status_enable_gen(leaf, ptr); +} + /* * The full config is read in one go, only called from open_ctree() * It doesn't use any locking, as at this point we're still single-threaded @@ -394,7 +403,6 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) int ret = 0; u64 flags = 0; u64 rescan_progress = 0; - bool simple; if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) return 0; @@ -447,9 +455,9 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) goto out; } fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr); - simple = (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE); - if (btrfs_qgroup_status_generation(l, ptr) != - fs_info->generation && !simple) { + if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE) { + qgroup_read_enable_gen(fs_info, l, slot, ptr); + } else if (btrfs_qgroup_status_generation(l, ptr) != fs_info->generation) { qgroup_mark_inconsistent(fs_info); btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); @@ -1113,10 +1121,12 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; - if (simple) + if (simple) { fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE; - else + btrfs_set_qgroup_status_enable_gen(leaf, ptr, trans->transid); + } else { fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + } btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAGS_MASK); btrfs_set_qgroup_status_rescan(leaf, ptr, 0); @@ -1220,6 +1230,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, goto out_free_path; } + fs_info->qgroup_enable_gen = trans->transid; + mutex_unlock(&fs_info->qgroup_ioctl_lock); /* * Commit the transaction while not holding qgroup_ioctl_lock, to avoid @@ -4666,6 +4678,10 @@ int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info, if (!is_fstree(root)) return 0; + /* If the extent predates enabling quotas, don't count it. */ + if (delta->generation < fs_info->qgroup_enable_gen) + return 0; + spin_lock(&fs_info->qgroup_lock); qgroup = find_qgroup_rb(fs_info, root); if (!qgroup) { diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 126a599c7349..25200b5f6ed3 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -276,6 +276,8 @@ struct btrfs_squota_delta { u64 num_bytes; /* The number of bytes reserved for this extent */ u64 rsv_bytes; + /* The generation the extent was created in */ + u64 generation; /* Whether we are using or freeing the extent */ bool is_inc; /* Whether the extent is data or metadata */ diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index e60b62c1627e..7844c28846d1 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -1248,6 +1248,15 @@ struct btrfs_qgroup_status_item { * of the scan. It contains a logical address */ __le64 rescan; + + /* + * the generation when quotas were last enabled. Used by simple quotas to + * avoid decrementing when freeing an extent that was written before + * enable. + * + * Set iff flags contains BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE. + */ + __le64 enable_gen; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { From patchwork Wed Sep 13 00:13:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F8F8EE3F39 for ; Wed, 13 Sep 2023 00:13:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237928AbjIMANa (ORCPT ); Tue, 12 Sep 2023 20:13:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMAN3 (ORCPT ); Tue, 12 Sep 2023 20:13:29 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A32D710F2 for ; Tue, 12 Sep 2023 17:13:25 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id EABC73200940; Tue, 12 Sep 2023 20:13:24 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 12 Sep 2023 20:13:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694564004; x= 1694650404; bh=0wuwNH+gwG/Qf/Im17An/wI+rzir670mY57X20iUBNE=; b=r oZNMnVdEv1CDUTPyu6tJdLlGwTUM+2mYZwjDMQFiCsJvGRsylbzF6EiOih8SK7Gp r3HwMfrvB+e8dsA2D7fyTtUpxtaX4AVB9mersCgmU095Y1B+IROHHcyQ5yb3U8cB 5oUh4lplY9bcpQy2Tx6tOI+VZtYmuPegBuJd2dk5UvqCn/smnLSkDliwxZeFTK19 76h5iR23qMV1VJc5rgG+5+VNQ9hZmg5mz7xn8NYGSyRhTqog2DR7Illvxkrl74xG LfR2/OA2STsIIuu/FPgD/aXQgjTq8SiLMSpwn8qCuhJy28yS7xR0xEETZppC+DF+ tCKWrv/MH9ZlgcAYWJpTA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694564004; x=1694650404; bh=0 wuwNH+gwG/Qf/Im17An/wI+rzir670mY57X20iUBNE=; b=VUpSno0GColyoLkNQ AO8mZ1Qdr0J0nA1kCI/w4peBpFyFLntQv4+Ja0UotHJ9fqYJt0266EVHSVDvxm/Y 6Vw2he72pad8qgca1y79XAt+U6oTSbFk6g/C3Zv8SNMKU/UecFgNqWkfUZbKwWkV gCYnO0Vr+XjTT9LvtpAKvWFn3UFRvMx0FTLv07f+M33o+elXOCPtiNLT0C3fVdaq KW7tOvasJV0sHaE7210BH69cZJY/KWTRdzFbhJNn2P1sO28CQxuHNFv6WM9US3n3 /Gui81OXoGlT8rnIDwfPiO1UzFuJLVf4PdGQaMebdQqsBq7ff6woI29lI2f1qe0y Ytzlw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:23 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 16/18] btrfs: track metadata relocation cow with simple quota Date: Tue, 12 Sep 2023 17:13:27 -0700 Message-ID: <5485556ee458ca5e1bfac429956cb49b4d9e799a.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Relocation cows metadata blocks in two cases for the reloc root: - copying the subvol root item when creating the reloc root - copying a btree node when there is a cow during relocation In both cases, the resulting btree node hits an abnormal code path with respect to the owner field in its btrfs_header. It first creates the root item for the new objectid, which populates the reloc root id, and it at this point that delayed refs are created. Later, it fully copies the old node into the new node (including the original owner field) which overwrites it. This results in a simple quotas mismatch where we run the delayed ref for the reloc root which has no simple quota effect (reloc root is not an fstree) but when we ultimately delete the node, the owner is the real original fstree and we do free the space. To work around this without tampering with the behavior of relocation, add a parameter to btrfs_add_tree_block that lets the relocation code path specify a different owning root than the "operating" root (in this case, owning root is the real root and the operating root is the reloc root). These can naturally be plumbed into delayed refs that have the same concept. Note that this is a double count in some sense, but a relatively natural one, as there are really two extents, and the old one will be deleted soon. This is consistent with how data relocation extents are accounted by simple quotas. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/ctree.c | 22 ++++++++++++++-------- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/extent-tree.c | 6 +++++- fs/btrfs/extent-tree.h | 1 + fs/btrfs/ioctl.c | 2 +- 5 files changed, 23 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index c362472a112f..e3818e451778 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -316,6 +316,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, int ret = 0; int level; struct btrfs_disk_key disk_key; + u64 reloc_src_root = 0; WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && trans->transid != fs_info->running_transaction->transid); @@ -328,9 +329,11 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, else btrfs_node_key(buf, &disk_key, 0); + if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) + reloc_src_root = btrfs_header_owner(buf); cow = btrfs_alloc_tree_block(trans, root, 0, new_root_objectid, &disk_key, level, buf->start, 0, - BTRFS_NESTING_NEW_ROOT); + reloc_src_root, BTRFS_NESTING_NEW_ROOT); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -522,6 +525,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, int last_ref = 0; int unlock_orig = 0; u64 parent_start = 0; + u64 reloc_src_root = 0; if (*cow_ret == buf) unlock_orig = 1; @@ -540,12 +544,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, else btrfs_node_key(buf, &disk_key, 0); - if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent) - parent_start = parent->start; - + if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) { + if (parent) + parent_start = parent->start; + reloc_src_root = btrfs_header_owner(buf); + } cow = btrfs_alloc_tree_block(trans, root, parent_start, root->root_key.objectid, &disk_key, level, - search_start, empty_size, nest); + search_start, empty_size, reloc_src_root, nest); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -2955,7 +2961,7 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, c = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, &lower_key, level, root->node->start, 0, - BTRFS_NESTING_NEW_ROOT); + 0, BTRFS_NESTING_NEW_ROOT); if (IS_ERR(c)) return PTR_ERR(c); @@ -3099,7 +3105,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans, split = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, &disk_key, level, c->start, 0, - BTRFS_NESTING_SPLIT); + 0, BTRFS_NESTING_SPLIT); if (IS_ERR(split)) return PTR_ERR(split); @@ -3850,7 +3856,7 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans, * use BTRFS_NESTING_NEW_ROOT. */ right = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, - &disk_key, 0, l->start, 0, + &disk_key, 0, l->start, 0, 0, num_doubles ? BTRFS_NESTING_NEW_ROOT : BTRFS_NESTING_SPLIT); if (IS_ERR(right)) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 43a6ca726879..e9413325c1d1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -859,7 +859,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, root->root_key.offset = 0; leaf = btrfs_alloc_tree_block(trans, root, 0, objectid, NULL, 0, 0, 0, - BTRFS_NESTING_NORMAL); + 0, BTRFS_NESTING_NORMAL); if (IS_ERR(leaf)) { ret = PTR_ERR(leaf); leaf = NULL; @@ -936,7 +936,7 @@ int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, */ leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, - NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); + NULL, 0, 0, 0, 0, BTRFS_NESTING_NORMAL); if (IS_ERR(leaf)) return PTR_ERR(leaf); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8d16f7b4786d..7b9c58fc9fa8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5070,6 +5070,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, const struct btrfs_disk_key *key, int level, u64 hint, u64 empty_size, + u64 reloc_src_root, enum btrfs_lock_nesting nest) { struct btrfs_fs_info *fs_info = root->fs_info; @@ -5082,6 +5083,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, int ret; u32 blocksize = fs_info->nodesize; bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA); + u64 owning_root; #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS if (btrfs_is_testing(fs_info)) { @@ -5108,11 +5110,13 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, ret = PTR_ERR(buf); goto out_free_reserved; } + owning_root = btrfs_header_owner(buf); if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) { if (parent == 0) parent = ins.objectid; flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; + owning_root = reloc_src_root; } else BUG_ON(parent > 0); @@ -5132,7 +5136,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, extent_op->level = level; btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins.objectid, ins.offset, parent, btrfs_header_owner(buf)); + ins.objectid, ins.offset, parent, owning_root); btrfs_init_tree_ref(&generic_ref, level, root_objectid, root->root_key.objectid, false); btrfs_ref_tree_mod(fs_info, &generic_ref); diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 5a02faa05464..9038d797aab2 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -114,6 +114,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, const struct btrfs_disk_key *key, int level, u64 hint, u64 empty_size, + u64 reloc_src_root, enum btrfs_lock_nesting nest); void btrfs_free_tree_block(struct btrfs_trans_handle *trans, u64 root_id, diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 59fcb7424d93..663709fd6dab 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -657,7 +657,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, goto out; leaf = btrfs_alloc_tree_block(trans, root, 0, objectid, NULL, 0, 0, 0, - BTRFS_NESTING_NORMAL); + 0, BTRFS_NESTING_NORMAL); if (IS_ERR(leaf)) { ret = PTR_ERR(leaf); goto out; From patchwork Wed Sep 13 00:13:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48B61EE3F39 for ; Wed, 13 Sep 2023 00:13:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237931AbjIMANd (ORCPT ); Tue, 12 Sep 2023 20:13:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMANc (ORCPT ); Tue, 12 Sep 2023 20:13:32 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CA7C10F2 for ; Tue, 12 Sep 2023 17:13:28 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 9397B3200940; Tue, 12 Sep 2023 20:13:27 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Tue, 12 Sep 2023 20:13:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694564007; x= 1694650407; bh=uNLa0yJI7FL8SDPOk2qnY3XMMXGy5QsZPprxRdWfo0s=; b=r W0ghKFUsm5g6dzryQ+DlGLUp25sRZPLNqIWgIRDHqUpKdxTKFlvErDNcysyMLOQv 92b+IqTcdnDYqXNrkMoGkgvhxj2ZettY82XSdYE0JhJBHBbaSRwRzI43CzhrDl/W 0A88PIaKOeeIyGzJod87+Y3ID3AH2TgRKd6YdtDVP6iaFRrJD9xWiQYS8VMew4q6 G+gPtWtDRetehzUImM8lE2aln1yw6I8JoeKWNFUGHtpw4LiOrzxpfTqT0wr3iZpN eeEQ0W922Ex+pdwstSq1/jRAsnwpkS4ImH1746n1sdnOVX4Q2O097dWeTWYI0Lye OWHw3eJkXCWV6pYqltjQA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694564007; x=1694650407; bh=u NLa0yJI7FL8SDPOk2qnY3XMMXGy5QsZPprxRdWfo0s=; b=oM8Km0zgPyUTL+e1J 3A8a6bwQsaQMfT56yMoDTATt3lNWaCmKl1X9QKno5xYJKjU5vfj/bJEMUf2UUQON QRDRGGFMiwDb/WP5SNRzffsrRKIdKrJfJU/Gk0i9eR9bFDbq5XBfbEjtsQ1B6Fwo 4Kv6kZLIhKwdu2+VUJU3BZvFkL4OGeGpwYxXs33o2qZsoGc74tuMP41+Gw+c9NhA U5hgF0WUWZKPajTMbTcP9C4hACiahEhWp5HKvV3jyZvknlD503byCcEOdX0jpYTx B9ZQH59R+Em6l2DAan2SymH9QXzpJM7Px3gG1y+lpLlicw7x3Ok8aHtAGlvDGQt5 T1kpA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:26 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 17/18] btrfs: track data relocation with simple quota Date: Tue, 12 Sep 2023 17:13:28 -0700 Message-ID: <4c5b76256e13ab4e7123b17cd63553020077c261.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Relocation data allocations are quite tricky for simple quotas. The basic data relocation sequence is (ignoring details that aren't relevant to this fix): - create a fake relocation data fs root - create a fake relocation inode in that root - foreach data extent: - preallocate a data extent on behalf of the fake inode - copy over the data - foreach extent - swap the refs so that the original file extent now refers to the new extent item - drop the fake root, dropping its refs on the old extents, which lets us delete them. Done naively, this results in storing an extent item in the extent tree whose owner_ref points at the relocation data root and a no-op squota recording, since the reloc root is not a legit fstree. So far, that's OK. The problem comes when you do the swap, and leave an extent item owned by this bogus root as the real permanent extents of the file. If the file then drops that ref, we free it and no-op account that against the fake relocation root. Essentially, this means that relocation is simple quota "extent laundering", since we re-own the extents into a fake root. Simple quotas very intentionally doesn't have a mechanism for transferring ownership of extents, as that is exactly the complicated thing we are trying to avoid with the new design. Further, it cannot be correctly done in this case, since at the time you create the new "real" refs, there is no way to know which was the original owner before relocation unless we track it. Therefore, it makes more sense to trick the preallocation to handle relocation as a special case and note the proper owner ref from the beginning. That way, we never write out an extent item without the correct owner ref that it will eventually have. This could be done by wiring a special root parameter all the way through the allocation code path, but to avoid that special case touching all the code, take advantage of the serial nature of relocation to store the src root on the relocation root object. Then when we finish the prealloc, if it happens to be this case, prepare the delayed ref appropriately. We must also add logic to handle relocating adjacent extents with different owning roots. Those cannot be preallocated together in a cluster as it would lose the separate ownership information. This is obviously a smelly bit of code, but I think it is the best solution to the problem, given the relocation implementation. Signed-off-by: Boris Burkov --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 13 ++++++----- fs/btrfs/relocation.c | 49 +++++++++++++++++++++++++++++++++++++++++- 3 files changed, 57 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index da9e07bf76ea..d8b23936ff54 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -305,6 +305,7 @@ struct btrfs_root { #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif + u64 relocation_src_root; }; static inline bool btrfs_root_readonly(const struct btrfs_root *root) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7b9c58fc9fa8..a1fa114196e5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -58,7 +58,7 @@ static void __run_delayed_extent_op(struct btrfs_delayed_extent_op *extent_op, static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, u64 parent, u64 root_objectid, u64 flags, u64 owner, u64 offset, - struct btrfs_key *ins, int ref_mod); + struct btrfs_key *ins, int ref_mod, u64 oref_root); static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op); @@ -1574,7 +1574,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, ret = alloc_reserved_file_extent(trans, parent, ref->root, flags, ref->objectid, ref->offset, &key, - node->ref_mod); + node->ref_mod, href->owning_root); if (!ret) ret = btrfs_record_squota_delta(trans->fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { @@ -4737,7 +4737,7 @@ static int alloc_reserved_extent(struct btrfs_trans_handle *trans, u64 bytenr, static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, u64 parent, u64 root_objectid, u64 flags, u64 owner, u64 offset, - struct btrfs_key *ins, int ref_mod) + struct btrfs_key *ins, int ref_mod, u64 oref_root) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *extent_root; @@ -4784,7 +4784,7 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, if (simple_quota) { btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY); oref = (struct btrfs_extent_owner_ref *)(&iref->offset); - btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid); + btrfs_set_extent_owner_ref_root_id(leaf, oref, oref_root); iref = (struct btrfs_extent_inline_ref *)(oref + 1); } btrfs_set_extent_inline_ref_type(leaf, iref, type); @@ -4895,6 +4895,9 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID); + if (btrfs_is_data_reloc_root(root) && is_fstree(root->relocation_src_root)) + owning_root = root->relocation_src_root; + btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, ins->objectid, ins->offset, 0, owning_root); btrfs_init_data_ref(&generic_ref, root_objectid, owner, @@ -4950,7 +4953,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, spin_unlock(&space_info->lock); ret = alloc_reserved_file_extent(trans, 0, root_objectid, 0, owner, - offset, ins, 1); + offset, ins, 1, root_objectid); if (ret) btrfs_pin_extent(trans, ins->objectid, ins->offset, 1); ret = btrfs_record_squota_delta(fs_info, &delta); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index bd07d5322c61..58cf831e1cac 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -122,6 +122,7 @@ struct file_extent_cluster { u64 end; u64 boundary[MAX_EXTENTS]; unsigned int nr; + u64 owning_root; }; struct reloc_control { @@ -3165,6 +3166,7 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, struct file_extent_cluster *cluster) { int ret; + struct btrfs_root *root = BTRFS_I(inode)->root; if (cluster->nr > 0 && extent_key->objectid != cluster->end + 1) { ret = relocate_file_extent_cluster(inode, cluster); @@ -3173,8 +3175,38 @@ int relocate_data_extent(struct inode *inode, struct btrfs_key *extent_key, cluster->nr = 0; } - if (!cluster->nr) + /* + * Under simple quotas, we set root->relocation_src_root when we find + * the extent. If adjacent extents have different owners, we can't merge + * them while relocating. Handle this by storing the owning root that + * started a cluster and if we see an extent from a different root break + * cluster formation (just like the above case of non-adjacent extents). + * + * Absent simple quotas, relocation_src_root is always 0, so we should + * never see a mismatch, and it should have no effect on relocation + * clusters. + */ + if (cluster->nr > 0 && cluster->owning_root != root->relocation_src_root) { + u64 tmp = root->relocation_src_root; + + /* + * root->relocation_src_root is the state that actually + * affects the preallocation we do here, so set it to the + * root owning the cluster we need to relocate. + */ + root->relocation_src_root = cluster->owning_root; + ret = relocate_file_extent_cluster(inode, cluster); + if (ret) + return ret; + cluster->nr = 0; + /* And reset it back for the current extent's owning root */ + root->relocation_src_root = tmp; + } + + if (!cluster->nr) { cluster->start = extent_key->objectid; + cluster->owning_root = root->relocation_src_root; + } else BUG_ON(cluster->nr >= MAX_EXTENTS); cluster->end = extent_key->objectid + extent_key->offset - 1; @@ -3704,6 +3736,21 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) struct btrfs_extent_item); flags = btrfs_extent_flags(path->nodes[0], ei); + /* + * If we are relocating a simple quota owned extent item, we need + * to note the owner on the reloc data root so that when we + * allocate the replacement item, we can attribute it to the + * correct eventual owner (rather than the reloc data root) + */ + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) { + struct btrfs_root *root = BTRFS_I(rc->data_inode)->root; + u64 owning_root_id = btrfs_get_extent_owner_root(fs_info, + path->nodes[0], + path->slots[0]); + + root->relocation_src_root = owning_root_id; + } + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { ret = add_tree_block(rc, &key, path, &blocks); } else if (rc->stage == UPDATE_DATA_PTRS && From patchwork Wed Sep 13 00:13:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13382349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84BF8EE3F3F for ; Wed, 13 Sep 2023 00:13:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237935AbjIMANf (ORCPT ); Tue, 12 Sep 2023 20:13:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237899AbjIMANe (ORCPT ); Tue, 12 Sep 2023 20:13:34 -0400 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5F1910F2 for ; Tue, 12 Sep 2023 17:13:30 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 363DB32005D8; Tue, 12 Sep 2023 20:13:30 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 12 Sep 2023 20:13:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694564009; x= 1694650409; bh=s0tUfaTnj7rM5upihSTGFOoH9CLyCCKCOVsCiEHOmvc=; b=G PLbGfj8bEh+etROBb9W3IQqmcFWWd0V+4qwxymInH/6jxAwTVFuA+AzVt4ziors6 6R2UWvv9OYhj9+AJwinJRBkxoWO/TC2GCDKWN9Sbp7NXvF0dPjGGH9aPim4V2HYS j8yAi4anSeKABODkcv1Ajunpa84TEdUumvOwbVoW9FImB2y6d5HnFE5AeM6g/YvU 2aP+mFnKq6rNNEYeJJ6YWsZvlO040fiAj43wRe0SiUCZL6WA33dBRsdtRMqLk9Cm XxDoTFH7P1G9Bx3BR2QIlJl/Utu3vvOSojNtKoUJ9cGRss+oB72g+m5FP8OKbrXJ pNH5mLCuk+VWzJMiyAy9A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1694564009; x=1694650409; bh=s 0tUfaTnj7rM5upihSTGFOoH9CLyCCKCOVsCiEHOmvc=; b=FbEcOwjucwkOJSOVi KxUPrCzjHVB3I+ABP9sjIlWULw4uTGd5KGo8UQDMbauPEeHQDZN/+Ao42Wwvjq9j CDuM6IM72xSbzhXfyuzudjnXfbRutJfhcaAulISLJb2ouzPUB4LdvkvGTsM9uO2m Yd9cldF3SHDxRCURmBEMyzOLdqGF/+I2RyJPGzUge0ey7PF2/D3yO5wOIZxK218d QKNUqEDgaM8LZIIswM21QTZ4yhhat3XSLtJ/MHYzteiSNlQEkwG6qm3Ajr5FNdQn C7vUEbC3icCKg0CpZV7I8jENIWezuX975KAeiEIGK07uyhlotB94QtGU/ZSQz2tR AW0pw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeijedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeegnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 20:13:29 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 18/18] btrfs: only set QUOTA_ENABLED when done reading qgroups Date: Tue, 12 Sep 2023 17:13:29 -0700 Message-ID: <425f0ca332862491b4ae9499fe42c38be88feba3.1694563454.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In open_ctree, we set BTRFS_FS_QUOTA_ENABLED as soon as we see a quota_root, as opposed to after we are done setting up the qgroup structures. In the quota_enable path, we wait until after the structures are set up. Likewise, in disable, we clear the bit before tearing down the structures. I feel that this organization is less surprising for the open_ctree path. I don't believe this fixes any actual bug, but avoids potential confusion when using btrfs_qgroup_mode in an intermediate state where we are enabled but haven't yet setup the qgroup status flags. It also avoids any risk of calling a qgroup function and attempting to use the qgroup rbtrees before they exist/are setup. This all occurs before we do rw setup, so I believe it should be mostly a no-op. Signed-off-by: Boris Burkov --- fs/btrfs/disk-io.c | 1 - fs/btrfs/qgroup.c | 15 +++++++-------- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e9413325c1d1..ccc40c2400ca 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2263,7 +2263,6 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info) root = btrfs_read_tree_root(tree_root, &location); if (!IS_ERR(root)) { set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state); - set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); fs_info->quota_root = root; } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 759395e83f9e..5f989cf2b01d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -404,7 +404,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) u64 flags = 0; u64 rescan_progress = 0; - if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) + if (!fs_info->quota_root) return 0; fs_info->qgroup_ulist = ulist_alloc(GFP_KERNEL); @@ -576,13 +576,12 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) out: btrfs_free_path(path); fs_info->qgroup_flags |= flags; - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) - clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); - else if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN && - ret >= 0) - ret = qgroup_rescan_init(fs_info, rescan_progress, 0); - - if (ret < 0) { + if (ret >= 0) { + if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON) + set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); + if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) + ret = qgroup_rescan_init(fs_info, rescan_progress, 0); + } else { ulist_free(fs_info->qgroup_ulist); fs_info->qgroup_ulist = NULL; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;