From patchwork Wed May 3 00:59:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229418 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 049A9C7EE26 for ; Wed, 3 May 2023 00:59:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229475AbjECA7x (ORCPT ); Tue, 2 May 2023 20:59:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229455AbjECA7w (ORCPT ); Tue, 2 May 2023 20:59:52 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60AB52D5E for ; Tue, 2 May 2023 17:59:48 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 8597B5C02F7; Tue, 2 May 2023 20:59:47 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 02 May 2023 20:59:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075587; x= 1683161987; bh=gWmI9gjUAqUqMrdHoFIIO04TEXXoG/G4C+hdZCLSIj4=; b=A RC/HdrKhIRWe38rtbpkEVQgVeMaMjg4FO3zcEO8mEUvLCEhRJBgKo9GkpzYP3oTY 65m7pJaLlgO6ErrqIyPk5tbpT9l7vyZ9Nb/s0FHD/y54dQiU0KalB0hMAkUAU/GL v+Y20TNWGtojBxs4IYDQdr5fDboWJIeQ0VBeB5OKIPsxW7HsJi1m0wvTZnP7V/eq j5Qsm2F94AVLQz9nRbXT2kwrYmtlvvKnomAl+7ggVTo8m0QKAqH4UsKJ+NlI/9e0 0D11Aw/idhHpfFO59+Je8daOft8uXfZH0BZCS1dYqz71Qvk86vl0GTH8xqL2ugQz RzEAP/oE7Eu5B/9YKw9gg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075587; x=1683161987; bh=g WmI9gjUAqUqMrdHoFIIO04TEXXoG/G4C+hdZCLSIj4=; b=c0I3n2FYvHMjeeIst crcMhlMS8TOEUmsMxejBiEuJAjOPZpcyO/paKs6C54XNMlbeUvnMGxtIbD0o3T+8 5C2rpLHnZxxno7II2WjspnKEhTfO3INvg/lvVTt8ucK2TLAvIp2UrgohmEwGoIs9 2V7+FvBm+ZzNMPh6hUjWUCq+UWELLD1s5AOnaKPZWqHJo7vh0nRx+Xu5Iv2h3WDs y2rJWH7WHjdhJrHFzxdcaEscgF934kndzGa3tvnPmnCMoJVIe84/rzvocyWxUGTy Dzp2R0a6SWqhPOFnR98mR2dkwrnNUZ/DMwmHhTEkLreXgqaP4V8TZQtnHWWHRC8z m8hxg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedggedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:47 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/9] btrfs: simple quotas mode Date: Tue, 2 May 2023 17:59:22 -0700 Message-Id: <099f2eee2855da6989c29c34e48aaf4b706f047e.1683075170.git.boris@bur.io> X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Allow the quota enable ioctl to specify simple quotas. Set an INCOMPAT bit when simple quotas are enabled, as it will result in several breaking changes to the on-disk format. Introduce an enum for capturing the current quota mode. Rather than just enabled/disabled, the possible settings are now disabled/simple/full. Signed-off-by: Boris Burkov --- fs/btrfs/fs.h | 5 ++- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 68 ++++++++++++++++++++++++++++++-------- fs/btrfs/qgroup.h | 10 +++++- fs/btrfs/transaction.c | 4 +-- include/uapi/linux/btrfs.h | 1 + 6 files changed, 69 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 0d98fc5f6f44..6c989d87768c 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -218,7 +218,8 @@ enum { BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ - BTRFS_FEATURE_INCOMPAT_ZONED) + BTRFS_FEATURE_INCOMPAT_ZONED | \ + BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA) #ifdef CONFIG_BTRFS_DEBUG /* @@ -233,7 +234,6 @@ enum { #define BTRFS_FEATURE_INCOMPAT_SUPP \ (BTRFS_FEATURE_INCOMPAT_SUPP_STABLE) - #endif #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ @@ -791,7 +791,6 @@ struct btrfs_fs_info { struct lockdep_map btrfs_state_change_map[4]; struct lockdep_map btrfs_trans_pending_ordered_map; struct lockdep_map btrfs_ordered_extent_map; - #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9522669000a7..ca7d2ef739c8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3685,7 +3685,7 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) switch (sa->cmd) { case BTRFS_QUOTA_CTL_ENABLE: - ret = btrfs_quota_enable(fs_info); + ret = btrfs_quota_enable(fs_info, sa); break; case BTRFS_QUOTA_CTL_DISABLE: ret = btrfs_quota_disable(fs_info); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index f41da7ac360d..3c8b296215ee 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3,6 +3,7 @@ * Copyright (C) 2011 STRATO. All rights reserved. */ +#include #include #include #include @@ -10,7 +11,6 @@ #include #include #include -#include #include #include "ctree.h" @@ -30,6 +30,15 @@ #include "root-tree.h" #include "tree-checker.h" +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info) +{ + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return BTRFS_QGROUP_MODE_DISABLED; + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return BTRFS_QGROUP_MODE_SIMPLE; + return BTRFS_QGROUP_MODE_FULL; +} + /* * Helpers to access qgroup reservation * @@ -340,6 +349,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info) { + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; fs_info->qgroup_flags |= (BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT | BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN | BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING); @@ -412,7 +423,8 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) goto out; } if (btrfs_qgroup_status_generation(l, ptr) != - fs_info->generation) { + fs_info->generation && + !btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) { qgroup_mark_inconsistent(fs_info); btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); @@ -949,7 +961,8 @@ static int btrfs_clean_quota_tree(struct btrfs_trans_handle *trans, return ret; } -int btrfs_quota_enable(struct btrfs_fs_info *fs_info) +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args) { struct btrfs_root *quota_root; struct btrfs_root *tree_root = fs_info->tree_root; @@ -961,6 +974,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup *qgroup = NULL; struct btrfs_trans_handle *trans = NULL; struct ulist *ulist = NULL; + bool simple_qgroups = quota_ctl_args->status == 42; int ret = 0; int slot; @@ -1063,8 +1077,9 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup_status_item); btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); - fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON | - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; + if (!simple_qgroups) + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAGS_MASK); btrfs_set_qgroup_status_rescan(leaf, ptr, 0); @@ -1180,8 +1195,14 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) spin_lock(&fs_info->qgroup_lock); fs_info->quota_root = quota_root; set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); + if (simple_qgroups) + btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); spin_unlock(&fs_info->qgroup_lock); + /* Skip rescan for simple qgroups */ + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + goto out_free_path; + ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); @@ -1766,6 +1787,9 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info, struct btrfs_qgroup_extent_record *entry; u64 bytenr = record->bytenr; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; + lockdep_assert_held(&delayed_refs->lock); trace_btrfs_qgroup_trace_extent(fs_info, record); @@ -1798,6 +1822,8 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, struct btrfs_backref_walk_ctx ctx = { 0 }; int ret; + if (btrfs_qgroup_mode(trans->fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; /* * We are always called in a context where we are already holding a * transaction handle. Often we are called when adding a data delayed @@ -1853,7 +1879,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr, struct btrfs_delayed_ref_root *delayed_refs; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL || bytenr == 0 || num_bytes == 0) return 0; record = kzalloc(sizeof(*record), GFP_NOFS); @@ -1886,7 +1912,7 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans, u64 bytenr, num_bytes; /* We can be called directly from walk_up_proc() */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; for (i = 0; i < nr; i++) { @@ -2262,7 +2288,7 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, int level; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; /* Wrong parameter order */ @@ -2319,7 +2345,7 @@ int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, BUG_ON(root_level < 0 || root_level >= BTRFS_MAX_LEVEL); BUG_ON(root_eb == NULL); - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; spin_lock(&fs_info->qgroup_lock); @@ -2659,7 +2685,7 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, * If quotas get disabled meanwhile, the resources need to be freed and * we can't just exit here. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL || fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING) goto out_free; @@ -2747,6 +2773,9 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) u64 qgroup_to_skip; int ret = 0; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; + delayed_refs = &trans->transaction->delayed_refs; qgroup_to_skip = delayed_refs->qgroup_to_skip; while ((node = rb_first(&delayed_refs->dirty_extent_root))) { @@ -2989,11 +3018,10 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, qgroup_dirty(fs_info, dstgroup); } - if (srcid) { + if (srcid && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) { srcgroup = find_qgroup_rb(fs_info, srcid); if (!srcgroup) goto unlock; - /* * We call inherit after we clone the root in order to make sure * our counts don't go crazy, so at this point the only @@ -3281,6 +3309,9 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, int slot; int ret; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 1; + mutex_lock(&fs_info->qgroup_rescan_lock); extent_root = btrfs_extent_root(fs_info, fs_info->qgroup_rescan_progress.objectid); @@ -3378,6 +3409,9 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) bool stopped = false; bool did_leaf_rescans = false; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; + path = btrfs_alloc_path(); if (!path) goto out; @@ -3481,6 +3515,12 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, { int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) { + btrfs_warn(fs_info, "qgroup rescan init failed, running in simple mode. mode: %d\n", + btrfs_qgroup_mode(fs_info)); + return -EINVAL; + } + if (!init_flags) { /* we're resuming qgroup rescan at mount time */ if (!(fs_info->qgroup_flags & @@ -4240,7 +4280,7 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, int level = btrfs_header_level(subvol_parent) - 1; int ret = 0; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; if (btrfs_node_ptr_generation(subvol_parent, subvol_slot) > @@ -4350,7 +4390,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, int ret = 0; int i; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; if (!is_fstree(root->root_key.objectid) || !root->reloc_root) return 0; diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 7bffa10589d6..d4c4d039585f 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -249,7 +249,15 @@ enum { ENUM_BIT(QGROUP_FREE), }; -int btrfs_quota_enable(struct btrfs_fs_info *fs_info); +enum btrfs_qgroup_mode { + BTRFS_QGROUP_MODE_DISABLED, + BTRFS_QGROUP_MODE_FULL, + BTRFS_QGROUP_MODE_SIMPLE +}; + +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info); +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8b6a99b8d7f6..e6d6752c2fca 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1514,11 +1514,11 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that qgroups are not + * Save some performance in the case that full qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; /* diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index dbb8b96da50d..957ca4037974 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -333,6 +333,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) #define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 (1ULL << 13) +#define BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA (1ULL << 14) struct btrfs_ioctl_feature_flags { __u64 compat_flags; From patchwork Wed May 3 00:59:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19B0BC7EE24 for ; Wed, 3 May 2023 00:59:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbjECA7y (ORCPT ); Tue, 2 May 2023 20:59:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbjECA7w (ORCPT ); Tue, 2 May 2023 20:59:52 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C62312D61 for ; Tue, 2 May 2023 17:59:49 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 306825C030C; Tue, 2 May 2023 20:59:49 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 02 May 2023 20:59:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075589; x= 1683161989; bh=gOBp4vpA3+LJF4Cto8NWnS6ULz/n7HkBuk3VUY0eR+E=; b=r XMKOPVluq+soPU/cyshT3EIfb+xlYWMEjX/gmxk8gEqT8/Z+J00BAS39ACoQg4A9 VdtJovQG+17WToDdctR/A3Tte7FvBOjEozYWjd0oSAq0m17f1Rckoe9LfWtmk4n1 nYmw3nIuQxp749sAr5Vr3QHAGhM9H/A7EEQVY8mWrFCerzPsD3IyffwMz2dilDZq PhuQeFv6AEQbDRk9eKKPO95OBsqm02752g2366NFhFGdMReGWdY3Rr9i0yhH62vQ WdwAamDegEMbLdOy6MzeXsn8JqIm8CmE9f0Yjy0JHOv+Fo4jjGJlPEB9PNhnwbkr cknjEzUTmC3kjqnt8Oyrw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075589; x=1683161989; bh=g OBp4vpA3+LJF4Cto8NWnS6ULz/n7HkBuk3VUY0eR+E=; b=akHAuBiSb8IS9p58J 3fc663t1aIBryr+LSJfIinVWASNfZXpmRBkPNFgQLRBg2I/w3hYPsddRyfrkrR5a dyT6ygL/k7YRnZUKG5BUxXDsdEsyfr6DnGjHDexDrbO4BGUWB4CKvfZGnKoTRVfz PJ0rvw6rKv6mO6QkByw94gE9KhqvX2nAACGRnVtkyGfAtmz4mxi5ezNIzj3DeS3N 88PaOmdOV8FGbwrMM0ZFJCJMftjqy0mY+unfjxkMr0FNyJW+56M8EkMvXhiLRwlj gwEGgPcMJwwvyVLui5h8TH08TCITTziyn4DHkTfck/Jbjc5lLFcI3MsiTocZzel3 ly/ug== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:48 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/9] btrfs: new function for recording simple quota usage Date: Tue, 2 May 2023 17:59:23 -0700 Message-Id: <33e6475ff008fb21ece6eb288c8b78fcacb4d478.1683075170.git.boris@bur.io> X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Rather than re-computing shared/exclusive ownership based on backrefs and walking roots for implicit backrefs, simple quotas does an increment when creating an extent and a decrement when deleting it. Add the API for the extent item code to use to track those events. Also add a helper function to make collecting parent qgroups in a ulist easier for functions like this. Signed-off-by: Boris Burkov --- fs/btrfs/qgroup.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 10 +++++- 2 files changed, 94 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 3c8b296215ee..8982b76ae9e5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -332,6 +332,44 @@ static int del_relation_rb(struct btrfs_fs_info *fs_info, return -ENOENT; } +static int qgroup_collect_parents(struct btrfs_qgroup *qgroup, + struct ulist *ul) +{ + struct ulist_iterator uiter; + struct ulist_node *unode; + struct btrfs_qgroup_list *glist; + struct btrfs_qgroup *qg; + bool err_free = false; + int ret = 0; + + if (!ul) { + ul = ulist_alloc(GFP_KERNEL); + err_free = true; + } else { + ulist_reinit(ul); + } + + ret = ulist_add(ul, qgroup->qgroupid, + qgroup_to_aux(qgroup), GFP_ATOMIC); + if (ret < 0) + goto out; + ULIST_ITER_INIT(&uiter); + while ((unode = ulist_next(ul, &uiter))) { + qg = unode_aux_to_qgroup(unode); + list_for_each_entry(glist, &qg->groups, next_group) { + ret = ulist_add(ul, glist->group->qgroupid, + qgroup_to_aux(glist->group), GFP_ATOMIC); + if (ret < 0) + goto out; + } + } + ret = 0; +out: + if (ret && err_free) + ulist_free(ul); + return ret; +} + #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, u64 rfer, u64 excl) @@ -4472,3 +4510,50 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans) kfree(entry); } } + +int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_simple_quota_delta *delta) +{ + int ret; + struct ulist *ul = fs_info->qgroup_ulist; + struct btrfs_qgroup *qgroup; + struct ulist_iterator uiter; + struct ulist_node *unode; + struct btrfs_qgroup *qg; + bool drop_rsv = false; + u64 root = delta->root; + u64 num_bytes = delta->num_bytes; + int sign = delta->is_inc ? 1 : -1; + + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE) + return 0; + + if (!is_fstree(root)) + return 0; + + spin_lock(&fs_info->qgroup_lock); + qgroup = find_qgroup_rb(fs_info, root); + if (!qgroup) { + ret = -ENOENT; + goto out; + } + ret = qgroup_collect_parents(qgroup, ul); + if (ret) + goto out; + + ULIST_ITER_INIT(&uiter); + while ((unode = ulist_next(ul, &uiter))) { + qg = unode_aux_to_qgroup(unode); + qg->excl += num_bytes * sign; + qg->rfer += num_bytes * sign; + if (delta->is_inc && delta->is_data) + drop_rsv = true; + qgroup_dirty(fs_info, qg); + } + +out: + spin_unlock(&fs_info->qgroup_lock); + if (!ret && drop_rsv) + btrfs_qgroup_free_refroot(fs_info, root, num_bytes, BTRFS_QGROUP_RSV_DATA); + return ret; +} diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index d4c4d039585f..0d627a871900 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -235,6 +235,13 @@ struct btrfs_qgroup { struct kobject kobj; }; +struct btrfs_simple_quota_delta { + u64 root; /* The fstree root this delta counts against */ + u64 num_bytes; /* The number of bytes in the extent being counted */ + bool is_inc; /* Whether we are using or freeing the extent */ + bool is_data; /* Whether the extent is data or metadata */ +}; + static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) { return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); @@ -447,5 +454,6 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb); void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info); - +int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_simple_quota_delta *delta); #endif From patchwork Wed May 3 00:59:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229420 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23AB1C77B78 for ; Wed, 3 May 2023 01:00:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229487AbjECA7z (ORCPT ); Tue, 2 May 2023 20:59:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229464AbjECA7x (ORCPT ); Tue, 2 May 2023 20:59:53 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AC0A2D6D for ; Tue, 2 May 2023 17:59:51 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id DCED55C0313; Tue, 2 May 2023 20:59:50 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 02 May 2023 20:59:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075590; x= 1683161990; bh=PCORDII6Wiz2/v0fesGnD8bC3KEQqaeyyEhdGdw8v1g=; b=h oFFLLx29ydj9lD1hYcZFqFQFCmQdjL27KtA7izumzig5k8CCQcOoXQM3HizFz3aJ VhSoXz+yws9gWbIbX4+KEqt4aV5hTZuMcpMS10FbPiMH1Zwi4UFQeN/ne4gPcbLT ZefOnJa/FOrOGSTcYQKv2r9V0Zg1U1lkH7jyUavYBaOadcWDk2LLRM6Idj3ql6IX UwBpLOQ4q+mZIyflxv/puOaDuCod3VimFJ8bud9VFnGnWyfqnYuBW7NiWXb4E+8T MRea8xbxSWeUvmEIqxWfMlcLVG2IyP+ymWbjHdGxS0iZAzjP4ZxVEfy/GiLDTPZW WhjWsRIFjEjxiZFozuz9A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075590; x=1683161990; bh=P CORDII6Wiz2/v0fesGnD8bC3KEQqaeyyEhdGdw8v1g=; b=A0SVk0UkUTS6kcHVb 6HFkaT0u5nD/8YbPahNKQLMizieQ7OJZexoEF/sCJZHqfw6khLG1NthwAqnG8kN5 JKOMeUcWK+D/eov+1/q8r/vaQ4n3K9vQ5jsH9C2FUs8IusBewg+B0QSIk32lpVZv EaKYCyAbtDm0TnJmcZKuVtLSOdiD0vsfPPgj2MWHTriDPpfclP5mw5UkrPYzYrYv Jsu1lc++vfltPkZXPkgBakuFR6r0vJv0fYCq+u+zMW/5fIyopy4PsAIKPRXMUJ5L f/3vg2RdCCk8/5VyTahEzZs4IJu7Au/eEqTEf/uoAWfZaCED3zhOs6LuJuDGm3TW c8+xA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:50 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/9] btrfs: track original extent subvol in a new inline ref Date: Tue, 2 May 2023 17:59:24 -0700 Message-Id: <7a4b78e240d2f26eb3d7be82d4c0b8ddaa409519.1683075170.git.boris@bur.io> X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In order to implement simple quota groups, we need to be able to associate a data extent with the subvolume that created it. Once you account for reflink, this information cannot be recovered without explicitly storing it. Options for storing it are: - a new key/item - a new extent inline ref item The former is backwards compatible, but wastes space, the latter is incompat, but is efficient in space and reuses the existing inline ref machinery, while only abusing it a tiny amount -- specifically, the new item is not a ref, per-se. Signed-off-by: Boris Burkov --- fs/btrfs/accessors.h | 4 +++ fs/btrfs/backref.c | 3 ++ fs/btrfs/extent-tree.c | 50 +++++++++++++++++++++++++-------- fs/btrfs/print-tree.c | 12 ++++++++ fs/btrfs/ref-verify.c | 3 ++ fs/btrfs/tree-checker.c | 3 ++ include/uapi/linux/btrfs_tree.h | 6 ++++ 7 files changed, 70 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index ceadfc5d6c66..aab61312e4e8 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -350,6 +350,8 @@ BTRFS_SETGET_FUNCS(extent_data_ref_count, struct btrfs_extent_data_ref, count, 3 BTRFS_SETGET_FUNCS(shared_data_ref_count, struct btrfs_shared_data_ref, count, 32); +BTRFS_SETGET_FUNCS(extent_owner_ref_root_id, struct btrfs_extent_owner_ref, root_id, 64); + BTRFS_SETGET_FUNCS(extent_inline_ref_type, struct btrfs_extent_inline_ref, type, 8); BTRFS_SETGET_FUNCS(extent_inline_ref_offset, struct btrfs_extent_inline_ref, @@ -366,6 +368,8 @@ static inline u32 btrfs_extent_inline_ref_size(int type) if (type == BTRFS_EXTENT_DATA_REF_KEY) return sizeof(struct btrfs_extent_data_ref) + offsetof(struct btrfs_extent_inline_ref, offset); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) + return sizeof(struct btrfs_extent_inline_ref); return 0; } diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index e54f0884802a..8cd8ed6c572f 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1128,6 +1128,9 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, count, sc, GFP_NOFS); break; } + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(ctx->fs_info, SIMPLE_QUOTA)); + break; default: WARN_ON(1); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5cd289de4e92..b9a2f1e355b7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -363,9 +363,13 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, struct btrfs_extent_inline_ref *iref, enum btrfs_inline_ref_type is_data) { + struct btrfs_fs_info *fs_info = eb->fs_info; int type = btrfs_extent_inline_ref_type(eb, iref); u64 offset = btrfs_extent_inline_ref_offset(eb, iref); + if (type == BTRFS_EXTENT_OWNER_REF_KEY && btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return type; + if (type == BTRFS_TREE_BLOCK_REF_KEY || type == BTRFS_SHARED_BLOCK_REF_KEY || type == BTRFS_SHARED_DATA_REF_KEY || @@ -374,26 +378,25 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, if (type == BTRFS_TREE_BLOCK_REF_KEY) return type; if (type == BTRFS_SHARED_BLOCK_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ - if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + if (offset && IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else if (is_data == BTRFS_REF_TYPE_DATA) { if (type == BTRFS_EXTENT_DATA_REF_KEY) return type; if (type == BTRFS_SHARED_DATA_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else { @@ -403,7 +406,7 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, } btrfs_print_leaf((struct extent_buffer *)eb); - btrfs_err(eb->fs_info, + btrfs_err(fs_info, "eb %llu iref 0x%lx invalid extent inline ref type %d", eb->start, (unsigned long)iref, type); WARN_ON(1); @@ -912,6 +915,11 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, } iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + ptr += btrfs_extent_inline_ref_size(type); + continue; + } if (type == BTRFS_REF_TYPE_INVALID) { err = -EUCLEAN; goto out; @@ -1708,6 +1716,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, node->type == BTRFS_SHARED_DATA_REF_KEY) ret = run_delayed_data_ref(trans, node, extent_op, insert_reserved); + else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) + ret = 0; else BUG(); if (ret && insert_reserved) @@ -2275,6 +2285,7 @@ static noinline int check_committed_ref(struct btrfs_root *root, struct btrfs_extent_item *ei; struct btrfs_key key; u32 item_size; + u32 expected_size; int type; int ret; @@ -2301,10 +2312,17 @@ static noinline int check_committed_ref(struct btrfs_root *root, ret = 1; item_size = btrfs_item_size(leaf, path->slots[0]); ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); + expected_size = sizeof(*ei) + btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY); + + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA) && type == BTRFS_EXTENT_OWNER_REF_KEY) { + expected_size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + iref = (struct btrfs_extent_inline_ref *)(iref + 1); + } /* If extent item has more than 1 inline ref then it's shared */ - if (item_size != sizeof(*ei) + - btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) + if (item_size != expected_size) goto out; /* @@ -2316,8 +2334,6 @@ static noinline int check_committed_ref(struct btrfs_root *root, btrfs_root_last_snapshot(&root->root_item))) goto out; - iref = (struct btrfs_extent_inline_ref *)(ei + 1); - /* If this extent has SHARED_DATA_REF then it's shared */ type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); if (type != BTRFS_EXTENT_DATA_REF_KEY) @@ -4572,6 +4588,7 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root; int ret; struct btrfs_extent_item *extent_item; + struct btrfs_extent_owner_ref *oref; struct btrfs_extent_inline_ref *iref; struct btrfs_path *path; struct extent_buffer *leaf; @@ -4583,7 +4600,10 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, else type = BTRFS_EXTENT_DATA_REF_KEY; - size = sizeof(*extent_item) + btrfs_extent_inline_ref_size(type); + size = sizeof(*extent_item); + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + size += btrfs_extent_inline_ref_size(type); path = btrfs_alloc_path(); if (!path) @@ -4604,8 +4624,16 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, btrfs_set_extent_flags(leaf, extent_item, flags | BTRFS_EXTENT_FLAG_DATA); + iref = (struct btrfs_extent_inline_ref *)(extent_item + 1); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) { + btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY); + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid); + iref = (struct btrfs_extent_inline_ref *)(oref + 1); + } btrfs_set_extent_inline_ref_type(leaf, iref, type); + if (parent > 0) { struct btrfs_shared_data_ref *ref; ref = (struct btrfs_shared_data_ref *)(iref + 1); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index b93c96213304..1114cd915bd8 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -80,12 +80,20 @@ static void print_extent_data_ref(struct extent_buffer *eb, btrfs_extent_data_ref_count(eb, ref)); } +static void print_extent_owner_ref(struct extent_buffer *eb, + struct btrfs_extent_owner_ref *ref) +{ + WARN_ON(!btrfs_fs_incompat(eb->fs_info, SIMPLE_QUOTA)); + pr_cont("extent data owner root %llu\n", btrfs_extent_owner_ref_root_id(eb, ref)); +} + static void print_extent_item(struct extent_buffer *eb, int slot, int type) { struct btrfs_extent_item *ei; struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; + struct btrfs_extent_owner_ref *oref; struct btrfs_disk_key key; unsigned long end; unsigned long ptr; @@ -159,6 +167,10 @@ static void print_extent_item(struct extent_buffer *eb, int slot, int type) "\t\t\t(parent %llu not aligned to sectorsize %u)\n", offset, eb->fs_info->sectorsize); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + print_extent_owner_ref(eb, oref); + break; default: pr_cont("(extent %llu has INVALID ref type %d)\n", eb->start, type); diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index 95d28497de7c..9edc87eaff1f 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -485,6 +485,9 @@ static int process_extent_item(struct btrfs_fs_info *fs_info, ret = add_shared_data_ref(fs_info, offset, count, key->objectid, key->offset); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: btrfs_err(fs_info, "invalid key type in iref"); ret = -EINVAL; diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index e2b54793bf0c..27d4230a38a8 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1451,6 +1451,9 @@ static int check_extent_item(struct extent_buffer *leaf, } inline_refs += btrfs_shared_data_ref_count(leaf, sref); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: extent_err(leaf, slot, "unknown inline ref type: %u", inline_type); diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index ab38d0f411fa..424c7f342712 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -226,6 +226,8 @@ #define BTRFS_SHARED_DATA_REF_KEY 184 +#define BTRFS_EXTENT_OWNER_REF_KEY 190 + /* * block groups give us hints into the extent allocation trees. Which * blocks are free etc etc @@ -783,6 +785,10 @@ struct btrfs_shared_data_ref { __le32 count; } __attribute__ ((__packed__)); +struct btrfs_extent_owner_ref { + u64 root_id; +} __attribute__ ((__packed__)); + struct btrfs_extent_inline_ref { __u8 type; __le64 offset; From patchwork Wed May 3 00:59:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BC29C77B75 for ; Wed, 3 May 2023 01:00:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229488AbjECA74 (ORCPT ); Tue, 2 May 2023 20:59:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbjECA7z (ORCPT ); Tue, 2 May 2023 20:59:55 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25B8010E5 for ; Tue, 2 May 2023 17:59:53 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 9159F5C02C4; Tue, 2 May 2023 20:59:52 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 02 May 2023 20:59:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075592; x= 1683161992; bh=Fyg3Qhxh4i1zVTniLzT7tblQqhRAZo85Wa6g2FIgu5w=; b=U VDgfhacwag2ugY1ZlJNM4Hbk7/u0XdC5XEdP/4oPlZUR7oUMLTu54fIZfBR6JQIf zyn0J3magaUU/akvMdLvTgN+ed2C09I3snUWAhxWy5V8dPYT1shcJdJhIyWY61AN ZwbFUSmn3pDmGzh0yDSsRyXLSHtsInw84Ja7K1E9iftvOPTW7YuSz3qazxJ6r6Gf ShW5AlzSMR+Ear6psk/mcjhREBSQP2ykuUgBFY7oi8dX50cEQ/98MPF9vBMksn8y xty+LkmMUWIiMQ7CKHYK0d9aO2HjfjPGBF0HOhdmzRJ4QfmS3hlLtYAFldKx9gj0 tDA6fX3VaSS3ABXO/6bcA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075592; x=1683161992; bh=F yg3Qhxh4i1zVTniLzT7tblQqhRAZo85Wa6g2FIgu5w=; b=CdPvKDPznFJiCWnLS IcWyvQePiA5/cXw2uX9d02L2Qyk4/zscE2EvjVVD9qoWrTkb21OsLJ52gJL1lBqS OgwRgHc1IVCQYR0vS/50ms9U/UHo8RRgC8QGESzA+yWGqbPXJhHYb8rLaxuKtolM z+A6WggDKCkPxWOLz6v6RFiB25busMZljQ6W9/f/LnMCuo94No+qUwYf7KNOcpUo CggHYa9xbqOCQVeFmcFILT5GoIDyTuPzYCj8gy1xwAIDH4nJVX3yIUizGXajcw4v lvzwNgmEViBirEvaVPepel0uGHOxd+L0zn9x7ZWQBIz6IY923kt2AB3knv0/aQ64 YxHBA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:52 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/9] btrfs: track metadata owning root in delayed refs Date: Tue, 2 May 2023 17:59:25 -0700 Message-Id: X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While data extents require us to store additional inline refs to track the original owner on free, this information is available implicitly for metadata. It is found in the owner field of the header of the tree block. Even if other trees refer to this block and the original ref goes away, we will not rewrite that header field, so it will reliabl give the original owner. To use it for recording simple quota deltas, we need to wire this root id through from when we create the delayed ref until we fully process it and know that we are deleteing a metadata extent. Store it in the btrfs_tree_ref struct of the delayed tree ref. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 7 ++++--- fs/btrfs/delayed-ref.h | 16 +++++++++++++++- fs/btrfs/extent-tree.c | 8 ++++---- fs/btrfs/relocation.c | 11 ++++++----- 4 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 0b32432d7d56..f4c01965e1ea 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -838,7 +838,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, static void init_delayed_ref_common(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 ref_root, - int action, u8 ref_type) + int action, u8 ref_type, u64 allocation_owning_root) { u64 seq = 0; @@ -857,6 +857,7 @@ static void init_delayed_ref_common(struct btrfs_fs_info *fs_info, ref->in_tree = 1; ref->seq = seq; ref->type = ref_type; + ref->allocation_owning_root = allocation_owning_root; RB_CLEAR_NODE(&ref->ref_node); INIT_LIST_HEAD(&ref->add_list); } @@ -915,7 +916,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, generic_ref->tree_ref.owning_root, action, - ref_type); + ref_type, generic_ref->tree_ref.allocation_owning_root); ref->root = generic_ref->tree_ref.owning_root; ref->parent = parent; ref->level = level; @@ -989,7 +990,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, else ref_type = BTRFS_EXTENT_DATA_REF_KEY; init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, - ref_root, action, ref_type); + ref_root, action, ref_type, ref_root); ref->root = ref_root; ref->parent = parent; ref->objectid = owner; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index b54261fe509b..427de8a25eb2 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -32,6 +32,12 @@ struct btrfs_delayed_ref_node { /* seq number to keep track of insertion order */ u64 seq; + /* + * root which originally allocated this extent and owns it for + * simple quota accounting purposes. + */ + u64 allocation_owning_root; + /* ref count on this data structure */ refcount_t refs; @@ -216,6 +222,12 @@ struct btrfs_tree_ref { u64 owning_root; /* For non-skinny metadata, no special member needed */ + + /* + * Root that originally allocated this block and owns the allocation for + * simple quota accounting purposes. + */ + u64 allocation_owning_root; }; struct btrfs_ref { @@ -284,7 +296,8 @@ static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref, } static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, - int level, u64 root, u64 mod_root, bool skip_qgroup) + int level, u64 root, u64 mod_root, + bool skip_qgroup, u64 allocation_owning_root) { #ifdef CONFIG_BTRFS_FS_REF_VERIFY /* If @real_root not set, use @root as fallback */ @@ -292,6 +305,7 @@ static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, #endif generic_ref->tree_ref.level = level; generic_ref->tree_ref.owning_root = root; + generic_ref->tree_ref.allocation_owning_root = allocation_owning_root; generic_ref->type = BTRFS_REF_METADATA; if (skip_qgroup || !(is_fstree(root) && (!mod_root || is_fstree(mod_root)))) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b9a2f1e355b7..c821d22be2ca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2446,7 +2446,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&generic_ref, action, bytenr, num_bytes, parent); btrfs_init_tree_ref(&generic_ref, level - 1, ref_root, - root->root_key.objectid, for_reloc); + root->root_key.objectid, for_reloc, ref_root); if (inc) ret = btrfs_inc_extent_ref(trans, &generic_ref); else @@ -3268,7 +3268,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&generic_ref, BTRFS_DROP_DELAYED_REF, buf->start, buf->len, parent); btrfs_init_tree_ref(&generic_ref, btrfs_header_level(buf), - root_id, 0, false); + root_id, 0, false, btrfs_header_owner(buf)); if (root_id != BTRFS_TREE_LOG_OBJECTID) { btrfs_ref_tree_mod(fs_info, &generic_ref); @@ -4952,7 +4952,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, ins.objectid, ins.offset, parent); btrfs_init_tree_ref(&generic_ref, level, root_objectid, - root->root_key.objectid, false); + root->root_key.objectid, false, btrfs_header_owner(buf)); btrfs_ref_tree_mod(fs_info, &generic_ref); ret = btrfs_add_delayed_tree_ref(trans, &generic_ref, extent_op); if (ret) @@ -5374,7 +5374,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, fs_info->nodesize, parent); btrfs_init_tree_ref(&ref, level - 1, root->root_key.objectid, - 0, false); + 0, false, btrfs_header_owner(next)); ret = btrfs_free_extent(trans, &ref); if (ret) goto out_unlock; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 09b1988d1791..0e981e8a374e 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1384,7 +1384,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, old_bytenr, blocksize, path->nodes[level]->start); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, - 0, true); + 0, true, src->root_key.objectid); ret = btrfs_inc_extent_ref(trans, &ref); if (ret) { btrfs_abort_transaction(trans, ret); @@ -1393,7 +1393,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr, blocksize, 0); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, 0, - true); + true, dest->root_key.objectid); ret = btrfs_inc_extent_ref(trans, &ref); if (ret) { btrfs_abort_transaction(trans, ret); @@ -1403,7 +1403,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, new_bytenr, blocksize, path->nodes[level]->start); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, - 0, true); + 0, true, src->root_key.objectid); ret = btrfs_free_extent(trans, &ref); if (ret) { btrfs_abort_transaction(trans, ret); @@ -1413,7 +1413,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, old_bytenr, blocksize, 0); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, - 0, true); + 0, true, src->root_key.objectid); ret = btrfs_free_extent(trans, &ref); if (ret) { btrfs_abort_transaction(trans, ret); @@ -2494,7 +2494,8 @@ static int do_relocation(struct btrfs_trans_handle *trans, upper->eb->start); btrfs_init_tree_ref(&ref, node->level, btrfs_header_owner(upper->eb), - root->root_key.objectid, false); + root->root_key.objectid, false, + btrfs_header_owner(upper->eb)); ret = btrfs_inc_extent_ref(trans, &ref); if (!ret) ret = btrfs_drop_subtree(trans, root, eb, From patchwork Wed May 3 00:59:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229422 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A5F4C7EE24 for ; Wed, 3 May 2023 01:00:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229496AbjECA77 (ORCPT ); Tue, 2 May 2023 20:59:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbjECA75 (ORCPT ); Tue, 2 May 2023 20:59:57 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C895F2D69 for ; Tue, 2 May 2023 17:59:54 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 35F185C0310; Tue, 2 May 2023 20:59:54 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 02 May 2023 20:59:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075594; x= 1683161994; bh=StCgSaAGnttN7o4lvZdFJjTNKXiMfeCw1hb8kRfr0+w=; b=T xt9JsqOndzWBGw1q9WC8fNemriZZxuO6N8UaULNgNhZCli8qETgwzVZBdt2GcHIg iOVXPGpy4Zo0OozzzJNA0u69zsw48Uv4bAPJ7JUpOOnyVQEVRVhwu6XTK6CqB8+w R30Of4AlBycbEUur7Jl2E5Z9WJbGlV9dD/YlgOQsFucieuEdCfZz7nJxD46Ygn30 giN3Nfq+T1nfcGu/mxeWo2oW3iETXiU/YGER8s48mVf9EY8J9HsWpFcYUWlgCG0o 8Gl9/V711i8LdtD0fE488BXZ3zoPquqPQGsUuyGlGQe8Grrut9TVIbLxc08bFI4z rmv6tHDTxnOaKGSRa03Qg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075594; x=1683161994; bh=S tCgSaAGnttN7o4lvZdFJjTNKXiMfeCw1hb8kRfr0+w=; b=HCwjsS8+Jvh1Je1jo 1r9s4SVZdHBZ/IbDdPMwCldarho4fimDWcLJk/oQrWuq5y7BhCHMsOj/ZxA2PwBz OsX9vlyGO8eWpeWVN3smX1rfXs4VJPnP2NBSdpJcqzvZOk/qfzXv7xt+vrY+fVDz blVitLGblwEhH2bgtPVqi0i/UOq5o49J/19rGqth5vKv2Ljz/7/9wp1BNaJiz192 A5giYAN3t04JLA8ukm6UNqA9KE1RHfQvTa6+ggHq42WyCOecyDr7cc91n3EDMWlu oXoMP3/lcf7JSbX0DARxJnRxbiI8ijwx0/kFtaRD2HtBL9mBBEYMkp/98Bdy8LxP Ck05w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedggedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:53 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/9] btrfs: record simple quota deltas Date: Tue, 2 May 2023 17:59:26 -0700 Message-Id: X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org At the moment that we run delayed refs, we make the final ref-count based decision on creating/removing extent (and metadata) items. Therefore, it is exactly the spot to hook up simple quotas. There are a few important subtleties to the fields we must collect to accurately track simple quotas, particularly when removing an extent. When removing a data extent, the ref could be in any tree (due to reflink, for example) and so we need to recover the owning root id from the owner ref item. When removing a metadata extent, we know the owning root from the owner field in the header when we create the delayed ref, so we can recover it from there. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 6 ++++ fs/btrfs/delayed-ref.h | 12 +++++++ fs/btrfs/extent-tree.c | 82 ++++++++++++++++++++++++++++++++++++++---- fs/btrfs/qgroup.c | 11 +++--- fs/btrfs/qgroup.h | 2 ++ fs/btrfs/transaction.c | 5 +++ 6 files changed, 106 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index f4c01965e1ea..c06e5c8bcdc1 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -732,6 +732,12 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, head_ref->bytenr = bytenr; head_ref->num_bytes = num_bytes; head_ref->ref_mod = count_mod; + head_ref->ref_root = 0; + head_ref->reserved_bytes = 0; + if (ref_root && reserved) { + head_ref->ref_root = ref_root; + head_ref->reserved_bytes = reserved; + } head_ref->must_insert_reserved = must_insert_reserved; head_ref->is_data = is_data; head_ref->is_system = is_system; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 427de8a25eb2..64aba4601de4 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -107,6 +107,18 @@ struct btrfs_delayed_ref_head { */ int ref_mod; + /* + * Track the root which accounted for the reservation relevant to + * must_insert_reserved. In simple quota mode, we need to drop + * reservations for such head refs when we drop delayed refs. + */ + u64 ref_root; + /* + * Track reserved bytes when setting must_insert_reserved. + * On success or cleanup, we will need to free the reservation. + */ + u64 reserved_bytes; + /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c821d22be2ca..7379ee04018d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1506,6 +1506,7 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, } static int run_delayed_data_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, int insert_reserved) @@ -1529,12 +1530,23 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, ref_root = ref->root; if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_simple_quota_delta delta = { + .root = ref_root, + .num_bytes = node->num_bytes, + .rsv_root = href->ref_root, + .rsv_bytes = href->reserved_bytes, + .is_data = true, + .is_inc = true, + }; + if (extent_op) flags |= extent_op->flags_to_set; ret = alloc_reserved_file_extent(trans, parent, ref_root, flags, ref->objectid, ref->offset, &ins, node->ref_mod); + if (!ret) + ret = btrfs_record_simple_quota_delta(trans->fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->objectid, ref->offset, @@ -1661,6 +1673,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, int insert_reserved) { int ret = 0; + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_tree_ref *ref; u64 parent = 0; u64 ref_root = 0; @@ -1680,8 +1693,19 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, return -EIO; } if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_simple_quota_delta delta = { + .root = ref_root, + .num_bytes = fs_info->nodesize, + .rsv_root = 0, + .rsv_bytes = 0, + .is_data = false, + .is_inc = true, + }; + BUG_ON(!extent_op || !extent_op->update_flags); ret = alloc_reserved_tree_block(trans, node, extent_op); + if (!ret) + btrfs_record_simple_quota_delta(fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->level, 0, 1, extent_op); @@ -1696,6 +1720,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, /* helper function to actually process a single delayed ref entry */ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, int insert_reserved) @@ -1714,8 +1739,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, insert_reserved); else if (node->type == BTRFS_EXTENT_DATA_REF_KEY || node->type == BTRFS_SHARED_DATA_REF_KEY) - ret = run_delayed_data_ref(trans, node, extent_op, - insert_reserved); + ret = run_delayed_data_ref(trans, href, node, + extent_op, insert_reserved); else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) ret = 0; else @@ -1812,6 +1837,11 @@ void btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, spin_unlock(&delayed_refs->lock); nr_items += btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); } + if (head->must_insert_reserved && head->is_data && + btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + btrfs_qgroup_free_refroot(fs_info, head->ref_root, + head->reserved_bytes, + BTRFS_QGROUP_RSV_DATA); btrfs_delayed_refs_rsv_release(fs_info, nr_items); } @@ -1959,8 +1989,8 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, locked_ref->extent_op = NULL; spin_unlock(&locked_ref->lock); - ret = run_one_delayed_ref(trans, ref, extent_op, - must_insert_reserved); + ret = run_one_delayed_ref(trans, locked_ref, ref, + extent_op, must_insert_reserved); btrfs_free_delayed_extent_op(extent_op); if (ret) { @@ -2826,11 +2856,12 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) } static int do_free_extent_accounting(struct btrfs_trans_handle *trans, - u64 bytenr, u64 num_bytes, bool is_data) + u64 bytenr, struct btrfs_simple_quota_delta *delta) { int ret; + u64 num_bytes = delta->num_bytes; - if (is_data) { + if (delta->is_data) { struct btrfs_root *csum_root; csum_root = btrfs_csum_root(trans->fs_info, bytenr); @@ -2841,6 +2872,12 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans, } } + ret = btrfs_record_simple_quota_delta(trans->fs_info, delta); + if (ret) { + btrfs_abort_transaction(trans, ret); + return ret; + } + ret = add_to_free_space_tree(trans, bytenr, num_bytes); if (ret) { btrfs_abort_transaction(trans, ret); @@ -2936,6 +2973,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, u64 bytenr = node->bytenr; u64 num_bytes = node->num_bytes; bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA); + u64 delayed_ref_root = node->allocation_owning_root; extent_root = btrfs_extent_root(info, bytenr); ASSERT(extent_root); @@ -3133,6 +3171,16 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } } } else { + struct btrfs_extent_owner_ref *oref; + struct btrfs_simple_quota_delta delta = { + .root = delayed_ref_root, + .num_bytes = num_bytes, + .rsv_root = 0, + .rsv_bytes = 0, + .is_data = is_data, + .is_inc = false, + }; + /* In this branch refs == 1 */ if (found_extent) { if (is_data && refs_to_drop != @@ -3170,6 +3218,26 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, num_to_del = 2; } } + /* + * We can't infer the data owner from the delayed ref, so we + * need to try to get it from the owning ref item. + * + * If it is not present, then that extent was not written under + * simple quotas mode, so we don't need to account for its + * deletion. + */ + if (is_data && btrfs_fs_incompat(trans->fs_info, SIMPLE_QUOTA)) { + struct btrfs_extent_inline_ref *owning_iref; + int type; + + owning_iref = (struct btrfs_extent_inline_ref *)(ei + 1); + type = btrfs_get_extent_inline_ref_type(leaf, owning_iref, + BTRFS_REF_TYPE_ANY); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + oref = (struct btrfs_extent_owner_ref *)(&owning_iref->offset); + delta.root = btrfs_extent_owner_ref_root_id(leaf, oref); + } + } ret = btrfs_del_items(trans, extent_root, path, path->slots[0], num_to_del); @@ -3179,7 +3247,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } btrfs_release_path(path); - ret = do_free_extent_accounting(trans, bytenr, num_bytes, is_data); + ret = do_free_extent_accounting(trans, bytenr, &delta); } btrfs_release_path(path); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 8982b76ae9e5..e3d0630fef0c 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1655,6 +1655,9 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid) struct btrfs_qgroup *qgroup; int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) + return 0; + mutex_lock(&fs_info->qgroup_ioctl_lock); if (!fs_info->quota_root) { ret = -ENOTCONN; @@ -4520,7 +4523,6 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, struct ulist_iterator uiter; struct ulist_node *unode; struct btrfs_qgroup *qg; - bool drop_rsv = false; u64 root = delta->root; u64 num_bytes = delta->num_bytes; int sign = delta->is_inc ? 1 : -1; @@ -4546,14 +4548,13 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, qg = unode_aux_to_qgroup(unode); qg->excl += num_bytes * sign; qg->rfer += num_bytes * sign; - if (delta->is_inc && delta->is_data) - drop_rsv = true; qgroup_dirty(fs_info, qg); } out: spin_unlock(&fs_info->qgroup_lock); - if (!ret && drop_rsv) - btrfs_qgroup_free_refroot(fs_info, root, num_bytes, BTRFS_QGROUP_RSV_DATA); + if (!ret && delta->rsv_bytes && delta->rsv_root) + btrfs_qgroup_free_refroot(fs_info, delta->rsv_root, + delta->rsv_bytes, BTRFS_QGROUP_RSV_DATA); return ret; } diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 0d627a871900..b300998dcbc7 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -238,6 +238,8 @@ struct btrfs_qgroup { struct btrfs_simple_quota_delta { u64 root; /* The fstree root this delta counts against */ u64 num_bytes; /* The number of bytes in the extent being counted */ + u64 rsv_root; /* The fstree root this delta should free rsv for */ + u64 rsv_bytes; /* The number of bytes reserved for this extent */ bool is_inc; /* Whether we are using or freeing the extent */ bool is_data; /* Whether the extent is data or metadata */ }; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e6d6752c2fca..0edfb58afd80 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1704,6 +1704,11 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, } btrfs_release_path(path); + ret = btrfs_create_qgroup(trans, objectid); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto fail; + } /* * pull in the delayed directory update * and the delayed inode item From patchwork Wed May 3 00:59:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D9A1C7EE29 for ; Wed, 3 May 2023 01:00:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229498AbjECA77 (ORCPT ); Tue, 2 May 2023 20:59:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjECA75 (ORCPT ); Tue, 2 May 2023 20:59:57 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6692D358E for ; Tue, 2 May 2023 17:59:56 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id D6C3C5C0311; Tue, 2 May 2023 20:59:55 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 02 May 2023 20:59:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075595; x= 1683161995; bh=B4QVcOh1n1iRTg3zT022aoBauHMxFlmaxBHP1x45Oqk=; b=h L0VigQbHODUY13nrzpDaS+jzaesqNaZkBV9zbrtC4mgCVUYiEGEO2FS4mW2ZZcMb WSoSEMQvrUHxvEgmFnYpgu3KINGng4/BCrOS2dDi9Xj1fbpMY8xpj9wWwYKWJyAE Cx6EqIYnHDNMbIUCZUQScqA4JYMdewSy/SbCAXcRzFwPMTTcJtXzA6fOPO9CUZyF wlbyd87GOoOkKORGW4c5RXFcM4pJ7LLRmNUm+//Ax5yFYYvbSWIhWgi7AWkSwBm7 fh3ugqdAnMJMq0mmuD22+AwZV3aoFmnOnY1uRihbloJapH32ChrRJ3lbRGR0+Fxz a3ZGPmdwESvo/IAKdjOpw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075595; x=1683161995; bh=B 4QVcOh1n1iRTg3zT022aoBauHMxFlmaxBHP1x45Oqk=; b=PLDr824L8aRbcFmTp oVLuPbh3PVpWA6vJOHJCSJ85De9wB0LuYs9I3p87PaNbHVBlL6mahZAUwXhtNffT R9B3g30CctH7DQ8wGxQpyZIOnMIEULnZwJROcj7E7N1N9yy4aF1YFd2YGbNU/YKC g8heDYRRpwlVYWBPfNm7K8Vb96Ez8wN8oLGmcxomq4BlTAvsNUjCQI06KjeGXRPC BB/gcsSU0WNw+Gnhog9usJCpqxdiLIZTqpvxVgz328J1h83njUoWxwCZDK4+JOYY Fe4v/e3zz/C2fGgOOZG0ft0MBwc5HFVxsyGLlpFJQtwNycxZHYCLjqdZIGLeDk+V QYUPw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedggedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedvnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:55 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/9] btrfs: auto hierarchy for simple qgroups of nested subvols Date: Tue, 2 May 2023 17:59:27 -0700 Message-Id: X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Consider the following sequence: - enable quotas - create subvol S id 256 at dir outer/ - create a qgroup 1/100 - add 0/256 (S's auto qgroup) to 1/100 - create subvol T id 257 at dir outer/inner/ With full qgroups, there is no relationship between 0/257 and either of 0/256 or 1/100. There is an inherit feature that the creator of inner/ can use to specify it ought to be in 1/100. Simple quotas are targeted at container isolation, where such automatic inheritance for not necessarily trusted/controlled nested subvol creation would be quite helpful. Therefore, add a new default behavior for simple quotas: when you create a nested subvol, automatically inherit as parents any parents of the qgroup of the subvol the new inode is going in. In our example, 257/0 would also be under 1/100, allowing easy control of a total quota over an arbitrary hierarchy of subvolumes. I think this _might_ be a generally useful behavior, so it could be interesting to put it behind a new inheritance flag that simple quotas always use while traditional quotas let the user specify, but this is a minimally intrusive change to start. Signed-off-by: Boris Burkov --- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 46 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/qgroup.h | 6 +++--- fs/btrfs/transaction.c | 2 +- 4 files changed, 48 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ca7d2ef739c8..4d6d28feb5c6 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, /* Tree log can't currently deal with an inode which is a new root. */ btrfs_set_log_full_commit(trans); - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); if (ret) goto out; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index e3d0630fef0c..6816e01f00b5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1504,8 +1504,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, return ret; } -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst) +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_qgroup *parent; @@ -2945,6 +2944,42 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) return ret; } +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, + u64 inode_rootid, + struct btrfs_qgroup_inherit **inherit) +{ + int i = 0; + u64 num_qgroups = 0; + struct btrfs_qgroup *inode_qg; + struct btrfs_qgroup_list *qg_list; + + if (*inherit) + return -EEXIST; + + inode_qg = find_qgroup_rb(fs_info, inode_rootid); + if (!inode_qg) + return -ENOENT; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { + ++num_qgroups; + } + + if (!num_qgroups) + return 0; + + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS); + if (!*inherit) + return -ENOMEM; + (*inherit)->num_qgroups = num_qgroups; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { + u64 qg_id = qg_list->group->qgroupid; + *((u64 *)((*inherit)+1) + i) = qg_id; + } + + return 0; +} + /* * Copy the accounting information between qgroups. This is necessary * when a snapshot or a subvolume is created. Throwing an error will @@ -2952,7 +2987,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) * when a readonly fs is a reasonable outcome. */ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit) + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit) { int ret = 0; int i; @@ -2994,6 +3030,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, goto out; } + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + qgroup_auto_inherit(fs_info, inode_rootid, &inherit); + if (inherit) { i_qgroups = (u64 *)(inherit + 1); nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + @@ -3020,6 +3059,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, if (ret) goto out; + /* * add qgroup to all inherited groups */ diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index b300998dcbc7..aecebe9d0d62 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -272,8 +272,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, bool interruptible); -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst); +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); @@ -367,7 +366,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); int btrfs_run_qgroups(struct btrfs_trans_handle *trans); int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit); + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit); void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 0edfb58afd80..6befcf1b4b1f 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1557,7 +1557,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, /* Now qgroup are all updated, we can inherit it to new qgroups */ ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid, - inherit); + parent->root_key.objectid, inherit); if (ret < 0) goto out; From patchwork Wed May 3 00:59:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229424 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B329C7EE2A for ; Wed, 3 May 2023 01:00:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229501AbjECBAB (ORCPT ); Tue, 2 May 2023 21:00:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbjECA77 (ORCPT ); Tue, 2 May 2023 20:59:59 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12D042D5E for ; Tue, 2 May 2023 17:59:58 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 84B625C0314; Tue, 2 May 2023 20:59:57 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Tue, 02 May 2023 20:59:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075597; x= 1683161997; bh=xpw8KzBWFdqXFGYtgH8wcPFmB32ZF1BOUcspkp+TVk4=; b=y GZb2EGE/hYvW1AJ7xPYG0eYtOhgTtybQcd3eQLLXt8k7z67VPSPEqh+I2OJc135l FKZi49jKXN5WMG9wkqY3Fm9lbfE75P3Mf8gwzCLYdcAleqixwk/jh6+eStlLDojE vy5yA3hZn84V+K6ayDw4ebHAo2VqhM2uZyX9lCxlaQSRuyHzZO1QmIr0lz5Co+gG HzTQQoTRQ2YYsAcCjnEVQ2oAtOsgnK6HwqR1Y0sL1rbEMO+aoKni/bT2Oms18fFf 49pQOlsEZBLiP7VpFa8LsFQwhtwJsasCP7OTW9vQRQqO2drjIo601gvAmrzgIOvJ duSWcA32K01Kb4XZsiSwQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075597; x=1683161997; bh=x pw8KzBWFdqXFGYtgH8wcPFmB32ZF1BOUcspkp+TVk4=; b=a+eW2YINxWoK0CO70 jSds655ioCa+zubGl2DTWx/TtXyqm9zQb8amZjdpDhF349pRTJhLXhiaeXKeC4qs tjeYTXxomt14k23PcXdB+WaWRVvwc1YbhepEhbBm5ecxpoPXicGnjx4AG8XG62eP bRP2ft4JghA5VXU751n/xZT41G0KftiwzTGiA8wBfPNYkSZoK84QQMygXyztYt7N kwYggM/R7P0T9tcF16o/ESDMgrds6kYT56kArWEqzDhcw+RMiw1BcMl/U9pZuVNG zgYgUb09+f1RMu1u3NfJi3FwUbD9h/GVdH/k/bb5hCx9C4+ar2M/bLloEKhFbNFg qA+Vw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedggedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:57 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 7/9] btrfs: check generation when recording simple quota delta Date: Tue, 2 May 2023 17:59:28 -0700 Message-Id: <13e4e8d1d9479423c135fa9d192e074c8036bcf4.1683075170.git.boris@bur.io> X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Simple quotas count extents only from the moment the feature is enabled. Therefore, if we do something like: 1. create subvol S 2. write F in S 3. enable quotas 4. remove F 5. write G in S then after 3. and 4. we would expect the simple quota usage of S to be 0 (putting aside some metadata extents that might be written) and after 5., it should be the size of G plus metadata. Therefore, we need to be able to determine whether a particular quota delta we are processing predates simple quota enablement. To do this, store the transaction id when quotas were enabled. In fs_info for immediate use and in the quota status item to make it recoverable on mount. When we see a delta, check if the generation of the extent item is less than that of quota enablement. If so, we should ignore the delta from this extent. Signed-off-by: Boris Burkov --- fs/btrfs/accessors.h | 2 ++ fs/btrfs/extent-tree.c | 3 +++ fs/btrfs/fs.h | 2 ++ fs/btrfs/qgroup.c | 20 ++++++++++++++++---- fs/btrfs/qgroup.h | 1 + include/uapi/linux/btrfs_tree.h | 7 +++++++ 6 files changed, 31 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index aab61312e4e8..8d122cc42cb7 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -971,6 +971,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, rescan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item, + enable_gen, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7379ee04018d..7f48e7d34b09 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1537,6 +1537,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, .rsv_bytes = href->reserved_bytes, .is_data = true, .is_inc = true, + .generation = trans->transid, }; if (extent_op) @@ -1700,6 +1701,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = false, .is_inc = true, + .generation = trans->transid, }; BUG_ON(!extent_op || !extent_op->update_flags); @@ -3179,6 +3181,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = is_data, .is_inc = false, + .generation = btrfs_extent_generation(leaf, ei), }; /* In this branch refs == 1 */ diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 6c989d87768c..d67ee2652c87 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -803,6 +803,8 @@ struct btrfs_fs_info { spinlock_t eb_leak_lock; struct list_head allocated_ebs; #endif + + u64 quota_enable_gen; }; static inline void btrfs_set_last_root_drop_gen(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 6816e01f00b5..cd28e92d8a37 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -467,8 +467,9 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); } - fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, - ptr); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + fs_info->quota_enable_gen = btrfs_qgroup_status_enable_gen(l, ptr); + fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr); rescan_progress = btrfs_qgroup_status_rescan(l, ptr); goto next1; } @@ -1114,6 +1115,10 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_qgroup_status_item); btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); + if (simple_qgroups) { + btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); + btrfs_set_qgroup_status_enable_gen(leaf, ptr, trans->transid); + } btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; if (!simple_qgroups) @@ -1209,6 +1214,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, goto out_free_path; } + fs_info->quota_enable_gen = trans->transid; + mutex_unlock(&fs_info->qgroup_ioctl_lock); /* * Commit the transaction while not holding qgroup_ioctl_lock, to avoid @@ -1233,8 +1240,6 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, spin_lock(&fs_info->qgroup_lock); fs_info->quota_root = quota_root; set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); - if (simple_qgroups) - btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); spin_unlock(&fs_info->qgroup_lock); /* Skip rescan for simple qgroups */ @@ -4573,6 +4578,13 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, if (!is_fstree(root)) return 0; + /* + * If the extent predates enabling quotas, don't count it. + * This is particularly likely when freeing old extents. + */ + if (delta->generation < fs_info->quota_enable_gen) + return 0; + spin_lock(&fs_info->qgroup_lock); qgroup = find_qgroup_rb(fs_info, root); if (!qgroup) { diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index aecebe9d0d62..9f3c397d51fb 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -242,6 +242,7 @@ struct btrfs_simple_quota_delta { u64 rsv_bytes; /* The number of bytes reserved for this extent */ bool is_inc; /* Whether we are using or freeing the extent */ bool is_data; /* Whether the extent is data or metadata */ + u64 generation; /* The generation the extent was created in */ }; static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 424c7f342712..7797560f0215 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -1230,6 +1230,13 @@ struct btrfs_qgroup_status_item { * of the scan. It contains a logical address */ __le64 rescan; + + /* + * the generation when quotas are enabled. Used by simple quotas to + * avoid decrementing when freeing an extent that was written before + * enable. + */ + __le64 enable_gen; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { From patchwork Wed May 3 00:59:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45E0FC7EE2C for ; Wed, 3 May 2023 01:00:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229506AbjECBAC (ORCPT ); Tue, 2 May 2023 21:00:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229499AbjECBAA (ORCPT ); Tue, 2 May 2023 21:00:00 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEAD72D5E for ; Tue, 2 May 2023 17:59:59 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 38DA55C030D; Tue, 2 May 2023 20:59:59 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 02 May 2023 20:59:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075599; x= 1683161999; bh=kiz1WuRnLGG9Z3fuMX6gyOK8LAOaY29ruDhICDGePII=; b=d /gPUDshhxeKqF/ywVcC/fA1ZS3b6TdtrZr7HGc+KY+FRkt0DyOuLI105MsdlJdHm u1Su2OSJ4XQx/qoosFcCRZWJe5j85VgKMUi37NEoQuXsPjZWI5Iu+Ez+TCHX87AY 1wOrgyx6P7vQnWxLRUQRx/9PZnvbhnmblVfnmaxfvuZNewYlT/WXgIW6RPqB7rcV BHz1pHNChMoiz9qSVBylw235DpPyWl42xlwHZfxdR4Rycyi7Gvjqf2cjy00jc0oM HZDNLVj9X+FARBOEAFUF1vBun4PEXwsYDIVgUwnJ6vegtToZCwxWPfyCZ8YguRXO e97HVHmTVmp5DUPUGcETA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075599; x=1683161999; bh=k iz1WuRnLGG9Z3fuMX6gyOK8LAOaY29ruDhICDGePII=; b=kODHeuXB1srf+Bg3d zNsegZ2UUOxU+yxxDkAXzxUyNhCIVOeVLGlvu+62JQaJ28AUbdROKmwZsRUScGy7 Pz+CvjpArFNF0LqyWbBx/h4RmXTSYRVpg0DQPaohFFa99JN21HJhRc9OJRncRZZK YJ0GwdWuxQKJ8f8y1BbMLqHgqXGULfcIYgXfGD8tWJti8BNz4Ajy/OCVeDoCtWnZ Wk0Dv0Jra2udipoXBuntkUXtNY+3Zyxw5vNTXzbBMqjlSgyzhT4JR1yO64O3SRKU dJ0MYWfza7lH5TSoP971j1ckFBZJy5pRVVZYQngbrcw48XJD1rKIZOTsSiXu77p3 JJ7lA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 20:59:58 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 8/9] btrfs: expose the qgroup mode via sysfs Date: Tue, 2 May 2023 17:59:29 -0700 Message-Id: X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new sysfs file /sys/fs/btrfs//qgroups/mode which prints out the mode qgroups is running in. The possible modes are disabled, qgroup, and squota Signed-off-by: Boris Burkov --- fs/btrfs/sysfs.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 25294e624851..d2de2207120a 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -2079,6 +2079,31 @@ static ssize_t qgroup_enabled_show(struct kobject *qgroups_kobj, } BTRFS_ATTR(qgroups, enabled, qgroup_enabled_show); +static ssize_t qgroup_mode_show(struct kobject *qgroups_kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(qgroups_kobj->parent); + char *mode = ""; + + spin_lock(&fs_info->qgroup_lock); + switch (btrfs_qgroup_mode(fs_info)) { + case BTRFS_QGROUP_MODE_DISABLED: + mode = "disabled"; + break; + case BTRFS_QGROUP_MODE_FULL: + mode = "qgroup"; + break; + case BTRFS_QGROUP_MODE_SIMPLE: + mode = "squota"; + break; + } + spin_unlock(&fs_info->qgroup_lock); + + return sysfs_emit(buf, "%s\n", mode); +} +BTRFS_ATTR(qgroups, mode, qgroup_mode_show); + static ssize_t qgroup_inconsistent_show(struct kobject *qgroups_kobj, struct kobj_attribute *a, char *buf) @@ -2141,6 +2166,7 @@ static struct attribute *qgroups_attrs[] = { BTRFS_ATTR_PTR(qgroups, enabled), BTRFS_ATTR_PTR(qgroups, inconsistent), BTRFS_ATTR_PTR(qgroups, drop_subtree_threshold), + BTRFS_ATTR_PTR(qgroups, mode), NULL }; ATTRIBUTE_GROUPS(qgroups); From patchwork Wed May 3 00:59:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13229425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D7B1C77B75 for ; Wed, 3 May 2023 01:00:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229511AbjECBAD (ORCPT ); Tue, 2 May 2023 21:00:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229502AbjECBAC (ORCPT ); Tue, 2 May 2023 21:00:02 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 740D1358E for ; Tue, 2 May 2023 18:00:01 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id E5CE25C0316; Tue, 2 May 2023 21:00:00 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 02 May 2023 21:00:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1683075600; x= 1683162000; bh=dFTnOYD7i0+LzW/Ouzk9kYFwhQ7rJPxOSqpojvo9gXk=; b=3 j1lJ9rDD3to5g1i6RV3rWT6iehEvenIwecuJ4FgxiUHiAdKM5cLBrqdCKDEh0Pwf vhM8UpFioAyH++8Ts5TcXJi1JtpkbarAtqk6EtBfO1OSdNcioVvpd4SI4beFrzCH t0H5Z/srt/EJcrFchVfOhVWX0kR9Z/00p/k6gG0aFY3DH4tXhzKXyPym8sJH2/bs cGTT2pZSmYNrj/EicB7v7Pqfi22NfyrPM1oWY6fF5EoB+vXs/h+9pJmK4PuXSOwK SlH1VEGN4qPEOXX+q1g0ZpOHh/DonGG3JL+mC6FFVoGC4w78WIMgDaFhqBTeSHeP GmTiaBcemy/5bJbiTE4tA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1683075600; x=1683162000; bh=d FTnOYD7i0+LzW/Ouzk9kYFwhQ7rJPxOSqpojvo9gXk=; b=P9y2CH0/AAn0IjxhC VMr3zXXsspvCB5tqe3frrdy9QjRK0biPLJoKqvyE/EYw0mw8txtm1Uanzq20XX8i +9wJjez8XHs1YGDhZ6h0zXzJG3K3DVX0jfMZRuOHt05xbO7Mp0wfnRpCj5wOURf6 g7GrrILgwv8gh9KkHpw1RoIPxCnbdydZD+7aV732ikbkMVBerIDrWu9Lk1+6BRoz dFUjclStnjbWJUJvfC0qAQ7Uay18VJy9Tlrr5qvoPJQus7ftP7uIYHYcO7r2Utsx 3Do0k7PsvbhqfWvjpdEoiUx+Uq8TmbVezoZiwJFlbhaZFZhLWdaUqnPXH/Jru1NA 6yTGw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvjedgfeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdr ihhoqeenucggtffrrghtthgvrhhnpeeiueffuedvieeujefhheeigfekvedujeejjeffve dvhedtudefiefhkeegueehleenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegsohhrihhssegsuhhrrdhioh X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 2 May 2023 21:00:00 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 9/9] btrfs: free qgroup rsv on io failure Date: Tue, 2 May 2023 17:59:30 -0700 Message-Id: <185066f6b569a51c99b26046d1e33ea46385c083.1683075170.git.boris@bur.io> X-Mailer: git-send-email 2.40.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we do a write whose bio suffers an error, we will never reclaim the qgroup reserved space for it. We allocate the space in the write_iter codepath, then release the reservation as we allocate the ordered extent, but we only create a delayed ref if the ordered extent finishes. If it has an error, we simply leak the rsv. This is apparent in running any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such tests fail due to dmesg on umount complaining about the leaked qgroup data space. Signed-off-by: Boris Burkov --- fs/btrfs/ordered-data.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index a9778a91511e..e6803587a13c 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -426,8 +426,12 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode, entry->bytes_left -= len; } - if (!uptodate) + if (!uptodate) { set_bit(BTRFS_ORDERED_IOERR, &entry->flags); + btrfs_qgroup_free_refroot(fs_info, inode->root->root_key.objectid, + entry->qgroup_rsv, BTRFS_QGROUP_RSV_DATA); + entry->qgroup_rsv = 0; + } /* * All the IO of the ordered extent is finished, we need to queue