From patchwork Thu May 27 04:51:57 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283331
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 20A2DC4708B
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 0277F613CC
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232616AbhE0Exm (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:42 -0400
Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:35272 "EHLO
        mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S234365AbhE0Exk (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:40 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 3E3DE11411D4
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h19-Ne
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qgP-EA
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 1/6] xfs: btree format inode forks can have zero extents
Date: Thu, 27 May 2021 14:51:57 +1000
Message-Id: <20210527045202.1155628-2-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=RfcaFHgcWMp4lnFpylIA:9
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

xfs/538 is assert failing with this trace when testing with
directory block sizes of 64kB:

XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
....
Call Trace:
 xfs_bmap_btree_to_extents+0x2a9/0x470
 ? kmem_cache_alloc+0xe7/0x220
 __xfs_bunmapi+0x4ca/0xdf0
 xfs_bunmapi+0x1a/0x30
 xfs_dir2_shrink_inode+0x71/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_remove+0x244/0x460
 xfs_vn_unlink+0x53/0xa0
 ? selinux_inode_unlink+0x13/0x20
 vfs_unlink+0x117/0x220
 do_unlinkat+0x1a2/0x2d0
 __x64_sys_unlink+0x42/0x60
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.

We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.

Also, the assert directly after this bogus one is:

	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);

Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.

The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7e3b9b01431e..3f8b6da09261 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -605,7 +605,6 @@ xfs_bmap_btree_to_extents(
 
 	ASSERT(cur);
 	ASSERT(whichfork != XFS_COW_FORK);
-	ASSERT(!xfs_need_iread_extents(ifp));
 	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
 	ASSERT(be16_to_cpu(rblock->bb_level) == 1);
 	ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1);

From patchwork Thu May 27 04:51:58 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283333
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F17EFC4708C
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CE90D61073
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234387AbhE0Exn (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:43 -0400
Received: from mail108.syd.optusnet.com.au ([211.29.132.59]:51001 "EHLO
        mail108.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S234363AbhE0Exl (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:41 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 3D38B1AFF11
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h17-Nd
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qgR-F0
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues
Date: Thu, 27 May 2021 14:51:58 +1000
Message-Id: <20210527045202.1155628-3-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=VwQbUJbxAAAA:8 a=pGLkceISAAAA:8
        a=jemw3zrVzxxEioCUW9kA:9 a=AjGcO6oz07-iQ99wixmX:22
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3f8b6da09261..a3e0e6f672d6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5349,7 +5349,6 @@ __xfs_bunmapi(
 	xfs_fsblock_t		sum;
 	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
 	xfs_fileoff_t		max_len;
-	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
 	xfs_fileoff_t		end;
 	struct xfs_iext_cursor	icur;
 	bool			done = false;
@@ -5441,16 +5440,6 @@ __xfs_bunmapi(
 		del = got;
 		wasdel = isnullstartblock(del.br_startblock);
 
-		/*
-		 * Make sure we don't touch multiple AGF headers out of order
-		 * in a single transaction, as that could cause AB-BA deadlocks.
-		 */
-		if (!wasdel && !isrt) {
-			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
-			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
-				break;
-			prev_agno = agno;
-		}
 		if (got.br_startoff < start) {
 			del.br_startoff = start;
 			del.br_blockcount -= start - got.br_startoff;

From patchwork Thu May 27 04:51:59 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283337
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5DC18C4708E
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 4300061073
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234363AbhE0Exn (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:43 -0400
Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:47000 "EHLO
        mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S234405AbhE0Exl (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:41 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 292351043897
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h1A-O9
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qgT-GT
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
Date: Thu, 27 May 2021 14:51:59 +1000
Message-Id: <20210527045202.1155628-4-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=m78L4NlQDRkRyoOWfzwA:9
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

Ever since we moved to freeing of extents by deferred operations,
we've already freed extents via individual transactions. Hence the
only limitation of how many extents we can mark for freeing in a
single xfs_bunmapi() call bound only by how many deferrals we want
to queue.

That is xfs_bunmapi() doesn't actually do any AG based extent
freeing, so there's no actually transaction reservation used up by
calling bunmapi with a large count of extents to be freed. RT
extents have always been freed directly by bunmapi, but that doesn't
require modification of large number of blocks as there are no
btrees to split.

Some callers of xfs_bunmapi assume that the extent count being freed
is bound by geometry (e.g. directories) and these can ask bunmapi to
free up to 64 extents in a single call. These functions just work as
tehy stand, so there's no reason for truncate to have a limit of
just two extents per bunmapi call anymore.

Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
that can be deferred in a single loop to match what the directory
code already uses.

For realtime inodes, where xfs_bunmapi() directly frees extents,
leave the limit at 2 extents per loop as this is all the space that
the transaction reservation will cover.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0369eb22c1bb..db220eaa34b8 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
 
 /*
  * Used in xfs_itruncate_extents().  This is the maximum number of extents
- * freed from a file in a single transaction.
+ * we will unmap and defer for freeing in a single call to xfs_bunmapi().
+ * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
+ * by transaction reservation size to 2 extents.
  */
-#define	XFS_ITRUNC_MAX_EXTENTS	2
+static inline int
+xfs_itrunc_max_extents(
+	struct xfs_inode	*ip)
+{
+	if (XFS_IS_REALTIME_INODE(ip))
+		return 2;
+	return 64;
+}
 
 STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
 STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
@@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
 	while (unmap_len > 0) {
 		ASSERT(tp->t_firstblock == NULLFSBLOCK);
 		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
-				flags, XFS_ITRUNC_MAX_EXTENTS);
+				flags, xfs_itrunc_max_extents(ip));
 		if (error)
 			goto out;
 

From patchwork Thu May 27 04:52:00 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283335
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 83357C4708F
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:12 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 68DD161073
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234365AbhE0Exo (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:44 -0400
Received: from mail110.syd.optusnet.com.au ([211.29.132.97]:52463 "EHLO
        mail110.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S234414AbhE0Exl (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:41 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 3E1A0105F88
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h1F-P7
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qgY-HO
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 4/6] xfs: add a free space extent change reservation
Date: Thu, 27 May 2021 14:52:00 +1000
Message-Id: <20210527045202.1155628-5-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=rZi3jfzWYBuMxqL7aZQA:9
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

Lots of the transaction reservation code reserves space for an
extent allocation. It is inconsistently implemented, and many of
them get it wrong. Introduce a new function to calculate the log
space reservation for adding or removing an extent from the free
space btrees.

This function reserves space for logging the AGF, the AGFL and the
free space btrees, avoiding the need to account for them seperately
in every reservation that manipulates free space.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index d1a0848cb52e..6363cacb790f 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -79,6 +79,23 @@ xfs_allocfree_log_count(
 	return blocks;
 }
 
+/*
+ * Log reservation required to add or remove a single extent to the free space
+ * btrees.  This requires modifying:
+ *
+ * the agf header: 1 sector
+ * the agfl header: 1 sector
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+uint
+xfs_allocfree_extent_res(
+	struct xfs_mount *mp)
+{
+	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+	       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
+				XFS_FSB_TO_B(mp, 1));
+}
+
 /*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into

From patchwork Thu May 27 04:52:01 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283327
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF105C4707F
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CC50C613CC
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234416AbhE0Exk (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:40 -0400
Received: from mail109.syd.optusnet.com.au ([211.29.132.80]:33889 "EHLO
        mail109.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S229579AbhE0Exj (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:39 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 2985667CED
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h1H-Q9
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qgb-IL
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 5/6] xfs: factor free space tree transaciton reservations
Date: Thu, 27 May 2021 14:52:01 +1000
Message-Id: <20210527045202.1155628-6-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=nuMuv_KYr93gYNIyU0IA:9
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

Convert all the open coded free space tree modification reservations
to use the new xfs_allocfree_extent_res() function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 122 ++++++++++++---------------------
 1 file changed, 45 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6363cacb790f..02079f55ef20 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -143,18 +143,16 @@ xfs_calc_inode_res(
  * reservation:
  *
  * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * one extent allocfree reservation for the AG.
  *
- * The caller must account for SB and AG header modifications, etc.
  */
 STATIC uint
 xfs_calc_inobt_res(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_buf_res(M_IGEO(mp)->inobt_maxlevels,
-			XFS_FSB_TO_B(mp, 1)) +
-				xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-			XFS_FSB_TO_B(mp, 1));
+				XFS_FSB_TO_B(mp, 1)) +
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -182,7 +180,7 @@ xfs_calc_finobt_res(
  * Calculate the reservation required to allocate or free an inode chunk. This
  * includes:
  *
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * one extent allocfree reservation for the AG.
  * the inode chunk: m_ino_geo.ialloc_blks * N
  *
  * The size N of the inode chunk reservation depends on whether it is for
@@ -200,8 +198,7 @@ xfs_calc_inode_chunk_res(
 {
 	uint			res, size = 0;
 
-	res = xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-			       XFS_FSB_TO_B(mp, 1));
+	res = xfs_allocfree_extent_res(mp);
 	if (alloc) {
 		/* icreate tx uses ordered buffers */
 		if (xfs_sb_version_has_v3inode(&mp->m_sb))
@@ -256,22 +253,18 @@ xfs_rtalloc_log_count(
  * extents.  This gives (t1):
  *    the inode getting the new extents: inode size
  *    the inode's bmap btree: max depth * block size
- *    the agfs of the ags from which the extents are allocated: 2 * sector
  *    the superblock free block counter: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  * Or, if we're writing to a realtime file (t2):
  *    the inode getting the new extents: inode size
  *    the inode's bmap btree: max depth * block size
- *    the agfs of the ags from which the extents are allocated: 2 * sector
+ *    one extent allocfree reservation for the AG.
  *    the superblock free block counter: sector size
  *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 1 block
- *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
  * And the bmap_finish transaction can free bmap blocks in a join (t3):
- *    the agfs of the ags containing the blocks: 2 * sector size
- *    the agfls of the ags containing the blocks: 2 * sector size
  *    the super block free block counter: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_write_reservation(
@@ -282,8 +275,8 @@ xfs_calc_write_reservation(
 
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) +
-	     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+	     xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 2;
 
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
 		t2 = xfs_calc_inode_res(mp, 1) +
@@ -291,13 +284,13 @@ xfs_calc_write_reservation(
 				     blksz) +
 		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
 		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 1), blksz) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), blksz);
+		     xfs_allocfree_extent_res(mp);
 	} else {
 		t2 = 0;
 	}
 
-	t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 2;
 
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
@@ -307,19 +300,13 @@ xfs_calc_write_reservation(
  *    the inode being truncated: inode size
  *    the inode's bmap btree: (max depth + 1) * block size
  * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
- *    the agf for each of the ags: 4 * sector size
- *    the agfl for each of the ags: 4 * sector size
  *    the super block to reflect the freed blocks: sector size
- *    worst case split in allocation btrees per extent assuming 4 extents:
- *		4 exts * 2 trees * (2 * max depth - 1) * block size
+ *    four extent allocfree reservations for the AG.
  * Or, if it's a realtime file (t3):
- *    the agf for each of the ags: 2 * sector size
- *    the agfl for each of the ags: 2 * sector size
  *    the super block to reflect the freed blocks: sector size
  *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 2 exts * 1 block
- *    worst case split in allocation btrees per extent assuming 2 extents:
- *		2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_itruncate_reservation(
@@ -331,13 +318,13 @@ xfs_calc_itruncate_reservation(
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
 
-	t2 = xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4), blksz);
+	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 4;
 
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
-		t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
+		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
 		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+		     xfs_allocfree_extent_res(mp) * 2;
 	} else {
 		t3 = 0;
 	}
@@ -352,10 +339,8 @@ xfs_calc_itruncate_reservation(
  *    the two directory bmap btrees: 2 * max depth * block size
  * And the bmap_finish transaction can free dir and bmap blocks (two sets
  *	of bmap blocks) giving:
- *    the agf for the ags in which the blocks live: 3 * sector size
- *    the agfl for the ags in which the blocks live: 3 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ *    three extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_rename_reservation(
@@ -365,9 +350,8 @@ xfs_calc_rename_reservation(
 		max((xfs_calc_inode_res(mp, 4) +
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 3));
 }
 
 /*
@@ -381,20 +365,19 @@ xfs_calc_iunlink_remove_reservation(
 	struct xfs_mount        *mp)
 {
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	       2 * M_IGEO(mp)->inode_cluster_size;
+	       xfs_calc_buf_res(2, M_IGEO(mp)->inode_cluster_size);
 }
 
 /*
  * For creating a link to an inode:
+ *    the inode is removed from the iunlink list (O_TMPFILE)
  *    the parent directory inode: inode size
  *    the linked inode: inode size
  *    the directory btree could split: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
  * And the bmap_finish transaction can free some bmap blocks giving:
- *    the agf for the ag in which the blocks live: sector size
- *    the agfl for the ag in which the blocks live: sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_link_reservation(
@@ -405,9 +388,8 @@ xfs_calc_link_reservation(
 		max((xfs_calc_inode_res(mp, 2) +
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -419,20 +401,19 @@ STATIC uint
 xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 {
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-			M_IGEO(mp)->inode_cluster_size;
+	       xfs_calc_buf_res(1, M_IGEO(mp)->inode_cluster_size);
 }
 
 /*
  * For removing a directory entry we can modify:
+ *    the inode is added to the agi unlinked list
  *    the parent directory inode: inode size
  *    the removed inode: inode size
  *    the directory btree could join: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
  * And the bmap_finish transaction can free the dir and bmap blocks giving:
- *    the agf for the ag in which the blocks live: 2 * sector size
- *    the agfl for the ag in which the blocks live: 2 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_remove_reservation(
@@ -443,9 +424,8 @@ xfs_calc_remove_reservation(
 		max((xfs_calc_inode_res(mp, 1) +
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 2));
 }
 
 /*
@@ -581,16 +561,14 @@ xfs_calc_ichange_reservation(
 /*
  * Growing the data section of the filesystem.
  *	superblock
- *	agi and agf
- *	allocation btrees
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_growdata_reservation(
 	struct xfs_mount	*mp)
 {
-	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	       xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -598,10 +576,9 @@ xfs_calc_growdata_reservation(
  * In the first set of transactions (ALLOC) we allocate space to the
  * bitmap or summary files.
  *	superblock: sector size
- *	agf of the ag from which the extent is allocated: sector size
  *	bmap btree for bitmap/summary inode: max depth * blocksize
  *	bitmap/summary inode: inode size
- *	allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_growrtalloc_reservation(
@@ -611,8 +588,7 @@ xfs_calc_growrtalloc_reservation(
 		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -675,7 +651,7 @@ xfs_calc_writeid_reservation(
  *	agf block and superblock (for block allocation)
  *	the new block (directory sized)
  *	bmap blocks for the new directory block
- *	allocation btrees
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_addafork_reservation(
@@ -687,8 +663,7 @@ xfs_calc_addafork_reservation(
 		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
 		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
 				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -696,11 +671,8 @@ xfs_calc_addafork_reservation(
  *    the inode being truncated: inode size
  *    the inode's bmap btree: max depth * block size
  * And the bmap_finish transaction can free the blocks and bmap blocks:
- *    the agf for each of the ags: 4 * sector size
- *    the agfl for each of the ags: 4 * sector size
  *    the super block to reflect the freed blocks: sector size
- *    worst case split in allocation btrees per extent assuming 4 extents:
- *		4 exts * 2 trees * (2 * max depth - 1) * block size
+ *    four extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrinval_reservation(
@@ -709,9 +681,8 @@ xfs_calc_attrinval_reservation(
 	return max((xfs_calc_inode_res(mp, 1) +
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
-		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
-				     XFS_FSB_TO_B(mp, 1))));
+		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		    xfs_allocfree_extent_res(mp) * 4));
 }
 
 /*
@@ -760,10 +731,8 @@ xfs_calc_attrsetrt_reservation(
  *    the attribute btree could join: max depth * block size
  *    the inode bmap btree could join or split: max depth * block size
  * And the bmap_finish transaction can free the attr blocks freed giving:
- *    the agf for the ag in which the blocks live: 2 * sector size
- *    the agfl for the ag in which the blocks live: 2 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrrm_reservation(
@@ -776,9 +745,8 @@ xfs_calc_attrrm_reservation(
 		     (uint)XFS_FSB_TO_B(mp,
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
-		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 2));
 }
 
 /*

From patchwork Thu May 27 04:52:02 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dave Chinner <david@fromorbit.com>
X-Patchwork-Id: 12283325
Return-Path: <linux-xfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 13677C47089
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:10 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E5182613DD
	for <linux-xfs@archiver.kernel.org>; Thu, 27 May 2021 04:52:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229579AbhE0Exl (ORCPT <rfc822;linux-xfs@archiver.kernel.org>);
        Thu, 27 May 2021 00:53:41 -0400
Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:48782 "EHLO
        mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S232616AbhE0Exk (ORCPT
        <rfc822;linux-xfs@vger.kernel.org>); Thu, 27 May 2021 00:53:40 -0400
Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au
 [49.180.230.185])
        by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 44531862700
        for <linux-xfs@vger.kernel.org>;
 Thu, 27 May 2021 14:52:06 +1000 (AEST)
Received: from discord.disaster.area ([192.168.253.110])
        by dread.disaster.area with esmtp (Exim 4.92.3)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-005h1J-RE
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
Received: from dave by discord.disaster.area with local (Exim 4.94)
        (envelope-from <david@fromorbit.com>)
        id 1lm804-004qge-JX
        for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
Date: Thu, 27 May 2021 14:52:02 +1000
Message-Id: <20210527045202.1155628-7-david@fromorbit.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
References: <20210527045202.1155628-1-david@fromorbit.com>
MIME-Version: 1.0
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0
        a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17
        a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=7un6xMnLzUUjxZIj67YA:9
Precedence: bulk
List-ID: <linux-xfs.vger.kernel.org>
X-Mailing-List: linux-xfs@vger.kernel.org

From: Dave Chinner <dchinner@redhat.com>

Ever since we moved to deferred freeing of extents, we only every
free one extent per transaction. We separated the bulk unmapping of
extents from the submission of EFI/free/EFD transactions, and hence
while we unmap extents in bulk, we only every free one per
transaction.

Our transaction reservations still live in the era from before
deferred freeing of extents, so still refer to "xfs_bmap_finish"
and it needing to free multiple extents per transaction. These
freeing reservations can now all be reduced to a single extent to
reflect how we currently free extents.

This significantly reduces the reservation sizes for operations like
truncate and directory operations where they currently reserve space
for freeing up to 4 extents per transaction.

For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
reservation sizes change like this:

Reservation		Before			After
(index)			logres	logcount	logres	logcount
 0	write		314104	    8		314104	    8
 1	itruncate	579456	    8           148608	    8
 2	rename		435840	    2           307936	    2
 3	link		191600	    2           191600	    2
 4	remove		312960	    2           174328	    2
 5	symlink		470656	    3           470656	    3
 6	create		469504	    2           469504	    2
 7	create_tmpfile	490240	    2           490240	    2
 8	mkdir		469504	    3           469504	    3
 9	ifree		508664	    2           508664	    2
 10	ichange		  5752	    0             5752	    0
 11	growdata	147840	    2           147840	    2
 12	addafork	178936	    2           178936	    2
 13	writeid		   760	    0              760	    0
 14	attrinval	578688	    1           147840	    1
 15	attrsetm	 26872	    3            26872	    3
 16	attrsetrt	 16896	    0            16896	    0
 17	attrrm		292224	    3           148608	    3
 18	clearagi	  4224	    0             4224	    0
 19	growrtalloc	173944	    2           173944	    2
 20	growrtzero	  4224	    0             4224	    0
 21	growrtfree	 10096	    0            10096	    0
 22	qm_setqlim	   232	    1              232	    1
 23	qm_dqalloc	318327	    8           318327	    8
 24	qm_quotaoff	  4544	    1             4544	    1
 25	qm_equotaoff	   320	    1              320	    1
 26	sb		  4224	    1             4224	    1
 27	fsyncts		   760	    0              760	    0
MAX			579456	    8           318327	    8

So we can see that many of the reservations have gone substantially
down in size. itruncate, rename, remove, attrinval and attrrm are
much smaller now. The maximum reservation size has gone from being
attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
substantial improvement for common operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 63 +++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 02079f55ef20..f5e76eeae281 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -232,8 +232,7 @@ xfs_rtalloc_log_count(
  * Various log reservation values.
  *
  * These are based on the size of the file system block because that is what
- * most transactions manipulate.  Each adds in an additional 128 bytes per
- * item logged to try to account for the overhead of the transaction mechanism.
+ * most transactions manipulate.
  *
  * Note:  Most of the reservations underestimate the number of allocation
  * groups into which they could free extents in the xfs_defer_finish() call.
@@ -262,9 +261,9 @@ xfs_rtalloc_log_count(
  *    the superblock free block counter: sector size
  *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 1 block
- * And the bmap_finish transaction can free bmap blocks in a join (t3):
+ * And the deferred freeing can free bmap blocks in a join (t3):
  *    the super block free block counter: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_write_reservation(
@@ -290,23 +289,25 @@ xfs_calc_write_reservation(
 	}
 
 	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	     xfs_allocfree_extent_res(mp) * 2;
+	     xfs_allocfree_extent_res(mp);
 
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
 
 /*
- * In truncating a file we free up to two extents at once.  We can modify (t1):
+ * In truncating a file we defer freeing so we only free one extent per
+ * transaction for normal files. For rt files we limit to 2 extents per
+ * transaction.
+ * We can modify (t1):
  *    the inode being truncated: inode size
  *    the inode's bmap btree: (max depth + 1) * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
- *    the super block to reflect the freed blocks: sector size
- *    four extent allocfree reservations for the AG.
- * Or, if it's a realtime file (t3):
+ * Or, if it's a realtime file (t2):
  *    the super block to reflect the freed blocks: sector size
  *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 2 exts * 1 block
- *    two extent allocfree reservations for the AG.
+ * And the deferred freeing can free the blocks and bmap blocks (t3):
+ *    the super block to reflect the freed blocks: sector size
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_itruncate_reservation(
@@ -318,17 +319,16 @@ xfs_calc_itruncate_reservation(
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
 
-	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	     xfs_allocfree_extent_res(mp) * 4;
-
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
-		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
-		     xfs_allocfree_extent_res(mp) * 2;
+		t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz);
 	} else {
-		t3 = 0;
+		t2 = 0;
 	}
 
+	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp);
+
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
 
@@ -337,10 +337,9 @@ xfs_calc_itruncate_reservation(
  *    the four inodes involved: 4 * inode size
  *    the two directory btrees: 2 * (max depth + v2) * dir block size
  *    the two directory bmap btrees: 2 * max depth * block size
- * And the bmap_finish transaction can free dir and bmap blocks (two sets
- *	of bmap blocks) giving:
+ * And the deferred freeing can free dir and bmap blocks giving:
  *    the superblock for the free block count: sector size
- *    three extent allocfree reservations for the AG.
+ *    one extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_rename_reservation(
@@ -351,7 +350,7 @@ xfs_calc_rename_reservation(
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 3));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -375,7 +374,7 @@ xfs_calc_iunlink_remove_reservation(
  *    the linked inode: inode size
  *    the directory btree could split: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free some bmap blocks giving:
+ * And the deferred freeing can free bmap blocks giving:
  *    the superblock for the free block count: sector size
  *    one extent allocfree reservation for the AG.
  */
@@ -411,9 +410,9 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
  *    the removed inode: inode size
  *    the directory btree could join: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free the dir and bmap blocks giving:
+ * And the deferred freeing can free the dir and bmap blocks giving:
  *    the superblock for the free block count: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_remove_reservation(
@@ -425,7 +424,7 @@ xfs_calc_remove_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 2));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -670,9 +669,9 @@ xfs_calc_addafork_reservation(
  * Removing the attribute fork of a file
  *    the inode being truncated: inode size
  *    the inode's bmap btree: max depth * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks:
+ * And the deferred freeing can free the blocks and bmap blocks:
  *    the super block to reflect the freed blocks: sector size
- *    four extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_attrinval_reservation(
@@ -682,7 +681,7 @@ xfs_calc_attrinval_reservation(
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
 		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		    xfs_allocfree_extent_res(mp) * 4));
+		    xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -730,9 +729,9 @@ xfs_calc_attrsetrt_reservation(
  *    the inode: inode size
  *    the attribute btree could join: max depth * block size
  *    the inode bmap btree could join or split: max depth * block size
- * And the bmap_finish transaction can free the attr blocks freed giving:
+ * And the deferred freeing can free the attr blocks freed giving:
  *    the superblock for the free block count: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrrm_reservation(
@@ -746,7 +745,7 @@ xfs_calc_attrrm_reservation(
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 2));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*