From patchwork Thu May 27 04:51:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20A2DC4708B for ; Thu, 27 May 2021 04:52:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0277F613CC for ; Thu, 27 May 2021 04:52:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232616AbhE0Exm (ORCPT ); Thu, 27 May 2021 00:53:42 -0400 Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:35272 "EHLO mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234365AbhE0Exk (ORCPT ); Thu, 27 May 2021 00:53:40 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 3E3DE11411D4 for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h19-Ne for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qgP-EA for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 1/6] xfs: btree format inode forks can have zero extents Date: Thu, 27 May 2021 14:51:57 +1000 Message-Id: <20210527045202.1155628-2-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=RfcaFHgcWMp4lnFpylIA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner xfs/538 is assert failing with this trace when testing with directory block sizes of 64kB: XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608 .... Call Trace: xfs_bmap_btree_to_extents+0x2a9/0x470 ? kmem_cache_alloc+0xe7/0x220 __xfs_bunmapi+0x4ca/0xdf0 xfs_bunmapi+0x1a/0x30 xfs_dir2_shrink_inode+0x71/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_remove+0x244/0x460 xfs_vn_unlink+0x53/0xa0 ? selinux_inode_unlink+0x13/0x20 vfs_unlink+0x117/0x220 do_unlinkat+0x1a2/0x2d0 __x64_sys_unlink+0x42/0x60 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This is a check to ensure that the extents have been read into memory before we are doing a ifork btree manipulation. This assert is bogus in the above case. We have a fragmented directory block that has more extents in it than can fit in extent format, so the inode data fork is in btree format. xfs_dir2_shrink_inode() asks to remove all remaining 16 filesystem blocks from the inode so it can convert to short form, and __xfs_bunmapi() removes all the extents. We now have a data fork in btree format but have zero extents in the fork. This incorrectly trips the xfs_need_iread_extents() assert because it assumes that an empty extent btree means the extent tree has not been read into memory yet. This is clearly not the case with xfs_bunmapi(), as it has an explicit call to xfs_iread_extents() in it to pull the extents into memory before it starts unmapping. Also, the assert directly after this bogus one is: ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); Which covers the context in which it is legal to call xfs_bmap_btree_to_extents just fine. Hence we should just remove the bogus assert as it is clearly wrong and causes a regression. The returns the test behaviour to the pre-existing assert failure in xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to remove all the extents in the range it was asked to unmap. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7e3b9b01431e..3f8b6da09261 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -605,7 +605,6 @@ xfs_bmap_btree_to_extents( ASSERT(cur); ASSERT(whichfork != XFS_COW_FORK); - ASSERT(!xfs_need_iread_extents(ifp)); ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); ASSERT(be16_to_cpu(rblock->bb_level) == 1); ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1); From patchwork Thu May 27 04:51:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F17EFC4708C for ; Thu, 27 May 2021 04:52:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE90D61073 for ; Thu, 27 May 2021 04:52:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234387AbhE0Exn (ORCPT ); Thu, 27 May 2021 00:53:43 -0400 Received: from mail108.syd.optusnet.com.au ([211.29.132.59]:51001 "EHLO mail108.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234363AbhE0Exl (ORCPT ); Thu, 27 May 2021 00:53:41 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 3D38B1AFF11 for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h17-Nd for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qgR-F0 for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Date: Thu, 27 May 2021 14:51:58 +1000 Message-Id: <20210527045202.1155628-3-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=VwQbUJbxAAAA:8 a=pGLkceISAAAA:8 a=jemw3zrVzxxEioCUW9kA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner large directory block size operations are assert failing because xfs_bunmapi() is not completely removing fragmented directory blocks like so: XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677 .... Call Trace: xfs_dir2_shrink_inode+0x1a8/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_rename+0xb79/0xc50 ? avc_has_perm+0x8d/0x1a0 ? avc_has_perm_noaudit+0x9a/0x120 xfs_vn_rename+0xdb/0x150 vfs_rename+0x719/0xb50 ? __lookup_hash+0x6a/0xa0 do_renameat2+0x413/0x5e0 __x64_sys_rename+0x45/0x50 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae We are aborting the bunmapi() pass because of this specific chunk of code: /* * Make sure we don't touch multiple AGF headers out of order * in a single transaction, as that could cause AB-BA deadlocks. */ if (!wasdel && !isrt) { agno = XFS_FSB_TO_AGNO(mp, del.br_startblock); if (prev_agno != NULLAGNUMBER && prev_agno > agno) break; prev_agno = agno; } This is designed to prevent deadlocks in AGF locking when freeing multiple extents by ensuring that we only ever lock in increasing AG number order. Unfortunately, this also violates the "bunmapi will always succeed" semantic that some high level callers depend on, such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and xfs_inactive_symlink_rmt(). This AG lock ordering was introduced back in 2017 to fix deadlocks triggered by generic/299 as reported here: https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/ This codebase is old enough that it was before we were defering all AG based extent freeing from within xfs_bunmapi(). THat is, we never actually lock AGs in xfs_bunmapi() any more - every non-rt based extent free is added to the defer ops list, as is all BMBT block freeing. And RT extents are not RT based, so there's no lock ordering issues associated with them. Hence this AGF lock ordering code is both broken and dead. Let's just remove it so that the large directory block code works reliably again. Tested against xfs/538 and generic/299 which is the original test that exposed the deadlocks that this code fixed. Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 3f8b6da09261..a3e0e6f672d6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5349,7 +5349,6 @@ __xfs_bunmapi( xfs_fsblock_t sum; xfs_filblks_t len = *rlen; /* length to unmap in file */ xfs_fileoff_t max_len; - xfs_agnumber_t prev_agno = NULLAGNUMBER, agno; xfs_fileoff_t end; struct xfs_iext_cursor icur; bool done = false; @@ -5441,16 +5440,6 @@ __xfs_bunmapi( del = got; wasdel = isnullstartblock(del.br_startblock); - /* - * Make sure we don't touch multiple AGF headers out of order - * in a single transaction, as that could cause AB-BA deadlocks. - */ - if (!wasdel && !isrt) { - agno = XFS_FSB_TO_AGNO(mp, del.br_startblock); - if (prev_agno != NULLAGNUMBER && prev_agno > agno) - break; - prev_agno = agno; - } if (got.br_startoff < start) { del.br_startoff = start; del.br_blockcount -= start - got.br_startoff; From patchwork Thu May 27 04:51:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DC18C4708E for ; Thu, 27 May 2021 04:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4300061073 for ; Thu, 27 May 2021 04:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234363AbhE0Exn (ORCPT ); Thu, 27 May 2021 00:53:43 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:47000 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234405AbhE0Exl (ORCPT ); Thu, 27 May 2021 00:53:41 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 292351043897 for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h1A-O9 for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qgT-GT for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Date: Thu, 27 May 2021 14:51:59 +1000 Message-Id: <20210527045202.1155628-4-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=m78L4NlQDRkRyoOWfzwA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Ever since we moved to freeing of extents by deferred operations, we've already freed extents via individual transactions. Hence the only limitation of how many extents we can mark for freeing in a single xfs_bunmapi() call bound only by how many deferrals we want to queue. That is xfs_bunmapi() doesn't actually do any AG based extent freeing, so there's no actually transaction reservation used up by calling bunmapi with a large count of extents to be freed. RT extents have always been freed directly by bunmapi, but that doesn't require modification of large number of blocks as there are no btrees to split. Some callers of xfs_bunmapi assume that the extent count being freed is bound by geometry (e.g. directories) and these can ask bunmapi to free up to 64 extents in a single call. These functions just work as tehy stand, so there's no reason for truncate to have a limit of just two extents per bunmapi call anymore. Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents that can be deferred in a single loop to match what the directory code already uses. For realtime inodes, where xfs_bunmapi() directly frees extents, leave the limit at 2 extents per loop as this is all the space that the transaction reservation will cover. Signed-off-by: Dave Chinner --- fs/xfs/xfs_inode.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 0369eb22c1bb..db220eaa34b8 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone; /* * Used in xfs_itruncate_extents(). This is the maximum number of extents - * freed from a file in a single transaction. + * we will unmap and defer for freeing in a single call to xfs_bunmapi(). + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound + * by transaction reservation size to 2 extents. */ -#define XFS_ITRUNC_MAX_EXTENTS 2 +static inline int +xfs_itrunc_max_extents( + struct xfs_inode *ip) +{ + if (XFS_IS_REALTIME_INODE(ip)) + return 2; + return 64; +} STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *); STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *); @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags( while (unmap_len > 0) { ASSERT(tp->t_firstblock == NULLFSBLOCK); error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len, - flags, XFS_ITRUNC_MAX_EXTENTS); + flags, xfs_itrunc_max_extents(ip)); if (error) goto out; From patchwork Thu May 27 04:52:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83357C4708F for ; Thu, 27 May 2021 04:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68DD161073 for ; Thu, 27 May 2021 04:52:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234365AbhE0Exo (ORCPT ); Thu, 27 May 2021 00:53:44 -0400 Received: from mail110.syd.optusnet.com.au ([211.29.132.97]:52463 "EHLO mail110.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234414AbhE0Exl (ORCPT ); Thu, 27 May 2021 00:53:41 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 3E1A0105F88 for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h1F-P7 for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qgY-HO for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 4/6] xfs: add a free space extent change reservation Date: Thu, 27 May 2021 14:52:00 +1000 Message-Id: <20210527045202.1155628-5-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Tu+Yewfh c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=rZi3jfzWYBuMxqL7aZQA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Lots of the transaction reservation code reserves space for an extent allocation. It is inconsistently implemented, and many of them get it wrong. Introduce a new function to calculate the log space reservation for adding or removing an extent from the free space btrees. This function reserves space for logging the AGF, the AGFL and the free space btrees, avoiding the need to account for them seperately in every reservation that manipulates free space. Signed-off-by: Dave Chinner Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- fs/xfs/libxfs/xfs_trans_resv.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index d1a0848cb52e..6363cacb790f 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -79,6 +79,23 @@ xfs_allocfree_log_count( return blocks; } +/* + * Log reservation required to add or remove a single extent to the free space + * btrees. This requires modifying: + * + * the agf header: 1 sector + * the agfl header: 1 sector + * the allocation btrees: 2 trees * (max depth - 1) * block size + */ +uint +xfs_allocfree_extent_res( + struct xfs_mount *mp) +{ + return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) + + xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), + XFS_FSB_TO_B(mp, 1)); +} + /* * Logging inodes is really tricksy. They are logged in memory format, * which means that what we write into the log doesn't directly translate into From patchwork Thu May 27 04:52:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283327 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF105C4707F for ; Thu, 27 May 2021 04:52:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CC50C613CC for ; Thu, 27 May 2021 04:52:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234416AbhE0Exk (ORCPT ); Thu, 27 May 2021 00:53:40 -0400 Received: from mail109.syd.optusnet.com.au ([211.29.132.80]:33889 "EHLO mail109.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229579AbhE0Exj (ORCPT ); Thu, 27 May 2021 00:53:39 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 2985667CED for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h1H-Q9 for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qgb-IL for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 5/6] xfs: factor free space tree transaciton reservations Date: Thu, 27 May 2021 14:52:01 +1000 Message-Id: <20210527045202.1155628-6-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=nuMuv_KYr93gYNIyU0IA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Convert all the open coded free space tree modification reservations to use the new xfs_allocfree_extent_res() function. Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_trans_resv.c | 122 ++++++++++++--------------------- 1 file changed, 45 insertions(+), 77 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 6363cacb790f..02079f55ef20 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -143,18 +143,16 @@ xfs_calc_inode_res( * reservation: * * the inode btree: max depth * blocksize - * the allocation btrees: 2 trees * (max depth - 1) * block size + * one extent allocfree reservation for the AG. * - * The caller must account for SB and AG header modifications, etc. */ STATIC uint xfs_calc_inobt_res( struct xfs_mount *mp) { return xfs_calc_buf_res(M_IGEO(mp)->inobt_maxlevels, - XFS_FSB_TO_B(mp, 1)) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)); + XFS_FSB_TO_B(mp, 1)) + + xfs_allocfree_extent_res(mp); } /* @@ -182,7 +180,7 @@ xfs_calc_finobt_res( * Calculate the reservation required to allocate or free an inode chunk. This * includes: * - * the allocation btrees: 2 trees * (max depth - 1) * block size + * one extent allocfree reservation for the AG. * the inode chunk: m_ino_geo.ialloc_blks * N * * The size N of the inode chunk reservation depends on whether it is for @@ -200,8 +198,7 @@ xfs_calc_inode_chunk_res( { uint res, size = 0; - res = xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)); + res = xfs_allocfree_extent_res(mp); if (alloc) { /* icreate tx uses ordered buffers */ if (xfs_sb_version_has_v3inode(&mp->m_sb)) @@ -256,22 +253,18 @@ xfs_rtalloc_log_count( * extents. This gives (t1): * the inode getting the new extents: inode size * the inode's bmap btree: max depth * block size - * the agfs of the ags from which the extents are allocated: 2 * sector * the superblock free block counter: sector size - * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size + * two extent allocfree reservations for the AG. * Or, if we're writing to a realtime file (t2): * the inode getting the new extents: inode size * the inode's bmap btree: max depth * block size - * the agfs of the ags from which the extents are allocated: 2 * sector + * one extent allocfree reservation for the AG. * the superblock free block counter: sector size * the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 1 block - * the allocation btrees: 2 trees * (2 * max depth - 1) * block size * And the bmap_finish transaction can free bmap blocks in a join (t3): - * the agfs of the ags containing the blocks: 2 * sector size - * the agfls of the ags containing the blocks: 2 * sector size * the super block free block counter: sector size - * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size + * two extent allocfree reservations for the AG. */ STATIC uint xfs_calc_write_reservation( @@ -282,8 +275,8 @@ xfs_calc_write_reservation( t1 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) + - xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz); + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 2; if (xfs_sb_version_hasrealtime(&mp->m_sb)) { t2 = xfs_calc_inode_res(mp, 1) + @@ -291,13 +284,13 @@ xfs_calc_write_reservation( blksz) + xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 1), blksz) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), blksz); + xfs_allocfree_extent_res(mp); } else { t2 = 0; } - t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz); + t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 2; return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3); } @@ -307,19 +300,13 @@ xfs_calc_write_reservation( * the inode being truncated: inode size * the inode's bmap btree: (max depth + 1) * block size * And the bmap_finish transaction can free the blocks and bmap blocks (t2): - * the agf for each of the ags: 4 * sector size - * the agfl for each of the ags: 4 * sector size * the super block to reflect the freed blocks: sector size - * worst case split in allocation btrees per extent assuming 4 extents: - * 4 exts * 2 trees * (2 * max depth - 1) * block size + * four extent allocfree reservations for the AG. * Or, if it's a realtime file (t3): - * the agf for each of the ags: 2 * sector size - * the agfl for each of the ags: 2 * sector size * the super block to reflect the freed blocks: sector size * the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 2 exts * 1 block - * worst case split in allocation btrees per extent assuming 2 extents: - * 2 exts * 2 trees * (2 * max depth - 1) * block size + * two extent allocfree reservations for the AG. */ STATIC uint xfs_calc_itruncate_reservation( @@ -331,13 +318,13 @@ xfs_calc_itruncate_reservation( t1 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz); - t2 = xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4), blksz); + t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 4; if (xfs_sb_version_hasrealtime(&mp->m_sb)) { - t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + + t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz); + xfs_allocfree_extent_res(mp) * 2; } else { t3 = 0; } @@ -352,10 +339,8 @@ xfs_calc_itruncate_reservation( * the two directory bmap btrees: 2 * max depth * block size * And the bmap_finish transaction can free dir and bmap blocks (two sets * of bmap blocks) giving: - * the agf for the ags in which the blocks live: 3 * sector size - * the agfl for the ags in which the blocks live: 3 * sector size * the superblock for the free block count: sector size - * the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size + * three extent allocfree reservations for the AG. */ STATIC uint xfs_calc_rename_reservation( @@ -365,9 +350,8 @@ xfs_calc_rename_reservation( max((xfs_calc_inode_res(mp, 4) + xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), - (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3), - XFS_FSB_TO_B(mp, 1)))); + (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 3)); } /* @@ -381,20 +365,19 @@ xfs_calc_iunlink_remove_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - 2 * M_IGEO(mp)->inode_cluster_size; + xfs_calc_buf_res(2, M_IGEO(mp)->inode_cluster_size); } /* * For creating a link to an inode: + * the inode is removed from the iunlink list (O_TMPFILE) * the parent directory inode: inode size * the linked inode: inode size * the directory btree could split: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize * And the bmap_finish transaction can free some bmap blocks giving: - * the agf for the ag in which the blocks live: sector size - * the agfl for the ag in which the blocks live: sector size * the superblock for the free block count: sector size - * the allocation btrees: 2 trees * (2 * max depth - 1) * block size + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_link_reservation( @@ -405,9 +388,8 @@ xfs_calc_link_reservation( max((xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), - (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)))); + (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp))); } /* @@ -419,20 +401,19 @@ STATIC uint xfs_calc_iunlink_add_reservation(xfs_mount_t *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - M_IGEO(mp)->inode_cluster_size; + xfs_calc_buf_res(1, M_IGEO(mp)->inode_cluster_size); } /* * For removing a directory entry we can modify: + * the inode is added to the agi unlinked list * the parent directory inode: inode size * the removed inode: inode size * the directory btree could join: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize * And the bmap_finish transaction can free the dir and bmap blocks giving: - * the agf for the ag in which the blocks live: 2 * sector size - * the agfl for the ag in which the blocks live: 2 * sector size * the superblock for the free block count: sector size - * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size + * two extent allocfree reservations for the AG. */ STATIC uint xfs_calc_remove_reservation( @@ -443,9 +424,8 @@ xfs_calc_remove_reservation( max((xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), - (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), - XFS_FSB_TO_B(mp, 1)))); + (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 2)); } /* @@ -581,16 +561,14 @@ xfs_calc_ichange_reservation( /* * Growing the data section of the filesystem. * superblock - * agi and agf - * allocation btrees + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_growdata_reservation( struct xfs_mount *mp) { - return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)); + return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp); } /* @@ -598,10 +576,9 @@ xfs_calc_growdata_reservation( * In the first set of transactions (ALLOC) we allocate space to the * bitmap or summary files. * superblock: sector size - * agf of the ag from which the extent is allocated: sector size * bmap btree for bitmap/summary inode: max depth * blocksize * bitmap/summary inode: inode size - * allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_growrtalloc_reservation( @@ -611,8 +588,7 @@ xfs_calc_growrtalloc_reservation( xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), XFS_FSB_TO_B(mp, 1)) + xfs_calc_inode_res(mp, 1) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)); + xfs_allocfree_extent_res(mp); } /* @@ -675,7 +651,7 @@ xfs_calc_writeid_reservation( * agf block and superblock (for block allocation) * the new block (directory sized) * bmap blocks for the new directory block - * allocation btrees + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_addafork_reservation( @@ -687,8 +663,7 @@ xfs_calc_addafork_reservation( xfs_calc_buf_res(1, mp->m_dir_geo->blksize) + xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1, XFS_FSB_TO_B(mp, 1)) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), - XFS_FSB_TO_B(mp, 1)); + xfs_allocfree_extent_res(mp); } /* @@ -696,11 +671,8 @@ xfs_calc_addafork_reservation( * the inode being truncated: inode size * the inode's bmap btree: max depth * block size * And the bmap_finish transaction can free the blocks and bmap blocks: - * the agf for each of the ags: 4 * sector size - * the agfl for each of the ags: 4 * sector size * the super block to reflect the freed blocks: sector size - * worst case split in allocation btrees per extent assuming 4 extents: - * 4 exts * 2 trees * (2 * max depth - 1) * block size + * four extent allocfree reservations for the AG. */ STATIC uint xfs_calc_attrinval_reservation( @@ -709,9 +681,8 @@ xfs_calc_attrinval_reservation( return max((xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK), XFS_FSB_TO_B(mp, 1))), - (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4), - XFS_FSB_TO_B(mp, 1)))); + (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 4)); } /* @@ -760,10 +731,8 @@ xfs_calc_attrsetrt_reservation( * the attribute btree could join: max depth * block size * the inode bmap btree could join or split: max depth * block size * And the bmap_finish transaction can free the attr blocks freed giving: - * the agf for the ag in which the blocks live: 2 * sector size - * the agfl for the ag in which the blocks live: 2 * sector size * the superblock for the free block count: sector size - * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size + * two extent allocfree reservations for the AG. */ STATIC uint xfs_calc_attrrm_reservation( @@ -776,9 +745,8 @@ xfs_calc_attrrm_reservation( (uint)XFS_FSB_TO_B(mp, XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)), - (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), - XFS_FSB_TO_B(mp, 1)))); + (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp) * 2)); } /* From patchwork Thu May 27 04:52:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12283325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13677C47089 for ; Thu, 27 May 2021 04:52:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5182613DD for ; Thu, 27 May 2021 04:52:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229579AbhE0Exl (ORCPT ); Thu, 27 May 2021 00:53:41 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:48782 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232616AbhE0Exk (ORCPT ); Thu, 27 May 2021 00:53:40 -0400 Received: from dread.disaster.area (pa49-180-230-185.pa.nsw.optusnet.com.au [49.180.230.185]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 44531862700 for ; Thu, 27 May 2021 14:52:06 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1lm804-005h1J-RE for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1lm804-004qge-JX for linux-xfs@vger.kernel.org; Thu, 27 May 2021 14:52:04 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Date: Thu, 27 May 2021 14:52:02 +1000 Message-Id: <20210527045202.1155628-7-david@fromorbit.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com> References: <20210527045202.1155628-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=F8MpiZpN c=1 sm=1 tr=0 a=dUIOjvib2kB+GiIc1vUx8g==:117 a=dUIOjvib2kB+GiIc1vUx8g==:17 a=5FLXtPjwQuUA:10 a=20KFwNOVAAAA:8 a=7un6xMnLzUUjxZIj67YA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Ever since we moved to deferred freeing of extents, we only every free one extent per transaction. We separated the bulk unmapping of extents from the submission of EFI/free/EFD transactions, and hence while we unmap extents in bulk, we only every free one per transaction. Our transaction reservations still live in the era from before deferred freeing of extents, so still refer to "xfs_bmap_finish" and it needing to free multiple extents per transaction. These freeing reservations can now all be reduced to a single extent to reflect how we currently free extents. This significantly reduces the reservation sizes for operations like truncate and directory operations where they currently reserve space for freeing up to 4 extents per transaction. For a 4kB block size filesytsem with reflink=1,rmapbt=1, the reservation sizes change like this: Reservation Before After (index) logres logcount logres logcount 0 write 314104 8 314104 8 1 itruncate 579456 8 148608 8 2 rename 435840 2 307936 2 3 link 191600 2 191600 2 4 remove 312960 2 174328 2 5 symlink 470656 3 470656 3 6 create 469504 2 469504 2 7 create_tmpfile 490240 2 490240 2 8 mkdir 469504 3 469504 3 9 ifree 508664 2 508664 2 10 ichange 5752 0 5752 0 11 growdata 147840 2 147840 2 12 addafork 178936 2 178936 2 13 writeid 760 0 760 0 14 attrinval 578688 1 147840 1 15 attrsetm 26872 3 26872 3 16 attrsetrt 16896 0 16896 0 17 attrrm 292224 3 148608 3 18 clearagi 4224 0 4224 0 19 growrtalloc 173944 2 173944 2 20 growrtzero 4224 0 4224 0 21 growrtfree 10096 0 10096 0 22 qm_setqlim 232 1 232 1 23 qm_dqalloc 318327 8 318327 8 24 qm_quotaoff 4544 1 4544 1 25 qm_equotaoff 320 1 320 1 26 sb 4224 1 4224 1 27 fsyncts 760 0 760 0 MAX 579456 8 318327 8 So we can see that many of the reservations have gone substantially down in size. itruncate, rename, remove, attrinval and attrrm are much smaller now. The maximum reservation size has gone from being attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a substantial improvement for common operations. Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_trans_resv.c | 63 +++++++++++++++++----------------- 1 file changed, 31 insertions(+), 32 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 02079f55ef20..f5e76eeae281 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -232,8 +232,7 @@ xfs_rtalloc_log_count( * Various log reservation values. * * These are based on the size of the file system block because that is what - * most transactions manipulate. Each adds in an additional 128 bytes per - * item logged to try to account for the overhead of the transaction mechanism. + * most transactions manipulate. * * Note: Most of the reservations underestimate the number of allocation * groups into which they could free extents in the xfs_defer_finish() call. @@ -262,9 +261,9 @@ xfs_rtalloc_log_count( * the superblock free block counter: sector size * the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 1 block - * And the bmap_finish transaction can free bmap blocks in a join (t3): + * And the deferred freeing can free bmap blocks in a join (t3): * the super block free block counter: sector size - * two extent allocfree reservations for the AG. + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_write_reservation( @@ -290,23 +289,25 @@ xfs_calc_write_reservation( } t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 2; + xfs_allocfree_extent_res(mp); return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3); } /* - * In truncating a file we free up to two extents at once. We can modify (t1): + * In truncating a file we defer freeing so we only free one extent per + * transaction for normal files. For rt files we limit to 2 extents per + * transaction. + * We can modify (t1): * the inode being truncated: inode size * the inode's bmap btree: (max depth + 1) * block size - * And the bmap_finish transaction can free the blocks and bmap blocks (t2): - * the super block to reflect the freed blocks: sector size - * four extent allocfree reservations for the AG. - * Or, if it's a realtime file (t3): + * Or, if it's a realtime file (t2): * the super block to reflect the freed blocks: sector size * the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 2 exts * 1 block - * two extent allocfree reservations for the AG. + * And the deferred freeing can free the blocks and bmap blocks (t3): + * the super block to reflect the freed blocks: sector size + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_itruncate_reservation( @@ -318,17 +319,16 @@ xfs_calc_itruncate_reservation( t1 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz); - t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 4; - if (xfs_sb_version_hasrealtime(&mp->m_sb)) { - t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) + - xfs_allocfree_extent_res(mp) * 2; + t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz); } else { - t3 = 0; + t2 = 0; } + t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + + xfs_allocfree_extent_res(mp); + return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3); } @@ -337,10 +337,9 @@ xfs_calc_itruncate_reservation( * the four inodes involved: 4 * inode size * the two directory btrees: 2 * (max depth + v2) * dir block size * the two directory bmap btrees: 2 * max depth * block size - * And the bmap_finish transaction can free dir and bmap blocks (two sets - * of bmap blocks) giving: + * And the deferred freeing can free dir and bmap blocks giving: * the superblock for the free block count: sector size - * three extent allocfree reservations for the AG. + * one extent allocfree reservations for the AG. */ STATIC uint xfs_calc_rename_reservation( @@ -351,7 +350,7 @@ xfs_calc_rename_reservation( xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 3)); + xfs_allocfree_extent_res(mp))); } /* @@ -375,7 +374,7 @@ xfs_calc_iunlink_remove_reservation( * the linked inode: inode size * the directory btree could split: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize - * And the bmap_finish transaction can free some bmap blocks giving: + * And the deferred freeing can free bmap blocks giving: * the superblock for the free block count: sector size * one extent allocfree reservation for the AG. */ @@ -411,9 +410,9 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp) * the removed inode: inode size * the directory btree could join: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize - * And the bmap_finish transaction can free the dir and bmap blocks giving: + * And the deferred freeing can free the dir and bmap blocks giving: * the superblock for the free block count: sector size - * two extent allocfree reservations for the AG. + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_remove_reservation( @@ -425,7 +424,7 @@ xfs_calc_remove_reservation( xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 2)); + xfs_allocfree_extent_res(mp))); } /* @@ -670,9 +669,9 @@ xfs_calc_addafork_reservation( * Removing the attribute fork of a file * the inode being truncated: inode size * the inode's bmap btree: max depth * block size - * And the bmap_finish transaction can free the blocks and bmap blocks: + * And the deferred freeing can free the blocks and bmap blocks: * the super block to reflect the freed blocks: sector size - * four extent allocfree reservations for the AG. + * one extent allocfree reservation for the AG. */ STATIC uint xfs_calc_attrinval_reservation( @@ -682,7 +681,7 @@ xfs_calc_attrinval_reservation( xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 4)); + xfs_allocfree_extent_res(mp))); } /* @@ -730,9 +729,9 @@ xfs_calc_attrsetrt_reservation( * the inode: inode size * the attribute btree could join: max depth * block size * the inode bmap btree could join or split: max depth * block size - * And the bmap_finish transaction can free the attr blocks freed giving: + * And the deferred freeing can free the attr blocks freed giving: * the superblock for the free block count: sector size - * two extent allocfree reservations for the AG. + * one extent allocfree reservations for the AG. */ STATIC uint xfs_calc_attrrm_reservation( @@ -746,7 +745,7 @@ xfs_calc_attrrm_reservation( XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)), (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + - xfs_allocfree_extent_res(mp) * 2)); + xfs_allocfree_extent_res(mp))); } /*