From patchwork Wed May 22 03:17:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13670328 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6CEE282E5 for ; Wed, 22 May 2024 03:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716347850; cv=none; b=IOqPMTiieJP80tJ8GvHSmu0/OthnIYl41OH3G2ujsyA7LONtddK7EQwoMdh46IesvP54KBAPre51ITvxDWasAFYP3iW6Eg6irQu3Mo7Q+ok/N7rIHuvgC/RQrZIU8aY34qxGbN8EydNokOPEcGGssQJx4KKQjvBBa83mHECiV4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716347850; c=relaxed/simple; bh=nLZ+k+ZeOqIxQEgjz3IEoxQAwMvntLpCHtRoWSA3kmY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=X/92nekp9CWdi7QDrAJlr/U4f4fmo/5P2feYzCaolrSEoP8dLoFAShxhpDRIFKBEME3SujFyZDPYpMjUSHyazwzgO3GdR/uUhCQ0P+seiu/Hm/nnUu1+ogLRpf2ajo5XmhbTG02hhC4YnCMR4ZNWg/xVpKnPRlPbWFuxYL2XhEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o9v7Eqg1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o9v7Eqg1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76ED7C2BD11; Wed, 22 May 2024 03:17:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716347848; bh=nLZ+k+ZeOqIxQEgjz3IEoxQAwMvntLpCHtRoWSA3kmY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=o9v7Eqg1IgjfBpEyQ+fs2zjsoEiMgiABPdQ7nt870nqUFujeDCUBta8klEoEb20Y3 UwsUIWSQpZ5GuToQJB4rP+Y/aHsa1cVmLD7Ws7idZZ3h/Jac/0ovS1o7NKlH2c3eqX 2Ksdiusly9tyD7Vya90Vusc7AHxb03Y6HEAeosYD70GfviROVLhpkidmyE2b1mu5ck n+0C2ScVboyhR21D0XGuya+3BjaxV0OYSeyW5c0suOnxamY95nIp2DKLKAHa68GzTO n9nY6M7H9WYt13SHEFthMx9I+uIh5hCLO5HAsETBTe5C7mNhLgjUX9rqH9VifR6rSl UnAMav+UbFMxw== Date: Tue, 21 May 2024 20:17:27 -0700 Subject: [PATCH 110/111] xfs: shrink failure needs to hold AGI buffer From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: Chandan Babu R , Dave Chinner , Gao Xiang , Christoph Hellwig , Chandan Babu R , linux-xfs@vger.kernel.org Message-ID: <171634533353.2478931.3296707388319648696.stgit@frogsfrogsfrogs> In-Reply-To: <171634531590.2478931.8474978645585392776.stgit@frogsfrogsfrogs> References: <171634531590.2478931.8474978645585392776.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner Source kernel commit: 75bcffbb9e7563259b7aed0fa77459d6a3a35627 Chandan reported a AGI/AGF lock order hang on xfs/168 during recent testing. The cause of the problem was the task running xfs_growfs to shrink the filesystem. A failure occurred trying to remove the free space from the btrees that the shrink would make disappear, and that meant it ran the error handling for a partial failure. This error path involves restoring the per-ag block reservations, and that requires calculating the amount of space needed to be reserved for the free inode btree. The growfs operation hung here: [18679.536829] down+0x71/0xa0 [18679.537657] xfs_buf_lock+0xa4/0x290 [xfs] [18679.538731] xfs_buf_find_lock+0xf7/0x4d0 [xfs] [18679.539920] xfs_buf_lookup.constprop.0+0x289/0x500 [xfs] [18679.542628] xfs_buf_get_map+0x2b3/0xe40 [xfs] [18679.547076] xfs_buf_read_map+0xbb/0x900 [xfs] [18679.562616] xfs_trans_read_buf_map+0x449/0xb10 [xfs] [18679.569778] xfs_read_agi+0x1cd/0x500 [xfs] [18679.573126] xfs_ialloc_read_agi+0xc2/0x5b0 [xfs] [18679.578708] xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs] [18679.582480] xfs_ag_resv_init+0x2c5/0x490 [xfs] [18679.586023] xfs_ag_shrink_space+0x736/0xd30 [xfs] [18679.590730] xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs] [18679.599764] xfs_growfs_data+0x2f1/0x410 [xfs] [18679.602212] xfs_file_ioctl+0xd1e/0x1370 [xfs] trying to get the AGI lock. The AGI lock was held by a fstress task trying to do an inode allocation, and it was waiting on the AGF lock to allocate a new inode chunk on disk. Hence deadlock. The fix for this is for the growfs code to hold the AGI over the transaction roll it does in the error path. It already holds the AGF locked across this, and that is what causes the lock order inversion in the xfs_ag_resv_init() call. Reported-by: Chandan Babu R Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()") Signed-off-by: Dave Chinner Reviewed-by: Gao Xiang Reviewed-by: Christoph Hellwig Signed-off-by: Chandan Babu R --- libxfs/xfs_ag.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c index e2fc3e882..a9aae0990 100644 --- a/libxfs/xfs_ag.c +++ b/libxfs/xfs_ag.c @@ -973,14 +973,23 @@ xfs_ag_shrink_space( if (error) { /* - * if extent allocation fails, need to roll the transaction to + * If extent allocation fails, need to roll the transaction to * ensure that the AGFL fixup has been committed anyway. + * + * We need to hold the AGF across the roll to ensure nothing can + * access the AG for allocation until the shrink is fully + * cleaned up. And due to the resetting of the AG block + * reservation space needing to lock the AGI, we also have to + * hold that so we don't get AGI/AGF lock order inversions in + * the error handling path. */ xfs_trans_bhold(*tpp, agfbp); + xfs_trans_bhold(*tpp, agibp); err2 = xfs_trans_roll(tpp); if (err2) return err2; xfs_trans_bjoin(*tpp, agfbp); + xfs_trans_bjoin(*tpp, agibp); goto resv_init_out; }