From patchwork Thu Nov 16 02:28:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13457546 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B37BC5AE5C for ; Thu, 16 Nov 2023 02:28:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235674AbjKPC2w (ORCPT ); Wed, 15 Nov 2023 21:28:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235630AbjKPC2v (ORCPT ); Wed, 15 Nov 2023 21:28:51 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB7AE19E; Wed, 15 Nov 2023 18:28:48 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1cc0d0a0355so2898465ad.3; Wed, 15 Nov 2023 18:28:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700101728; x=1700706528; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NyDHj0Q9SKvFdot9inww9hKxC7RkooFeASb9HJ0TqmI=; b=jMz14dMxQcQzozrOvIkmIUA2oN1UV6OnpnvFUPyz9z8w9WiLrpSLRsxq/byut6vatF yqxTaHcG9t+rmvt9zBD3pWw81ysrilbsoHACOHA6bQnvjct49QKoqFwZ/eftEXO0sMaf qx236TJ52LPKzYzJgR5GNb+jxvZ0N/KckcX6RzNQSFhkF8G1xe+kEmx4JwWJo8S52V46 xUcydPPij5MW6hQpTKbHsGIlp7Vu6V74meb9evdYFtxVMSFOGDKIc/Y3NLM63fdGWWlo OzmapONDKm9gSrQTpYwTRGltH+0QUfdzBrQjdforaWb+AV3YnsMTyxVFp/2iCSognkMg mJHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700101728; x=1700706528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NyDHj0Q9SKvFdot9inww9hKxC7RkooFeASb9HJ0TqmI=; b=AdHOjGdtn4BjrjPQiEhdzGFWS1Gq9S1bd+uMlcbSTZhD1mwYWn1frXuksxvTdWFLBF gRhwGlckpPtqChuHkflu+pF6Sf0UNqUAPO5hDYXVXeZ+n/BkMYn2vO5p6JPhGmzQdE+M Fqwu9qhOe6fJDlvuRPZ6kYHZiL0YVbYd7Y5sunbPqD1GIgaWoCDtS1qjsrVhwHNDd2G/ QjeY8KzYtOLFFBFQfRHYyKePWpqCULJsyrZjcurVeMq1ErEdEx3jENU8UReHIiRqboPf 5YU5basPSX288tT9YHCfCQPVeiM30oxEXpnMyz747F0v1Yygt4ks0QEHR3QCe8nqRWrD nebw== X-Gm-Message-State: AOJu0YxIQrTkRBkMS+/9Z2bGpdMNnslmOVBpEOHuXuZslQPgyzD2SXOk oTb1gsoBUTfN8E08ZuqTr6ipeSgY3jzlZQ== X-Google-Smtp-Source: AGHT+IEvwZcyqaEdzvO+uOLrzL5s+xcn7bX+omK/N3gvJNd8TtD7OprBw3lkPqM0PyvvYsJAPaFstw== X-Received: by 2002:a17:903:244a:b0:1cc:644a:211b with SMTP id l10-20020a170903244a00b001cc644a211bmr8747715pls.47.1700101728001; Wed, 15 Nov 2023 18:28:48 -0800 (PST) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:25ea:d6bb:623c:d6a0]) by smtp.gmail.com with ESMTPSA id j9-20020a170903024900b001b8b1f6619asm8087072plh.75.2023.11.15.18.28.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 18:28:47 -0800 (PST) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, fred@cloudflare.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 5.15 10/17] xfs: fix intermittent hang during quotacheck Date: Wed, 15 Nov 2023 18:28:26 -0800 Message-ID: <20231116022833.121551-10-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.43.0.rc0.421.g78406f8d94-goog In-Reply-To: <20231116022833.121551-1-leah.rumancik@gmail.com> References: <20231116022833.121551-1-leah.rumancik@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" [ Upstream commit f0c2d7d2abca24d19831c99edea458704fac8087 ] Every now and then, I see the following hang during mount time quotacheck when running fstests. Turning on KASAN seems to make it happen somewhat more frequently. I've edited the backtrace for brevity. XFS (sdd): Quotacheck needed: Please wait. XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0 RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20 R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000 FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0 Call Trace: xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 I /think/ this can happen if xfs_qm_flush_one is racing with xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the dquot buffer, let alone queued it to the delwri list. In this case, flush_one will fail to get the dquot flush lock, but it can lock the incore buffer, but xfs_buf_delwri_pushbuf will then trip over this ASSERT, which checks that the buffer isn't on a delwri list. The hang results because the _delwri_submit_buffers ignores non DELWRI_Q buffers, which means that xfs_buf_iowait waits forever for an IO that has not yet been scheduled. AFAICT, a reasonable solution here is to detect a dquot buffer that is not on a DELWRI list, drop it, and return -EAGAIN to try the flush again. It's not /that/ big of a deal if quotacheck writes the dquot buffer repeatedly before we even set QUOTA_CHKD. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_qm.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 623244650a2f..792736e29a37 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1244,6 +1244,13 @@ xfs_qm_flush_one( error = -EINVAL; goto out_unlock; } + + if (!(bp->b_flags & _XBF_DELWRI_Q)) { + error = -EAGAIN; + xfs_buf_relse(bp); + goto out_unlock; + } + xfs_buf_unlock(bp); xfs_buf_delwri_pushbuf(bp, buffer_list);