From patchwork Tue Nov 14 01:53:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13454711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75460C4167B for ; Tue, 14 Nov 2023 01:54:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232000AbjKNByB (ORCPT ); Mon, 13 Nov 2023 20:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231969AbjKNByA (ORCPT ); Mon, 13 Nov 2023 20:54:00 -0500 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCF4AD44 for ; Mon, 13 Nov 2023 17:53:56 -0800 (PST) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1cc37fb1310so38289785ad.1 for ; Mon, 13 Nov 2023 17:53:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699926836; x=1700531636; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3gwUolg5jbvLCo3qarJX+gsvTFh8u44kjgMnElXaHxw=; b=UhrE3kxr5O99y/sjWrvDtGCgQbQFfjk9hcKblG4fE/1R18j5/JDzH1wQvA90M/mimD vJGui5UCLhL2AIPDNUoBdBUnODIrugdTMnQYWsVS8ZiPjR8OZz8x9Y1dASDADRESmNVU 5DxJ5RLfNX1HjdEVmnRokoquamJLP13+MNDttF5xBX5O8PHc2R17OnC1ulQ+TrSMIzRn YYlq62S5yTGW+gfOOMZlUwtfif5NlwPHe6gRML/r8gVdMigXIHzRHqhXNtbSRo65xDBj rLKr2nd17DAczCQDd+iLDCjM/6C8WcPWH8YC/dF0mK/vFzasA/dCp8Gb0AHVXIAfisYW x1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699926836; x=1700531636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3gwUolg5jbvLCo3qarJX+gsvTFh8u44kjgMnElXaHxw=; b=d2i+meeQSABWQjm0ErDnswAujpuPYPzSA91YV9O5Jrxf/5oeHOstsOy/sv+LNQAMu9 xNaiTSzhqSUXW/PPJBz3A5KbqUbDpm8NBVTty5Af3Z3OM5oOUHPUanLMJLjinW+kJJCb 1CaaBTaYhcInk+6XEvisq11JHECTJrUC35IUxDWDDl3CP3UyisQ7kBl1X0kMJSzeAlYa QbPWf6FPl4g6JOdGEEKvUdX3kYSHr4PQer5haKkaGMAcBe18V3TaxZWdkEciR3FWF0Y6 7UM/tI3ChX6tZu7zkxChzVM/D8vjgeR53DVkMXKHsXgPytoo42zBTm9NmysU80LVodE9 Y4vg== X-Gm-Message-State: AOJu0YxXFMwN3jg6MWQdeWs7SQHiaLxZ5/JMREzmiKMdSC5F9Uahru5Z FXjBMrvJqijW6+iT38jvhHR8d0NqFhAGKA== X-Google-Smtp-Source: AGHT+IH78lTuy2ZMfNp/JyuyouH0iYDA5IqtUoEKDUxEgsEPRLIQDPSW1Jr2LX8z11zebB7pDrG/dg== X-Received: by 2002:a17:902:db06:b0:1cc:3daa:d368 with SMTP id m6-20020a170902db0600b001cc3daad368mr1133880plx.65.1699926836068; Mon, 13 Nov 2023 17:53:56 -0800 (PST) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:d177:a8ad:804f:74f1]) by smtp.gmail.com with ESMTPSA id a17-20020a170902ecd100b001c9cb2fb8d8sm4668592plh.49.2023.11.13.17.53.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 17:53:55 -0800 (PST) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, fred@cloudflare.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik Subject: [PATCH 5.15 CANDIDATE 10/17] xfs: fix intermittent hang during quotacheck Date: Mon, 13 Nov 2023 17:53:31 -0800 Message-ID: <20231114015339.3922119-11-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.43.0.rc0.421.g78406f8d94-goog In-Reply-To: <20231114015339.3922119-1-leah.rumancik@gmail.com> References: <20231114015339.3922119-1-leah.rumancik@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: "Darrick J. Wong" [ Upstream commit f0c2d7d2abca24d19831c99edea458704fac8087 ] Every now and then, I see the following hang during mount time quotacheck when running fstests. Turning on KASAN seems to make it happen somewhat more frequently. I've edited the backtrace for brevity. XFS (sdd): Quotacheck needed: Please wait. XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:assfail+0x46/0x4a [xfs] Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0 RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20 R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000 FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0 Call Trace: xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368] get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 I /think/ this can happen if xfs_qm_flush_one is racing with xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the dquot buffer, let alone queued it to the delwri list. In this case, flush_one will fail to get the dquot flush lock, but it can lock the incore buffer, but xfs_buf_delwri_pushbuf will then trip over this ASSERT, which checks that the buffer isn't on a delwri list. The hang results because the _delwri_submit_buffers ignores non DELWRI_Q buffers, which means that xfs_buf_iowait waits forever for an IO that has not yet been scheduled. AFAICT, a reasonable solution here is to detect a dquot buffer that is not on a DELWRI list, drop it, and return -EAGAIN to try the flush again. It's not /that/ big of a deal if quotacheck writes the dquot buffer repeatedly before we even set QUOTA_CHKD. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_qm.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 623244650a2f..792736e29a37 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1244,6 +1244,13 @@ xfs_qm_flush_one( error = -EINVAL; goto out_unlock; } + + if (!(bp->b_flags & _XBF_DELWRI_Q)) { + error = -EAGAIN; + xfs_buf_relse(bp); + goto out_unlock; + } + xfs_buf_unlock(bp); xfs_buf_delwri_pushbuf(bp, buffer_list);