From patchwork Thu Sep 5 18:21:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792786 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C868317ADFC for ; Thu, 5 Sep 2024 18:21:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560515; cv=none; b=Wu3uFZEOmQp/xVS6u8D6Ab84RFI63hAirUT3FoL7jFeFL9yiM6VYZGDdj9FovHElJ5NqYBT2iNNddXxyDczwVXXp77OFooOFMBB8fFH9Cy5HmXKPEFQNDZgRedcDSS3eSzz0+3czvgsvOd6N64eAWnhX5Al+Llpflsjfq1IghTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560515; c=relaxed/simple; bh=P8HrQ8OTdayzL861yDijJFKOOl1b2FVyKfw2ER9ScN0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YJIpqRnIsLCCK362K3oojm2ixOWNu+pfFkdTwjQ2sDc1g83xxlgHnHMGBs/8v5TRsfJJ7GEqT6wxRG45UnvzciDjHIduIwxoHF/5syGzENE6z6BjV/PGibh1Ren0hbbYHi3Udykx8/odnehHoWpiwKpTdZIvAXC1LOgBGjYig8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kOfNC5pu; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kOfNC5pu" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-71767ef16b3so745949b3a.0 for ; Thu, 05 Sep 2024 11:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560513; x=1726165313; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YftxjRoJc0SoYF3SMn+V0Lig2UIZcgDtolUKgf0ffM0=; b=kOfNC5puQqntyfAD/jUgKFqryykCI/CNqS164ZSK1WwmaJK7lkHG5b8hEuqmPH0YES Mg/k3DIQfesCiKBw2zeBeLlGbVr2EwKuDXz59S/H19CPUPiKE9W2T1+n34GtLuAr3Qo8 U0+jPk82yIL6zdQiFlFJ23v5OV3SwY+OqeKINukhvnReHMgF71Hr98OxZ4cbSBwH5/CV s2iGjZdBMxrkT1Kp0k/2UF/d0LQV0EldjN++lbfo5Bed0Cu+5zzfS5vGLPILGwsGA616 GQhNTsB8GDPXjJWmmyc0X6lbw4lfPCUsmVP9va0es1c8N0kMqGAEtE/lkTbLpIBjH6qm ySHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560513; x=1726165313; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YftxjRoJc0SoYF3SMn+V0Lig2UIZcgDtolUKgf0ffM0=; b=lHWeVia/+YjHP4OfVvhvKOPY4V3BkAufJjEotGO3r7nqS+1pp94rmUOj8gf22FEuZ2 154rCXHBao49WlJR2s7+N9Y/We7eEy+g2g17zKD+ErJPrB3WVW1vRyNccAAkyn4k4oeT mI11oCQSgSmWT52byVUbhwQN/DMsTe01pBZKeiKB9qNGhKnQvriUSah/J7pPad3ePTO+ 6i3pLA9pGx35gXeKCoWDljCRczOfzixEPaTPocumxKNongjjUDlnjGEtib89IBl3q9BA MDq50SsPtHVhbU5G5a8gkb/icHPxNJhlOKN4E9heHyV8uqBAHxsdAHUeR/sPZv1SwZNy jIQQ== X-Gm-Message-State: AOJu0YxRBkEyHBQPnWOdScZYobY4FJrP0ahRwfmjwRX3/GEHi8MybpQ9 PW/y0Z2pWZ0sLQqeclNcLLq5i+GW4nFcIscg1Qk0bDjzgWRNNbsSNwcv5/Do X-Google-Smtp-Source: AGHT+IEOBC1sUh11SsdlEwCtXIsRQloQf0ENvxKbZY3f5L4T1MVNtYBUzU2ADse+IuLGiBrcyU4PKA== X-Received: by 2002:a17:902:f552:b0:206:c776:4f04 with SMTP id d9443c01a7336-206c776541cmr47823145ad.42.1725560512723; Thu, 05 Sep 2024 11:21:52 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:52 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , syzbot+912776840162c13db1a3@syzkaller.appspotmail.com, "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 01/26] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING Date: Thu, 5 Sep 2024 11:21:18 -0700 Message-ID: <20240905182144.2691920-2-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 52f31ed228212ba572c44e15e818a3a5c74122c0 ] Resulting in a UAF if the shrinker races with some other dquot freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is removed from the LRU. This can occur if a dquot purge races with drop_caches. Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_qm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 18bb4ec4d7c9..ff53d40a2dae 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -422,6 +422,14 @@ xfs_qm_dquot_isolate( if (!xfs_dqlock_nowait(dqp)) goto out_miss_busy; + /* + * If something else is freeing this dquot and hasn't yet removed it + * from the LRU, leave it for the freeing task to complete the freeing + * process rather than risk it being free from under us here. + */ + if (dqp->q_flags & XFS_DQFLAG_FREEING) + goto out_miss_unlock; + /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. @@ -441,10 +449,8 @@ xfs_qm_dquot_isolate( * skip it so there is time for the IO to complete before we try to * reclaim it again on the next LRU pass. */ - if (!xfs_dqflock_nowait(dqp)) { - xfs_dqunlock(dqp); - goto out_miss_busy; - } + if (!xfs_dqflock_nowait(dqp)) + goto out_miss_unlock; if (XFS_DQ_IS_DIRTY(dqp)) { struct xfs_buf *bp = NULL; @@ -478,6 +484,8 @@ xfs_qm_dquot_isolate( XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims); return LRU_REMOVED; +out_miss_unlock: + xfs_dqunlock(dqp); out_miss_busy: trace_xfs_dqreclaim_busy(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); From patchwork Thu Sep 5 18:21:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792787 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 740D617BEC1 for ; Thu, 5 Sep 2024 18:21:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560515; cv=none; b=t/gfux88aiVywx4IDSoMW1/hEoUUpVs2vls/nRzGxNQ8B+3A9aGhp5+UCHJIqQPdlAVGBnFrDjwhNzTr0EhH4tMhXLJoF0g4e7WmErnSoEGGBe62fEqfxzZMqfJQAgnzOChSGfTejilVu6FafL9YiqLFFg3hpI/Pwi4L2prjlHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560515; c=relaxed/simple; bh=3SWavtPxKwtYrAcbNdUS6BZdFZuaQnzQYdrYu9pISz8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oPjQ/EvpbFG7je2+I74Fh5nHoOnOBvA05FhRVrpC9fSkww2A0zTywfe9F2x9RFc1bK6n3qQBx49N5J7vp9clSFooS0lqZcAVP+OTbsves/ywqD/HM3+s5mSQBayNefGHG0ePU9bBfrxY424V7GussKc/2kv9NeysRZ0e2SzrHA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EaGMBc74; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EaGMBc74" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2057835395aso11023225ad.3 for ; Thu, 05 Sep 2024 11:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560513; x=1726165313; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=D9ROGkXsssELCpFIc+wpYnNlFRb4wx11f/vrf8GjhkI=; b=EaGMBc74LpSmRIV410BVnt0SWZV/uuA83dY0C6QiF+nsPiLCizz/hb9N4iOp9k1V9Z T3kVlV0DsJ57a/ybpbvU2T+lI4aNSdMTo/mS+yAFlFPJ7U1y4yt3cg4QJGQ2ue7dnuh1 iomcw4r+FUhHt/EDEcEpGF6dQMEbC5WpSaBg0UuVd8WiprclAtG2cDttU26dYO0aAxGM W2HQODJNbe8FEQ++IQGTNyi8heQ0K7mgiV7C0f6qVJvK9UqwpSAR2Z14TsdW8xghrQHb +LthewRKvuEfAa1NUgc3eUvWjU4UFXBUh+HOvc7jHj7rxgNitEs3kh3ELKL8Xu1WnNKZ YmBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560513; x=1726165313; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D9ROGkXsssELCpFIc+wpYnNlFRb4wx11f/vrf8GjhkI=; b=vQeF9aZ1Y8bde4nKwnQUzJ1Et4J7NwDVpM1Z0dChUdF5FCyIjY+Fn5EaOZbYcz5v+Q 9JQCWEoKTESIhp9LeFRC0ILBF4GlDPhwH6oy8+4QpyNTM0I5edxn7/sqPA9o7JgzOWFh QCI8vInYYKkms2eov5jZ6m+ZxG2vXQKdhjs2yCD/iz4QbXoewVWxk2VIbt3ZdAfZ47Wi AgQubzD0F2Gd5bsV05NzszeG1h58xiCIXavmCAyJSJptCJDvCDZ0EOJgx90ZyApYgDi5 PT9bcpJvDDQTr23DKS9cp6iv2hzeCr1WT32/HGvQb6OShadvC+41Qs96t+yP+BI6+X+D 96jw== X-Gm-Message-State: AOJu0YxCKDxGZQ7EZGV154Df08rPAJic2j/C1zYcwgdTE0fRFg4umdis 2nCTDOSjYA4zfQiUREQ/5FKqC12MAaBq7xaj72dglQdw0n5x6ap2AKdjnnzv X-Google-Smtp-Source: AGHT+IEftiwE6wXigV6Bs0JBxoW/je5Iz+AxK0sSulWHCWqBZ46+Dd8iFV8NH2fESPbctFly+ET1HQ== X-Received: by 2002:a17:902:daca:b0:205:5d12:3f24 with SMTP id d9443c01a7336-2055d12474cmr143702615ad.20.1725560513661; Thu, 05 Sep 2024 11:21:53 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:53 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Wu Guanghao , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 02/26] xfs: Fix deadlock on xfs_inodegc_worker Date: Thu, 5 Sep 2024 11:21:19 -0700 Message-ID: <20240905182144.2691920-3-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Wu Guanghao [ Upstream commit 4da112513c01d7d0acf1025b8764349d46e177d6 ] We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found. [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30 When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock. So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker. Signed-off-by: Wu Guanghao Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_icache.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index dd5a664c294f..f5568fa54039 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1858,6 +1858,7 @@ xfs_inodegc_worker( struct xfs_inodegc, work); struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; + unsigned int nofs_flag; ASSERT(gc->cpu == smp_processor_id()); @@ -1866,6 +1867,13 @@ xfs_inodegc_worker( if (!node) return; + /* + * We can allocate memory here while doing writeback on behalf of + * memory reclaim. To avoid memory allocation deadlocks set the + * task-wide nofs context for the following operations. + */ + nofs_flag = memalloc_nofs_save(); + ip = llist_entry(node, struct xfs_inode, i_gclist); trace_xfs_inodegc_worker(ip->i_mount, READ_ONCE(gc->shrinker_hits)); @@ -1874,6 +1882,8 @@ xfs_inodegc_worker( xfs_iflags_set(ip, XFS_INACTIVATING); xfs_inodegc_inactivate(ip); } + + memalloc_nofs_restore(nofs_flag); } /* From patchwork Thu Sep 5 18:21:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792788 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6263D183CB7 for ; Thu, 5 Sep 2024 18:21:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560516; cv=none; b=nn48voW9z5qAgk2wc4b5QCXbUO1CHQO7pN95QnPz+cn9GHUT3G9K8a5d/kj8QwwqEjwbQeXC6+ljQRMQvPjstedQp8+LRrpeuj38zyv8+WAFZ4c+P7yhyDmyE/AC+b6bDy8W2/1QvK40PlYWKSys5n91fNkMqO5wkwqep/85A6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560516; c=relaxed/simple; bh=/OEoQKnY8a779EBHUO+LTZlIDn2RAlwt/EuLDdx8vc0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=myTd5n0AEpYLiGw9AlRIY78UTxZ9W/RCsJTdpBLyDAGxQ/7VgOLGq6ZgTUxBWbv7JrXnF2x2NifQ6kM/USZkD5JTd/fdkpjtDe9VxzIg+YxnhPwvqtimQSHOGpxmia67IjUr5KGO/0v3WlWgxh2MkI0cqtQ9bKMtx6O92rnCvyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Q7avDGET; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q7avDGET" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2021c08b95cso18555665ad.0 for ; Thu, 05 Sep 2024 11:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560514; x=1726165314; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zBWrptBan1hieddm9lbe0C8LOI4hE6j+bk2iO/lGwWw=; b=Q7avDGETxGTgCRE5VQzLFWb98IXLkeWHAArwau4kc3HvpzFy86VfTaY7FCb9ph3kDD DfP1YF5Hr0he9Qh8HmtRCFxJbwi5OXBbSJWO75bhZ2gUFuGicKaYxEtOgMSoWYjvPe35 7LzUmzAf7MInSsbRKLU9Jqy67771qFlt7E4TwljSJp7R6DnpCRe5NEvjlszR4b31OkMQ OZypFwQzeXOvkWz2c0jv9KbxrstDgEVQoNHV69LwgU9euUQz2kxxT051dApg+CVXZgO4 VyDPul0Jhh8wfNTBVdiOAahlF5PnOBQ8ZG24YQI3DVdUk5PeefrXU2yCMYMzkYkJh4B+ P+dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560514; x=1726165314; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zBWrptBan1hieddm9lbe0C8LOI4hE6j+bk2iO/lGwWw=; b=G3xd0dB7FcOVqkQRztN2hvWm92OY9CqNvR3ivdiCYcciN8DN221HgX7MYO4xwOAkWY tmdz8FcO1i2jScW3ZNPICSTqFMSWvxwD+bfMydOlCopOkeF+POiFIfGzNs52OCCeHub0 5c+7dxVFFtiByr6rEaUwlE1FREKlYdfv8FR0SpYAmSY9xaZNznLWVJhYi/70vXm+CZBQ a5xvEP9A9qtfTrpl/1F4A0KG6X496pdo9PrZfp650uxc3K/jOqg7LE+ThKamKU7H8x0h Cu+DUwUbWr/A+Y8utmCuHRQApBmM37M7dK3yaKHm7zSj+5O7E+qUslIYJ1/WDc54jYXd Gt6w== X-Gm-Message-State: AOJu0YyWro8xhNfGdg5XT8IyGEXd4/8HWQPSs/poQbLu51bjo8GMwb2/ i+e2y3D6dI0NOm6iSx1RQJua5eK48KpuquVc3MPfUkPfqicrHs6X/ykTEbXG X-Google-Smtp-Source: AGHT+IHZWR46vidcvvlQ7Xx7gpswcBdtWFLIe5JP0IHfiw64epilmlJ1KqkuGxC1EplN0sannYoCKg== X-Received: by 2002:a17:902:e885:b0:1fd:96c7:24f5 with SMTP id d9443c01a7336-206b7ce7a6cmr111612575ad.5.1725560514520; Thu, 05 Sep 2024 11:21:54 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:54 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Wengang Wang , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 03/26] xfs: fix extent busy updating Date: Thu, 5 Sep 2024 11:21:20 -0700 Message-ID: <20240905182144.2691920-4-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Wengang Wang [ Upstream commit 601a27ea09a317d0fe2895df7d875381fb393041 ] In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on extent busy, the relavent length has to be modified accordingly. Signed-off-by: Wengang Wang Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_extent_busy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index ad22a003f959..f3d328e4a440 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -236,6 +236,7 @@ xfs_extent_busy_update_extent( * */ busyp->bno = fend; + busyp->length = bend - fend; } else if (bbno < fbno) { /* * Case 8: From patchwork Thu Sep 5 18:21:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792789 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E1E2185937 for ; Thu, 5 Sep 2024 18:21:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560517; cv=none; b=EUZECitSiL+2+vTUt4HMsyzsXn8O1vr4BoOPQwd9H9Egr8r+AEPZ+fCNRBGGVnotA5zc0wO20KvEsAKq4fr0fQni0xCLjALAMWEJVF9fLnSnhtzAS5AOXOCq78b9FSZLrfQ+hwtDv8mbCPL7v4NKxlMtLgog2pwA95Ki2gKuj6o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560517; c=relaxed/simple; bh=rJimvfVsU4AeVGq4g+d5N1fHApAqDUdZuaDR9bJWmto=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dhAbvd7mvGGoLWonBqDVJdPiJ/Z7Ekw8A6VXYTgcgzqO8v2R4HgpA/5dYScv12Yy+OJqsD7EOPiqjedqFNM5Ht5e3nXZ7dJLs8txVaRtDbHT26DNWKvCBKCHd2b618gwMMILwmiQUHHLr/3NalQHIhMlihl79APnpTY+fQTLvKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AVK6pdmE; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AVK6pdmE" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-7178df70f28so969222b3a.2 for ; Thu, 05 Sep 2024 11:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560515; x=1726165315; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5esp8iz/87TcLE1PAAVXsZdj3gH8pz4NYQj4IXXZ/gw=; b=AVK6pdmEKcp8wfyHjC6bfxeud8qKTGOXrBijq5VH0trDHUO0yCXszOodhLeFCLEzX1 6YFFxiQSRK4cR6516CfEXuRbCNt0mCFQod91unmn1Z6PLfr79HkqCgTmwNZRLCgATIFB SxPi8t6Xl4AtMeG6P7CaFVPYdgSPbcvuC6YnbvnDbpgkJU//jwoe+HvqeOPK0W/XBfW3 0KKNhXcugF7EcVi0XYuwRfm3Xj8Oo3xxiwDklu0E0+4doKZw7G5voABipLOimTDW6qh3 SOLO57jB7xXnMTQH+h23U6xB2t/mqiCo4Ar7frg5kZBeL+AHjwIFEazbUVo6XOcVhuEc MV6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560515; x=1726165315; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5esp8iz/87TcLE1PAAVXsZdj3gH8pz4NYQj4IXXZ/gw=; b=UMVEchILq4++zoa4eLEQhHhzAacQtrIfZ8jzEkwj4CDZwpXvkijy8FNF1H6qe+cv30 u029JpWCYVoa5L/wLyNrrtl/wS2Uc/I/Qx8+BbUSqMrj+xI7FVxs2XRSbQ4HDJsr3YEN 48q9dTCnIgXiZp9WtBlgy1P9+gek/D+JvJfFcQSJW550I6IIzDU3H2EAuf4QgyOshJAr A95VH22rgHHxLI6uHeGfAHotibH6bxtnsS5GZI4CBMHPoclnxSYZO+gAyUoUp83VyUvj Fzlddmbq38sn0ThYOMdluK9vzCVmCoLC9/jzj2p9RkkGKI6ugL5ai3ztcLA8B6WZXtUd ZhLg== X-Gm-Message-State: AOJu0Yyd0Y55MU4YcHupZ/jeveThkTkE0EB5Vgqo7bek9xmLshDyKlnc Ms5Jn/0v6D0Ta3RuHGjUyKlLRScGs+2qzIlT0cLLgH9Ek3YPIl1q4v3S0EmJ X-Google-Smtp-Source: AGHT+IGB8hzmnDsc2YyFyctZq2uxoC+8m5cYZqUwOLEWRvDvwBpti/DYybunlJdUQvsDEjDfQYpB+Q== X-Received: by 2002:a17:902:e54f:b0:205:7574:3b79 with SMTP id d9443c01a7336-20699ae8293mr112690885ad.25.1725560515450; Thu, 05 Sep 2024 11:21:55 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:55 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 04/26] xfs: don't use BMBT btree split workers for IO completion Date: Thu, 5 Sep 2024 11:21:21 -0700 Message-ID: <20240905182144.2691920-5-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b ] When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_btree.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 4c16c8c31fcb..6b084b3cac83 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -2913,9 +2913,22 @@ xfs_btree_split_worker( } /* - * BMBT split requests often come in with little stack to work on. Push + * BMBT split requests often come in with little stack to work on so we push * them off to a worker thread so there is lots of stack to use. For the other * btree types, just call directly to avoid the context switch overhead here. + * + * Care must be taken here - the work queue rescuer thread introduces potential + * AGF <> worker queue deadlocks if the BMBT block allocation has to lock new + * AGFs to allocate blocks. A task being run by the rescuer could attempt to + * lock an AGF that is already locked by a task queued to run by the rescuer, + * resulting in an ABBA deadlock as the rescuer cannot run the lock holder to + * release it until the current thread it is running gains the lock. + * + * To avoid this issue, we only ever queue BMBT splits that don't have an AGF + * already locked to allocate from. The only place that doesn't hold an AGF + * locked is unwritten extent conversion at IO completion, but that has already + * been offloaded to a worker thread and hence has no stack consumption issues + * we have to worry about. */ STATIC int /* error */ xfs_btree_split( @@ -2929,7 +2942,8 @@ xfs_btree_split( struct xfs_btree_split_args args; DECLARE_COMPLETION_ONSTACK(done); - if (cur->bc_btnum != XFS_BTNUM_BMAP) + if (cur->bc_btnum != XFS_BTNUM_BMAP || + cur->bc_tp->t_firstblock == NULLFSBLOCK) return __xfs_btree_split(cur, level, ptrp, key, curp, stat); args.cur = cur; From patchwork Thu Sep 5 18:21:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792790 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 763181862B4 for ; Thu, 5 Sep 2024 18:21:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560519; cv=none; b=SMPB8ILXDTiioHU4++jhNq4qZH/d7arP3gFGmAfHxFLZoMpJHPv7xnxKrA2lLzmFbbmBc6MLyp11coLNWclXYcZ354kdlV01MK5UIwzEsYlWruZUor0Me2iiHCpM1fopSr63/na4f0flgaVVsLH6FOo09DCQVYzg7+gxWZJ9hA4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560519; c=relaxed/simple; bh=YntEk1aSrdpflNWOfU1FsKdaUIpTBcq+A2o1PW9IdEQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MVXCEiPchrE2F/NqnnPYVKwI4RhvXfAOxlwJqzoLfxacsYsGPrNHy4t3QRCg+CH9eT8nWcmZaK/B1QDMo5p3JoSdlZt1zUP0jDa1aho6qzpZoy85VzEmkkUrm0mmtypS+t9CaJtOHqVNJN9QeveyzUptORg4OSF/lt7H1U2K7Vc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Cd/5aty3; arc=none smtp.client-ip=209.85.215.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Cd/5aty3" Received: by mail-pg1-f176.google.com with SMTP id 41be03b00d2f7-7c3ebba7fbbso915840a12.1 for ; Thu, 05 Sep 2024 11:21:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560516; x=1726165316; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=62kj8RbxEktclRNWNuIxo0IxrhVzG6pnC4qkEucGbzw=; b=Cd/5aty3ybPdwlyazAbe7AlE7m0ISsQCkxhL5O/yeva2sCnNlCl6q+w1H3b2ZOjuCn XYdwWujiUPl/9/c7MXlEDe7xOhdK7tk/C0J83DWBOGhXwT1ZXSvqZIwb2N7NNRKuzehp I/L5LY/BLy32TMYsOjJCQ4d2TEY3+WP+agcnp2PX+jdZd2ejMnjgANlRFFmwqqyMK9fh ctA9Ysc9Pg2x1t/WQmtgO0ZFZSADrJ/fLa1lKVte1pHnjJ42Nckpskk6Wpm/1AWcLfpS mswi0ohm+AD3k+G8ppOnPzb6fR8SV4EqquE5PMscQmcZa8WyJtIcxDOQIV2qziIuKmdL LdxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560516; x=1726165316; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=62kj8RbxEktclRNWNuIxo0IxrhVzG6pnC4qkEucGbzw=; b=wna54XleAeXvemeDaZI9tMtcP46HCZCC1AvzjV2oWoEK00bCl0+NK8vNAz6GoIHIvF m6OWJglAeVeNHMLYhgYKr3KMcyB3SRJvxlRol/5cTLaXGyNrXeW6SABIVaEhaxOXncgM ZKbg7fz0BFab3sCwt5j0MCqzJggPtgQS54RsedjHpWhi0l32nB+uIAuigrezuGSg5A5M Jl/s2agQTRu67hTIRqRuI7JEZaWe6N1Md77lQMM4EXmvyHhszm6Dxhal1ReJw7OApnnv rRUFPKVLFfnt3TBTR0ek4EzWxdlBYHM65EZruSvKtlbaKrcdrMGh6nM6hTG4d2TVyiOb vrhw== X-Gm-Message-State: AOJu0Yzc29KFS5dBsW0bR5FZM/GgIIt195RPmxuzcN8Ov+MzPPbL3ec2 oWWxVn14X8FV6uzjEhg5pFVwZC4dOV4w54VrC0hsk4p2CZ4aPJrHEXd/KPOZ X-Google-Smtp-Source: AGHT+IHc6q/HBzjkQJossmB55fxal5w8byk85QV3mK06C8oGuo4nonD52+0sE57ORpQg8H0TGHrNnA== X-Received: by 2002:a17:902:e804:b0:202:4b65:65af with SMTP id d9443c01a7336-20546b35a4cmr232255595ad.52.1725560516434; Thu, 05 Sep 2024 11:21:56 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:56 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 05/26] xfs: fix low space alloc deadlock Date: Thu, 5 Sep 2024 11:21:22 -0700 Message-ID: <20240905182144.2691920-6-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 1dd0510f6d4b85616a36aabb9be38389467122d9 ] I've recently encountered an ABBA deadlock with g/476. The upcoming changes seem to make this much easier to hit, but the underlying problem is a pre-existing one. Essentially, if we select an AG for allocation, then lock the AGF and then fail to allocate for some reason (e.g. minimum length requirements cannot be satisfied), then we drop out of the allocation with the AGF still locked. The caller then modifies the allocation constraints - usually loosening them up - and tries again. This can result in trying to access AGFs that are lower than the AGF we already have locked from the failed attempt. e.g. the failed attempt skipped several AGs before failing, so we have locks an AG higher than the start AG. Retrying the allocation from the start AG then causes us to violate AGF lock ordering and this can lead to deadlocks. The deadlock exists even if allocation succeeds - we can do a followup allocations in the same transaction for BMBT blocks that aren't guaranteed to be in the same AG as the original, and can move into higher AGs. Hence we really need to move the tp->t_firstblock tracking down into xfs_alloc_vextent() where it can be set when we exit with a locked AG. xfs_alloc_vextent() can also check there if the requested allocation falls within the allow range of AGs set by tp->t_firstblock. If we can't allocate within the range set, we have to fail the allocation. If we are allowed to to non-blocking AGF locking, we can ignore the AG locking order limitations as we can use try-locks for the first iteration over requested AG range. This invalidates a set of post allocation asserts that check that the allocation is always above tp->t_firstblock if it is set. Because we can use try-locks to avoid the deadlock in some circumstances, having a pre-existing locked AGF doesn't always prevent allocation from lower order AGFs. Hence those ASSERTs need to be removed. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_alloc.c | 69 ++++++++++++++++++++++++++++++++------- fs/xfs/libxfs/xfs_bmap.c | 14 -------- fs/xfs/xfs_trace.h | 1 + 3 files changed, 58 insertions(+), 26 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index de79f5d07f65..8bb024b06b95 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3164,10 +3164,13 @@ xfs_alloc_vextent( xfs_alloctype_t type; /* input allocation type */ int bump_rotor = 0; xfs_agnumber_t rotorstep = xfs_rotorstep; /* inode32 agf stepper */ + xfs_agnumber_t minimum_agno = 0; mp = args->mp; type = args->otype = args->type; args->agbno = NULLAGBLOCK; + if (args->tp->t_firstblock != NULLFSBLOCK) + minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp->t_firstblock); /* * Just fix this up, for the case where the last a.g. is shorter * (or there's only one a.g.) and the caller couldn't easily figure @@ -3201,6 +3204,13 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->pag = xfs_perag_get(mp, args->agno); + + if (minimum_agno > args->agno) { + trace_xfs_alloc_vextent_skip_deadlock(args); + error = 0; + break; + } + error = xfs_alloc_fix_freelist(args, 0); if (error) { trace_xfs_alloc_vextent_nofix(args); @@ -3232,6 +3242,8 @@ xfs_alloc_vextent( case XFS_ALLOCTYPE_FIRST_AG: /* * Rotate through the allocation groups looking for a winner. + * If we are blocking, we must obey minimum_agno contraints for + * avoiding ABBA deadlocks on AGF locking. */ if (type == XFS_ALLOCTYPE_FIRST_AG) { /* @@ -3239,7 +3251,7 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->type = XFS_ALLOCTYPE_THIS_AG; - sagno = 0; + sagno = minimum_agno; flags = 0; } else { /* @@ -3248,6 +3260,7 @@ xfs_alloc_vextent( args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno); flags = XFS_ALLOC_FLAG_TRYLOCK; } + /* * Loop over allocation groups twice; first time with * trylock set, second time without. @@ -3276,19 +3289,21 @@ xfs_alloc_vextent( if (args->agno == sagno && type == XFS_ALLOCTYPE_START_BNO) args->type = XFS_ALLOCTYPE_THIS_AG; + /* - * For the first allocation, we can try any AG to get - * space. However, if we already have allocated a - * block, we don't want to try AGs whose number is below - * sagno. Otherwise, we may end up with out-of-order - * locking of AGF, which might cause deadlock. - */ + * If we are try-locking, we can't deadlock on AGF + * locks, so we can wrap all the way back to the first + * AG. Otherwise, wrap back to the start AG so we can't + * deadlock, and let the end of scan handler decide what + * to do next. + */ if (++(args->agno) == mp->m_sb.sb_agcount) { - if (args->tp->t_firstblock != NULLFSBLOCK) - args->agno = sagno; - else + if (flags & XFS_ALLOC_FLAG_TRYLOCK) args->agno = 0; + else + args->agno = sagno; } + /* * Reached the starting a.g., must either be done * or switch to non-trylock mode. @@ -3300,7 +3315,14 @@ xfs_alloc_vextent( break; } + /* + * Blocking pass next, so we must obey minimum + * agno constraints to avoid ABBA AGF deadlocks. + */ flags = 0; + if (minimum_agno > sagno) + sagno = minimum_agno; + if (type == XFS_ALLOCTYPE_START_BNO) { args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno); @@ -3322,9 +3344,9 @@ xfs_alloc_vextent( ASSERT(0); /* NOTREACHED */ } - if (args->agbno == NULLAGBLOCK) + if (args->agbno == NULLAGBLOCK) { args->fsbno = NULLFSBLOCK; - else { + } else { args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno); #ifdef DEBUG ASSERT(args->len >= args->minlen); @@ -3335,6 +3357,29 @@ xfs_alloc_vextent( #endif } + + /* + * We end up here with a locked AGF. If we failed, the caller is likely + * going to try to allocate again with different parameters, and that + * can widen the AGs that are searched for free space. If we have to do + * BMBT block allocation, we have to do a new allocation. + * + * Hence leaving this function with the AGF locked opens up potential + * ABBA AGF deadlocks because a future allocation attempt in this + * transaction may attempt to lock a lower number AGF. + * + * We can't release the AGF until the transaction is commited, so at + * this point we must update the "firstblock" tracker to point at this + * AG if the tracker is empty or points to a lower AG. This allows the + * next allocation attempt to be modified appropriately to avoid + * deadlocks. + */ + if (args->agbp && + (args->tp->t_firstblock == NULLFSBLOCK || + args->pag->pag_agno > minimum_agno)) { + args->tp->t_firstblock = XFS_AGB_TO_FSB(mp, + args->pag->pag_agno, 0); + } xfs_perag_put(args->pag); return 0; error0: diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 0d56a8d862e8..018837bd72c8 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3413,21 +3413,7 @@ xfs_bmap_process_allocated_extent( xfs_fileoff_t orig_offset, xfs_extlen_t orig_length) { - int nullfb; - - nullfb = ap->tp->t_firstblock == NULLFSBLOCK; - - /* - * check the allocation happened at the same or higher AG than - * the first block that was allocated. - */ - ASSERT(nullfb || - XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <= - XFS_FSB_TO_AGNO(args->mp, args->fsbno)); - ap->blkno = args->fsbno; - if (nullfb) - ap->tp->t_firstblock = args->fsbno; ap->length = args->len; /* * If the extent size hint is active, we tried to round the diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 372d871bccc5..5587108d5678 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1877,6 +1877,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough); DEFINE_ALLOC_EVENT(xfs_alloc_small_done); DEFINE_ALLOC_EVENT(xfs_alloc_small_error); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs); +DEFINE_ALLOC_EVENT(xfs_alloc_vextent_skip_deadlock); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed); From patchwork Thu Sep 5 18:21:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792791 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90DCF186608 for ; Thu, 5 Sep 2024 18:21:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560520; cv=none; b=qcsLRdB9m+LaQYLAQ55B4kh7xhdbEc2aAiV15eSvBEtPV0ixhSFXewkLNLbYkUjVMwEL8yfosC/G1wmjCCpLnpO8M+6xKPFOX6JeLfMqPxp0f5KmMVK8RBoiA5ELudmOe0u50AVhGQ84PhuZHjYoEwvlecq60RXW43jJYmu7jik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560520; c=relaxed/simple; bh=iG12Hbt+HUHFWRWYme9P4+XMNeqr8y1lnwAGC3q0VIg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SaVI02WF3F1Qy6puh6r1rzTSUsQpBzzKazLEhpAulpQizyBKSfVhgHnFihfXdQ5/yZa7uXKjm4PFNpEvV+oL7Juo4twsojtgQw96CPm45lNF6kjHwyjacSHB4wz0ZuHqoV630Mc8gl891CmZQ8VZ8Jtvgrg8IL2+SZjnGfERK0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lXCaAerM; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lXCaAerM" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2068a7c9286so10967405ad.1 for ; Thu, 05 Sep 2024 11:21:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560518; x=1726165318; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=22WKvDUkm8sNe/9MUbla1NZCO9YRNA415GHi5o/rV6M=; b=lXCaAerMzn0ptnF1nTUCOHuT2JAPxOXElzduzGQOraO5M4KLpXLKG1t4aWDScJhkZi C4lrYcT12mtfXqsECdiyIWjf1OLD25rCVGrCpqijdXZQrubgNuFNfOxM+sNeFvnd8N5g EdPlFPT8Nrgly0gLpzBGh5QWI9CbrAp5YAnp0qxZ6GC/R7alhhxnZDemapC7y2yvYrtP sHtnsgBDkDrozMav73ygJzEiYc27A+nwciUj8SMeOgNmAZwdgv2I2oiZSMmPIV7MPoqI ROGkjZcGIOOHi+kXdOZZVEsEBeIBYT2hDERohfQzl+AWdND/QsYbYU6hpFBM0rgXdzco OBbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560518; x=1726165318; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=22WKvDUkm8sNe/9MUbla1NZCO9YRNA415GHi5o/rV6M=; b=bBZEOoDVfPQOBBrmNiahTOGk8AfCKAW3o0/VCe8FesY8CMyMG1CBaFCr2zQo0r93Wc 9K9ZHBHEtFLHgzq41u2nfLLotNlTEELKW1hd8dWBn0PeyzwHaRD+Pm8eA+tGYrtOVPYn IJLMiRxtGYFEZCszRBuyb6zCYzvIoCUWCbmZHQugUlTO7mjwEq2ifxd2DP9Ie6HhuSKk VtjdAUkSFYNQeyyB4oKUtln4zir1grrm28Gw/gYbEgc4cnsqSL1cYWxDYwet0DY7cL4i ERl0WgkK4I1Eyt3xgqrGaZWl5XOIvfOLz1cKhRPR2Tc2XNHo/t52+YI00YhhZrJF7JZ1 bMMw== X-Gm-Message-State: AOJu0YxtgtuyYKkwwzuZjiKff14vKNbj5WowjyxSBh/o6OJNZyoDmSKr niZrSslOLAs5V5KOPbvNB0wvu31waCXOgsGzw5d31xurt6BeX1E/k7MtJC9C X-Google-Smtp-Source: AGHT+IGdPi7sW3fFj0qbUnWH179tVtVQrQJDxNwE3JUYp5CtkyGyYCER7aoFq3rB2aEFQ0rO/ttlLQ== X-Received: by 2002:a17:902:dacb:b0:205:5136:b2fb with SMTP id d9443c01a7336-205841a4376mr165882585ad.23.1725560517558; Thu, 05 Sep 2024 11:21:57 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:57 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 06/26] xfs: prefer free inodes at ENOSPC over chunk allocation Date: Thu, 5 Sep 2024 11:21:23 -0700 Message-ID: <20240905182144.2691920-7-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit f08f984c63e9980614ae3a0a574b31eaaef284b2 ] When an XFS filesystem has free inodes in chunks already allocated on disk, it will still allocate new inode chunks if the target AG has no free inodes in it. Normally, this is a good idea as it preserves locality of all the inodes in a given directory. However, at ENOSPC this can lead to using the last few remaining free filesystem blocks to allocate a new chunk when there are many, many free inodes that could be allocated without consuming free space. This results in speeding up the consumption of the last few blocks and inode create operations then returning ENOSPC when there free inodes available because we don't have enough block left in the filesystem for directory creation reservations to proceed. Hence when we are near ENOSPC, we should be attempting to preserve the remaining blocks for directory block allocation rather than using them for unnecessary inode chunk creation. This particular behaviour is exposed by xfs/294, when it drives to ENOSPC on empty file creation whilst there are still thousands of free inodes available for allocation in other AGs in the filesystem. Hence, when we are within 1% of ENOSPC, change the inode allocation behaviour to prefer to use existing free inodes over allocating new inode chunks, even though it results is poorer locality of the data set. It is more important for the allocations to be space efficient near ENOSPC than to have optimal locality for performance, so lets modify the inode AG selection code to reflect that fact. This allows generic/294 to not only pass with this allocator rework patchset, but to increase the number of post-ENOSPC empty inode allocations to from ~600 to ~9080 before we hit ENOSPC on the directory create transaction reservation. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 94db50eb706a..120dbec16f5c 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1737,6 +1737,7 @@ xfs_dialloc( struct xfs_perag *pag; struct xfs_ino_geometry *igeo = M_IGEO(mp); bool ok_alloc = true; + bool low_space = false; int flags; xfs_ino_t ino; @@ -1767,6 +1768,20 @@ xfs_dialloc( ok_alloc = false; } + /* + * If we are near to ENOSPC, we want to prefer allocation from AGs that + * have free inodes in them rather than use up free space allocating new + * inode chunks. Hence we turn off allocation for the first non-blocking + * pass through the AGs if we are near ENOSPC to consume free inodes + * that we can immediately allocate, but then we allow allocation on the + * second pass if we fail to find an AG with free inodes in it. + */ + if (percpu_counter_read_positive(&mp->m_fdblocks) < + mp->m_low_space[XFS_LOWSP_1_PCNT]) { + ok_alloc = false; + low_space = true; + } + /* * Loop until we find an allocation group that either has free inodes * or in which we can allocate some inodes. Iterate through the @@ -1795,6 +1810,8 @@ xfs_dialloc( break; } flags = 0; + if (low_space) + ok_alloc = true; } xfs_perag_put(pag); } From patchwork Thu Sep 5 18:21:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792792 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7177C186608 for ; Thu, 5 Sep 2024 18:21:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560521; cv=none; b=kM7j7/4C+WhcSDF3GkGHBUUTXdJ5+/O8m4t/7MoXdzbOOARMe3hqGRESVEws3q13qQFDZlzSW8kWFHn3/ZCusO4PyVl098++9MMOpzysSsP+KjT7GAK8jMQ3NrLH91rUcXhncWbXQywKg9NU193mohS8o+cHJDL4C1BTCDzE7IE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560521; c=relaxed/simple; bh=FF6JMZWwQRcRleKVkDA6D0aj94NXEADuuw+p/iYLhGo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rRwbMw0goQY0rr4E0zmkQ8xgC3IOHclfUXQbVtQ/dQQDa6bnoXR7MvaPICjjqhrQ3/piVoSbC4bs9RUFOS1aMXwRLqIpKBRxAjJrLx2jy1jhzS8vEkhjN/x6OLvAZ8M6403DTbrTNjwa9B5B6d34/meKA74MJAKOymRfGd9t9Dg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DHgSiQ+2; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DHgSiQ+2" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-715cc93694fso1003042b3a.2 for ; Thu, 05 Sep 2024 11:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560519; x=1726165319; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fRCmlKmYyIo2vUDZg14mFO8/fkqfqDMlbQVHLwmhVOA=; b=DHgSiQ+2Q4VqmoMdZ7YsUUvXxEcql3zJ1zaZczzN/mBgBhOmJf71v7PGhrpFrV0ozP 4AzEQ7Cx+irDyCQpxrsbbCx0uPy3NH0QBFUVpGsukn9WyO+UvLh9nOlKHZ87jXlWqe3+ l/6yVa9B85Ud64FylCNnTuh7cATwlSijxxyamiyX9RjndoWBwhNwC4r2VCKb25Qoaibk /W/Xwejy8KPIZ6UVADwKshVxAyMyADxLmQR8bHhRSscCSkTmRGcfAf/kYL+KQXWRsLlS 47i33Yua+NiHt9kQ9Xxt2h8mlsDV0oGuaqziNCEGSCtT0+mOSPz6TT1/9qAptXckreIO YbDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560519; x=1726165319; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fRCmlKmYyIo2vUDZg14mFO8/fkqfqDMlbQVHLwmhVOA=; b=hADAR+IhGp8DBFeAIkFckJlzsy9K1YoBgbxbjVPWLa72iykXIpZv3UMoP/z4qPO2bh 1vjvHDuNINF7EKeuMUpLlyPKiNnzvNz9IIXPCkByg/GT8H+u75XmQ4bhUCgCDLvOVbHI uwCzFcjjJBDtDHBX1tBfxONnkmIKv5EDMh0AfeuNDu7MrLVgjNff/LNVt1TlH0+KFU+A SmGevmjpnIxMOumIY9p/k2jZp7NhsLvsuwz3KJfhHKteGi0f/vbFu8gOPpCtDPK+qbkn VDF5aogGL5nrsGBj38PKL5mQTX053qNy/eDZnaZ62faNhQQOv2j2+r7jD9py1m6qhUS/ iCUw== X-Gm-Message-State: AOJu0YwsIBfmwMea0W991G36TWNkjCa3eTy6WrgtWWdhZaL5l334gFzp jkfAeAYP9eFlAIrUZPuUHC4Z4cwcFLoZHwsv6dKd36NTlN/DerE8//mcMRWd X-Google-Smtp-Source: AGHT+IERZk1yo9NjR+kJQ3nTcxhFXAsOZPY5JMw2+Q4xDoial0SADCBw5sjzUL4UmgtG41PHscuF5g== X-Received: by 2002:a17:902:e804:b0:1fc:568d:5f05 with SMTP id d9443c01a7336-205472c61acmr214965895ad.8.1725560518674; Thu, 05 Sep 2024 11:21:58 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:58 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 07/26] xfs: block reservation too large for minleft allocation Date: Thu, 5 Sep 2024 11:21:24 -0700 Message-ID: <20240905182144.2691920-8-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit d5753847b216db0e553e8065aa825cfe497ad143 ] When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value. This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation. We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it... Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap.h | 2 ++ fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++---------- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 018837bd72c8..9dc33cdc2ab9 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4242,7 +4242,7 @@ xfs_bmapi_convert_unwritten( return 0; } -static inline xfs_extlen_t +xfs_extlen_t xfs_bmapi_minleft( struct xfs_trans *tp, struct xfs_inode *ip, diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 16db95b11589..08c16e4edc0f 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -220,6 +220,8 @@ int xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp); +xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip, + int fork); enum xfs_bmap_intent_type { XFS_BMAP_MAP = 1, diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index cfa052d40105..18de4fbfef4e 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -213,18 +213,16 @@ xfs_bmbt_alloc_block( if (args.fsbno == NULLFSBLOCK) { args.fsbno = be64_to_cpu(start->l); args.type = XFS_ALLOCTYPE_START_BNO; + /* - * Make sure there is sufficient room left in the AG to - * complete a full tree split for an extent insert. If - * we are converting the middle part of an extent then - * we may need space for two tree splits. - * - * We are relying on the caller to make the correct block - * reservation for this operation to succeed. If the - * reservation amount is insufficient then we may fail a - * block allocation here and corrupt the filesystem. + * If we are coming here from something like unwritten extent + * conversion, there has been no data extent allocation already + * done, so we have to ensure that we attempt to locate the + * entire set of bmbt allocations in the same AG, as + * xfs_bmapi_write() would have reserved. */ - args.minleft = args.tp->t_blk_res; + args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip, + cur->bc_ino.whichfork); } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) { args.type = XFS_ALLOCTYPE_START_BNO; } else { @@ -248,6 +246,7 @@ xfs_bmbt_alloc_block( * successful activate the lowspace algorithm. */ args.fsbno = 0; + args.minleft = 0; args.type = XFS_ALLOCTYPE_FIRST_AG; error = xfs_alloc_vextent(&args); if (error) From patchwork Thu Sep 5 18:21:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792793 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 724C2187357 for ; Thu, 5 Sep 2024 18:22:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560521; cv=none; b=VyTKOJYvJ4de0kwNheA73PK2SBssA7M0dyzMjF6NZVzXcVnC8ey0vOD9DU2/WU6XwWz0DsT7yXfHaxWF5wZ0JUOCo/4BkvRr+xGobJJfKa90MEyZ2IKQnuRh9r/X06xTcxprgA0keOcQThYnV4ElKfc3eHZkuZvzp5QPi/blSxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560521; c=relaxed/simple; bh=mS0Nk9qedVsrVvzemMpUVzwX/mfMQF94WPpvXDLExhE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qnUXNS4kMqswGy7RlSKEqsXmx3lAnZeNap+V5oJUdAm2PqlZzeRzkEMERVUOKkJwRedEyBSNFGNXdmLKEAzf43MCx4nTZPl/itzPpOFs41jFl0oyojTskQc3uhjzs8BaayG8vXeWnuiOfTsittDvezaPDV8yDRYtpxiJzKULrqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FuD69d/Y; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FuD69d/Y" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-201d5af11a4so10861285ad.3 for ; Thu, 05 Sep 2024 11:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560519; x=1726165319; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=olcB0F8DHK4CDFU/f2CEsuVv4NsXmBIozmjDSFuNxmg=; b=FuD69d/YVtXqE+zpBMKCrgUr6+ujig2unJDjO5btpCmt0kEEBhDctVVXKb7paF4UNI PS+Eh34r/L1IF3wY404DdHC/s+ZDWrsX0rUl47lBQ69IBDXt4YAblCHZxgpfM2d96ipp kocSqw5ZRwkPoM54uYKdPLS2IDx9mMt5A/8zrg0akTobYes7jCwO1K9Iea6i8s1JPHrk y/E9ewmQ30Yfyuvn3A7hXDdnVmjuRiQs7SDoW5geEQF+SwLe7G3aloVEHwAR2aIUfiCs t3BJzDhjYKwG/CEVWHubTDE4gN08DmLhX+5FMclSmKHSVMnleNw7YQz54Zz8tvPDGe7/ Xt1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560519; x=1726165319; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=olcB0F8DHK4CDFU/f2CEsuVv4NsXmBIozmjDSFuNxmg=; b=mfF2ac5XzSnqSPAw2UY1FnJyQddBd3saA9s5TrQeMDfHKTmdDLpg6U6dyJ1GZoKlab zA/YX0TdsCh37TP++QP7nHCmxUtPOPTqTChg1DlUeNixsGOEjG0YoxZor+fZEaMdsu+R gYAjGmDZf7DCYPUOkIS+Y37lP7w3GsAFsBqBgEXcJlsP2g6o0aVFv70GAWpDdwNyI3vF 51XaacVOJI6wosNz7EJ1/tQ0dxfdWQfc+Z6fjuNpJEqwPzC89I/AGVFfUjowH2lAXYgs LfTBuCCogCCkCQmJWmAbfGi8dP4dPAKr38L6p2fHvsOEhuR4ZWUkKuLi9FTecPjhbd8A zTlg== X-Gm-Message-State: AOJu0YxnMeVbc7lbJo4hLeryajs0THwr6wu0qJponRitfWhL/smU/qHP 9GIihKEvwjpV5rsFLL9opVQxVytnX06zDwqN1s1JYX9rSdAapXY383WtH2q3 X-Google-Smtp-Source: AGHT+IE6q94kM5GcVu/dON7eY0k90Yy3KKVlZ6Zgu4uvfzKm9M+5M25RYwN8aWIX9Z7niJx5BD6bEQ== X-Received: by 2002:a17:902:eccd:b0:202:51ca:9823 with SMTP id d9443c01a7336-20546b35b37mr202720455ad.46.1725560519622; Thu, 05 Sep 2024 11:21:59 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:21:59 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com, Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 08/26] xfs: fix uninitialized variable access Date: Thu, 5 Sep 2024 11:21:25 -0700 Message-ID: <20240905182144.2691920-9-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 60b730a40c43fbcc034970d3e77eb0f25b8cc1cf ] If the end position of a GETFSMAP query overlaps an allocated space and we're using the free space info to generate fsmap info, the akeys information gets fed into the fsmap formatter with bad results. Zero-init the space. Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_fsmap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index d8337274c74d..062e5dc5db9f 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -761,6 +761,7 @@ xfs_getfsmap_datadev_bnobt( { struct xfs_alloc_rec_incore akeys[2]; + memset(akeys, 0, sizeof(akeys)); info->missing_owner = XFS_FMR_OWN_UNKNOWN; return __xfs_getfsmap_datadev(tp, keys, info, xfs_getfsmap_datadev_bnobt_query, &akeys[0]); From patchwork Thu Sep 5 18:21:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792794 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4CC018754D for ; Thu, 5 Sep 2024 18:22:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560523; cv=none; b=m8unn4gD8bBYkxNY2aljmMD86GeJMnRj1aulY73tLn/3ojKsLjkfvXan7t/a61fYKb1uBcwkM+orGOe6gCDoyseASijYFDhajrHQCAjcqlirFE4PZTGGAN/tpIk+OYdVH8Q2p01Ul3B9GQRBnr4PUWYMKZJmw9O2xpZa4OrG+Hw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560523; c=relaxed/simple; bh=ex+Z0N7afk0rOd6e7S5yztxz371+3w1tO7GgrvQk9Kw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dxVRh3DfbjE/BtflkT66bKGr46ulBgbAWFhgi8C725lQTUW/t+2hTshF7OjKASTq85sE/QpcM4cdlDHMbhg18271xGnfYkhD6YpdUXv9912FnLo9YKkEzb7cXsRdPKZloGxmBiIUYZWUEUCjHoiLDDarMlboOaRhl1sS7Co+Cwk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SEj51Wcb; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SEj51Wcb" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-7d50e865b7aso549019a12.0 for ; Thu, 05 Sep 2024 11:22:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560521; x=1726165321; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hZU7Bfd5lbjk+/RmdOLvGCY4sbjakLLt4Vz3JSDRaUc=; b=SEj51WcbdgPyz8Q8Q0Bfx9TdGxMgmyG07KTAy08dWgPMl/MB7FxII/XD2fSe3i90qK dc0b7U8yavRxJhZRFzAG45v0JHQPgQStSt5NN0ott5YIUw9vqJZk0xN7BEE4fqyuOsWe pRfOXTThi008rw1edSJTHRSCcXD9AiwHANWS2n4QJdCZF2stnoRKcSctn/1do8upC6OA NH/km6fxUEOZ2pVsWTXmygm9MlOxdIJNd5kfObZf3GQE6RJv75ZPtx/574HC2cOosTbR QDvWwykxqP5Pb+JftGzcHHs61xpOSJAh4+KnGdGYQCGqrnAnvk4wPJN4HzpFvwgku7wX RLJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560521; x=1726165321; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hZU7Bfd5lbjk+/RmdOLvGCY4sbjakLLt4Vz3JSDRaUc=; b=SqtQPJPC/kP5oXiZ0W8+OKG64i9kkDUPpA3VOI++eGwVrCgJ5nLEzITt53GzvTXLC0 679JXgV/B/6jYEDBFZYylZJ8DSJxvNvf2kMFIk9qBT6Hkv/6nbf76HjixvWVo0w+xrK6 2kYvj9+yxbCA1uxpIxblBG4M2nJuR1KtlIFVbUgWyEk6U4jXqwWMIJAH4Tvz0e9QCbkQ HUsOdB/DcEzrq82F9FNfFtp8xcrSmgSVsOdVZVLF8LnC9mlth+z7XSYOkY/+2DUkYxgt 316LJ2wX1t6C6m0elMguQaKB5MWaIkeV1MoRLgxXk6o4tqEY3b9RtKGvXS+qRwczCdbi HLqQ== X-Gm-Message-State: AOJu0Yx3nk8LIxOLlq4EVUVfwWVvmgPETvX6VCL/hp25CcdxK18H9Wr5 +s46o1wrkB7Mf6nggDu+d6uAM2xSbnbxtaSFETIqo+eL0jPMvpGfV9Enwm23 X-Google-Smtp-Source: AGHT+IHRpFiZVEo7/K7PgHPQrfJ5NsN0ylYQoOjTFWcy608rBctTB3D9jbsBOaY5j+ZCMyITx+wXZQ== X-Received: by 2002:a17:902:e804:b0:202:4b65:65af with SMTP id d9443c01a7336-20546b35a4cmr232258135ad.52.1725560520735; Thu, 05 Sep 2024 11:22:00 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.21.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:00 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Pengfei Xu , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 09/26] xfs: quotacheck failure can race with background inode inactivation Date: Thu, 5 Sep 2024 11:21:26 -0700 Message-ID: <20240905182144.2691920-10-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 0c7273e494dd5121e20e160cb2f047a593ee14a8 ] The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_qm.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index ff53d40a2dae..f51960d7dcbd 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1321,15 +1321,14 @@ xfs_qm_quotacheck( error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); - if (error) { - /* - * The inode walk may have partially populated the dquot - * caches. We must purge them before disabling quota and - * tearing down the quotainfo, or else the dquots will leak. - */ - xfs_qm_dqpurge_all(mp); - goto error_return; - } + + /* + * On error, the inode walk may have partially populated the dquot + * caches. We must purge them before disabling quota and tearing down + * the quotainfo, or else the dquots will leak. + */ + if (error) + goto error_purge; /* * We've made all the changes that we need to make incore. Flush them @@ -1363,10 +1362,8 @@ xfs_qm_quotacheck( * and turn quotaoff. The dquots won't be attached to any of the inodes * at this point (because we intentionally didn't in dqget_noattach). */ - if (error) { - xfs_qm_dqpurge_all(mp); - goto error_return; - } + if (error) + goto error_purge; /* * If one type of quotas is off, then it will lose its @@ -1376,7 +1373,7 @@ xfs_qm_quotacheck( mp->m_qflags &= ~XFS_ALL_QUOTA_CHKD; mp->m_qflags |= flags; - error_return: +error_return: xfs_buf_delwri_cancel(&buffer_list); if (error) { @@ -1395,6 +1392,21 @@ xfs_qm_quotacheck( } else xfs_notice(mp, "Quotacheck: Done."); return error; + +error_purge: + /* + * On error, we may have inodes queued for inactivation. This may try + * to attach dquots to the inode before running cleanup operations on + * the inode and this can race with the xfs_qm_destroy_quotainfo() call + * below that frees mp->m_quotainfo. To avoid this race, flush all the + * pending inodegc operations before we purge the dquots from memory, + * ensuring that background inactivation is idle whilst we turn off + * quotas. + */ + xfs_inodegc_flush(mp); + xfs_qm_dqpurge_all(mp); + goto error_return; + } /* From patchwork Thu Sep 5 18:21:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792795 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 858B1187357 for ; Thu, 5 Sep 2024 18:22:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560524; cv=none; b=A8CHk0GAhCQIQJRZ4FEurZSMdapWeB7Q7B/Mn3S+mJL2jVS+Z0eibU0fKiAH88ojEixXaP+pzTtq42VfunCQFegV2l6WMgowl3m8SZc0DVGpRR5h0+n5ixpCgvGfrXnVAcZLonbKthe773Sl98sf57Dl7D51/1K0nAQDFFfR7NE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560524; c=relaxed/simple; bh=fNxF1A85GQ3tX99uYdvNeatQYgz/N0TrtfWfvLV0184=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ux/GeDiLYbotSG15yTGrXTSyVFzVEFxPQHMWHy7PiVX7UguAmqTiYkkt3YQRTMMNZHADId8tYIKEzy6hhUQYHpz00W4GAt51rcz2b/bHOEIEMX9MQpn6DNvXA3ZnUVJrxPIIe7QpHKQerTKeGDIRjzeiZFc7ZaUFxV8uHJghgP4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WRxHNjKG; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WRxHNjKG" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2055136b612so12351855ad.0 for ; Thu, 05 Sep 2024 11:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560522; x=1726165322; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=41dGirSTLjJ1lRlnioWURooeGcIA7fc6MXk3B/N6dEA=; b=WRxHNjKGH2w/pEmlhwxpNmLpQpd5ReNDuj6iecXeQdCb1bk3UnksldRJyf/TejJhaA KQxh5mljVkt5bAlYFtzsujbzjlKP2Ua1gXvF/dxb7Go9llYWSjz31pbLHUZmNFKRpeAC 0lgtuShHLPYQRM1lYgTl3ugIlH3NDzzv+Pi+H7u7w03EYM7InRwSVQjtrbpKTSDKXqh2 G4wMFQAg9Se9nMTJilLfUTPTMz7e4VjSKYbae0TXG76Ke53GAJ7S6kMQ2uRF9EfDeQCu l+CSQynec1R8+hPATwOsCVBGtaoBiWvAxTh9GeZBBZQYKkW3zkiJa2bKlm4GbIuH6clI wjXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560522; x=1726165322; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=41dGirSTLjJ1lRlnioWURooeGcIA7fc6MXk3B/N6dEA=; b=mmsHgViOOZpjRptMaPWINvep+6fRDkWjeV5ZN99DbVA/twIkLewLDIdKYGAkk4L1sM ZWbCgUnK+73b1suBkbkL+8/5SREWhB1Kg2Qg+YYjlACm3lX5ZsisHGLa9LCZ5rx0NXgw 9AIuiUIzShf2jRdBAUdtPYaTRK/bz8ZyqEAMi4br9RCCfjnC+JMyux0ytDzxTSDyoo5T bG8cIpFBal9wiGsXdM59aAOM1OIfuVfIn2uRURRa4FMqOFb8F2CEKBaLouLxTzoZ8Fus gYmf3+D2+LLXY5TvyxJ2hMfU+FWEDIMCgShZwVyGX8mtuP1vupHpkeXqUWSWcTQ7EOSR jIvw== X-Gm-Message-State: AOJu0YxJPGHhOxHVADmz0TzFBqtYqKzmTnUEn73mFwcYpPe0M7RdCyDK bd8ulqUA+KqCtv4JVOLGwj60kkneONTSRemvxz3IwCKiEUmJysjBnsLT81ap X-Google-Smtp-Source: AGHT+IE+l+2mPRSkVQVc3j74Eny1N8vw5GLGknU28tvyZwWV9Gh6cvdass/Rn5m+Q5PT0nGT8E8S3g== X-Received: by 2002:a17:902:da82:b0:206:9ab3:2ebc with SMTP id d9443c01a7336-2069ab33684mr131494255ad.47.1725560521704; Thu, 05 Sep 2024 11:22:01 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:01 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Ye Bin , "Darrick J. Wong" , Dave Chinner , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 10/26] xfs: fix BUG_ON in xfs_getbmap() Date: Thu, 5 Sep 2024 11:21:27 -0700 Message-ID: <20240905182144.2691920-11-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ye Bin [ Upstream commit 8ee81ed581ff35882b006a5205100db0b57bf070 ] There's issue as follows: XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422 RIP: 0010:assfail+0x96/0xa0 RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000 RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0 R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c FS: 00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfs_getbmap+0x1a5b/0x1e40 xfs_ioc_getbmap+0x1fd/0x5b0 xfs_file_ioctl+0x2cb/0x1d50 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happen as follows: ThreadA ThreadB do_shared_fault __do_fault xfs_filemap_fault __xfs_filemap_fault filemap_fault xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag xfs_getbmap xfs_ilock(ip, XFS_IOLOCK_SHARED); filemap_write_and_wait do_page_mkwrite xfs_filemap_page_mkwrite __xfs_filemap_fault xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); iomap_page_mkwrite ... xfs_buffered_write_iomap_begin xfs_bmapi_reserve_delalloc -> Allocate delay extent xfs_ilock_data_map_shared(ip) xfs_getbmap_report_one ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0) -> trigger BUG_ON As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's small window mkwrite can produce delay extent after file write in xfs_getbmap(). To solve above issue, just skip delalloc extents. Signed-off-by: Ye Bin Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_bmap_util.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 867645b74d88..351087cde27e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -314,15 +314,13 @@ xfs_getbmap_report_one( if (isnullstartblock(got->br_startblock) || got->br_startblock == DELAYSTARTBLOCK) { /* - * Delalloc extents that start beyond EOF can occur due to - * speculative EOF allocation when the delalloc extent is larger - * than the largest freespace extent at conversion time. These - * extents cannot be converted by data writeback, so can exist - * here even if we are not supposed to be finding delalloc - * extents. + * Take the flush completion as being a point-in-time snapshot + * where there are no delalloc extents, and if any new ones + * have been created racily, just skip them as being 'after' + * the flush and so don't get reported. */ - if (got->br_startoff < XFS_B_TO_FSB(ip->i_mount, XFS_ISIZE(ip))) - ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0); + if (!(bmv->bmv_iflags & BMV_IF_DELALLOC)) + return 0; p->bmv_oflags |= BMV_OF_DELALLOC; p->bmv_block = -2; From patchwork Thu Sep 5 18:21:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792796 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9412718859E for ; Thu, 5 Sep 2024 18:22:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560525; cv=none; b=B/mx7aChsMhnW8Zs8GUjiTTZAphdjXQtoUyWebv2tHLC8qMqQj3Bv2BVV8eg0IFS1EWg5yt0kcqtkxFrwdo6WzKAMGv9MCaqaEk8tkp2YqffEnlPN66xHhUyWmGcAdbvTIpPMKpvTOLUDs1DhQsWoaupUUFhwnWrmP7p3v6w5m8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560525; c=relaxed/simple; bh=pfPrTiSwv55fb6SWEnAa3GIdTeUVnZOqfCPfS1nLuwQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eceJgC1Yl2+GgrOFeZDpKYEIdvtyYb8OTydX6nnlNI95zyJkfXvE4pZ/FY840SOm0+NTPnFhUdndvMj/MH+Bym+zdjCcQmbV/pzaIyyQxpVnD75/A++nD7vS3Z+QfxSIX6KHxwQtojMr2ktyb3vPS7bTmfUHM+aXXaSu3DDapjk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nt6OCuQr; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nt6OCuQr" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2068acc8b98so10813825ad.3 for ; Thu, 05 Sep 2024 11:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560523; x=1726165323; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y9xN5bPMgo2WHZh2fICHFeHRggcCb1gTrZfcU7FHyI4=; b=Nt6OCuQruQqrcZvcy+yya1AUH+orsRmA6kPQ9R11f6qdu0LhJXI/1Zyn/91ENDjLo8 xlQbiEWvy9l6wAuTnfsSQtZLklZ2+3hTsQJSoIKaO4xkqOFU33jGyVJcmfHrRATEtA5S ssdJ+Vkp89ah7GeXgRGHofqNfneJACPJyrQO8FRfdrP4CA92kQDoHmFhvgUA1CTfaiTt 1XWVO1DD5BBrzB6vcq95HJpK0cHtHoqxQMPfBkH9q3uWpbfWc6MYmTMbDG4Op5XfwqzW rdyjR9mSCz9AuT5gv0N9BJ4neGztLVABLZdtImEwSNTVZl/gx6Apzb54eVVZL+SrXHt5 z/LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560523; x=1726165323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y9xN5bPMgo2WHZh2fICHFeHRggcCb1gTrZfcU7FHyI4=; b=oMzAzuCr5Mm0bMwjdd5u5qvqSfXa3eHZ/5fSnnIuHJR1YmN8iqCUYk28611TKfAHVI 4pjqGZXPgobPHdHz3IMFK/ysgVkjLNEBQ57Qo2H7h+gpJavJlk5h0990BUmfy4m7mDZ7 K29DF7D+gwN84XcvKAuttmfcT+Gc4V31mz0WXo3ojVn29luc/SMtX/MATaZ2HoXWzgpM CfG/8iwZP8/QznQEubpWSVHiJu9r8WO0iGlyPcaAa89x4qrBa2g2xAhqZ6HLU+0q6AvO fIzJhodWdxPVXzqVKPawiOnpSxgIaNK/XjBzIVblUS7DSBIsjsMSIXZldaAH9xDu0sS0 Yf3Q== X-Gm-Message-State: AOJu0Yz/KXYE016rtsyP8CDvOemKUD7pPDt8hlpQ3w3Q8LU2rF79kkHG l2VJ4eGngqU003uyrs1RyeNlm3qbIzGVf52axjktOtT97hazuYVhd+jRrDR0 X-Google-Smtp-Source: AGHT+IFAA9xU2PTcqlGCVHC+KPCOecu8nc2XYoy2daipixbC9UqoJF/ih2P3KswXioliWXWaneT/7Q== X-Received: by 2002:a17:902:e544:b0:205:5bfd:13ae with SMTP id d9443c01a7336-2055bfd226cmr212915475ad.35.1725560522596; Thu, 05 Sep 2024 11:22:02 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:02 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , yangerkun , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 11/26] xfs: buffer pins need to hold a buffer reference Date: Thu, 5 Sep 2024 11:21:28 -0700 Message-ID: <20240905182144.2691920-12-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ] When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index df7322ed73fa..023d4e0385dd 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -452,10 +452,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -470,13 +478,30 @@ xfs_buf_item_pin( trace_xfs_buf_item_pin(bip); + xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); } /* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -493,24 +518,18 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin(bip); - /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters); - /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active. + */ + if (!freed) { + xfs_buf_rele(bp); return; + } if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -522,6 +541,15 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin_stale(bip); + /* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will @@ -538,16 +566,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); } STATIC uint From patchwork Thu Sep 5 18:21:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792797 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E225F14F125 for ; Thu, 5 Sep 2024 18:22:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560526; cv=none; b=jY/hiwUHXhDQ96ddYK95ztllAFl4RANAWEbjczcjRwvJBgZUdkjfU5H24vmxQkS+KcnnGaNx9nbITC5RRVnojFbgsKm/EwFWV7SWrdwUn3utLJLgpQihkjnTAT0aiSHd23uqNRsyd7OouEouTqXgqvh1UnWbtokDK/ZIfoqmmgo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560526; c=relaxed/simple; bh=ifnYNt/+jlhTAFtZBjhQEqo4IuNDS0hZz0y07syNYcY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hqwGfhcBCclx8gQgK6lVApMyQYFS7NqtMVPFDcsR5k75cyyLZQtzk26QDK4H5k5iEF3PPeMjw+gGhXzkPdwD4V0yLBQRWQqcwEPc61QKWprnbPcC2P1aRTqLWBX5vlD8HrGyEjQmyASwFyBiqTI0wDy1nPMun20mhawWQvWFolQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OI8s9UG/; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OI8s9UG/" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-718a3b8a2dcso138504b3a.2 for ; Thu, 05 Sep 2024 11:22:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560524; x=1726165324; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1y5ANoM1rABh/Nr7pq0cLc0YXYWTafjj08fvWhSPka8=; b=OI8s9UG/QP9JxWjzuXhDmMzlo0yV8ne30pzgAB7+XejkWegotMKsyfOdcXRAsHvKZ3 tX+6Jbrh2J5MaDbVP9hUmOg20IUF3RFIZXW7YuZj/RhkOIa3hE39apliinV3K44+Nk8V W2b11kxIK/J8Q+rHzYMC/IFoIVtGDcEIpw46CgUNHgMEY9nNQDEB7m+Vp8I3qbbp7YWf Kyd8cds/LoKAdNUsQABTAh9KOeJlmAP3OJw52hzQXevgY1Lsz9dtPfeFPxxem5tphct1 uVORp2pAWfG85cxlD0WKdX8dlLccYOBRD3kaJnxZF7cxHRwGELY/vYbtLTDyWxnU5tM4 F1Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560524; x=1726165324; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1y5ANoM1rABh/Nr7pq0cLc0YXYWTafjj08fvWhSPka8=; b=Nrq4xl8oU85DxHoSh8o3YLZlAJAj0Ev3gKFUMDIHOvQN1wwkTu2t1SFgQee6JT8zEq x8+5h3hPnLnM3ympWJC6vqJswtGE6OATI5X0aPBulMkX+4SNFRU0DTIHquD//sY6Jb6/ Q24EMABZ92FP5vRrvLquTqBC+gTlN3RDntO2hyJhkKQgO0AxRnMU1buPrGgnlQQTT5QT XtdSe6F8CFjzQsLfP7io81e+9SnZnTG7NyYvyskCVKLcbg+0X85vB1fP4V1UCpDApH/C yBioNY1DYoXovaFVEkh+NwaWdmFwaJYAjZdWgDW9pcOuW/xrbosNR/ixtxruSxmWKk5y AeXA== X-Gm-Message-State: AOJu0Ywj7segNklTHBFrnYIvrmVfcgNZvynEaH1mclfkC6rGM/q8owyq 7gGGytLb5IZET/257HJGr3w2reFvy3Jcqr4Tw4I9dXtGf6KSWHHBdMwwfGCX X-Google-Smtp-Source: AGHT+IEK4TSX1oJzat2uLwusqmJRR1FSR0PpJdno+g65ghyjFjjBDaWUC2jX24RBi9bPjvasYQ0DLQ== X-Received: by 2002:a17:902:d50d:b0:205:4a37:b2ac with SMTP id d9443c01a7336-2054a37b550mr160046115ad.34.1725560524043; Thu, 05 Sep 2024 11:22:04 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:03 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 12/26] xfs: defered work could create precommits Date: Thu, 5 Sep 2024 11:21:29 -0700 Message-ID: <20240905182144.2691920-13-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit cb042117488dbf0b3b38b05771639890fada9a52 ] To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_trans.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 7bd16fbff534..a772f60de4a2 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -970,6 +970,11 @@ __xfs_trans_commit( error = xfs_defer_finish_noroll(&tp); if (error) goto out_unreserve; + + /* Run precommits from final tx in defer chain. */ + error = xfs_trans_run_precommits(tp); + if (error) + goto out_unreserve; } /* From patchwork Thu Sep 5 18:21:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792798 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35064183CB8 for ; Thu, 5 Sep 2024 18:22:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560528; cv=none; b=Z0KINtmhIeUXcga2FEMBy2/GrnMoxFNp574tB7CI6a6AV3Ur7ztuIQTTN3C2SrBFIdcQk33SmLaoqxNnxMZuqy1nvzCl9YQlCjGCbtXAeFCgxhVlzixtAU0vHmJmLoZaaqbCCFMqRL60VztexaZGYBwlUAfVpBvySqLUFYOMmnU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560528; c=relaxed/simple; bh=cXbR2BrL28+Nv5fe5fo1wP7SCaWROq27w01fNqV/b28=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bvZTwX2+yncyCkNLBenFlek3n8mXuRqPHoC+Jqu7LQWISwUDVA/zaUSMFdSdoPWZojx5cEKHQaWQRFKsZPkEQZAjq52jrrT4RvPP+RJEehDE9WXECf8wx8m+wywlgePDlJhgJywdpKINrQv9otC8iNNFRKjLPha3hFXnRw+NJmg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nHB4Cki4; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nHB4Cki4" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-7cb3db0932cso953768a12.1 for ; Thu, 05 Sep 2024 11:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560525; x=1726165325; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QVqUYPIWdP5eCegjpjBL0SsLMXeNqCOKWFLK7lHVYpk=; b=nHB4Cki4GKviNt1BBWS755WFsxqm8aUFv4c9lYaSoenhDJPN5zl1fn0LD4MQq38dqZ POxevwxO9gNvETmysNzs+c4kR7UkUBpH1TQrLxTQbgfg/zy7YeIaLJ8C/4JfxpJoNZnm VEOqjBfw7xoPXEVAltT8Y5vEA0ZDei/oC6N1Oin9aAQ8OL6mUNVjM1ouuO/cnjP3LqaU Igauj90iGQEUeZWIgj80BPwc1hYU9TUMndgFib7o3F0n9OkXdtLWk8Nq3xqA46EJlEgr H7iHX3S4Ok2zJwrhuFTHj9fPIbPdkdldYm0zGBeV2jpXYvOvYKduSW07ApZ5Wngy0qrT l7Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560525; x=1726165325; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QVqUYPIWdP5eCegjpjBL0SsLMXeNqCOKWFLK7lHVYpk=; b=NRkM0gz80fhEjWuCczA/bD6rnoAVXbwHV8B4E/VQu68vIpQqIoHJPclqIrDIEXp7lk +IVlalFy6Pd9L0fBebSRdoceL6g3Rqg1eCuUTh/3XTR9ALjy4nsX+/DCpMQTPB73mh4P lSh1MaJrxB3EDfjGN+sGbPd7eVxHORLbddPHAwxGun+kJUVFcZOq0f8bN/gPPiBvtL1s FIrltZz7wh2JwaB+au63B4PAIclRQnNA9dV1D53EqJmoCabcDKvWc7/rskATzxtt4+a4 Tv/nGFOz6AIwJlEWuQT+71mZKAoAH8k9Yynge6AOQYUsJB681VKgxaZd9BGWmq0+4GKb SLcw== X-Gm-Message-State: AOJu0Yyqg3lLoTxoSDQW1cqa0twYvdJE+bbqPFRzge6/+iRQ/WljFv6i Lb7zSDNAJQ2BcJjSekHRE5YoEqxjauH+NufuJT51mhf48btTQP7Dobhh/7yQ X-Google-Smtp-Source: AGHT+IEUw1u4CC8nyi7kPw7lftEm1b2BjZqxP5CesqF5AqG84einUTjo34ycRlipAgSpAJ4j4WdS4g== X-Received: by 2002:a17:903:244e:b0:205:5548:9a28 with SMTP id d9443c01a7336-20555489bacmr200370405ad.49.1725560525030; Thu, 05 Sep 2024 11:22:05 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:04 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Christoph Hellwig , "Darrick J. Wong" , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 13/26] xfs: fix AGF vs inode cluster buffer deadlock Date: Thu, 5 Sep 2024 11:21:30 -0700 Message-ID: <20240905182144.2691920-14-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 82842fee6e5979ca7e2bf4d839ef890c22ffb7aa ] Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_log_format.h | 9 +- fs/xfs/libxfs/xfs_trans_inode.c | 113 ++---------------------- fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.h | 1 + 4 files changed, 166 insertions(+), 106 deletions(-) diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index f13e0809dc63..269573c82808 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -324,7 +324,6 @@ struct xfs_inode_log_format_32 { #define XFS_ILOG_DOWNER 0x200 /* change the data fork owner on replay */ #define XFS_ILOG_AOWNER 0x400 /* change the attr fork owner on replay */ - /* * The timestamps are dirty, but not necessarily anything else in the inode * core. Unlike the other fields above this one must never make it to disk @@ -333,6 +332,14 @@ struct xfs_inode_log_format_32 { */ #define XFS_ILOG_TIMESTAMP 0x4000 +/* + * The version field has been changed, but not necessarily anything else of + * interest. This must never make it to disk - it is used purely to ensure that + * the inode item ->precommit operation can update the fsync flag triggers + * in the inode item correctly. + */ +#define XFS_ILOG_IVERSION 0x8000 + #define XFS_ILOG_NONCORE (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \ XFS_ILOG_DBROOT | XFS_ILOG_DEV | \ XFS_ILOG_ADATA | XFS_ILOG_AEXT | \ diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 8b5547073379..cb4796b6e693 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -40,9 +40,8 @@ xfs_trans_ijoin( iip->ili_lock_flags = lock_flags; ASSERT(!xfs_iflags_test(ip, XFS_ISTALE)); - /* - * Get a log_item_desc to point at the new item. - */ + /* Reset the per-tx dirty context and add the item to the tx. */ + iip->ili_dirty_flags = 0; xfs_trans_add_item(tp, &iip->ili_item); } @@ -76,17 +75,10 @@ xfs_trans_ichgtime( /* * This is called to mark the fields indicated in fieldmask as needing to be * logged when the transaction is committed. The inode must already be - * associated with the given transaction. - * - * The values for fieldmask are defined in xfs_inode_item.h. We always log all - * of the core inode if any of it has changed, and we always log all of the - * inline data/extents/b-tree root if any of them has changed. - * - * Grab and pin the cluster buffer associated with this inode to avoid RMW - * cycles at inode writeback time. Avoid the need to add error handling to every - * xfs_trans_log_inode() call by shutting down on read error. This will cause - * transactions to fail and everything to error out, just like if we return a - * read error in a dirty transaction and cancel it. + * associated with the given transaction. All we do here is record where the + * inode was dirtied and mark the transaction and inode log item dirty; + * everything else is done in the ->precommit log item operation after the + * changes in the transaction have been completed. */ void xfs_trans_log_inode( @@ -96,7 +88,6 @@ xfs_trans_log_inode( { struct xfs_inode_log_item *iip = ip->i_itemp; struct inode *inode = VFS_I(ip); - uint iversion_flags = 0; ASSERT(iip); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -104,18 +95,6 @@ xfs_trans_log_inode( tp->t_flags |= XFS_TRANS_DIRTY; - /* - * Don't bother with i_lock for the I_DIRTY_TIME check here, as races - * don't matter - we either will need an extra transaction in 24 hours - * to log the timestamps, or will clear already cleared fields in the - * worst case. - */ - if (inode->i_state & I_DIRTY_TIME) { - spin_lock(&inode->i_lock); - inode->i_state &= ~I_DIRTY_TIME; - spin_unlock(&inode->i_lock); - } - /* * First time we log the inode in a transaction, bump the inode change * counter if it is configured for this to occur. While we have the @@ -128,86 +107,10 @@ xfs_trans_log_inode( if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) { if (IS_I_VERSION(inode) && inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE)) - iversion_flags = XFS_ILOG_CORE; - } - - /* - * If we're updating the inode core or the timestamps and it's possible - * to upgrade this inode to bigtime format, do so now. - */ - if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && - xfs_has_bigtime(ip->i_mount) && - !xfs_inode_has_bigtime(ip)) { - ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; - flags |= XFS_ILOG_CORE; - } - - /* - * Inode verifiers do not check that the extent size hint is an integer - * multiple of the rt extent size on a directory with both rtinherit - * and extszinherit flags set. If we're logging a directory that is - * misconfigured in this way, clear the hint. - */ - if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && - (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { - ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | - XFS_DIFLAG_EXTSZINHERIT); - ip->i_extsize = 0; - flags |= XFS_ILOG_CORE; + flags |= XFS_ILOG_IVERSION; } - /* - * Record the specific change for fdatasync optimisation. This allows - * fdatasync to skip log forces for inodes that are only timestamp - * dirty. - */ - spin_lock(&iip->ili_lock); - iip->ili_fsync_fields |= flags; - - if (!iip->ili_item.li_buf) { - struct xfs_buf *bp; - int error; - - /* - * We hold the ILOCK here, so this inode is not going to be - * flushed while we are here. Further, because there is no - * buffer attached to the item, we know that there is no IO in - * progress, so nothing will clear the ili_fields while we read - * in the buffer. Hence we can safely drop the spin lock and - * read the buffer knowing that the state will not change from - * here. - */ - spin_unlock(&iip->ili_lock); - error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); - if (error) { - xfs_force_shutdown(ip->i_mount, SHUTDOWN_META_IO_ERROR); - return; - } - - /* - * We need an explicit buffer reference for the log item but - * don't want the buffer to remain attached to the transaction. - * Hold the buffer but release the transaction reference once - * we've attached the inode log item to the buffer log item - * list. - */ - xfs_buf_hold(bp); - spin_lock(&iip->ili_lock); - iip->ili_item.li_buf = bp; - bp->b_flags |= _XBF_INODES; - list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); - xfs_trans_brelse(tp, bp); - } - - /* - * Always OR in the bits from the ili_last_fields field. This is to - * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines - * in the eventual clearing of the ili_fields bits. See the big comment - * in xfs_iflush() for an explanation of this coordination mechanism. - */ - iip->ili_fields |= (flags | iip->ili_last_fields | iversion_flags); - spin_unlock(&iip->ili_lock); + iip->ili_dirty_flags |= flags; } int diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index ca2941ab6cbc..91c847a84e10 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -29,6 +29,153 @@ static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip) return container_of(lip, struct xfs_inode_log_item, ili_item); } +static uint64_t +xfs_inode_item_sort( + struct xfs_log_item *lip) +{ + return INODE_ITEM(lip)->ili_inode->i_ino; +} + +/* + * Prior to finally logging the inode, we have to ensure that all the + * per-modification inode state changes are applied. This includes VFS inode + * state updates, format conversions, verifier state synchronisation and + * ensuring the inode buffer remains in memory whilst the inode is dirty. + * + * We have to be careful when we grab the inode cluster buffer due to lock + * ordering constraints. The unlinked inode modifications (xfs_iunlink_item) + * require AGI -> inode cluster buffer lock order. The inode cluster buffer is + * not locked until ->precommit, so it happens after everything else has been + * modified. + * + * Further, we have AGI -> AGF lock ordering, and with O_TMPFILE handling we + * have AGI -> AGF -> iunlink item -> inode cluster buffer lock order. Hence we + * cannot safely lock the inode cluster buffer in xfs_trans_log_inode() because + * it can be called on a inode (e.g. via bumplink/droplink) before we take the + * AGF lock modifying directory blocks. + * + * Rather than force a complete rework of all the transactions to call + * xfs_trans_log_inode() once and once only at the end of every transaction, we + * move the pinning of the inode cluster buffer to a ->precommit operation. This + * matches how the xfs_iunlink_item locks the inode cluster buffer, and it + * ensures that the inode cluster buffer locking is always done last in a + * transaction. i.e. we ensure the lock order is always AGI -> AGF -> inode + * cluster buffer. + * + * If we return the inode number as the precommit sort key then we'll also + * guarantee that the order all inode cluster buffer locking is the same all the + * inodes and unlink items in the transaction. + */ +static int +xfs_inode_item_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_inode_log_item *iip = INODE_ITEM(lip); + struct xfs_inode *ip = iip->ili_inode; + struct inode *inode = VFS_I(ip); + unsigned int flags = iip->ili_dirty_flags; + + /* + * Don't bother with i_lock for the I_DIRTY_TIME check here, as races + * don't matter - we either will need an extra transaction in 24 hours + * to log the timestamps, or will clear already cleared fields in the + * worst case. + */ + if (inode->i_state & I_DIRTY_TIME) { + spin_lock(&inode->i_lock); + inode->i_state &= ~I_DIRTY_TIME; + spin_unlock(&inode->i_lock); + } + + /* + * If we're updating the inode core or the timestamps and it's possible + * to upgrade this inode to bigtime format, do so now. + */ + if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && + xfs_has_bigtime(ip->i_mount) && + !xfs_inode_has_bigtime(ip)) { + ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; + flags |= XFS_ILOG_CORE; + } + + /* + * Inode verifiers do not check that the extent size hint is an integer + * multiple of the rt extent size on a directory with both rtinherit + * and extszinherit flags set. If we're logging a directory that is + * misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + flags |= XFS_ILOG_CORE; + } + + /* + * Record the specific change for fdatasync optimisation. This allows + * fdatasync to skip log forces for inodes that are only timestamp + * dirty. Once we've processed the XFS_ILOG_IVERSION flag, convert it + * to XFS_ILOG_CORE so that the actual on-disk dirty tracking + * (ili_fields) correctly tracks that the version has changed. + */ + spin_lock(&iip->ili_lock); + iip->ili_fsync_fields |= (flags & ~XFS_ILOG_IVERSION); + if (flags & XFS_ILOG_IVERSION) + flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); + + if (!iip->ili_item.li_buf) { + struct xfs_buf *bp; + int error; + + /* + * We hold the ILOCK here, so this inode is not going to be + * flushed while we are here. Further, because there is no + * buffer attached to the item, we know that there is no IO in + * progress, so nothing will clear the ili_fields while we read + * in the buffer. Hence we can safely drop the spin lock and + * read the buffer knowing that the state will not change from + * here. + */ + spin_unlock(&iip->ili_lock); + error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); + if (error) + return error; + + /* + * We need an explicit buffer reference for the log item but + * don't want the buffer to remain attached to the transaction. + * Hold the buffer but release the transaction reference once + * we've attached the inode log item to the buffer log item + * list. + */ + xfs_buf_hold(bp); + spin_lock(&iip->ili_lock); + iip->ili_item.li_buf = bp; + bp->b_flags |= _XBF_INODES; + list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); + xfs_trans_brelse(tp, bp); + } + + /* + * Always OR in the bits from the ili_last_fields field. This is to + * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines + * in the eventual clearing of the ili_fields bits. See the big comment + * in xfs_iflush() for an explanation of this coordination mechanism. + */ + iip->ili_fields |= (flags | iip->ili_last_fields); + spin_unlock(&iip->ili_lock); + + /* + * We are done with the log item transaction dirty state, so clear it so + * that it doesn't pollute future transactions. + */ + iip->ili_dirty_flags = 0; + return 0; +} + /* * The logged size of an inode fork is always the current size of the inode * fork. This means that when an inode fork is relogged, the size of the logged @@ -662,6 +809,8 @@ xfs_inode_item_committing( } static const struct xfs_item_ops xfs_inode_item_ops = { + .iop_sort = xfs_inode_item_sort, + .iop_precommit = xfs_inode_item_precommit, .iop_size = xfs_inode_item_size, .iop_format = xfs_inode_item_format, .iop_pin = xfs_inode_item_pin, diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index bbd836a44ff0..377e06007804 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -17,6 +17,7 @@ struct xfs_inode_log_item { struct xfs_log_item ili_item; /* common portion */ struct xfs_inode *ili_inode; /* inode ptr */ unsigned short ili_lock_flags; /* inode lock flags */ + unsigned int ili_dirty_flags; /* dirty in current tx */ /* * The ili_lock protects the interactions between the dirty state and * the flush state of the inode log item. This allows us to do atomic From patchwork Thu Sep 5 18:21:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792799 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C766814F125 for ; Thu, 5 Sep 2024 18:22:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560528; cv=none; b=E3yM/YY2p5UvmCYALBWbZPlTPJ7wtAAfkvHrjxnU7LmNnWZTKKPCzktg7Pc22Y83z5bZs2V+SMEFyRSrSUOPiaEdizAGPkS5T13XwNatujOWoDzMr1Zrgxyk0SA5aJE6FKp2IjbptSW9ss3butoj7mM7hyRLjNwcuQ30wMyau2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560528; c=relaxed/simple; bh=BbFnDJSkUm3Sdb4VroXXYAhanjg3vkqsQhVamSt+P7g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oaVUX4nMSDivC+9i5xJukKz9AWLk1fmSNnU/qf3MoAiA51a64E66Qd2DtOqQW2HmktkJGsoU88UJ4inAVVWnmvKPN1JuFrKnKOlbnD4SC7JcfxVSVvhyca27QMzs1QRmhQBZLBDMr2DqWnCaiIrWUWz8IR+yu2NMdEUGgWh5OBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hsQ/EMVp; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hsQ/EMVp" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2068a7c9286so10968655ad.1 for ; Thu, 05 Sep 2024 11:22:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560526; x=1726165326; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Pl22X2afz+QBVCMFxTlTVdU7L2XLQCrOsSnRWzmnJn8=; b=hsQ/EMVpA590g0LH90PCC1zAysZUioLrUBHCGcEHXVQFGM6pACEZogHPO6wOtqi5UP C2RD2JqnRuwFzga4MJ/fyicd7UCA7qVp6FhLN0LwuGOoYHGhZIunyVl1kGO8gUpS6pca K9p5IdjFV4riTo7ksthdhsingRWpzamH1b1iZj6F7l0iexmScmg/d++jQggbG9N0U/BU 6Z5q0nzpFOcdP34lWoUmMlng0EnBZKztiIFpOI1/tdsq0i945jknGtzckMML/33H61YF YEA3zhX45LlPnJM49j4dBYGXft6+rDTIrmOpE2DEnHK9Ss1InbBtc0sdWLw6uSBH23aP SEIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560526; x=1726165326; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Pl22X2afz+QBVCMFxTlTVdU7L2XLQCrOsSnRWzmnJn8=; b=CBJAakQZf91n/nMSUwTjQ3m5FJz1TJvLTFBt+t1mwQyk7mwM6tRY9Rt2/ZHZaSGkpv GCUIXTdW2zxGFCrpvZElEa1BmpVKtdVDmZtOU4KFxE2QC4baq7JLPAkrsY+FmgzCrkKd rx7MaHsyxBUdZamH0k0whwcSfs1w3ThRib1It/P/c6xsERYS8PK8Rt8JK7tZcGYlLRVp t82pA2EnezXWZWRj5xRV396F1Gsy19AzPn24zh8RisYGWBtCzNdUL4lmFyHzOQlxHtT8 YSfW5Xd3V4Va2UNcIoCnAO7QDWuKwSONTaKUBqnydSTrqNNGhYSd9MsvRSui5qIyRR35 hSug== X-Gm-Message-State: AOJu0Yx6tLPcgfr6o86cMdIQmvs+8V2TgYeKxmC4tAXi3pMwc1AXDQj+ N489UQBym+iXimivm9vN2M9ao2YElAoC+IWrZXe1PqJS1O9DxugQoVDud6Gv X-Google-Smtp-Source: AGHT+IFyZsgZA52vmjhRYX23VY5eQdGjFmYgDoxDrWg9v/CcbArTzDLRsudXe5cuECkkq614xfv6Tg== X-Received: by 2002:a17:902:d48a:b0:205:73ac:98d9 with SMTP id d9443c01a7336-205841b9fa7mr135970595ad.31.1725560525960; Thu, 05 Sep 2024 11:22:05 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:05 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , "Darrick J. Wong" , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 14/26] xfs: collect errors from inodegc for unlinked inode recovery Date: Thu, 5 Sep 2024 11:21:31 -0700 Message-ID: <20240905182144.2691920-15-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f ] Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_icache.c | 46 ++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_icache.h | 4 ++-- fs/xfs/xfs_inode.c | 20 ++++++----------- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_log_recover.c | 19 ++++++++--------- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 1 + fs/xfs/xfs_trans.c | 4 +++- 8 files changed, 60 insertions(+), 37 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index f5568fa54039..4b040740678c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -454,6 +454,27 @@ xfs_inodegc_queue_all( return ret; } +/* Wait for all queued work and collect errors */ +static int +xfs_inodegc_wait_all( + struct xfs_mount *mp) +{ + int cpu; + int error = 0; + + flush_workqueue(mp->m_inodegc_wq); + for_each_online_cpu(cpu) { + struct xfs_inodegc *gc; + + gc = per_cpu_ptr(mp->m_inodegc, cpu); + if (gc->error && !error) + error = gc->error; + gc->error = 0; + } + + return error; +} + /* * Check the validity of the inode we just found it the cache */ @@ -1490,15 +1511,14 @@ xfs_blockgc_free_space( if (error) return error; - xfs_inodegc_flush(mp); - return 0; + return xfs_inodegc_flush(mp); } /* * Reclaim all the free space that we can by scheduling the background blockgc * and inodegc workers immediately and waiting for them all to clear. */ -void +int xfs_blockgc_flush_all( struct xfs_mount *mp) { @@ -1519,7 +1539,7 @@ xfs_blockgc_flush_all( for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCKGC_TAG) flush_delayed_work(&pag->pag_blockgc_work); - xfs_inodegc_flush(mp); + return xfs_inodegc_flush(mp); } /* @@ -1841,13 +1861,17 @@ xfs_inodegc_set_reclaimable( * This is the last chance to make changes to an otherwise unreferenced file * before incore reclamation happens. */ -static void +static int xfs_inodegc_inactivate( struct xfs_inode *ip) { + int error; + trace_xfs_inode_inactivating(ip); - xfs_inactive(ip); + error = xfs_inactive(ip); xfs_inodegc_set_reclaimable(ip); + return error; + } void @@ -1879,8 +1903,12 @@ xfs_inodegc_worker( WRITE_ONCE(gc->shrinker_hits, 0); llist_for_each_entry_safe(ip, n, node, i_gclist) { + int error; + xfs_iflags_set(ip, XFS_INACTIVATING); - xfs_inodegc_inactivate(ip); + error = xfs_inodegc_inactivate(ip); + if (error && !gc->error) + gc->error = error; } memalloc_nofs_restore(nofs_flag); @@ -1904,13 +1932,13 @@ xfs_inodegc_push( * Force all currently queued inode inactivation work to run immediately and * wait for the work to finish. */ -void +int xfs_inodegc_flush( struct xfs_mount *mp) { xfs_inodegc_push(mp); trace_xfs_inodegc_flush(mp, __return_address); - flush_workqueue(mp->m_inodegc_wq); + return xfs_inodegc_wait_all(mp); } /* diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 6cd180721659..da58984b80d2 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -59,7 +59,7 @@ int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp, unsigned int iwalk_flags); int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags); int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm); -void xfs_blockgc_flush_all(struct xfs_mount *mp); +int xfs_blockgc_flush_all(struct xfs_mount *mp); void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); @@ -77,7 +77,7 @@ void xfs_blockgc_start(struct xfs_mount *mp); void xfs_inodegc_worker(struct work_struct *work); void xfs_inodegc_push(struct xfs_mount *mp); -void xfs_inodegc_flush(struct xfs_mount *mp); +int xfs_inodegc_flush(struct xfs_mount *mp); void xfs_inodegc_stop(struct xfs_mount *mp); void xfs_inodegc_start(struct xfs_mount *mp); void xfs_inodegc_cpu_dead(struct xfs_mount *mp, unsigned int cpu); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 54b707787f90..b0b4f6ac2397 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1620,16 +1620,7 @@ xfs_inactive_ifree( */ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1); - /* - * Just ignore errors at this point. There is nothing we can do except - * to try to keep going. Make sure it's not a silent error. - */ - error = xfs_trans_commit(tp); - if (error) - xfs_notice(mp, "%s: xfs_trans_commit returned error %d", - __func__, error); - - return 0; + return xfs_trans_commit(tp); } /* @@ -1696,12 +1687,12 @@ xfs_inode_needs_inactive( * now be truncated. Also, we clear all of the read-ahead state * kept for the inode here since the file is now closed. */ -void +int xfs_inactive( xfs_inode_t *ip) { struct xfs_mount *mp; - int error; + int error = 0; int truncate = 0; /* @@ -1742,7 +1733,7 @@ xfs_inactive( * reference to the inode at this point anyways. */ if (xfs_can_free_eofblocks(ip, true)) - xfs_free_eofblocks(ip); + error = xfs_free_eofblocks(ip); goto out; } @@ -1779,7 +1770,7 @@ xfs_inactive( /* * Free the inode. */ - xfs_inactive_ifree(ip); + error = xfs_inactive_ifree(ip); out: /* @@ -1787,6 +1778,7 @@ xfs_inactive( * the attached dquots. */ xfs_qm_dqdetach(ip); + return error; } /* diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index fa780f08dc89..225f6f93c2fa 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -470,7 +470,7 @@ enum layout_break_reason { (xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID)) int xfs_release(struct xfs_inode *ip); -void xfs_inactive(struct xfs_inode *ip); +int xfs_inactive(struct xfs_inode *ip); int xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name, struct xfs_inode **ipp, struct xfs_name *ci_name); int xfs_create(struct user_namespace *mnt_userns, diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 05e48523ea40..affe94356ed1 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2711,7 +2711,9 @@ xlog_recover_iunlink_bucket( * just to flush the inodegc queue and wait for it to * complete. */ - xfs_inodegc_flush(mp); + error = xfs_inodegc_flush(mp); + if (error) + break; } prev_agino = agino; @@ -2719,10 +2721,15 @@ xlog_recover_iunlink_bucket( } if (prev_ip) { + int error2; + ip->i_prev_unlinked = prev_agino; xfs_irele(prev_ip); + + error2 = xfs_inodegc_flush(mp); + if (error2 && !error) + return error2; } - xfs_inodegc_flush(mp); return error; } @@ -2789,7 +2796,6 @@ xlog_recover_iunlink_ag( * bucket and remaining inodes on it unreferenced and * unfreeable. */ - xfs_inodegc_flush(pag->pag_mount); xlog_recover_clear_agi_bucket(pag, bucket); } } @@ -2806,13 +2812,6 @@ xlog_recover_process_iunlinks( for_each_perag(log->l_mp, agno, pag) xlog_recover_iunlink_ag(pag); - - /* - * Flush the pending unlinked inodes to ensure that the inactivations - * are fully completed on disk and the incore inodes can be reclaimed - * before we signal that recovery is complete. - */ - xfs_inodegc_flush(log->l_mp); } STATIC void diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 69ddd5319634..c8e72f0d3965 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -62,6 +62,7 @@ struct xfs_error_cfg { struct xfs_inodegc { struct llist_head list; struct delayed_work work; + int error; /* approximate count of inodes in the list */ unsigned int items; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 12662b169b71..1c143c69da6e 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1089,6 +1089,7 @@ xfs_inodegc_init_percpu( #endif init_llist_head(&gc->list); gc->items = 0; + gc->error = 0; INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker); } return 0; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index a772f60de4a2..b45879868f90 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -290,7 +290,9 @@ xfs_trans_alloc( * Do not perform a synchronous scan because callers can hold * other locks. */ - xfs_blockgc_flush_all(mp); + error = xfs_blockgc_flush_all(mp); + if (error) + return error; want_retry = false; goto retry; } From patchwork Thu Sep 5 18:21:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792800 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B892D183CDB for ; Thu, 5 Sep 2024 18:22:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560529; cv=none; b=r+vSFXH/buqs7D4Ws0pfWzfVs50nKgt0V3BG0zenozjpekaxDYbj/P9e+Ws/570jS5ZYv1SkSBlqpS8bjA6ID0/pVNlhO5O4exikGvhexNhBPuhKKfUEUcXlB7neI6qT+FDiFiFKEZ/RUuCcfHowDjVTM2DvfjmrM5eUGKwpAJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560529; c=relaxed/simple; bh=pFP4c3h157ZI7rFoxzb8loQz6PI2aF7QxliRY5byF3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uMGBRCtWiKxfvQGDAMEliPTZUG8Bg9MIgWsXrfxG8C0vmWXBH/P0CMkThjAkE9o7La8DN1nifoGXcMANruxqPz5ZivPjqZfxwqDYG6OC//czWlboJBAWq5EcosVs1Ogdx6FuICH9ZC9B18KxribCDvUDl2sdFgpv1T4KRSTL7cc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J8hR6UHQ; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J8hR6UHQ" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-202508cb8ebso9197935ad.3 for ; Thu, 05 Sep 2024 11:22:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560527; x=1726165327; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=F2xdGx0nCi5gLiXa84V2u8HtlYeybnaUcLkdjfOqmiY=; b=J8hR6UHQCl/QIpW5xjII+SPGITcL3xngVk3/hxs9MS1JwpP+zDHfkdBfHrEDNSo5Sm A6dUu4o2dtQN1GI93dlnPC6gwdlF2znc2D3dhBmpDgoc7tgOJSC2+rUMjhrz+uq51wfb UvkJmgB66gQ7NahVtKQkIcad1KupR3/YbB+1xhmaaB4mtiJIndTgBWKFAUP5bU7x6ypK D+gjskEr5zKJNBKkoCQsI2kSGRauCgzTJ8MiPMeDq+l1ELWLF1/A+89Yy8gJudKk2O6E EuEtAraeVfpp4pzZBFYq8DzNpJO6PSw2E/a40z+rYKRlKCiN3yEr2sqZEnYBxBBWeeUW oX2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560527; x=1726165327; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F2xdGx0nCi5gLiXa84V2u8HtlYeybnaUcLkdjfOqmiY=; b=fP3iWoy98Qsu+pMs9/045S0P6t4ruU81fJGBSZpdKH5xB/rJuiIQpQjNsLQ6LTq72r yJcx9AHxSx0duG3GpTbYDkfJnTHTudx1is8+796kSVyD8Y4hl/oLbyqtGzlRokURcVrm yzp2ZU8MlE+/0WUAIRBV6DJxm/iSw1KrBSwcahQI/hQbUMbCb8HZLmlI4w/8Ckd1XU/f xMqm8dFRb9CwPK4vUU5iS5soBWc8V6Kiqv1vuV6Z7go3Dc2pUmPDCYdp0YlQg7YOJVO4 op0PO/kGDvws0K0SgvnzM4rZTuU5CqNGOPpA1amQTuDiBeUJjRvDqgeaRm/4SYfh4U7Y PfIA== X-Gm-Message-State: AOJu0YxSJb/1IeOkfNEFf/NGEDX8dCK3TiXY5jCJX0vSvACIulbzOIJi HnFwl/69f9oyPHBkgDojgk+3+0/18dryln7o9YUobOEbQDb7UwQjEt4MOMRR X-Google-Smtp-Source: AGHT+IHPVleOD2LiiMCANBVggVOc7Rj7PiappiFM0Hl5SDVFXaSY9cswP0zlsMjyb6ekhaOdz+dc+A== X-Received: by 2002:a17:902:f60e:b0:205:500f:5dcc with SMTP id d9443c01a7336-205500f5ef1mr250974425ad.40.1725560527057; Thu, 05 Sep 2024 11:22:07 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:06 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Long Li , Long Li , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 15/26] xfs: fix ag count overflow during growfs Date: Thu, 5 Sep 2024 11:21:32 -0700 Message-ID: <20240905182144.2691920-16-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Long Li [ Upstream commit c3b880acadc95d6e019eae5d669e072afda24f1b ] I found a corruption during growfs: XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 __xfs_free_extent+0x2c1/0x3c0 xfs_ag_extend_space+0x291/0x3e0 xfs_growfs_data+0xd72/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_error_report+0x93/0xc0 xfs_trans_cancel+0x2c0/0x350 xfs_growfs_data+0x691/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2d86706577 The bug can be reproduced with the following sequence: # truncate -s 1073741824 xfs_test.img # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img # truncate -s 2305843009213693952 xfs_test.img # mount -o loop xfs_test.img /mnt/test # xfs_growfs -D 1125899907891200 /mnt/test The root cause is that during growfs, user space passed in a large value of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is too small, new AG count will exceed UINT_MAX. Because of AG number type is unsigned int and it would overflow, that caused nagcount much smaller than the actual value. During AG extent space, delta blocks in xfs_resizefs_init_new_ags() will much larger than the actual value due to incorrect nagcount, even exceed UINT_MAX. This will cause corruption and be detected in __xfs_free_extent. Fix it by growing the filesystem to up to the maximally allowed AGs and not return EINVAL when new AG count overflow. Signed-off-by: Long Li Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_fs.h | 2 ++ fs/xfs/xfs_fsops.c | 13 +++++++++---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 1cfd5bc6520a..9c60ebb328b4 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -257,6 +257,8 @@ typedef struct xfs_fsop_resblks { #define XFS_MAX_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_BLOCKSIZE) #define XFS_MAX_CRC_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_CRC_BLOCKSIZE) +#define XFS_MAX_AGNUMBER ((xfs_agnumber_t)(NULLAGNUMBER - 1)) + /* keep the maximum size under 2^31 by a small amount */ #define XFS_MAX_LOG_BYTES \ ((2 * 1024 * 1024 * 1024ULL) - XFS_MIN_LOG_BYTES) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 332da0d7b85c..77b14f788214 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -115,11 +115,16 @@ xfs_growfs_data_private( nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); - nagcount = nb_div + (nb_mod != 0); - if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) { - nagcount--; - nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks; + if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) + nb_div++; + else if (nb_mod) + nb = nb_div * mp->m_sb.sb_agblocks; + + if (nb_div > XFS_MAX_AGNUMBER + 1) { + nb_div = XFS_MAX_AGNUMBER + 1; + nb = nb_div * mp->m_sb.sb_agblocks; } + nagcount = nb_div; delta = nb - mp->m_sb.sb_dblocks; /* * Reject filesystems with a single AG because they are not From patchwork Thu Sep 5 18:21:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792801 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4A7E14F125 for ; Thu, 5 Sep 2024 18:22:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560530; cv=none; b=g+Bd3lpzBPSeeN4PTVHt9eK7cgAQUKpbKc4SYmPIYf/LnYBv+L4cqxIzuJ74C6WcIIc2X8o6PIMekuDZjrBCmY57aFvhxpp3SqFtoq4tHyrS08kw9c6WvhjpU1E3XIvmWsXvJk5MZPbq0jMr9fC0qujciVeu+QQojGWdqw8l5bo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560530; c=relaxed/simple; bh=TPY+vj0/sdNOV2JkVbbDkmRU7H9VqyVffPRcrPrRo04=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LjtaF854R7Yk/yqXj+MeI9BHXndLQM14lrJC7z9JLwRnCMIRhJR8VZ8SOT03CaVujffw4suj3UDEWy9kd/q8YE9dMLStF6xYCxkqIl+EJZprYRObXORHfW7ym+FEcBE+9PEY8JaKGZyYTJiV64RB5vTiEmd7uHnUjVyXMWLUs40= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=P5cvfO3O; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P5cvfO3O" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2059204f448so10328445ad.0 for ; Thu, 05 Sep 2024 11:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560528; x=1726165328; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SEHVUfEZxglyCvDS+ChOrn7L1eN3AwmIAAdmlinPo8Y=; b=P5cvfO3O9cJteoeYtJJknRb9psy3FyDAq9+8sezQpzUCNuLKmw/pDsUfGoqglHSTlW cP/L/9bhH0/zJjQ6DUK+uWuoQb+m6HdpdhU96MiUma5xh6mIqVog0QPtTV6bUNc8tFPT 8oYkhqwhFtRntMtD8IAHIFte4jZubZO9GufGQoWrypOnLhqFdNX3NkAaROPe8QrVHNi0 4bWoC5I9Ha03OOXKy+HydaMJJSOXSQ7OkvKI0y6R9CXhnKsnPLGB07FUg7DgSGHvQ73n 1axh9S3ZRYmnriNI+3Y1y7mQWOOHYUmUXesxKB2yHe7O+ViJF1jn5uSyV2QLvXPHlvd5 8SFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560528; x=1726165328; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SEHVUfEZxglyCvDS+ChOrn7L1eN3AwmIAAdmlinPo8Y=; b=oHErLoQ65TZeZq1TDbDemrnuVi1eMsp8QrF2sakA19LL3FbwiH63gXyss8aJ1epijy YyP5rMPX6Oel6j2UzrezlpR+Fu8rk4e+rqlCeCTB/Y1DNUbOdCYpL5Zn4WeXgBKlUuP/ 4IhCAdoAXQkFBImzmTxaj/zBTElKHnLNl76F0p+Ql5u1ox8JJHwbnuxhyq7/RdWypwaN tXHOmOiD201bF8r+tJmplFYKPohuNRmaAGwi3ECMiYrp2tLUMbZX4NcmQ08B9Jkq5RfG f4oQ9ImoD/O2iTr0eQ5g/sOO9fUY04CH/MHvIQwrGJzpevL/0/NRAiEzqMQmrUUdVawL J71Q== X-Gm-Message-State: AOJu0Yyif+uv37N8XxBbEasI4DNZXx+E/0W118GbMWtct0gsoBLpNg5y yWgsFpb4+2yIrowvXNzGd6Uw92C5iO3e0SAm8PKka5Dx2K8zYZN79LhdYwWM X-Google-Smtp-Source: AGHT+IEcek0PekIVBRcMU58WUYY0pS9sWSDMv02y2RkzU/Jyd5+1uA1lR8Fx/1PV1ki9g0bF7In4NQ== X-Received: by 2002:a17:902:f686:b0:205:7574:3b87 with SMTP id d9443c01a7336-20575743c5fmr154821475ad.15.1725560528223; Thu, 05 Sep 2024 11:22:08 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:07 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 16/26] xfs: remove WARN when dquot cache insertion fails Date: Thu, 5 Sep 2024 11:21:33 -0700 Message-ID: <20240905182144.2691920-17-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 4b827b3f305d1fcf837265f1e12acc22ee84327c ] It just creates unnecessary bot noise these days. Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_dquot.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 8fb90da89787..7f071757f278 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -798,7 +798,6 @@ xfs_qm_dqget_cache_insert( error = radix_tree_insert(tree, id, dqp); if (unlikely(error)) { /* Duplicate found! Caller must try again. */ - WARN_ON(error != -EEXIST); mutex_unlock(&qi->qi_tree_lock); trace_xfs_dqget_dup(dqp); return error; From patchwork Thu Sep 5 18:21:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792802 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDD59183CDB for ; Thu, 5 Sep 2024 18:22:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560531; cv=none; b=MJCxWtleEhDJyM4RFO7wzRB1YSeHrJ0W3a7WHekRTE35u8GXr4ZBGiEok0JAqYAs7iGZALv0mxtuN7duVvchmovoF6MEzMyDvsyBIluKLdpEYWhrAd8pza2VFq2QXWYw/4GciO6guzfDSJfbeZNlHql0FrNrzPhai7TX7pB11xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560531; c=relaxed/simple; bh=tvVjVyX4Co3HHAUVk7G8k6na1MHpoiE1Fsg97Fi+MMk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FJzmnOTj4zIyi7IHbwkYjF+853zCBdfoV05x8kN4Ee0+v8wlItOP1sowIZI9ApbaNzZ5GgsnPo10hai6rkfk+7b6JpnVIHMkaH3VwwgpEXjrc2NANNGqEhJKkAAbhu9lubAHYb4tDhpzLnaypO6stusuvR/3olBn/FVfFOt5rqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eJqdmftA; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eJqdmftA" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2054e22ce3fso11514255ad.2 for ; Thu, 05 Sep 2024 11:22:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560529; x=1726165329; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dxjIPvBEzZwS3nqnnAjs8rwzUoANMhBnXbOCMyUPgQ4=; b=eJqdmftAXLga2LYtToDCCaopsEKa7bh2xaIgeX/ocuvwJCmrdynJczBqdGhtI4Nlmr WQvRiHU2jFwj9SyicVFVJDIwlEMVz9fBzNUTB4S/ULJ2vBvCjFiOtFHfqUIGuVUyiFfO 9YTwLghufovyDCrcjzdDfHsDkchbmh9a49XGrULeLlkMblo0ungZ/CtQ/NaCrxqOia95 WJI4TCov5NhYN1Hp/claOUjht1EjBxFwn/LCuOmU3RHHrBGtDwBzFZYtNWKvdixoVKXF wbpC6CKPfIYLF1cDkDXgR9FFNGt2l4endCbR962UdtLT96G/BSukIY1NJRGr/G8pThek 5sOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560529; x=1726165329; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dxjIPvBEzZwS3nqnnAjs8rwzUoANMhBnXbOCMyUPgQ4=; b=g8BL0ReyQbmkvsPrjrSWC7gpd854S6AqRtUic/uzqGIZ+Ao/U9UeXt9L6eKFfxlswq avBT3NcE95TrhPgtHQ5TTRnnlCdlKFxxAJ2zcL+ecdodtudS5q1mMV3IMV9WMdwDH4jg OfKJVczuGWqHyPONRXo9bkdn4LKqaeFpeYIhrtFMNyoypgHEYAruHxtlWzASdxZh4YqD 7izZcAezRRlckYCorT/i/UA42J7ZxZAfOYl2TPGly9fEWe18hq54aKIJMJ2aQ1Dt6Ley BFPuYCEUNWITaguDl6ocY6avvAAXJhxos100KA9ve0iZ7VNG05MsDyOmSfWXku3OlXrE Sm7A== X-Gm-Message-State: AOJu0Yz1wRhTY13ZULH6Pzi3psO03CSzuKabmz+XuyMraMpQrGgZ1wXo nU8mUWNtP70UIKYpc4DXe+biJ1VhNpQg03/9YhJZ7TZsoHZ46g4stscQiInw X-Google-Smtp-Source: AGHT+IFn0FJMh5Q9o13moSx2rPZQ+o1giCnVncqRWCspIIddgw7+vh36EjLeU7bzeDBwxEfbnilfHA== X-Received: by 2002:a17:902:d487:b0:205:6d54:1ce6 with SMTP id d9443c01a7336-20699b36edcmr108563135ad.57.1725560529066; Thu, 05 Sep 2024 11:22:09 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:08 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Shiyang Ruan , "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 17/26] xfs: fix the calculation for "end" and "length" Date: Thu, 5 Sep 2024 11:21:34 -0700 Message-ID: <20240905182144.2691920-18-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Shiyang Ruan [ Upstream commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29 ] The value of "end" should be "start + length - 1". Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_notify_failure.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index c4078d0ec108..4a9bbd3fe120 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -114,7 +114,8 @@ xfs_dax_notify_ddev_failure( int error = 0; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr); xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); - xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen); + xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, + daddr + bblen - 1); xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno); error = xfs_trans_alloc_empty(mp, &tp); @@ -210,7 +211,7 @@ xfs_dax_notify_failure( ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1; /* Ignore the range out of filesystem area */ - if (offset + len < ddev_start) + if (offset + len - 1 < ddev_start) return -ENXIO; if (offset > ddev_end) return -ENXIO; @@ -222,8 +223,8 @@ xfs_dax_notify_failure( len -= ddev_start - offset; offset = 0; } - if (offset + len > ddev_end) - len -= ddev_end - offset; + if (offset + len - 1 > ddev_end) + len = ddev_end - offset + 1; return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len), mf_flags); From patchwork Thu Sep 5 18:21:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792803 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8DD214F125 for ; Thu, 5 Sep 2024 18:22:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560532; cv=none; b=qI+PtTelV/e+93c3K7IiW0iNyZq1btm/6VGbrnPlKCqbEPG1YMCGmbedcJDdRHkbRNkNLr/aN+3Ip2hJMsEPGHf/nv9pA9DDaZ+aIN4pSBVgGE8+ibU4wMvOLWNKGnCB8yd0y9zxiwLqXLebFa1JMMDXMk1bqmhWkHhyf4ekl/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560532; c=relaxed/simple; bh=KqYz84tmSCy/z6xkkil2O4E6expuu5CUt8rD6eBjTkA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n2a7vQG8qNXLwXigq3LZL3K1weBBKTv3dD4hXzhFMV6BPZJ/jF6kdKTuI25h7ANVyyCZWFxMMu0TtMpRmdXCRnCSitQXDI9TSHv2QB6R5R1d7xEDvl2J0q/gsHCXFj/qIC7BJf6nM3njYC6zuMbTWbc3JSTSZa+wiN9XrYgWrOU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SZb8JRrQ; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SZb8JRrQ" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2054feabfc3so11511795ad.1 for ; Thu, 05 Sep 2024 11:22:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560530; x=1726165330; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RXzf8GYMsUYplRQbR78A5sW8oXfJhyD/P6ow/XfogSA=; b=SZb8JRrQ6ZxTEWqIfERYdRoNLitqmADGmc8hT0WEobhq9nRmbWF19C5bKvfXnsWdLN fxqVqjm/6bikHcFctgqlO8XCZMZZwb9dpZWgY6XjmOHFw+SMHPSLbQ9PRCW7TEyMjYtk rCkwlQacCFkoUPWLTdYzBrEMAtrNgO3+TfNs15uc1DmTE+u0o1xGeoHaxy6aqjyD+tQM YybWpthKl6uvNtmxceNq49a7FnyWi0TDHs3Fnfj/UKdZMzJ0LJ3vhjHVBdsCvzr/gmWs khgiJTQyTyGejYmRcK8f4Lnb20uHbZyMP7DSeNfMimoJ0LeLVgShnUuhURraZLbgerFQ vulg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560530; x=1726165330; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RXzf8GYMsUYplRQbR78A5sW8oXfJhyD/P6ow/XfogSA=; b=L08usn/mbxCOeIjjpX/g091EO3+bypTU/Pei54ByJXhcqs7WM+AELMHxTE5FrrzMNj GRLnyAfKESrCJiWkNtpPuimwz1CHaBuO/jnSrG91RXLhdzhRufNoHjqt6l+4rtc2ch3l ymHV0CTXiFMaY67f9QKCdWFqi1H6wzRqSsM6P/NO2TYtAwJbunUf/+Rn7/pbSz1zOgK5 pUtzmVumCGlTIhaUEqMVXvG9EgEFPcDHswTNnFyVcyFX7c0qXvJ6PlbKr2YNCul+FGJ1 6TqbInpe6z9NOs0zpqQjxRCnyNaoY6Ogjq7vTH2FW3nWcdvy3XncWjS05HLTyixOJGRa A8qw== X-Gm-Message-State: AOJu0YwOc4RUrdRzprq7Gch3v2pgTX0XWzBF8W4CxM+gt2IK3G17lor2 4RcW9Uj7qcR8GZPxbe+rSVldEekW5yZ9wqX7s+Xt2S7iv59KuKxVv4lYzBJd X-Google-Smtp-Source: AGHT+IGclMqWI6gBJOUZoYTsmLNJWXCA7Qu70UTn/e+66hTamoUpcjHizcLjz1BjpGA0l4cVUsqI7w== X-Received: by 2002:a17:902:e804:b0:201:f6e8:637f with SMTP id d9443c01a7336-205443de91emr225080975ad.11.1725560530023; Thu, 05 Sep 2024 11:22:10 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:09 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , shrikanth hegde , Ritesh Harjani , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 18/26] xfs: load uncached unlinked inodes into memory on demand Date: Thu, 5 Sep 2024 11:21:35 -0700 Message-ID: <20240905182144.2691920-19-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ] shrikanth hegde reports that filesystems fail shortly after mount with the following failure: WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] This of course is the WARN_ON_ONCE in xfs_iunlink_lookup: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... } From diagnostic data collected by the bug reporters, it would appear that we cleanly mounted a filesystem that contained unlinked inodes. Unlinked inodes are only processed as a final step of log recovery, which means that clean mounts do not process the unlinked list at all. Prior to the introduction of the incore unlinked lists, this wasn't a problem because the unlink code would (very expensively) traverse the entire ondisk metadata iunlink chain to keep things up to date. However, the incore unlinked list code complains when it realizes that it is out of sync with the ondisk metadata and shuts down the fs, which is bad. Ritesh proposed to solve this problem by unconditionally parsing the unlinked lists at mount time, but this imposes a mount time cost for every filesystem to catch something that should be very infrequent. Instead, let's target the places where we can encounter a next_unlinked pointer that refers to an inode that is not in cache, and load it into cache. Note: This patch does not address the problem of iget loading an inode from the middle of the iunlink list and needing to set i_prev_unlinked correctly. Reported-by: shrikanth hegde Triaged-by: Ritesh Harjani Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_inode.c | 80 +++++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_trace.h | 25 +++++++++++++++ 2 files changed, 100 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b0b4f6ac2397..4e73dd4a4d82 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1829,12 +1829,17 @@ xfs_iunlink_lookup( rcu_read_lock(); ip = radix_tree_lookup(&pag->pag_ici_root, agino); + if (!ip) { + /* Caller can handle inode not being in memory. */ + rcu_read_unlock(); + return NULL; + } /* - * Inode not in memory or in RCU freeing limbo should not happen. - * Warn about this and let the caller handle the failure. + * Inode in RCU freeing limbo should not happen. Warn about this and + * let the caller handle the failure. */ - if (WARN_ON_ONCE(!ip || !ip->i_ino)) { + if (WARN_ON_ONCE(!ip->i_ino)) { rcu_read_unlock(); return NULL; } @@ -1843,7 +1848,10 @@ xfs_iunlink_lookup( return ip; } -/* Update the prev pointer of the next agino. */ +/* + * Update the prev pointer of the next agino. Returns -ENOLINK if the inode + * is not in cache. + */ static int xfs_iunlink_update_backref( struct xfs_perag *pag, @@ -1858,7 +1866,8 @@ xfs_iunlink_update_backref( ip = xfs_iunlink_lookup(pag, next_agino); if (!ip) - return -EFSCORRUPTED; + return -ENOLINK; + ip->i_prev_unlinked = prev_agino; return 0; } @@ -1902,6 +1911,62 @@ xfs_iunlink_update_bucket( return 0; } +/* + * Load the inode @next_agino into the cache and set its prev_unlinked pointer + * to @prev_agino. Caller must hold the AGI to synchronize with other changes + * to the unlinked list. + */ +STATIC int +xfs_iunlink_reload_next( + struct xfs_trans *tp, + struct xfs_buf *agibp, + xfs_agino_t prev_agino, + xfs_agino_t next_agino) +{ + struct xfs_perag *pag = agibp->b_pag; + struct xfs_mount *mp = pag->pag_mount; + struct xfs_inode *next_ip = NULL; + xfs_ino_t ino; + int error; + + ASSERT(next_agino != NULLAGINO); + +#ifdef DEBUG + rcu_read_lock(); + next_ip = radix_tree_lookup(&pag->pag_ici_root, next_agino); + ASSERT(next_ip == NULL); + rcu_read_unlock(); +#endif + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating recovery.", + next_agino, pag->pag_agno); + + /* + * Use an untrusted lookup just to be cautious in case the AGI has been + * corrupted and now points at a free inode. That shouldn't happen, + * but we'd rather shut down now since we're already running in a weird + * situation. + */ + ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, next_agino); + error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &next_ip); + if (error) + return error; + + /* If this is not an unlinked inode, something is very wrong. */ + if (VFS_I(next_ip)->i_nlink != 0) { + error = -EFSCORRUPTED; + goto rele; + } + + next_ip->i_prev_unlinked = prev_agino; + trace_xfs_iunlink_reload_next(next_ip); +rele: + ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + xfs_irele(next_ip); + return error; +} + static int xfs_iunlink_insert_inode( struct xfs_trans *tp, @@ -1933,6 +1998,8 @@ xfs_iunlink_insert_inode( * inode. */ error = xfs_iunlink_update_backref(pag, agino, next_agino); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, agino, next_agino); if (error) return error; @@ -2027,6 +2094,9 @@ xfs_iunlink_remove_inode( */ error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, ip->i_next_unlinked); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, ip->i_prev_unlinked, + ip->i_next_unlinked); if (error) return error; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 5587108d5678..d713e10dff8a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3679,6 +3679,31 @@ TRACE_EVENT(xfs_iunlink_update_dinode, __entry->new_ptr) ); +TRACE_EVENT(xfs_iunlink_reload_next, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + __field(xfs_agino_t, prev_agino) + __field(xfs_agino_t, next_agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + __entry->prev_agino = ip->i_prev_unlinked; + __entry->next_agino = ip->i_next_unlinked; + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x prev_unlinked 0x%x next_unlinked 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->prev_agino, + __entry->next_agino) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip), From patchwork Thu Sep 5 18:21:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792804 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B50B1183CDB for ; Thu, 5 Sep 2024 18:22:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560533; cv=none; b=M0Jse0b5JZToNU46MMoU17cxeODTFle7UuDxqfVcRJy3dZthnTdq2usVVnBQ0MWI9Gdx3m86ZEhGH0oNdK0V20EoG4xAs/w+BgUYinm3boD+zGi0CBbCvIacGd5Nhb/iO8123L8wciOXPSQB4b1RdT4dBjg4ban3aZRpGFqBgX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560533; c=relaxed/simple; bh=NeOssh5ymZtU9jtpZMoN7p+YAqNkBque6S5t017xNpY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u/cRrNkFVoOL2/HNIEYCMbGMIn7tIFQ1umxyAwCnFhS+cSjJiFqhxcrU6YAqB379bqd9vWPsuQfvHAmnVhYCEeeSTrsd7X3cJXIXiWk7tYJmjXbjCN3hTFqp+Yk4uw0N2URrlUrFW/wlBLpjo4SCgo3p2ILm06RCAmVSj4qvUfc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MDk2Bo47; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MDk2Bo47" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-201d5af11a4so10863005ad.3 for ; Thu, 05 Sep 2024 11:22:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560531; x=1726165331; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IRWzOc3+/EMvTO9f5DEXYWHVnsLlTM8gbmfkV3xHmBc=; b=MDk2Bo47tnheNSICC8Abq9mnSk0Hk/yWdloN6CG7F9oKwZqfaji6yCwOsmlG9F5hak mH9VMe480NAbQ+9jbsIh+MlciEXRlrvlhGtNoyi0OBsWTU4rcKy+Ham9v+gn23M1ebad fN+e5RqAmghI27wKoRQA5MDRgZ9oqsxxyXNg1MePXqje2AD/yUuK6ljTpUUdB82ix7S+ dnq0UAgWzQv4IgJFzQbVhzkPO/TboGfJ/0Urg4sJ+NRVdFVxp2LvU8MhdPsPzkImuMfT 2nTFfEOx0wAhU+Xc2JQytG8Jg7p6luPvaNK7aRhYI4y4Zsutc3d4DSWvp9CU6WLEKq+J IENg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560531; x=1726165331; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IRWzOc3+/EMvTO9f5DEXYWHVnsLlTM8gbmfkV3xHmBc=; b=P36KOFzYbH2BHH6fGoTt0W//KCLTxAIShbhjC+vGfKC25+6wxMG1wCQSB6gq9RCpoo /p3OwIMbGf4N667gKuGPyR1j+6dDRzqCgm5T9wscKYdQIcGI15Y/sm1z4fWpWrAL6wa3 K7mx3/6mbC6URhCjodgx6Q3HhH0XqAXbny7FWUcQN9oWlXl+Z6umm8jbw9cNAayE5SGa aCECVTdf7cYvpFMHZd5J+dFIeofdtxqrZiHW725F1mztgYU73yPpO+hF+KcFaYrKW6Ag TzMfwNNiZHE+HPtFVuSNlrHoW4i/VD982jnynRPR8gKOf3x7nu1w7IVAsIWAWqnjO2q0 U7cw== X-Gm-Message-State: AOJu0YzambJUy/26tuELynUJOKKdSReb+F3DMNYbKmbQrOl4KukOwSyT ANUtRtXCLDXcZIzUs/dQ/BBuMJ6GU9LTWfbVrNLwYJG5RDdf3uXjNTEtuBFe X-Google-Smtp-Source: AGHT+IFlZXYwKKzfSVCvQAD/F7EYjBHt8mO4L4o/4IGfR/VxKkT251PJXIXN5RAsC86eMXDmLpAM5A== X-Received: by 2002:a17:903:32cf:b0:205:753e:b46d with SMTP id d9443c01a7336-205753eb5aamr221051195ad.40.1725560531001; Thu, 05 Sep 2024 11:22:11 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:10 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com, Dave Chinner , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 19/26] xfs: fix negative array access in xfs_getbmap Date: Thu, 5 Sep 2024 11:21:36 -0700 Message-ID: <20240905182144.2691920-20-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032 ] In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx code that trips if we encounter a delalloc extent after flushing the pagecache to disk. The ioctl code does not hold MMAPLOCK so it's entirely possible that a racing write page fault can create a delalloc extent after the file has been flushed. The proposed solution was to replace the assertion with an early return that avoids filling out the bmap recordset with a delalloc entry if the caller didn't ask for it. At the time, I recall thinking that the forward logic sounded ok, but felt hesitant because I suspected that changing this code would cause something /else/ to burst loose due to some other subtlety. syzbot of course found that subtlety. If all the extent mappings found after the flush are delalloc mappings, we'll reach the end of the data fork without ever incrementing bmv->bmv_entries. This is new, since before we'd have emitted the delalloc mappings even though the caller didn't ask for them. Once we reach the end, we'll try to set BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go corrupt something else in memory. Yay. I really dislike all these stupid patches that fiddle around with debug code and break things that otherwise worked well enough. Nobody was complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would return BMV_OF_DELALLOC records, and now we've gone from "weird behavior that nobody cared about" to "bad behavior that must be addressed immediately". Maybe I'll just ignore anything from Huawei from now on for my own sake. Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/ Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_bmap_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 351087cde27e..ce8e17ab5434 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -558,7 +558,9 @@ xfs_getbmap( if (!xfs_iext_next_extent(ifp, &icur, &got)) { xfs_fileoff_t end = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); - out[bmv->bmv_entries - 1].bmv_oflags |= BMV_OF_LAST; + if (bmv->bmv_entries > 0) + out[bmv->bmv_entries - 1].bmv_oflags |= + BMV_OF_LAST; if (whichfork != XFS_ATTR_FORK && bno < end && !xfs_getbmap_full(bmv)) { From patchwork Thu Sep 5 18:21:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792805 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F83914F125 for ; Thu, 5 Sep 2024 18:22:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560534; cv=none; b=lpp7YavCyT/D0OXaA0t7JXjKD6AWl5R8t123ne5cV01VDUU5eox329Mp99Es8PWWYGueIRVKYw2LG/vWqx7PUna66Catl+MF+qf5dzR2dGePxvbMlC1p+SlGX35yfO0dUb50EhJ8Q+B8kiATRTBEMsleuiYs5WEHy6VNZgch3+o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560534; c=relaxed/simple; bh=pA7IeciRDZnn/fhIKUcxpC/KgCMoi6mYBXsq0+FE4oU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ys0vNOwJuR2uCN3jncdBHzLCqsivEIk7U+0OcRckNcTgPZRsiC+QJS/jppl2zBAz7xdqTSNy8shF0XsO6oFwHiGKFR2n0kJR24HtQVlXIjmiDWgpdJiEH1UGRLl3U9BbRowLF8k+4SU1Fachm9ooKIcB7ZlPFQaqpucophGE3RE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HR+L5HAc; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HR+L5HAc" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2057835395aso11026135ad.3 for ; Thu, 05 Sep 2024 11:22:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560532; x=1726165332; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=p774WjHxeJNubvCQIBGY2WST1GkRuIocSWLCfzH0nJQ=; b=HR+L5HAcXQd5pOk3rXt91zpcwT1M549rCKyVapJUjkxGrt5INAJKIPSuNve6wbSRdJ RUJVPxl3gAT8dq7dbZG66EtP3gMp7uqFM6b8Zc8rkJ/nfyBmNS6N5Bx/Ek04ofwnfQs6 B053mDwQIIGWL1gdLeo3AbFA//PGzUzDOzC7qfC9uPBRvYHosS7hTkazg1+DneywAFgQ AlrEmi4UAm/01QhNCdcTGoyHj3j7EFoTYmDG4OAmvjBGW/HwrlBP9dCcEmM3z5HME80Z FX2xmPSLtmez7Y4WG3EBf51r6MXDl6R56wwiBzG8YhAFkL9T5xyRrjfHunsFv6QcrRYA CeSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560532; x=1726165332; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p774WjHxeJNubvCQIBGY2WST1GkRuIocSWLCfzH0nJQ=; b=Pb1DSfqTGQBX7/YMxlp7xI4egzDC81frLvK80kj/+6V+VAHZli4lB5wZWi0qyjaot5 2a7P0lZygeKtDkHjMgTn7zZZRY1cwZ4dBw99JyoI2xUsx/iZVQ97tYJPNM961p30a05E He3AD9Br2siCIQYspmFU3UEcHLTRfzWfnal39TV6heXcswKERq5T9kb0UqYDJkh+Rj7M T2mNZ6lyWAGs9dJDD5nRHRCDUNB6L0AmrznS0dc1nXrOWJCheCc2ymDYM4wSHJWEfHK4 TR0kUP0JopOUim7RgBkz8R/KMucrkz14Hb0GzSK1FXjfedjyXkUuw9x4Xb+rK6nGHX+i FZQA== X-Gm-Message-State: AOJu0YyEBV0EDzQxB1PDs0+NgJ8XW+imEASeNtpl86Htj8sJn51L8zso 62RbFhxgWj9qwoU61tK1nuQ81byHtkYzTKAUCuWzV3sdEFgOYHY2QhEjQjUU X-Google-Smtp-Source: AGHT+IHjez96/OK32GaQAzuoFz3Iqjxit/oYRi3aUhZS/tQxlBmvRxcY1xD6K1ms12mPQpjyfvyiCQ== X-Received: by 2002:a17:902:d511:b0:206:bb29:31a3 with SMTP id d9443c01a7336-206bb293493mr61262625ad.62.1725560532334; Thu, 05 Sep 2024 11:22:12 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:11 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Dave Chinner , Luis Chamberlain , Christoph Hellwig , "Darrick J. Wong" , Chandan Babu R , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 20/26] xfs: fix unlink vs cluster buffer instantiation race Date: Thu, 5 Sep 2024 11:21:37 -0700 Message-ID: <20240905182144.2691920-21-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 348a1983cf4cf5099fc398438a968443af4c9f65 ] Luis has been reporting an assert failure when freeing an inode cluster during inode inactivation for a while. The assert looks like: XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs] RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7 RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0 RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000 R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80 FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs process_one_work (kernel/workqueue.c:3231) worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2)) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) And occurs when the the inode precommit handlers is attempt to look up the inode cluster buffer to attach the inode for writeback. The trail of logic that I can reconstruct is as follows. 1. the inode is clean when inodegc runs, so it is not attached to a cluster buffer when precommit runs. 2. #1 implies the inode cluster buffer may be clean and not pinned by dirty inodes when inodegc runs. 3. #2 implies that the inode cluster buffer can be reclaimed by memory pressure at any time. 4. The assert failure implies that the cluster buffer was attached to the transaction, but not marked done. It had been accessed earlier in the transaction, but not marked done. 5. #4 implies the cluster buffer has been invalidated (i.e. marked stale). 6. #5 implies that the inode cluster buffer was instantiated uninitialised in the transaction in xfs_ifree_cluster(), which only instantiates the buffers to invalidate them and never marks them as done. Given factors 1-3, this issue is highly dependent on timing and environmental factors. Hence the issue can be very difficult to reproduce in some situations, but highly reliable in others. Luis has an environment where it can be reproduced easily by g/531 but, OTOH, I've reproduced it only once in ~2000 cycles of g/531. I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag on the cluster buffers, even though they may not be initialised. The reasons why I think this is safe are: 1. A buffer cache lookup hit on a XBF_STALE buffer will clear the XBF_DONE flag. Hence all future users of the buffer know they have to re-initialise the contents before use and mark it done themselves. 2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which means the buffer remains locked until the journal commit completes and the buffer is unpinned. Hence once marked XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only context that can access the freed buffer is the currently running transaction. 3. #2 implies that future buffer lookups in the currently running transaction will hit the transaction match code and not the buffer cache. Hence XBF_STALE and XFS_BLI_STALE will not be cleared unless the transaction initialises and logs the buffer with valid contents again. At which point, the buffer will be marked marked XBF_DONE again, so having XBF_DONE already set on the stale buffer is a moot point. 4. #2 also implies that any concurrent access to that cluster buffer will block waiting on the buffer lock until the inode cluster has been fully freed and is no longer an active inode cluster buffer. 5. #4 + #1 means that any future user of the disk range of that buffer will always see the range of disk blocks covered by the cluster buffer as not done, and hence must initialise the contents themselves. 6. Setting XBF_DONE in xfs_ifree_cluster() then means the unlinked inode precommit code will see a XBF_DONE buffer from the transaction match as it expects. It can then attach the stale but newly dirtied inode to the stale but newly dirtied cluster buffer without unexpected failures. The stale buffer will then sail through the journal and do the right thing with the attached stale inode during unpin. Hence the fix is just one line of extra code. The explanation of why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and complex.... Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock") Signed-off-by: Dave Chinner Tested-by: Luis Chamberlain Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik --- fs/xfs/xfs_inode.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 4e73dd4a4d82..8c7cbe7f47ef 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2297,11 +2297,26 @@ xfs_ifree_cluster( * This buffer may not have been correctly initialised as we * didn't read it from disk. That's not important because we are * only using to mark the buffer as stale in the log, and to - * attach stale cached inodes on it. That means it will never be - * dispatched for IO. If it is, we want to know about it, and we - * want it to fail. We can acheive this by adding a write - * verifier to the buffer. + * attach stale cached inodes on it. + * + * For the inode that triggered the cluster freeing, this + * attachment may occur in xfs_inode_item_precommit() after we + * have marked this buffer stale. If this buffer was not in + * memory before xfs_ifree_cluster() started, it will not be + * marked XBF_DONE and this will cause problems later in + * xfs_inode_item_precommit() when we trip over a (stale, !done) + * buffer to attached to the transaction. + * + * Hence we have to mark the buffer as XFS_DONE here. This is + * safe because we are also marking the buffer as XBF_STALE and + * XFS_BLI_STALE. That means it will never be dispatched for + * IO and it won't be unlocked until the cluster freeing has + * been committed to the journal and the buffer unpinned. If it + * is written, we want to know about it, and we want it to + * fail. We can acheive this by adding a write verifier to the + * buffer. */ + bp->b_flags |= XBF_DONE; bp->b_ops = &xfs_inode_buf_ops; /* From patchwork Thu Sep 5 18:21:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792806 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CF711862B8 for ; Thu, 5 Sep 2024 18:22:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560535; cv=none; b=PW6jweFYgtMW4u47Ise2uHGkYb0TIV1J5u9QosfHpxlfpR0A7BE9lcMXHnZ9CB1Hf/Z92kSVdEUVz19CmM6TXweRijS00+NeZrMKiAC9Qdg8GrhuoD2gpt5lJ0hJBGg1uHP8mwkatQNJiFgGdGOsHoJN//KWAdY5S9VIn1WLTgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560535; c=relaxed/simple; bh=uvE4HzrMKfxh6m+ETmTZoE2Jqwp/Ub9SrGoT3o2Uoyo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OcLWpWjLflSgtJdOjnFoF+A7TyDNRtFM68dYXXZ8+fWvBzcuIdAmhF+0OUGthc6tCCjhmDOkmsPLRoCDdd/ELLXKEMn0xHbNKp/y47PXr/IZwPJn+vAqhmBTzpuZZehI3qu25WGYQKIVri1ILMTnDjuc3ae0pw+16wT5ZM7vphY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YcgwJSgy; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YcgwJSgy" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2054e22ce3fso11515015ad.2 for ; Thu, 05 Sep 2024 11:22:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560533; x=1726165333; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wl6kbP3NCHwsQRnmoY7uhvBICT8XJepGXHRJAGbl0Vw=; b=YcgwJSgyXAjd4flKIJIG6dog6Hnf+qmMsSOEMZMT/uUfXGH0Ze+OgjNqqyEJxPxHFd K+q5R4WtH41SojFpMy+baY09iHn2qUiR+ecBHMLY2bAiA+We924XNZbf2NlOCcC0BYsr Q947sllFhb/Sa4TGz569A9p/HJwF0fqPuLIW2O0EItU8FrDzOnRXRbVzKod02W2E+Hvb jJTCv/w9QHmKNavbF5Jc05JEVGSGS1L0SeDwtQlTH0IpNgfNRDslelyQaNLyuNvh5tJM Uw+LeF+C+dsmWKjtLGKo6i0S/WDHiyw1+xH3IvxcF8hVYsZQC+2UV0KcwxeVr4oeGgXQ 1Q+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560533; x=1726165333; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wl6kbP3NCHwsQRnmoY7uhvBICT8XJepGXHRJAGbl0Vw=; b=MoGxiqNb87rLOFeSfKdHf7EV8XK3DxSwO5ZYZP4pi0EzyBcA0rik5L4UJnNLGIwDZo QTvyhQXoBOwkVa8zzEJC2joirGNWBD8averLROKE3LaV3WgYNYdaHwsthTKIbWo/YS2i 4VdHEYOTXJszjFlB9ftHPlbKyOZ188vmMHltanZUs6VpnTFWbpVfHwu4IsxP9M3OjTw0 wVMuAk9NTfYppfefHMzCYObLaPBj0+1eiUIlVpaL9B5kTCmJEgHAI8AKYrk9WOMxsUES PTGvpN/FcOO0IjN/51cl+Kcj6S48kc5SggtFBpIp1TYJnlHybwKBvLe4OswZE5Ak8X2c UhEg== X-Gm-Message-State: AOJu0YyPh0+9u1lEx5k/xusrQmgKPuesZsi5EkQLH4A7WtZqvkKjH+On V5YkcMGG5V9aeJjxlwz1zQTtIE1zZwF1Of74gT9aCUlp1SccoexIVAxmNSpS X-Google-Smtp-Source: AGHT+IGPJpGa80cSZzb8/TQUGsIqmAWNmT044iqk0HEYN/a/GeBqocxJuEY0Fb0767pfxI9K9KA9Xg== X-Received: by 2002:a17:903:2287:b0:205:68a4:b2d8 with SMTP id d9443c01a7336-20699acb7d5mr121888335ad.11.1725560533251; Thu, 05 Sep 2024 11:22:13 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:13 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, Shiyang Ruan , "Darrick J. Wong" , Chandan Babu R , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 21/26] xfs: correct calculation for agend and blockcount Date: Thu, 5 Sep 2024 11:21:38 -0700 Message-ID: <20240905182144.2691920-22-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Shiyang Ruan [ Upstream commit 3c90c01e49342b166e5c90ec2c85b220be15a20e ] The agend should be "start + length - 1", then, blockcount should be "end + 1 - start". Correct 2 calculation mistakes. Also, rename "agend" to "range_agend" because it's not the end of the AG per se; it's the end of the dead region within an AG's agblock space. Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"") Signed-off-by: Shiyang Ruan Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik --- fs/xfs/xfs_notify_failure.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index 4a9bbd3fe120..a7daa522e00f 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -126,8 +126,8 @@ xfs_dax_notify_ddev_failure( struct xfs_rmap_irec ri_low = { }; struct xfs_rmap_irec ri_high; struct xfs_agf *agf; - xfs_agblock_t agend; struct xfs_perag *pag; + xfs_agblock_t range_agend; pag = xfs_perag_get(mp, agno); error = xfs_alloc_read_agf(pag, tp, 0, &agf_bp); @@ -148,10 +148,10 @@ xfs_dax_notify_ddev_failure( ri_high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsbno); agf = agf_bp->b_addr; - agend = min(be32_to_cpu(agf->agf_length), + range_agend = min(be32_to_cpu(agf->agf_length) - 1, ri_high.rm_startblock); notify.startblock = ri_low.rm_startblock; - notify.blockcount = agend - ri_low.rm_startblock; + notify.blockcount = range_agend + 1 - ri_low.rm_startblock; error = xfs_rmap_query_range(cur, &ri_low, &ri_high, xfs_dax_failure_fn, ¬ify); From patchwork Thu Sep 5 18:21:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792807 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2CF718892D for ; Thu, 5 Sep 2024 18:22:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560536; cv=none; b=hPrE8Iv5ZVpm7H0+K0k2B41b6ylDEiPen+NZgDqCU74nBtFsJJHv4n3Tal0IXZ8UzHRHNZBe9v1sjfWvGsxa5TdxeNLnJpnGGQOvjtBLELI8/0LeBj1KjqqQw8lsBWyXcMtpk9rX3K8R5RkuTS654lScGCk8rqm6vE7WMIcBeKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560536; c=relaxed/simple; bh=q6qq7169Hq/Hf2cGfojdRbHgLExeOMKheLI3HiW/Yo8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NJAuOdN9pT4g/cJSKCC/VjpNDddKkG+OqL8WX0wvXgLkNAnnPlV0LO49ShwEfk6B95FG5Z5vnHEEOiNedUsnD9hI4rLac5DqKLAk9qYtvJjox4KQsIii7XP3yTKVAmUv6jWCWrRSsHykSBfkTWdvFnXa1YVRA6h54aAHluDnN7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Qf82lU5R; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Qf82lU5R" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-20696938f86so11309395ad.3 for ; Thu, 05 Sep 2024 11:22:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560534; x=1726165334; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v5/3DIxSIUXkMvmWDVCYDFtNq6m7PQRik1oF/pZe10U=; b=Qf82lU5RNzhrrJljOqG0FRf4EqTiEZIA+hvzlQ8X+gfOWiySWtXTCcw3vM+86HTLaE t72WCzO62ZjbtSBjgiMCvod6a3O3eo1JCB7tkKKayG0/uV01hfhIwM/SeKuZpyHkeh0S V5ZhdJinQ72zBO87Sj6jnvU8/SW3ucv3tvAvcsodT42IaWJlMBvBPG3ZeDsNX1PdwH8+ qHerBX4cRE5BMstmZ05rmQnwDeIoDeLnqY2Ubh0rSxNYe1cf5NxUIwFTEuoLWhqQttA7 ryEH9x/o+zMkjD1O+r56ea5aDUSRInA91aAeZhaNF+JjvGwykeSI/oNkVRmtLk5Izaps ybEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560534; x=1726165334; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v5/3DIxSIUXkMvmWDVCYDFtNq6m7PQRik1oF/pZe10U=; b=oAzbWTsQi2BQOh7g1Bt1G2XXd4eFm/LepKzXvwQIjLerZaH4U9sMwBMGEvI6H4oFbN QaMmR2tkzsM4iuoAvwNwBI/Fk2g5tg3cty4iPmnsWW1WpLyp9fLOmMLsIyqMaSy+3mCy /nj+r1wedJYCwrOAuLJl2cIE0jN7BQL+M7dwjNzl/XuMAdImb0nrH4rRJe3W47giuQwj CUrgQZb60uL5Y2tnJiuG9Z+lvB3838fXh3/nDKCFqeav4wU8iIum1fT0ehT8mOWSTKMj h881Kuma0F6ogG+molnk6YNF9SeZo3SPENaJ5xmuKcFFZpXEFmdzv9XxYSa6CrpJBT62 2reA== X-Gm-Message-State: AOJu0Yx78djlBJnH9PRRHb3/6w6tzbJ9CQZmf9ON+DAGIRxJdlCR0pO4 3ZXa+jM3R3zLS53edIZY2kPh7bIRYDh7PpbnohxOXiWffM3FH7WOY5sMhwoM X-Google-Smtp-Source: AGHT+IHXU+DWAX2XZfMGSDQOFShwSzczl6sa382Wgp2ZwMGWu/OlFErxaGuefhnawU2I4KeNm6fHEA== X-Received: by 2002:a17:902:ce05:b0:206:5104:a21c with SMTP id d9443c01a7336-2065104a474mr144360285ad.20.1725560534140; Thu, 05 Sep 2024 11:22:14 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:13 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 22/26] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Date: Thu, 5 Sep 2024 11:21:39 -0700 Message-ID: <20240905182144.2691920-23-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit f12b96683d6976a3a07fdf3323277c79dbe8f6ab ] Alter the definition of i_prev_unlinked slightly to make it more obvious when an inode with 0 link count is not part of the iunlink bucket lists rooted in the AGI. This distinction is necessary because it is not sufficient to check inode.i_nlink to decide if an inode is on the unlinked list. Updates to i_nlink can happen while holding only ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list (which happen after the nlink update) requires both ILOCK_EXCL and the AGI buffer lock. The next few patches will make it possible to reload an entire unlinked bucket list when we're walking the inode table or performing handle operations and need more than the ability to iget the last inode in the chain. The upcoming directory repair code also needs to be able to make this distinction to decide if a zero link count directory should be moved to the orphanage or allowed to inactivate. An upcoming enhancement to the online AGI fsck code will need this distinction to check and rebuild the AGI unlinked buckets. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 3 ++- fs/xfs/xfs_inode.h | 20 +++++++++++++++++++- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 4b040740678c..6df826fc787c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -113,7 +113,7 @@ xfs_inode_alloc( INIT_LIST_HEAD(&ip->i_ioend_list); spin_lock_init(&ip->i_ioend_lock); ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; return ip; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 8c7cbe7f47ef..8c1782a72487 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2015,6 +2015,7 @@ xfs_iunlink_insert_inode( } /* Point the head of the list to point to this inode. */ + ip->i_prev_unlinked = NULLAGINO; return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); } @@ -2117,7 +2118,7 @@ xfs_iunlink_remove_inode( } ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; return error; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 225f6f93c2fa..c0211ff2874e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -68,8 +68,21 @@ typedef struct xfs_inode { uint64_t i_diflags2; /* XFS_DIFLAG2_... */ struct timespec64 i_crtime; /* time created */ - /* unlinked list pointers */ + /* + * Unlinked list pointers. These point to the next and previous inodes + * in the AGI unlinked bucket list, respectively. These fields can + * only be updated with the AGI locked. + * + * i_next_unlinked caches di_next_unlinked. + */ xfs_agino_t i_next_unlinked; + + /* + * If the inode is not on an unlinked list, this field is zero. If the + * inode is the first element in an unlinked list, this field is + * NULLAGINO. Otherwise, i_prev_unlinked points to the previous inode + * in the unlinked list. + */ xfs_agino_t i_prev_unlinked; /* VFS inode */ @@ -81,6 +94,11 @@ typedef struct xfs_inode { struct list_head i_ioend_list; } xfs_inode_t; +static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip) +{ + return ip->i_prev_unlinked != 0; +} + static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip) { return ip->i_forkoff > 0; From patchwork Thu Sep 5 18:21:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792808 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14E581862B8 for ; Thu, 5 Sep 2024 18:22:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560537; cv=none; b=U8tUGI9cm8k7efDx+w8Dbq0Nhhn3JcaZiyXA6QutmA5ybdK3PH+Y/cRbrkzsTUxbR8b3DP6oPTD4CNh8cssgdHbgRP7lpAMSa8bJDhBjm9jcOIERUOC5jX93SwwnKo/pOVOQpDA429Uk1CNWA9nRMCYDstwHtoPlI3jJquCV60A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560537; c=relaxed/simple; bh=Uwg+nboRYd5HhEqNSkSx+n/tsd7X8Tm3rR6D0OB37vc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XvpUU/7EefFlYMeJ6fMhpfEADAhWjJ23/OghT3CmBOdcLDX3EuXWM0Oz7fTefE8gQnGemsqI07Puf6zdYGqQqCIYL0eZiEE2Z8P9KOzNWppxTD/B10vufGK+ALmWQufN4dMbHV3+IzByEH4txysmB4J+nVfbObNmJMck/ap9E/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XSpR0tLj; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XSpR0tLj" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-205909afad3so12158945ad.2 for ; Thu, 05 Sep 2024 11:22:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560535; x=1726165335; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=crAqPX1XFEmY5qcCaw3wc4s67VJ1NonixVSmxVtWDKc=; b=XSpR0tLjf/XkWS35ba0lVBeQ1JgcHYeASJcJO4Pn/V9N3GXXGzzYT78bCsCi1w60qP 7VbN4QkJi/8flUR7LI2ASLBDdUKEljt2/xLVGkYmacDiKDYXCkic6JoA4Tib0IdIgWxV ba4+l1w+Aq3AvJC6FXKeltJcnjfBM/EAb7TlixlWwBlzhHAToWs3NqoeODsHvxDJt5yD JkL4GRUOIqmxF/dkMwVbVFzBgqrTOHQoxyByfC6qAi16NBfqa0HRVRqA9rCcYIGF8er7 5RHx6z0Wmtfo2Z7JugDmjxLo6ICAOyQsyNHevipXmsiAMnl7wtCkRziaaHJuzz6s/JG3 NRhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560535; x=1726165335; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=crAqPX1XFEmY5qcCaw3wc4s67VJ1NonixVSmxVtWDKc=; b=AKR3c3xNMdqgKJ12Z3j8l770/11PMk4KOHguk1iR9YgdcYDFTXkGQYWPrdQ00C0Lf4 v0qhZuDDbDkclJGs8cj7uBTZL8sFm5NeiczNz8flCfG2lnJVo7YrxTl5N540q44xQKqB /E2jMdQH+0b8OW+5WWX3l1Lh2L4lrbxaJUSADInfk6Z8U2b01KuJinFBF8Oqd6Jahziw yi8gOTZsnhoETDj3mqLFNdS8o9k0RsH2eZKLVQkRaAG9ik4dsZDe/BwFtdo0B7r2hucF 52WBjtI20hHZfPjQIB43YIrDPtm7TZzwAf/eEM4AkdJp5DJyODhQQxoT7qTpd/12xESA Olzg== X-Gm-Message-State: AOJu0YwX8cvJVH0nNyWlkNIvUKEXn/47ngMWDv49hdpuXakVFEdksqhI wLXNfnPg7luHF4cNbQ0omiI7xM6SgsAiFY9QhexkP/tdWNhIsduSiWsAFI7V X-Google-Smtp-Source: AGHT+IELJDgjrsiadlMcFnuhGE91Hekqu+Q0dFXUklMiZt25j6Yt6ZzEoxN5RMIlkdDJZLLPkmYVqg== X-Received: by 2002:a17:903:41ca:b0:206:dc2a:232c with SMTP id d9443c01a7336-206dc2a24c7mr19535185ad.15.1725560535119; Thu, 05 Sep 2024 11:22:15 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:14 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 23/26] xfs: reload entire unlinked bucket lists Date: Thu, 5 Sep 2024 11:21:40 -0700 Message-ID: <20240905182144.2691920-24-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 83771c50e42b92de6740a63e152c96c052d37736 ] The previous patch to reload unrecovered unlinked inodes when adding a newly created inode to the unlinked list is missing a key piece of functionality. It doesn't handle the case that someone calls xfs_iget on an inode that is not the last item in the incore list. For example, if at mount time the ondisk iunlink bucket looks like this: AGI -> 7 -> 22 -> 3 -> NULL None of these three inodes are cached in memory. Now let's say that someone tries to open inode 3 by handle. We need to walk the list to make sure that inodes 7 and 22 get loaded cold, and that the i_prev_unlinked of inode 3 gets set to 22. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_export.c | 6 +++ fs/xfs/xfs_inode.c | 100 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 9 ++++ fs/xfs/xfs_itable.c | 9 ++++ fs/xfs/xfs_trace.h | 20 +++++++++ 5 files changed, 144 insertions(+) diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index 1064c2342876..f71ea786a6d2 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,6 +146,12 @@ xfs_nfs_get_inode( return ERR_PTR(error); } + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_irele(ip); + return ERR_PTR(error); + } + if (VFS_I(ip)->i_generation != generation) { xfs_irele(ip); return ERR_PTR(-ESTALE); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 8c1782a72487..06cdf5dd88af 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3622,3 +3622,103 @@ xfs_iunlock2_io_mmap( if (ip1 != ip2) inode_unlock(VFS_I(ip1)); } + +/* + * Reload the incore inode list for this inode. Caller should ensure that + * the link count cannot change, either by taking ILOCK_SHARED or otherwise + * preventing other threads from executing. + */ +int +xfs_inode_reload_unlinked_bucket( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *agibp; + struct xfs_agi *agi; + struct xfs_perag *pag; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino); + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + xfs_agino_t prev_agino, next_agino; + unsigned int bucket; + bool foundit = false; + int error; + + /* Grab the first inode in the list */ + pag = xfs_perag_get(mp, agno); + error = xfs_ialloc_read_agi(pag, tp, &agibp); + xfs_perag_put(pag); + if (error) + return error; + + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; + agi = agibp->b_addr; + + trace_xfs_inode_reload_unlinked_bucket(ip); + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.", + agino, agno); + + prev_agino = NULLAGINO; + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]); + while (next_agino != NULLAGINO) { + struct xfs_inode *next_ip = NULL; + + if (next_agino == agino) { + /* Found this inode, set its backlink. */ + next_ip = ip; + next_ip->i_prev_unlinked = prev_agino; + foundit = true; + } + if (!next_ip) { + /* Inode already in memory. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* Inode not in memory, reload. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* No incore inode at all? We reloaded it... */ + ASSERT(next_ip != NULL); + error = -EFSCORRUPTED; + break; + } + + prev_agino = next_agino; + next_agino = next_ip->i_next_unlinked; + } + + xfs_trans_brelse(tp, agibp); + /* Should have found this inode somewhere in the iunlinked bucket. */ + if (!error && !foundit) + error = -EFSCORRUPTED; + return error; +} + +/* Decide if this inode is missing its unlinked list and reload it. */ +int +xfs_inode_reload_unlinked( + struct xfs_inode *ip) +{ + struct xfs_trans *tp; + int error; + + error = xfs_trans_alloc_empty(ip->i_mount, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_SHARED); + if (xfs_inode_unlinked_incomplete(ip)) + error = xfs_inode_reload_unlinked_bucket(tp, ip); + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_trans_cancel(tp); + + return error; +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index c0211ff2874e..0467d297531e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -593,4 +593,13 @@ void xfs_end_io(struct work_struct *work); int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); +static inline bool +xfs_inode_unlinked_incomplete( + struct xfs_inode *ip) +{ + return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip); +} +int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip); +int xfs_inode_reload_unlinked(struct xfs_inode *ip); + #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index a1c2bcf65d37..ee3eb3181e3e 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,6 +80,15 @@ xfs_bulkstat_one_int( if (error) goto out; + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked_bucket(tp, ip); + if (error) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_irele(ip); + return error; + } + } + ASSERT(ip != NULL); ASSERT(ip->i_imap.im_blkno != 0); inode = VFS_I(ip); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d713e10dff8a..0cd62031e53f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3704,6 +3704,26 @@ TRACE_EVENT(xfs_iunlink_reload_next, __entry->next_agino) ); +TRACE_EVENT(xfs_inode_reload_unlinked_bucket, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x bucket %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->agino % XFS_AGI_UNLINKED_BUCKETS) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip), From patchwork Thu Sep 5 18:21:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792809 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFCC718892D for ; Thu, 5 Sep 2024 18:22:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560538; cv=none; b=FR1PB/gWh/RIqAnlTaWAjDY5/kaGhPlXqlEpq02nG3bBqvbIeiWD2ZZpRzqEkND/De+m2eAq+6KLfyG48AF88+haiAHzEGd1owxgMx+wZFt1bJVY4MvTst+YntJaOhawkePcGNfksa+PZeRoaPmLgqPj2DiJg+yHKoFT2WTrk4Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560538; c=relaxed/simple; bh=HrhF2RdocuWG5Lwf+Z0wFqbLpsdOrAh3uIOG9AOLI/s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=srjuXd0j9wN/eilpFtFcCL/XnO8ej9zmcntjQjF8mZEH9iC0DKBKGJkLkhh+3DDpKL+AIIbxkEkzuYjb9QEgJdHn0OPlXlVoNa7krUEM+A7iaMUb/NP9N4X81i4DOzc3GzC6CxLaHkR6iRaW2TQHhaJ2oUctmj/hAq1vpJEJmAk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V0MTufMx; arc=none smtp.client-ip=209.85.215.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V0MTufMx" Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-7b0c9bbddb4so851190a12.3 for ; Thu, 05 Sep 2024 11:22:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560536; x=1726165336; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+kNHJ80eZPdzthSlv5Lf6HBlIG2tAlDhqhvDn22hO1k=; b=V0MTufMx02mP8T8AtA7NDPqDJ7+EgZWCxoShYL7AYfhakqz4ZsqZIwuyUmFoGUA2bP oQC/gIL42fkHC152LItABkuadsNOsL4Pjlq+tikAHqgwLAMR+H2+5wpHYQ+9voGcuWLN S4JhVw/XBTqJjK3wxNTLVNk56PdKBGGMmo6Qu9m+qfHkrpPloD/bsJq6tinP/3OWJsmA V3sK1q8z3ypaLGiLvt/2ScGYmkvLXK8Um83+FjKMvdhahyDzDnwbhRinvInYx8xIVvM3 m6aAk4A1UEpvJQGeUq+O7tBzydfEG6++Fq/2gELuTDwucnYhsEDjHHwaOMVYVT1J/3Ew NZHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560536; x=1726165336; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+kNHJ80eZPdzthSlv5Lf6HBlIG2tAlDhqhvDn22hO1k=; b=S3hzm7pk5sFwKldIcRN6UCYygHvlF9cD0jAbk24EJ3IjbPGBUFFDQpLCBBoZWS1s4B wBb3ntidm1y5cPj9HUdans26t6fNPaPkw0jUyjImymwoe5VW9OuRJBIRjmOFWDCuX7v1 +wSNpeGrk5oe4BixC0+8swtgZlBhV80ZA9QM40NzMcV2hDCEWCbOsBDoDB6XPl39u2DY +gxJtmY52JBOdoBAR/8yrSCrRvw4eheH3ocuNCkeL9mPkwGvUkrhIw4KIuzVXVDWBJar HO824JwsT3C0GedIanR5AbcMgoKzcxxyyNmK5Jf0mS6UY/QZvIp9xwoN3yeK4XOsTL9s RaqA== X-Gm-Message-State: AOJu0Yx1EUJXnDki8f5O9il7zAKTXZz9FBqqcnP43gvfUiWKtX9JVgj8 EYljWyANoLb7ceAQEEyKaF2/hpMxbTht6FS0em7qXdgIKrdIE84acOWrrOvv X-Google-Smtp-Source: AGHT+IGumzvb+BIkQ4+cFxHHl/WEWZMVsVfEoKa/I4wZGds7TRHkVpEon+rtHU1zBGWPc3gTLXtagg== X-Received: by 2002:a17:902:d2c6:b0:205:5417:ddda with SMTP id d9443c01a7336-2055417e04emr202055815ad.45.1725560536034; Thu, 05 Sep 2024 11:22:16 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:15 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 24/26] xfs: make inode unlinked bucket recovery work with quotacheck Date: Thu, 5 Sep 2024 11:21:41 -0700 Message-ID: <20240905182144.2691920-25-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 49813a21ed57895b73ec4ed3b99d4beec931496f ] Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik --- fs/xfs/xfs_attr_inactive.c | 1 - fs/xfs/xfs_inode.c | 12 +++++++++--- fs/xfs/xfs_inode.h | 5 ++++- fs/xfs/xfs_mount.h | 10 +++++++++- fs/xfs/xfs_qm.c | 7 +++++++ 5 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 5db87b34fb6e..89c7a9f4f930 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -333,7 +333,6 @@ xfs_attr_inactive( int error = 0; mp = dp->i_mount; - ASSERT(! XFS_NOT_DQATTACHED(mp, dp)); xfs_ilock(dp, lock_mode); if (!xfs_inode_has_attr_fork(dp)) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 06cdf5dd88af..00f41bc76bd7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1743,9 +1743,13 @@ xfs_inactive( ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0)) truncate = 1; - error = xfs_qm_dqattach(ip); - if (error) - goto out; + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + xfs_qm_dqdetach(ip); + } else { + error = xfs_qm_dqattach(ip); + if (error) + goto out; + } if (S_ISLNK(VFS_I(ip)->i_mode)) error = xfs_inactive_symlink(ip); @@ -1963,6 +1967,8 @@ xfs_iunlink_reload_next( trace_xfs_iunlink_reload_next(next_ip); rele: ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + if (xfs_is_quotacheck_running(mp) && next_ip) + xfs_iflags_set(next_ip, XFS_IQUOTAUNCHECKED); xfs_irele(next_ip); return error; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 0467d297531e..85395ad2859c 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -344,6 +344,9 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) */ #define XFS_INACTIVATING (1 << 13) +/* Quotacheck is running but inode has not been added to quota counts. */ +#define XFS_IQUOTAUNCHECKED (1 << 14) + /* All inode state flags related to inode reclaim. */ #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \ XFS_IRECLAIM | \ @@ -358,7 +361,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) #define XFS_IRECLAIM_RESET_FLAGS \ (XFS_IRECLAIMABLE | XFS_IRECLAIM | \ XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \ - XFS_INACTIVATING) + XFS_INACTIVATING | XFS_IQUOTAUNCHECKED) /* * Flags for inode locking. diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index c8e72f0d3965..9dc0acf7314f 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -401,6 +401,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID) #define XFS_OPSTATE_WARNED_SHRINK 8 /* Kernel has logged a warning about logged xattr updates being used. */ #define XFS_OPSTATE_WARNED_LARP 9 +/* Mount time quotacheck is running */ +#define XFS_OPSTATE_QUOTACHECK_RUNNING 10 #define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ @@ -423,6 +425,11 @@ __XFS_IS_OPSTATE(inode32, INODE32) __XFS_IS_OPSTATE(readonly, READONLY) __XFS_IS_OPSTATE(inodegc_enabled, INODEGC_ENABLED) __XFS_IS_OPSTATE(blockgc_enabled, BLOCKGC_ENABLED) +#ifdef CONFIG_XFS_QUOTA +__XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING) +#else +# define xfs_is_quotacheck_running(mp) (false) +#endif static inline bool xfs_should_warn(struct xfs_mount *mp, long nr) @@ -440,7 +447,8 @@ xfs_should_warn(struct xfs_mount *mp, long nr) { (1UL << XFS_OPSTATE_BLOCKGC_ENABLED), "blockgc" }, \ { (1UL << XFS_OPSTATE_WARNED_SCRUB), "wscrub" }, \ { (1UL << XFS_OPSTATE_WARNED_SHRINK), "wshrink" }, \ - { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" } + { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }, \ + { (1UL << XFS_OPSTATE_QUOTACHECK_RUNNING), "quotacheck" } /* * Max and min values for mount-option defined I/O diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index f51960d7dcbd..bbd0805fa94e 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust( if (error) return error; + error = xfs_inode_reload_unlinked(ip); + if (error) + goto error0; + ASSERT(ip->i_delayed_blks == 0); if (XFS_IS_REALTIME_INODE(ip)) { @@ -1173,6 +1177,7 @@ xfs_qm_dqusage_adjust( } nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks; + xfs_iflags_clear(ip, XFS_IQUOTAUNCHECKED); /* * Add the (disk blocks and inode) resources occupied by this @@ -1319,8 +1324,10 @@ xfs_qm_quotacheck( flags |= XFS_PQUOTA_CHKD; } + xfs_set_quotacheck_running(mp); error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); + xfs_clear_quotacheck_running(mp); /* * On error, the inode walk may have partially populated the dquot From patchwork Thu Sep 5 18:21:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792810 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEA271865FD for ; Thu, 5 Sep 2024 18:22:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560539; cv=none; b=ho0ILJ5BV+Nx5vEA21J5qnuya3/sYijdPz5W9HBSwVZ1hmeK5tTEFWBpGG6l1S0zjOPyj/DxAIhVuoNMuG7QJQXKSKLAT7CZHFKKDkeFxKjPFcPzZLO9TCHiWoewrE1flFkrskyB23MGggVbOF1087XGx4TBiY/zGVfyF/d3Ces= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560539; c=relaxed/simple; bh=DE33W2lACSLqnBxC0KOo+jaY7ChgXD+BTwIU/dYGiq8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NBNtn5qygJlqlR7o5L/sR9OhVkwL4U9ZhfEDKdZkC3+9MXJWWDfrz3qql5uCAeenXbmS2v3LF417KgdSzlyhr4b/dyVm2Z5aktr2QYa7t84WBQtMPnX/GJsvj4/eJfzHQ2/WBWEHn9QpxwNkUN2wxK0pfqwuI19Nqe/GtiH0tVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JFgGI+Vk; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JFgGI+Vk" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-7142e002aceso882265b3a.2 for ; Thu, 05 Sep 2024 11:22:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560537; x=1726165337; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XJXUjYs4PDv5CRJo9yYsxX66rUy+LfcpfqB8jNGeelo=; b=JFgGI+VkqgVoHm6Uj3+Rn/Z0X+SRS8oGQX5R6O7leQFfjxdAiJCkM4ziHZAF5AxO4y 8+J2kHbt3owC4wHI5hS+zAxy1VRV7WTno9QqZJyvg8ihDf/o0FzultWkyNeVGYwIkAxd 9SHJKjPdkg64Wiez4BNtH1sNnGX7FPA9QKdY78DvQtg54AaZLTLFtdHGL0B5gcguTwl9 RHwTNrMWjTsQoZt70jZc3kz7dl7gMkNWHj82sSUMPqn6GOZdT1zTgvmPsBV32i4xIuIA JjldFiP2pSyND7xf1ZsU4EypqSPSDLwF91v4YmInDASKCkch47ETgIH3RxSLgx8HzHKU jERw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560537; x=1726165337; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XJXUjYs4PDv5CRJo9yYsxX66rUy+LfcpfqB8jNGeelo=; b=YdHgo86VwMOLe8nAMDfKxKrw/sdyNyawqehE256j3wZ7rerG8WWFtTBTLihix+MFup 13iRzxNmtaHiw9lKWZqUnIgQK+5oMhA/4TEjSD2wT+InXNOvthBa6v4mRsId7waN+f4J lHXgLFUkhEx4C2jpnOYWlmZdxlX9+GiLmK+DlVF/CrxJu1YpFCa08k0LFwsKC4jSWnjg 5vBHCP9o8KIBNoHeyQyPfBvYrbC6hxFLSJh6LXpm3CCWx9wvPM9S+7S92cEivMEH5L1I J1JAUX5z3MoWPmSR6hLVDUEAKdZTGT9hVP29US/2U0lTjvaGVMwPX6FTe+buaHI752H2 bpyA== X-Gm-Message-State: AOJu0Yxm+RRdu9OaIyQgwS018sK0VofvIsAaExMiZN3xySAfGfLg98ac Ej3CiLQfv6++3kHHqRBr6GSQJJcTMXUJ0hwgP8ZM0MP7E+gxN7jJDk1JG16B X-Google-Smtp-Source: AGHT+IFW8foc47K1zsuAbH99wQRiayPg1z1xYVPi7qEb4zbOX+EixvofL6JoHJCjmpJS+mIPxxFUEg== X-Received: by 2002:a17:902:e74e:b0:206:bbaa:84e7 with SMTP id d9443c01a7336-206bbaa92afmr77695215ad.40.1725560537156; Thu, 05 Sep 2024 11:22:17 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:16 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 25/26] xfs: fix reloading entire unlinked bucket lists Date: Thu, 5 Sep 2024 11:21:42 -0700 Message-ID: <20240905182144.2691920-26-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 537c013b140d373d1ffe6290b841dc00e67effaa ] During review of the patcheset that provided reloading of the incore iunlink list, Dave made a few suggestions, and I updated the copy in my dev tree. Unfortunately, I then got distracted by ... who even knows what ... and forgot to backport those changes from my dev tree to my release candidate branch. I then sent multiple pull requests with stale patches, and that's what was merged into -rc3. So. This patch re-adds the use of an unlocked iunlink list check to determine if we want to allocate the resources to recreate the incore list. Since lost iunlinked inodes are supposed to be rare, this change helps us avoid paying the transaction and AGF locking costs every time we open any inode. This also re-adds the shutdowns on failure, and re-applies the restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and re-adds a requested comment about the quotachecking code. Retain the original RVB tag from Dave since there's no code change from the last submission. Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/xfs_export.c | 16 +++++++++++---- fs/xfs/xfs_inode.c | 48 +++++++++++++++++++++++++++++++++------------ fs/xfs/xfs_itable.c | 2 ++ fs/xfs/xfs_qm.c | 15 +++++++++++--- 4 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index f71ea786a6d2..7cd09c3a82cb 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,10 +146,18 @@ xfs_nfs_get_inode( return ERR_PTR(error); } - error = xfs_inode_reload_unlinked(ip); - if (error) { - xfs_irele(ip); - return ERR_PTR(error); + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + xfs_irele(ip); + return ERR_PTR(error); + } } if (VFS_I(ip)->i_generation != generation) { diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 00f41bc76bd7..909085269227 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1744,6 +1744,14 @@ xfs_inactive( truncate = 1; if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + /* + * If this inode is being inactivated during a quotacheck and + * has not yet been scanned by quotacheck, we /must/ remove + * the dquots from the inode before inactivation changes the + * block and inode counts. Most probably this is a result of + * reloading the incore iunlinked list to purge unrecovered + * unlinked inodes. + */ xfs_qm_dqdetach(ip); } else { error = xfs_qm_dqattach(ip); @@ -3657,6 +3665,16 @@ xfs_inode_reload_unlinked_bucket( if (error) return error; + /* + * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the + * incore unlinked list pointers for this inode. Check once more to + * see if we raced with anyone else to reload the unlinked list. + */ + if (!xfs_inode_unlinked_incomplete(ip)) { + foundit = true; + goto out_agibp; + } + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; agi = agibp->b_addr; @@ -3671,25 +3689,27 @@ xfs_inode_reload_unlinked_bucket( while (next_agino != NULLAGINO) { struct xfs_inode *next_ip = NULL; + /* Found this caller's inode, set its backlink. */ if (next_agino == agino) { - /* Found this inode, set its backlink. */ next_ip = ip; next_ip->i_prev_unlinked = prev_agino; foundit = true; + goto next_inode; } - if (!next_ip) { - /* Inode already in memory. */ - next_ip = xfs_iunlink_lookup(pag, next_agino); - } - if (!next_ip) { - /* Inode not in memory, reload. */ - error = xfs_iunlink_reload_next(tp, agibp, prev_agino, - next_agino); - if (error) - break; - next_ip = xfs_iunlink_lookup(pag, next_agino); - } + /* Try in-memory lookup first. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + if (next_ip) + goto next_inode; + + /* Inode not in memory, try reloading it. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + /* Grab the reloaded inode. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); if (!next_ip) { /* No incore inode at all? We reloaded it... */ ASSERT(next_ip != NULL); @@ -3697,10 +3717,12 @@ xfs_inode_reload_unlinked_bucket( break; } +next_inode: prev_agino = next_agino; next_agino = next_ip->i_next_unlinked; } +out_agibp: xfs_trans_brelse(tp, agibp); /* Should have found this inode somewhere in the iunlinked bucket. */ if (!error && !foundit) diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index ee3eb3181e3e..44d603364d5a 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,10 +80,12 @@ xfs_bulkstat_one_int( if (error) goto out; + /* Reload the incore unlinked list to avoid failure in inodegc. */ if (xfs_inode_unlinked_incomplete(ip)) { error = xfs_inode_reload_unlinked_bucket(tp, ip); if (error) { xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_irele(ip); return error; } diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index bbd0805fa94e..bd907bbc389c 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,9 +1160,18 @@ xfs_qm_dqusage_adjust( if (error) return error; - error = xfs_inode_reload_unlinked(ip); - if (error) - goto error0; + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + goto error0; + } + } ASSERT(ip->i_delayed_blks == 0); From patchwork Thu Sep 5 18:21:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13792811 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA1EC14F125 for ; Thu, 5 Sep 2024 18:22:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560540; cv=none; b=lvUzNNF2qjrDe2sHbR0l2qK1Ai9VpKUqiTpFCz4Z+QFrr6IQFHTyD2TIiGr6841kOZtZC29ckHaTCiCJ1qKm3cz7UzQjlBtF8MNtdTkVvokzkY4sNcwgKQ9qIC/4oFC/DQ6U0sqoXQfRb9Mf/XsNmFdTj9MLeg6BfzQwtBmsuP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725560540; c=relaxed/simple; bh=PPNdc3z3NqwEjYCLYKW/QfYOgzNgiCuta5kiBo65KeI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aJ2icomH4ZdmtoAd4lohHpRibd3iFADbRPVuulCqFwf44/LVaMQK0QjvvWIjdQYjv4aCC8IpCgFslUQEs4lTnw2Cthx60WP6EFMeRlrtJ/d8On3dxYwHBTu4uGy98p/CUDYTBmKSfJLY6cUQ9/5seeMVqwcScsXsaRFWFmqT5Oo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LXaMdfzC; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LXaMdfzC" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-71434174201so866237b3a.0 for ; Thu, 05 Sep 2024 11:22:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725560538; x=1726165338; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=X0L1twsXbiKjyUC98PoGnv6lrlvreY6QLXRsV9mKZ4U=; b=LXaMdfzCWPEjt6Sk9VYlxlZLhKEAXdGVYG6fVuJe0f1jZda9KFSohhZBw+Ej46yqsh DCURihDq3lTjkQ9oMyY3jg3lkjYvEvhxx/jHT69/wBBNlO3y3yVCdCN7m5upvkWEyrxU zedTGOr1Y71w/FRkY2zMJgOuyrI7K/BsgbUEhJ8TIpFc+3AnlgVlp7HxlrUugg0kmvFD 4YYFdEbNESTBAFa/UwOBX2upo8728hVG8PslN1m9NcpEgTSbnk3YHQwMZ/YYo+CKN4Eu Uf2OCoQuonJhfEizdaOKpYBtVEa6ry5gWwp6qRFX8xWRKZkUfyCRreatVMncsH0KG5Vb n+jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725560538; x=1726165338; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X0L1twsXbiKjyUC98PoGnv6lrlvreY6QLXRsV9mKZ4U=; b=xI4Mout2ihGjLd0k7ROHPtPrOA2fcs2teL0bqjwb2N26cm3Md8aOiZDd4du9V7PX/c emsIsD+32ZWjRZVygr1kk6DH5m1sLcrqCsx6MLX4ToWC1uknyU6tFZwvUz3ZomS3Ewaf PQmeLHQ6TPziAdQee9FqPqnQyg8tNsupkQGgFL1V1aEjl3ShFJQGQdKM2YI9pWmS9Lyh hyGjMDoTwA54kB8kPAScnlWLj8S1ofbtbZfsl8AZ5Rxo0M6TYPii8guCu8tOaWhUZTuL nGnDkwbQTMd4in39opoN4FiPO8EmwdAobDsDlYp3ChwVDNwh1zp8JeuTyvvlYeFq5vlt XuQw== X-Gm-Message-State: AOJu0YzJpI3otvAxf1rBVSHi/95OF0yUCv2NWjqP6WwltKe+GiOdVXK8 gPfuPyZFk/KxyXBUxwFtOT8Tn2lEeQcj6V+Z+UXe0WcZbKyawLytSCeXVv3t X-Google-Smtp-Source: AGHT+IHNsr7Io7SKqY0PAnRH284SxCVtjzf8mH5VCvkupsFQz5VEKWU8Qhe/2X96sTGcWmKxNemKCQ== X-Received: by 2002:a17:903:1cf:b0:1fd:8eaf:ea73 with SMTP id d9443c01a7336-20546705a4fmr212233295ad.35.1725560538098; Thu, 05 Sep 2024 11:22:18 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:2da2:d734:ef56:7ccf]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea684f0sm31374395ad.271.2024.09.05.11.22.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Sep 2024 11:22:17 -0700 (PDT) From: Leah Rumancik To: linux-xfs@vger.kernel.org Cc: amir73il@gmail.com, chandan.babu@oracle.com, "Darrick J. Wong" , Dave Chinner , Dave Chinner , Leah Rumancik Subject: [PATCH 6.1 CANDIDATE 26/26] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs Date: Thu, 5 Sep 2024 11:21:43 -0700 Message-ID: <20240905182144.2691920-27-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog In-Reply-To: <20240905182144.2691920-1-leah.rumancik@gmail.com> References: <20240905182144.2691920-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 8e698ee72c4ecbbf18264568eb310875839fd601 ] Through generic/300, I discovered that mkfs.xfs creates corrupt filesystems when given these parameters: # mkfs.xfs -d size=512M /dev/sda -f -d su=128k,sw=4 --unsupported Filesystems formatted with --unsupported are not supported!! meta-data=/dev/sda isize=512 agcount=8, agsize=16352 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=130816, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 blks Discarding blocks...Done. # xfs_repair -n /dev/sda Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 16:30:50: zeroing log - 16320 of 16320 blocks done - scan filesystem freespace and inode maps... agf_freeblks 25, counted 0 in ag 4 sb_fdblocks 8823, counted 8798 The root cause of this problem is the numrecs handling in xfs_freesp_init_recs, which is used to initialize a new AG. Prior to calling the function, we set up the new bnobt block with numrecs == 1 and rely on _freesp_init_recs to format that new record. If the last record created has a blockcount of zero, then it sets numrecs = 0. That last bit isn't correct if the AG contains the log, the start of the log is not immediately after the initial blocks due to stripe alignment, and the end of the log is perfectly aligned with the end of the AG. For this case, we actually formatted a single bnobt record to handle the free space before the start of the (stripe aligned) log, and incremented arec to try to format a second record. That second record turned out to be unnecessary, so what we really want is to leave numrecs at 1. The numrecs handling itself is overly complicated because a different function sets numrecs == 1. Change the bnobt creation code to start with numrecs set to zero and only increment it after successfully formatting a free space extent into the btree block. Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik --- fs/xfs/libxfs/xfs_ag.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index bb0c700afe3c..bf47efe08a58 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -415,10 +415,12 @@ xfs_freesp_init_recs( ASSERT(start >= mp->m_ag_prealloc_blocks); if (start != mp->m_ag_prealloc_blocks) { /* - * Modify first record to pad stripe align of log + * Modify first record to pad stripe align of log and + * bump the record count. */ arec->ar_blockcount = cpu_to_be32(start - mp->m_ag_prealloc_blocks); + be16_add_cpu(&block->bb_numrecs, 1); nrec = arec + 1; /* @@ -429,7 +431,6 @@ xfs_freesp_init_recs( be32_to_cpu(arec->ar_startblock) + be32_to_cpu(arec->ar_blockcount)); arec = nrec; - be16_add_cpu(&block->bb_numrecs, 1); } /* * Change record start to after the internal log @@ -438,15 +439,13 @@ xfs_freesp_init_recs( } /* - * Calculate the record block count and check for the case where - * the log might have consumed all available space in the AG. If - * so, reset the record count to 0 to avoid exposure of an invalid - * record start block. + * Calculate the block count of this record; if it is nonzero, + * increment the record count. */ arec->ar_blockcount = cpu_to_be32(id->agsize - be32_to_cpu(arec->ar_startblock)); - if (!arec->ar_blockcount) - block->bb_numrecs = 0; + if (arec->ar_blockcount) + be16_add_cpu(&block->bb_numrecs, 1); } /* @@ -458,7 +457,7 @@ xfs_bnoroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); } @@ -468,7 +467,7 @@ xfs_cntroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); }