From patchwork Tue Sep 24 18:38:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811096 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A4E61AC45A; Tue, 24 Sep 2024 18:38:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203141; cv=none; b=Oc/7CJhBFSG05bZTf9UOKYEDV1QNVOsSaNFt2wjLXs9RzWHCELZrr/w7wKzvaCqbGycK4PRVZ7PbaPdeVPcWGj/OwBsB9a8vJjFqOyh6VvC7yymEsQi5x2RNKzd4dlQrv6Q8ChRyhvKjuyaJGrIcWMY1s2VDXrZWKqnPbHRK7cg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203141; c=relaxed/simple; bh=oobLH3ROj0eZGQvh1Gbz4aAcohO6kDihZ3oNiNCAMVQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hy9JAfhWtPLh48nVcl+hwe/26dppfkhyJErlsOZef/7FHTX0DxYxLhuR8DZjLg61GatDMJMtv+crA0AkDZhmajwigXrB21sccMFtJ3t+SscsygIjyme7N0x/MZq51nVv9KMHs7QtDhrL3zNPcqjuSDStVu26hVcZr4g5VtyFXO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=X1XXRThg; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="X1XXRThg" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2daaa9706a9so4789674a91.1; Tue, 24 Sep 2024 11:38:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203139; x=1727807939; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3/NRkptMRzWHndvNssccbQUiqHKP0aF5n/yC1Yx0/88=; b=X1XXRThg0b3O+7WKe2w7GIqjtB9tPCoE1Qx/6PwDuJ3RGTPEOdRqG2Xq9G4+3mCrQ3 A1Uh8KLm+gIeDXb+jsyUbVT6FD3SveaZekYtBBZ2PVsCFGsOzSKzjmbVMlMyEoKmfESR mgHg79TaWWcnT3HcuYMjRz4XTR2EQW7RagXxh69KUJfSE0Oy58lkK/t+zYZKTgShbOBG Fy6KTbTBI6bL382ZsOMcJ4saiGG4cuZZRSIpdKzsftXQ+BE8vfS+AOZKQBlRODhMeRFY 64MBo2EJ5CH0U17ZYmsoJTNF/iOnmAEmJ8z2efP4mYZg0CfjZVDEAQ3HJ33W51Fvvk8D E1Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203139; x=1727807939; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3/NRkptMRzWHndvNssccbQUiqHKP0aF5n/yC1Yx0/88=; b=LbXcqAQC+ttlNTRhXoFFyp6ske50PSgyUHpHSCyq70CE/rvulGSagi9sloQbDTqfH6 ZH4qPrOT6kuKLYigPFEkAR+qm5gwyisktUei/5Wb+YgIKGR+PQyhFqz4hzgfKhwVjCBQ DuodSvTqh5Ntc5zX/EBeIMbzIyg5UsSz+a0fFHDt3n6G0HcFRAC+Rm7kJ22+MpU3cNDB 79gFh9dU1tWecZnwx5vNGeNEsiCww0JfsqBfP4G//cBlM9SNUAnadgFSbeuu0FCFnMvl qD8jWdaiZo8NqNMW0JwDqqzRAFSXLhEkQpOv66NeNGHanY++Dxm0f4Er2490HwXciSkm r9bw== X-Gm-Message-State: AOJu0Yx4sRG1s83+t0ImrO+1VcunNvkgair1nhkyAEBCzJ7v/O8Z4Vzb dn9f1/DP406S5r9DQD8F7yY9AIoBbHixpOfAzc7TMGAWU0qmyLrL9grgj4jm X-Google-Smtp-Source: AGHT+IHGkfvpQO1te1+j79Og1/h/bMVvfriLvfp+nDnk+Y0DucEEBSedFsTytKmhzt1ofHfdzUAWxg== X-Received: by 2002:a17:90a:51c4:b0:2c9:9f50:3f9d with SMTP id 98e67ed59e1d1-2e06ae2cae8mr115047a91.5.1727203138804; Tue, 24 Sep 2024 11:38:58 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.38.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:38:58 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , syzbot+912776840162c13db1a3@syzkaller.appspotmail.com, "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 01/26] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING Date: Tue, 24 Sep 2024 11:38:26 -0700 Message-ID: <20240924183851.1901667-2-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 52f31ed228212ba572c44e15e818a3a5c74122c0 ] Resulting in a UAF if the shrinker races with some other dquot freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is removed from the LRU. This can occur if a dquot purge races with drop_caches. Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_qm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 18bb4ec4d7c9..ff53d40a2dae 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -422,6 +422,14 @@ xfs_qm_dquot_isolate( if (!xfs_dqlock_nowait(dqp)) goto out_miss_busy; + /* + * If something else is freeing this dquot and hasn't yet removed it + * from the LRU, leave it for the freeing task to complete the freeing + * process rather than risk it being free from under us here. + */ + if (dqp->q_flags & XFS_DQFLAG_FREEING) + goto out_miss_unlock; + /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. @@ -441,10 +449,8 @@ xfs_qm_dquot_isolate( * skip it so there is time for the IO to complete before we try to * reclaim it again on the next LRU pass. */ - if (!xfs_dqflock_nowait(dqp)) { - xfs_dqunlock(dqp); - goto out_miss_busy; - } + if (!xfs_dqflock_nowait(dqp)) + goto out_miss_unlock; if (XFS_DQ_IS_DIRTY(dqp)) { struct xfs_buf *bp = NULL; @@ -478,6 +484,8 @@ xfs_qm_dquot_isolate( XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims); return LRU_REMOVED; +out_miss_unlock: + xfs_dqunlock(dqp); out_miss_busy: trace_xfs_dqreclaim_busy(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); From patchwork Tue Sep 24 18:38:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811097 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93B3E1AC898; Tue, 24 Sep 2024 18:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203143; cv=none; b=J55C13Bs5NgGM50JeBP1DtbKmDW0YBM/7iSi5krpMB89ANavT9Ff1m3Au60pFtnDgvzLWvxhJhit8ny6HmDiNGLXh2zthjmVQK8AgTvZpZ4+J+VL+P2tWkBK/d2LOB2EQ2g21oFmm8/Wr7Ve/Fgl/lC0wRphK2FBZBZ1hi670+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203143; c=relaxed/simple; bh=5/evbgZJpBDHqo8Z6QJmTNsvi1Z9bsk7VzcU+ReHC5I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F/rWUFCb45uNUape9dmL4KYQBNqTb8p5melRBonCGyviU0YmO1TIf3ThaZj0K8NU99qKwJOyT6UyPKbeKj26Jjx7p4QVsodOIabp+lTrVKO0mzc1Rkm3Dd0+Jyy6e1zxa8W54kum9ByURM7rTy2ryZM3hX4v9/yVwrP9fjzrIjw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=d42aQIN5; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d42aQIN5" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2d87176316eso126793a91.0; Tue, 24 Sep 2024 11:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203141; x=1727807941; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4ZRLOpJJYBLWAfm8yXjQKXJS6QubtWuMo+I+IUyXbVQ=; b=d42aQIN5BwKJLoZUMOTOBrUhBzUq9ZOoppW8fG3iD61ryPHqN4hjll+Ygyf6am21Ut WclhAd6KMTmv3K+U4ZF30A0aYguxWIPPRNF7MQc5kXlnq7UPR1dMrq4B8ffH+0UMzVmE 0cSoNogT/ZR5wb9OJCxeevpVBZyMgrz4JvKm/alNwwpzU3XyXXLWUUO31vVy1ZeggXS/ RtKbMvsHkLpfLRdhKvDUY475IH9raRfcIMRVhGWLWUTBNgjpOjlpkdCIqsUPbX5zqoyn lgIP+SOpBUIsYKCfIJEfAKAfqi+O4smDVL/lxzBEK8GSjXokmOwcdcB3PwnSxu8JeTQ7 xdNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203141; x=1727807941; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4ZRLOpJJYBLWAfm8yXjQKXJS6QubtWuMo+I+IUyXbVQ=; b=jXWp7GEm/wO1krDuttg/MjxHZpM15zpPb2C20nddIEDKS7QvV+DRed+9AAe94IkL+j I0JI8KrvHlnMeocFghBGT3+6pSCADugQy5BRa6iTIx9cEMe5UnqZi8JoQp1IaytGQ0oY qT8pDG2NgAdAPo8CO3+GwPVJKm2QZ7dd5tMWFbuEMUkKRoEe5D5K75mk3icpqpqIIsb9 aYmwL/WUWLK4+FHY/MhwB8CWU+w1o001y717bOm/EiA1Sa18YbEXKVWt96us/5R6N65Q tyzebNNkSclzq7N3FjJ1sukeK7lOCjBziSUGeqVv7ygVmiSWWOkFa3dNg9kuckrUJ+WX 5gcA== X-Gm-Message-State: AOJu0YwOYoPhJABZn1g2sCHwyyGnttyiMFbgFZZwqTV7kYBRgFYJBhPJ ScOUJluj33p3/+Vr/dEw6G44HNhRnCl2cnQRIVgg9moU6lRmLeRux/bOND7Q X-Google-Smtp-Source: AGHT+IGgs00Oed+XnEr+N2n8jAufX9x4DAAPkzn/VHK4zxoyzLNGLvMa9biK7kRvu1grdzmc5emOZQ== X-Received: by 2002:a17:90b:17c5:b0:2d8:e6d8:14c8 with SMTP id 98e67ed59e1d1-2e06ac38666mr232684a91.15.1727203140375; Tue, 24 Sep 2024 11:39:00 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.38.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:38:59 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Wu Guanghao , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 02/26] xfs: Fix deadlock on xfs_inodegc_worker Date: Tue, 24 Sep 2024 11:38:27 -0700 Message-ID: <20240924183851.1901667-3-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Wu Guanghao [ Upstream commit 4da112513c01d7d0acf1025b8764349d46e177d6 ] We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found. [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30 When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock. So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker. Signed-off-by: Wu Guanghao Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_icache.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index dd5a664c294f..f5568fa54039 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1858,6 +1858,7 @@ xfs_inodegc_worker( struct xfs_inodegc, work); struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; + unsigned int nofs_flag; ASSERT(gc->cpu == smp_processor_id()); @@ -1866,6 +1867,13 @@ xfs_inodegc_worker( if (!node) return; + /* + * We can allocate memory here while doing writeback on behalf of + * memory reclaim. To avoid memory allocation deadlocks set the + * task-wide nofs context for the following operations. + */ + nofs_flag = memalloc_nofs_save(); + ip = llist_entry(node, struct xfs_inode, i_gclist); trace_xfs_inodegc_worker(ip->i_mount, READ_ONCE(gc->shrinker_hits)); @@ -1874,6 +1882,8 @@ xfs_inodegc_worker( xfs_iflags_set(ip, XFS_INACTIVATING); xfs_inodegc_inactivate(ip); } + + memalloc_nofs_restore(nofs_flag); } /* From patchwork Tue Sep 24 18:38:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811098 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86BE61AC45A; Tue, 24 Sep 2024 18:39:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203143; cv=none; b=NQVWomNW34H3uyIWrOEdvv9o+WdtERcP1aQ/8PHpGtW7gBsXQGO/Sv11pgbrOUgt6XNcMLtW9JVQMqZLS0jiSiA/H8LXP2W87uk2c880J0ejX2nyGrgoSsK3mIlWfUukHrsn4TVHHkMtyHR4DDNNZ43ojmyCSQDaEhlSKWljLu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203143; c=relaxed/simple; bh=itnYuLbEbb5RX7almTbQe35WhKRcAs4sDhG3S42WrpE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dlIE8sq7YweyTuzrqP83x0pSvYN8VgIsxqvBNwORDRjKkhCPS5Sbly1vT8Mn4/L7oOXAzibnLB3OiioGYNGfK+peQCfhyd9o3Q9J4Mz2KUVmPvZ2wb1AljZyZMQC94HBWcdaPNrDtClgzwLCqJ8rv4XILEzndYPo3hmSKoqRCl8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=STQbNhuh; arc=none smtp.client-ip=209.85.216.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="STQbNhuh" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2db928dbd53so4569754a91.2; Tue, 24 Sep 2024 11:39:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203142; x=1727807942; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=omUh0GNRrQniy5dGCLOuTd2ASeKAQ32meQjaH4FZVtQ=; b=STQbNhuhWM2uPbeK5IqdgjTceZfREZ9/+Nh2pc7hXMWsS3AClnXaFBETeafRijjdHd wxpnDYZisr9Amjpdb97j5BVFgAp9vUvdq6FJACYrJOQd1Jx/AlGPIQUaQaR3/UNU1rDd UZqXcjlSaIcRXoQnDYiHnvELuyP96sOnbbmdvKCq1OJKOw9z4VgHY5l7r+t87bmm/hJr okkxL3p1D8kIqYr5kZvpAPboMksw3Q+sSfjzJkqRaLcxWyBGrPVn0+WCyxcjqknVhaQz TVL4l+UgaPdX1knh1qgSaaA/dn/v6v51KLgXgU/zgKCEABBZ9C+tYIKmJGGqPQe1uuYj Sq/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203142; x=1727807942; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=omUh0GNRrQniy5dGCLOuTd2ASeKAQ32meQjaH4FZVtQ=; b=CwvVqp2FULhHPsz95yx64OeEBVS8O87YYad4A1AW5x0dmQmZ8ZcYjIUAbFFxx9uaxL ZAQ/iwchBHRfkmxDh9WVE3qcG7hxAF5IwSXiT3SMOp7kYbegqlcWr6HGf0vzXwF1igki /s9ACMXvi48PS/BzmU46lR3R5UdlKTCr8Whjc4NMx12GSqYuNcX1t060JaAJ9mR2QGZ3 kFPzqd4Eh/lYXEh/8ZBb7nkRiLycwm4ja9pU8rhv/liYIsL1K8U5iRD0GKeFIT4w4DkV KQuPre6zHahoWDyaV/7EITbOvn8A6ON2nGTqjSNBoKG1qmcoXDD4r1z40q1qqEZpbQD3 X1Sw== X-Gm-Message-State: AOJu0Yx1VgA9Z0BZBzti6BmQtX0YmLdm/qQeIG+kh9+rMKHEzRbx6VZ7 7Tl88G2KBlYJNhSX2GRBfLBPgR+qBS9psz4jo7AbhFDRlItGJKdj1ujvUKJ1 X-Google-Smtp-Source: AGHT+IEzwDLqtzlbPDo0gZlPVFDqnW+RDWBCsz3TVjBQ1r4DcsmZ2ocsDJd0UokBE9gHwfiWcgRcJw== X-Received: by 2002:a17:90a:ea97:b0:2d8:83ce:d4c0 with SMTP id 98e67ed59e1d1-2e06ae5fc6dmr127869a91.13.1727203141624; Tue, 24 Sep 2024 11:39:01 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:01 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Wengang Wang , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 03/26] xfs: fix extent busy updating Date: Tue, 24 Sep 2024 11:38:28 -0700 Message-ID: <20240924183851.1901667-4-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Wengang Wang [ Upstream commit 601a27ea09a317d0fe2895df7d875381fb393041 ] In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on extent busy, the relavent length has to be modified accordingly. Signed-off-by: Wengang Wang Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_extent_busy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index ad22a003f959..f3d328e4a440 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -236,6 +236,7 @@ xfs_extent_busy_update_extent( * */ busyp->bno = fend; + busyp->length = bend - fend; } else if (bbno < fbno) { /* * Case 8: From patchwork Tue Sep 24 18:38:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811099 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 195631AC8BD; Tue, 24 Sep 2024 18:39:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203145; cv=none; b=i6+YFutje5aQ2rndfww7NE4fMTuaXQMKKhLXlvLIsd+42rg8zLlefgX4b6PaRVJIWNebIKfherPQytmawbRyqZRbrYRNiuoD/NKM5lcWNQL2oEao3PECImzMLPUd2s7pgacuMaj0x7lWH9jq94+7Gt4E2nqg0oC3Dr97r6OSt8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203145; c=relaxed/simple; bh=DoKK0T6owU2oCFgEuUA7DpIrCpu5ek5m34Ax1vWHHVo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hq2U/NS/eyZHQd3/h43KZeg2MLfyinTN06mtk8p9Jh6Vshvq1YK0xweY3y9/QGcc8mo/Hq71HejvPNo1ALTu0aY6vALi+Xr9picnds8GqDlVAN0jnIUE5zUWGd6gH+ZMx7DGld3Z4EXT0aIPO4dhXn6yfbuM5giYfWC5b2LoTws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NI75MJJW; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NI75MJJW" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2d86f71353dso4109338a91.2; Tue, 24 Sep 2024 11:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203143; x=1727807943; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0TvNSLk2c8FU/nJcg36ArGm0aCSGcQH9Hwo8obidMwk=; b=NI75MJJWG555oACsTW/WkJ0L/pT5Mht/H2W6EftM9pwWpENVS2XzYAP2dic8dyVOex 6ue9XieHvy6stZ/RJ7+3nzdH5fvkvCAVtbJi7E8QJQ12sBjaKi1wEiQu4yfu+5k/xDHI mDGqQuV7ew1hY10o7p5ieDgp1vX8Drf/xk7AE0qEXrCOqYBzM3a4Ib91jMBW3tj6W2eT RmD6Npe3OILgCXB37tWmvr+5JuOgjNOK8I10eBWLXAmOPHqgaIZji1OHL9+O6FymNkby viAoTL94eR8JkW4dR+Lft6ZmDtji+HGRmd2X01ENOxPBm+rvk98rUBw8cr4vM0nPhlmW EIJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203143; x=1727807943; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0TvNSLk2c8FU/nJcg36ArGm0aCSGcQH9Hwo8obidMwk=; b=gdZFLKy4Z5Ys3w5fTo9GyCGDYu8COUiISOZ4+HMcXcSTnm+ji87rKUxvOX0m2LQnBf 5l60RnBcmgZrxaZcCTFFhPs7tEJ+5asnzpeY2fulju/rLanjEgLbN6M9d3DJEyu2Tv7s Xy+HXIc0EO//kKreTUlCmQ0ctPb1mAq2OLs30KSfbA/M8B7h2FAxzFzflQfPo0tirprk f9p5pn9J92fkcqOpKjTKLtrt1x52+Aww465VbjkvKDm2FOs6UBEP+S3rtN4QayP7cUEL gYRFePNIUT888xDOqZRhbELoVpGWmRBMGEbWu1TJLAXAMbW6kcI2MU3XJPqTfjvi9395 l8Ww== X-Gm-Message-State: AOJu0Yz1brYjn4gOl3oKOY+jx5XdZTSqG2AVR7/xLouYA6so/2wNg+tv WuYBazmH7tbQ04m2wYaT99KTV1UKM07MOoCKg61j7fMJNgWR6VZjCA78U716 X-Google-Smtp-Source: AGHT+IE+1UDUXSi+l7BAe/WJjLhmbdlM445IYyHfEjRi+Nk+hFeqOLmcq7TghTTGbOl7d+ZYGCHVKQ== X-Received: by 2002:a17:90a:8c14:b0:2d8:adea:9940 with SMTP id 98e67ed59e1d1-2e06ae5ec4fmr121821a91.16.1727203143081; Tue, 24 Sep 2024 11:39:03 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:02 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 04/26] xfs: don't use BMBT btree split workers for IO completion Date: Tue, 24 Sep 2024 11:38:29 -0700 Message-ID: <20240924183851.1901667-5-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b ] When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_btree.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 4c16c8c31fcb..6b084b3cac83 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -2913,9 +2913,22 @@ xfs_btree_split_worker( } /* - * BMBT split requests often come in with little stack to work on. Push + * BMBT split requests often come in with little stack to work on so we push * them off to a worker thread so there is lots of stack to use. For the other * btree types, just call directly to avoid the context switch overhead here. + * + * Care must be taken here - the work queue rescuer thread introduces potential + * AGF <> worker queue deadlocks if the BMBT block allocation has to lock new + * AGFs to allocate blocks. A task being run by the rescuer could attempt to + * lock an AGF that is already locked by a task queued to run by the rescuer, + * resulting in an ABBA deadlock as the rescuer cannot run the lock holder to + * release it until the current thread it is running gains the lock. + * + * To avoid this issue, we only ever queue BMBT splits that don't have an AGF + * already locked to allocate from. The only place that doesn't hold an AGF + * locked is unwritten extent conversion at IO completion, but that has already + * been offloaded to a worker thread and hence has no stack consumption issues + * we have to worry about. */ STATIC int /* error */ xfs_btree_split( @@ -2929,7 +2942,8 @@ xfs_btree_split( struct xfs_btree_split_args args; DECLARE_COMPLETION_ONSTACK(done); - if (cur->bc_btnum != XFS_BTNUM_BMAP) + if (cur->bc_btnum != XFS_BTNUM_BMAP || + cur->bc_tp->t_firstblock == NULLFSBLOCK) return __xfs_btree_split(cur, level, ptrp, key, curp, stat); args.cur = cur; From patchwork Tue Sep 24 18:38:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811100 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71A881ACE00; Tue, 24 Sep 2024 18:39:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203147; cv=none; b=ZEnsXqhISdqBSNbK1XlpTHzLS6egkWV3qMXO407SujYYIzyDCxhZHj/CxSPU+HAm3QEVc/1u0xSRmijk2+sWJ/JfcX7Ge1y3RDFFuT31PgO1Q3G7hAdIOz/6TeuvpshbQy/eoG1I5rlV+Ci+5FPPwAK46M7cbie0IaKRPDb8oKw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203147; c=relaxed/simple; bh=vHC7hp8kOa1EbTDhpw6Ze6OISmQ5dHHPQS17ISvY1RQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FAypyzSz842hkqTqYz/8IeaSbho8EnIJBx0tFWoDadhsrSgXibsRWI9d5pjh0RLiAKn50c29UNdAdaHEmK8gJJnlixR0jP3NG48F6CzTzmJwlEJiOP//OfznUOHcxptv1GdONWEzwgPvQ/qBE/XhpEiw1CKF26VJHQA60H+X0rQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VHGcePfY; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VHGcePfY" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-20573eb852aso1029985ad.1; Tue, 24 Sep 2024 11:39:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203144; x=1727807944; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KN6aynWIfGmMF6yj54h8N6+mGM/wmoW9+W4pkaU71NQ=; b=VHGcePfYUFXDQ8WRaTJvtj0B6iUuiXHXd53JdAGXdXO949mrSwt9J1dVg+ZFknrsAT GYJ/H0a0HkjFOmT7ViV0PLqcvwmiD3CsJ5xIB5X7vHmqM9zeFmK72HN1LGRGFc26Lmma wYFWisCqDigE5W90K/0bACgVh0qWnSwQd/YtiQ7l6CYn4xIEo95UtXiDWe6JebkuLkGR zcS6qYdIvCco5Uz21otcIAATo0n0kHfbXKNU/4N5hH2S6+Ci+XdiNc18wbbgtwTkFRaI 2Qd4WRZ89w9QLWrW0E44qMoBI4aaYV3Bw9FYLHEfmPawjZZ3Dg5qn9CazQC77hSkahW/ M22Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203144; x=1727807944; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KN6aynWIfGmMF6yj54h8N6+mGM/wmoW9+W4pkaU71NQ=; b=Wn7DccggqgyD9K81OMINERLcR+4bzcpxuZDfb38uKmlh4+hWdiY36NiuuiHZEKgXpY cCJpfFqs7rxKzfxjyixaKVgiv+qYREv/prXpErUqgjTJnRyAkD/Eev4spbdBJEZffgLm 9Dx/aT8Og18BjyiSJ+qhmJwIVWfhDqo84FrtXVIQGBkReGzOO2yskDd24QPCSR3oXemq nx+wqdnUU/9WrZ6BmNHq68snlRsQAzmRJsjYNNL5pNQfZ2RteFxZV52T+QB7x65emKMG aEbLIBZfNI1YnnjwvNBxi4UTlBq/juLYk/BO5LdmAgrYPzYEI3t3typDBmVtJmnEpNXQ 6hRQ== X-Gm-Message-State: AOJu0YyeMqQPwPwWS2DZyKFl7NKJ55WtUjY+Ruq6T/E+8HWxVwWjV9a1 knAvvMt7dEB7IHrjcQFh+0uwyPbLkf7aIjoT8eoWS3BX3m9cP7+dNuGLjZFD X-Google-Smtp-Source: AGHT+IGpEc/a44ag2d+L0Bm0vEbZShl3CA3fIU+mB6iXTtn/xxOFVo1Y3+x8ufJ8naoueU7tuy/6Rw== X-Received: by 2002:a17:90a:12ca:b0:2d3:acbd:307b with SMTP id 98e67ed59e1d1-2e05682b809mr6231549a91.10.1727203144377; Tue, 24 Sep 2024 11:39:04 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:03 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 05/26] xfs: fix low space alloc deadlock Date: Tue, 24 Sep 2024 11:38:30 -0700 Message-ID: <20240924183851.1901667-6-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 1dd0510f6d4b85616a36aabb9be38389467122d9 ] I've recently encountered an ABBA deadlock with g/476. The upcoming changes seem to make this much easier to hit, but the underlying problem is a pre-existing one. Essentially, if we select an AG for allocation, then lock the AGF and then fail to allocate for some reason (e.g. minimum length requirements cannot be satisfied), then we drop out of the allocation with the AGF still locked. The caller then modifies the allocation constraints - usually loosening them up - and tries again. This can result in trying to access AGFs that are lower than the AGF we already have locked from the failed attempt. e.g. the failed attempt skipped several AGs before failing, so we have locks an AG higher than the start AG. Retrying the allocation from the start AG then causes us to violate AGF lock ordering and this can lead to deadlocks. The deadlock exists even if allocation succeeds - we can do a followup allocations in the same transaction for BMBT blocks that aren't guaranteed to be in the same AG as the original, and can move into higher AGs. Hence we really need to move the tp->t_firstblock tracking down into xfs_alloc_vextent() where it can be set when we exit with a locked AG. xfs_alloc_vextent() can also check there if the requested allocation falls within the allow range of AGs set by tp->t_firstblock. If we can't allocate within the range set, we have to fail the allocation. If we are allowed to to non-blocking AGF locking, we can ignore the AG locking order limitations as we can use try-locks for the first iteration over requested AG range. This invalidates a set of post allocation asserts that check that the allocation is always above tp->t_firstblock if it is set. Because we can use try-locks to avoid the deadlock in some circumstances, having a pre-existing locked AGF doesn't always prevent allocation from lower order AGFs. Hence those ASSERTs need to be removed. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_alloc.c | 69 ++++++++++++++++++++++++++++++++------- fs/xfs/libxfs/xfs_bmap.c | 14 -------- fs/xfs/xfs_trace.h | 1 + 3 files changed, 58 insertions(+), 26 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index de79f5d07f65..8bb024b06b95 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3164,10 +3164,13 @@ xfs_alloc_vextent( xfs_alloctype_t type; /* input allocation type */ int bump_rotor = 0; xfs_agnumber_t rotorstep = xfs_rotorstep; /* inode32 agf stepper */ + xfs_agnumber_t minimum_agno = 0; mp = args->mp; type = args->otype = args->type; args->agbno = NULLAGBLOCK; + if (args->tp->t_firstblock != NULLFSBLOCK) + minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp->t_firstblock); /* * Just fix this up, for the case where the last a.g. is shorter * (or there's only one a.g.) and the caller couldn't easily figure @@ -3201,6 +3204,13 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->pag = xfs_perag_get(mp, args->agno); + + if (minimum_agno > args->agno) { + trace_xfs_alloc_vextent_skip_deadlock(args); + error = 0; + break; + } + error = xfs_alloc_fix_freelist(args, 0); if (error) { trace_xfs_alloc_vextent_nofix(args); @@ -3232,6 +3242,8 @@ xfs_alloc_vextent( case XFS_ALLOCTYPE_FIRST_AG: /* * Rotate through the allocation groups looking for a winner. + * If we are blocking, we must obey minimum_agno contraints for + * avoiding ABBA deadlocks on AGF locking. */ if (type == XFS_ALLOCTYPE_FIRST_AG) { /* @@ -3239,7 +3251,7 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->type = XFS_ALLOCTYPE_THIS_AG; - sagno = 0; + sagno = minimum_agno; flags = 0; } else { /* @@ -3248,6 +3260,7 @@ xfs_alloc_vextent( args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno); flags = XFS_ALLOC_FLAG_TRYLOCK; } + /* * Loop over allocation groups twice; first time with * trylock set, second time without. @@ -3276,19 +3289,21 @@ xfs_alloc_vextent( if (args->agno == sagno && type == XFS_ALLOCTYPE_START_BNO) args->type = XFS_ALLOCTYPE_THIS_AG; + /* - * For the first allocation, we can try any AG to get - * space. However, if we already have allocated a - * block, we don't want to try AGs whose number is below - * sagno. Otherwise, we may end up with out-of-order - * locking of AGF, which might cause deadlock. - */ + * If we are try-locking, we can't deadlock on AGF + * locks, so we can wrap all the way back to the first + * AG. Otherwise, wrap back to the start AG so we can't + * deadlock, and let the end of scan handler decide what + * to do next. + */ if (++(args->agno) == mp->m_sb.sb_agcount) { - if (args->tp->t_firstblock != NULLFSBLOCK) - args->agno = sagno; - else + if (flags & XFS_ALLOC_FLAG_TRYLOCK) args->agno = 0; + else + args->agno = sagno; } + /* * Reached the starting a.g., must either be done * or switch to non-trylock mode. @@ -3300,7 +3315,14 @@ xfs_alloc_vextent( break; } + /* + * Blocking pass next, so we must obey minimum + * agno constraints to avoid ABBA AGF deadlocks. + */ flags = 0; + if (minimum_agno > sagno) + sagno = minimum_agno; + if (type == XFS_ALLOCTYPE_START_BNO) { args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno); @@ -3322,9 +3344,9 @@ xfs_alloc_vextent( ASSERT(0); /* NOTREACHED */ } - if (args->agbno == NULLAGBLOCK) + if (args->agbno == NULLAGBLOCK) { args->fsbno = NULLFSBLOCK; - else { + } else { args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno); #ifdef DEBUG ASSERT(args->len >= args->minlen); @@ -3335,6 +3357,29 @@ xfs_alloc_vextent( #endif } + + /* + * We end up here with a locked AGF. If we failed, the caller is likely + * going to try to allocate again with different parameters, and that + * can widen the AGs that are searched for free space. If we have to do + * BMBT block allocation, we have to do a new allocation. + * + * Hence leaving this function with the AGF locked opens up potential + * ABBA AGF deadlocks because a future allocation attempt in this + * transaction may attempt to lock a lower number AGF. + * + * We can't release the AGF until the transaction is commited, so at + * this point we must update the "firstblock" tracker to point at this + * AG if the tracker is empty or points to a lower AG. This allows the + * next allocation attempt to be modified appropriately to avoid + * deadlocks. + */ + if (args->agbp && + (args->tp->t_firstblock == NULLFSBLOCK || + args->pag->pag_agno > minimum_agno)) { + args->tp->t_firstblock = XFS_AGB_TO_FSB(mp, + args->pag->pag_agno, 0); + } xfs_perag_put(args->pag); return 0; error0: diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 0d56a8d862e8..018837bd72c8 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3413,21 +3413,7 @@ xfs_bmap_process_allocated_extent( xfs_fileoff_t orig_offset, xfs_extlen_t orig_length) { - int nullfb; - - nullfb = ap->tp->t_firstblock == NULLFSBLOCK; - - /* - * check the allocation happened at the same or higher AG than - * the first block that was allocated. - */ - ASSERT(nullfb || - XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <= - XFS_FSB_TO_AGNO(args->mp, args->fsbno)); - ap->blkno = args->fsbno; - if (nullfb) - ap->tp->t_firstblock = args->fsbno; ap->length = args->len; /* * If the extent size hint is active, we tried to round the diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 372d871bccc5..5587108d5678 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1877,6 +1877,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough); DEFINE_ALLOC_EVENT(xfs_alloc_small_done); DEFINE_ALLOC_EVENT(xfs_alloc_small_error); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs); +DEFINE_ALLOC_EVENT(xfs_alloc_vextent_skip_deadlock); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed); From patchwork Tue Sep 24 18:38:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811101 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AF2C1AD3F6; Tue, 24 Sep 2024 18:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203147; cv=none; b=L0Sy2W0fJmke62o3ZlCqhIoxuj5yras5X8nRw5+nDfrXUrhpTsTII2YV2niYhqmf3EJyOdjbAsSqe/SIizwBcOeEtQ5RdS9sFQH08x+zQAMoEQmm24N64CDnpw1gWPdrBvCbm2ZgHJxyO0Un/h8KP0vL7bxtrpCWx3sZuEv0+cg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203147; c=relaxed/simple; bh=am+Y4T1NQl/tlZCwmeXR0o5BHsCaaofEOZ6qz64MIOM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oXfr2gBxzL3ZyoaMMKegpbdp/eLuJjh5UxJt8od3D9UzBPzZWZgeNcfi72d2JyiLtNlUU4LX9TG1vUPVpCc9YVkr6YWDX25rMoG81mcrtGuiAJki9rY6vrlTV+mDGv3J2DCG669Nj6a8CRTdPP9FAnvKnjzVUKm/3nkF4sAeEWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aUvObVI+; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aUvObVI+" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2059112f0a7so52892805ad.3; Tue, 24 Sep 2024 11:39:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203145; x=1727807945; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q843SYhjoS7VY8SW3D2zo4cn2xBljLaNd63UP9CYYI4=; b=aUvObVI+01Jzdegb2ywu5lYPNanes9RH5bA979PExz75cgrpyFh4DyOSe7cJDER/rT RT45XRN/OP0snFDuqfvnyMP2o4rT+0i98lNTjupkyFuB6bW/iTkxvWCENXi7yKSlOgVG z6Fbje3tKtphdO8ScezfHDwENJ9Q+RRCBE0+VYwVgUJoacfFm01N6tA7BUFP3fX5nkVO poSeYPQcKVYn2vGh2hMLnZOAOsmIQFH0UBoq3oJ0iHgAA5iwwh3NYE2i79blBh6b7UG8 MpBfRcLqR1J/3LJQiCPAeRz0LGbnx2F/Ix8Ub+XRR9x6CQopfE/xKkjsvj/V9ASjBJDw It7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203145; x=1727807945; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q843SYhjoS7VY8SW3D2zo4cn2xBljLaNd63UP9CYYI4=; b=Wqpgz8cUboqqNPx1kPFV9mZ3Wh+/IJBxeJ8bU1vwstypQQQZKB7tWo8QNT2nbWPhxf 88kpYs2uJg2uDcoJ7ZljWQ12xmDpDtTV/YhhUZg/qTMzjVNllAmmDgzk+NdK2/6Wb8Dh kwnn4O/LaqDOZoLcY9A8/WUb9a+huG2itRN69rFqSuWKYh1T7zCLqv9eDcm7uox+ASK9 he86xeHBwvaSbyCvfLm1TtqZwSCWe3ZWF9VDJpAQ++IHVgHkkHwk7UgJpmp51VGn2oNb QWbyXwwSFtNzWFj7IyUSzzv98fyC9SBb1y0qvguSkIHouLtvbIg2QgJzpTRh4m/FvcUk 5iYw== X-Gm-Message-State: AOJu0YxlAmTkrF3OuhcMi+QxhOavYz/Y8dwvIG7tqGqUjxnzI4WIolsA F8BRd+dkKA71fjQPIzUwiqlhWgpCRT1Db1MHEO8gDC7yKsKpltOpe+s7Vhqk X-Google-Smtp-Source: AGHT+IGJp5XbQG4YLviq+JW6fMywZOA3Vf6miLROyjqvTb70vBxmvGpub9SViLki3uyR6bvecgscmQ== X-Received: by 2002:a17:90a:8a14:b0:2d3:b748:96dd with SMTP id 98e67ed59e1d1-2e06afb8f02mr83366a91.25.1727203145525; Tue, 24 Sep 2024 11:39:05 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:05 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 06/26] xfs: prefer free inodes at ENOSPC over chunk allocation Date: Tue, 24 Sep 2024 11:38:31 -0700 Message-ID: <20240924183851.1901667-7-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit f08f984c63e9980614ae3a0a574b31eaaef284b2 ] When an XFS filesystem has free inodes in chunks already allocated on disk, it will still allocate new inode chunks if the target AG has no free inodes in it. Normally, this is a good idea as it preserves locality of all the inodes in a given directory. However, at ENOSPC this can lead to using the last few remaining free filesystem blocks to allocate a new chunk when there are many, many free inodes that could be allocated without consuming free space. This results in speeding up the consumption of the last few blocks and inode create operations then returning ENOSPC when there free inodes available because we don't have enough block left in the filesystem for directory creation reservations to proceed. Hence when we are near ENOSPC, we should be attempting to preserve the remaining blocks for directory block allocation rather than using them for unnecessary inode chunk creation. This particular behaviour is exposed by xfs/294, when it drives to ENOSPC on empty file creation whilst there are still thousands of free inodes available for allocation in other AGs in the filesystem. Hence, when we are within 1% of ENOSPC, change the inode allocation behaviour to prefer to use existing free inodes over allocating new inode chunks, even though it results is poorer locality of the data set. It is more important for the allocations to be space efficient near ENOSPC than to have optimal locality for performance, so lets modify the inode AG selection code to reflect that fact. This allows generic/294 to not only pass with this allocator rework patchset, but to increase the number of post-ENOSPC empty inode allocations to from ~600 to ~9080 before we hit ENOSPC on the directory create transaction reservation. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 94db50eb706a..120dbec16f5c 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1737,6 +1737,7 @@ xfs_dialloc( struct xfs_perag *pag; struct xfs_ino_geometry *igeo = M_IGEO(mp); bool ok_alloc = true; + bool low_space = false; int flags; xfs_ino_t ino; @@ -1767,6 +1768,20 @@ xfs_dialloc( ok_alloc = false; } + /* + * If we are near to ENOSPC, we want to prefer allocation from AGs that + * have free inodes in them rather than use up free space allocating new + * inode chunks. Hence we turn off allocation for the first non-blocking + * pass through the AGs if we are near ENOSPC to consume free inodes + * that we can immediately allocate, but then we allow allocation on the + * second pass if we fail to find an AG with free inodes in it. + */ + if (percpu_counter_read_positive(&mp->m_fdblocks) < + mp->m_low_space[XFS_LOWSP_1_PCNT]) { + ok_alloc = false; + low_space = true; + } + /* * Loop until we find an allocation group that either has free inodes * or in which we can allocate some inodes. Iterate through the @@ -1795,6 +1810,8 @@ xfs_dialloc( break; } flags = 0; + if (low_space) + ok_alloc = true; } xfs_perag_put(pag); } From patchwork Tue Sep 24 18:38:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811102 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8E7D1AD409; Tue, 24 Sep 2024 18:39:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203149; cv=none; b=CLF/E5Q40YyyMUZrG9tvSFLnxmN+cDV7T4ZOwm3c42rxXejzg776qNJzHEEooOVBqV4rbi+WVoWL0JCGLIORJPqblmRQSysrobwQMqm5UANFBPzO4gTMKoQ9nIr0K2xHKCY0G3pUzxUfpGheXHV3eku2AgBxOuj51PLZ/gOLVS4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203149; c=relaxed/simple; bh=nVkPEga5SASg04QlCwOXOWsDhCoXJMVkpFru9hMKEOU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=meJnL+Vkeo2+75rGUlqngkj610nrzcPTgzkJrzX1Kc7zVLbIWDAwOQtv7JfILZBxNAXkpvhdpKqOMqapU/Zv5cTf69DW7UNA+4qhyXmBcy2NLZ5lX/DUWjr4mr9jKJThUX+L823ANd3cOUiCee9ZIN8wAu8bNp34S1EqNasPxhM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CgXUYbqI; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CgXUYbqI" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2db928dbd53so4569800a91.2; Tue, 24 Sep 2024 11:39:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203147; x=1727807947; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Siw42Xfz6n/3KOogFrORz0GkFjImJ3UQ18BTeV0byfk=; b=CgXUYbqIJv8LEhuCmVLJsR/iwRAi/a9vOqWVGvfbZRYqbfkRi2b0EzO80cFRIvgBOj OqAZ2mZfF9nBJGbV6eAtw6xbQxwZv3sWyB6JxNTr4m/8dTrHy83FWBTarJdJu0FeyS7N wUCSmA6TA46KmsKMH4NAYo8PPkfS+Y25BWVlOInyJETWPLbVPCvhZMs4IQTp5/8+hp88 vOUw/20zolJBooWjrHNNiBcX7FijYT/kd6OYWGnJDsom/1pyLMYXkD4LY/yiQAjnicIH mg0lj6+Uo2fpuDdMLxmTwQVEuq5BZvq/s1ZsEhX+mAMnpxsIdSpBtKwmbumZb/ZfkLX8 wdRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203147; x=1727807947; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Siw42Xfz6n/3KOogFrORz0GkFjImJ3UQ18BTeV0byfk=; b=WDeCY6rZJ/SPRT/zyenm1mTdEaVB2n5ueskzQwmWop39a8kM4YZnyqXT8RzQKtqCa9 qCbHaHv9I4hxpKZIoxZcayuL0no3xPW/9OOoihAcXh1mpOXpKzSJdHFmrYWsWP7YJImX f3C4i0RVRDIUE2Na4cEXd9+5H0n86R9uoEEi2KvgayEchvUH0c6QpsmaOATmjnqljMLJ 1G9yu8cOaJNMHP8fdIvb2gGnGws/hCr8JtQAAe8axHTD8L6HCYXiNaHHqs3D20Ng+vA2 jN/zxqlMK17nHipvfCd9Sz86PA1a81e5aM1o/+ywMfk72EEwj+G2OHd4VcJXrb+a/oUK RTCw== X-Gm-Message-State: AOJu0YwYArYyyKNhvS5dTNrXl0PIsxNCat1PZlRCplXuA7t/dvYdEtiI 59KLSnCEg1gx6HCotEtMYmIL17QzfJOqMemOANFtIm3Fuy4PpJLsnzVahtAR X-Google-Smtp-Source: AGHT+IEQE1LCTSG9DGhQpN8V2ChKBNUATpWLXunAXQ0dvoFO5Azu/FxXwhAuJZasAbzS2UTElLLasg== X-Received: by 2002:a17:90b:234e:b0:2c9:7219:1db0 with SMTP id 98e67ed59e1d1-2e06ae282d8mr100897a91.3.1727203147035; Tue, 24 Sep 2024 11:39:07 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:06 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Allison Henderson , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 07/26] xfs: block reservation too large for minleft allocation Date: Tue, 24 Sep 2024 11:38:32 -0700 Message-ID: <20240924183851.1901667-8-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit d5753847b216db0e553e8065aa825cfe497ad143 ] When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value. This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation. We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it... Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap.h | 2 ++ fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++---------- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 018837bd72c8..9dc33cdc2ab9 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4242,7 +4242,7 @@ xfs_bmapi_convert_unwritten( return 0; } -static inline xfs_extlen_t +xfs_extlen_t xfs_bmapi_minleft( struct xfs_trans *tp, struct xfs_inode *ip, diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 16db95b11589..08c16e4edc0f 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -220,6 +220,8 @@ int xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp); +xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip, + int fork); enum xfs_bmap_intent_type { XFS_BMAP_MAP = 1, diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index cfa052d40105..18de4fbfef4e 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -213,18 +213,16 @@ xfs_bmbt_alloc_block( if (args.fsbno == NULLFSBLOCK) { args.fsbno = be64_to_cpu(start->l); args.type = XFS_ALLOCTYPE_START_BNO; + /* - * Make sure there is sufficient room left in the AG to - * complete a full tree split for an extent insert. If - * we are converting the middle part of an extent then - * we may need space for two tree splits. - * - * We are relying on the caller to make the correct block - * reservation for this operation to succeed. If the - * reservation amount is insufficient then we may fail a - * block allocation here and corrupt the filesystem. + * If we are coming here from something like unwritten extent + * conversion, there has been no data extent allocation already + * done, so we have to ensure that we attempt to locate the + * entire set of bmbt allocations in the same AG, as + * xfs_bmapi_write() would have reserved. */ - args.minleft = args.tp->t_blk_res; + args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip, + cur->bc_ino.whichfork); } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) { args.type = XFS_ALLOCTYPE_START_BNO; } else { @@ -248,6 +246,7 @@ xfs_bmbt_alloc_block( * successful activate the lowspace algorithm. */ args.fsbno = 0; + args.minleft = 0; args.type = XFS_ALLOCTYPE_FIRST_AG; error = xfs_alloc_vextent(&args); if (error) From patchwork Tue Sep 24 18:38:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811103 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42CC01AC45A; Tue, 24 Sep 2024 18:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203150; cv=none; b=n/4faGirtcc7bCN1qySVwAW3buV3B+Y5EdqVbuANhFktQ6/W+a5xs8H7SDE1SH8PI+IauhhMbkFKiv+DOJlR4UJdxv6sK/4NNp/myEvgKXZJnROEENSkHi7+IYec4czQGlCbX0vbAy82yxwmxbtXAWBu87KXkCSUf/rlJtbYhUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203150; c=relaxed/simple; bh=rmhGXfE3TQJAGDtwyY0FjUMILnUmNJ2ubfAfeNwHjYY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Plc/mIsi3MQyybFRfuZY8U9ZP7pt/nwIjzwPb3a36QWzCbGEH3eogB2U8qK6PlK1D6+4X5gmDfZyu61gAhWUDDpweCDtZDxfT3zXdLSl7dbSREVgn1hz9cCLTkRUAbObhLz66IodS9qslAjLLRH/QjBl3gQsBNGfWnCCPacAW1A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J5AJCctY; arc=none smtp.client-ip=209.85.216.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J5AJCctY" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2d88edf1340so3911147a91.1; Tue, 24 Sep 2024 11:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203148; x=1727807948; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TfsYzXfCH6bQpdZPV3lPQevwKiuUgBW7XGe7rqea52I=; b=J5AJCctY1T5M4h8kXg8+Rj0WtUbJr4UI1xQ8EcVh19b2Bfj5ZHxv5q/Z7s4GsZxWT6 7dOAh3JSx4EJWnOl6y9MNHt1SSTW1qWOL1cr5FnztYeBUc4mQ3344WMz2+gAhdWU3czn MBla+luPpXJ9yvpyER5dZhe/8ajP+QM2hwlpQxLhZkjKY5LpMeEvGDSqTZgDytG2AMFT edPjF8FJcr7DJvPMRToGSOPzrrnryGnmPuXvNF+phaU6t2SX2q9vBmYHKLXv6CzaFcjH 6rxgJT+EjxRrX2AKHdaAHMctoOTAnfaYADDErJBVLlt4seK8cPGKfSmgqwCqvXhUmzOU x5WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203148; x=1727807948; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TfsYzXfCH6bQpdZPV3lPQevwKiuUgBW7XGe7rqea52I=; b=WTf2XWOjPvdQdf4/YD8LJoWkQbxKazuEORfdIndGyOf43e8q5CDlxR6JFHhdsvFLxF Am2HBQF0i475+u+vCH3a1uIFeLl8thjgy5nVD1sDmgrG2FEnHETCdrunGxZQ+7NmLToB RaB/KOyczxXTRBBiv8b1hKJROHPzR2p4ZPW1iZBf0Z189a4+ce3r4cnXMkdOlbtoR64L h4cme+1A8dHSDSlgic8WIFOhPij9GTgDvHTn9Hje2Ojb5YpZTLQoXEfbNKhz6+YokyFZ Gi0WKZuy0k1RSGONif0+qseL34nGvJN5vuf3QWrLp1U1uBZPeS1/dVqIDdS/MJFbyjFu F/gQ== X-Gm-Message-State: AOJu0YzIIRvYW1/zRpLZxDgmi1nJaGnT+mzghXMMp8A9NyzVvlLPT7Lz vA7x63X6YpSDIbcXdPjgd+MNHOdgs4UIgAxEQfAJmK7DTnArokST8xymlpqA X-Google-Smtp-Source: AGHT+IFY4v4cUQuBzkFDB8f8lnlwErJEM6mUTvtRej1GwnPIlBhG+lodRZjTmA+qjDHx6axbHEoimA== X-Received: by 2002:a17:90a:b308:b0:2d3:c6dd:4383 with SMTP id 98e67ed59e1d1-2e06ae4ca6amr96094a91.16.1727203148380; Tue, 24 Sep 2024 11:39:08 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:07 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com, Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 08/26] xfs: fix uninitialized variable access Date: Tue, 24 Sep 2024 11:38:33 -0700 Message-ID: <20240924183851.1901667-9-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 60b730a40c43fbcc034970d3e77eb0f25b8cc1cf ] If the end position of a GETFSMAP query overlaps an allocated space and we're using the free space info to generate fsmap info, the akeys information gets fed into the fsmap formatter with bad results. Zero-init the space. Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_fsmap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index d8337274c74d..062e5dc5db9f 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -761,6 +761,7 @@ xfs_getfsmap_datadev_bnobt( { struct xfs_alloc_rec_incore akeys[2]; + memset(akeys, 0, sizeof(akeys)); info->missing_owner = XFS_FMR_OWN_UNKNOWN; return __xfs_getfsmap_datadev(tp, keys, info, xfs_getfsmap_datadev_bnobt_query, &akeys[0]); From patchwork Tue Sep 24 18:38:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811104 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C824C1AC8A7; Tue, 24 Sep 2024 18:39:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203152; cv=none; b=SARsqzO4JK0ZQo4ZSDMfMcPeToZHI22/Mx4oZirNTD0RaV2ExnJQ+X/1Hpa4OFvHhBx+IAvFGB6XumbxsftjOWji3356mlbglr/DiPmx+XnwIUlWDKypgsYVgiSpKd4HwWFnOYENmai/UeV3H9zVmVaPu7bkzk7L/zxEXUzSDBQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203152; c=relaxed/simple; bh=6NBdnA7+RA8YF9keQxng4kuQgB6muPfTAlauOj+vvww=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WfWidPY71DtI+67psMPivhZ2sO6l/2AFBIeyZem4FIQo+0zdsqiZ24Z09bi6D3OFV6P9CpEg4fGnsmVL3wmnqORwrwS2OeesOCGQY+aH4kCStk+BXPuUOjIgf0RMkd9OO6/0qn+0nbAb122GZFgGSkjCdBdOChlqYq8gmmpU1DA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GXF0olw1; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GXF0olw1" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2d889ba25f7so4010564a91.0; Tue, 24 Sep 2024 11:39:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203150; x=1727807950; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=phA2n2FlmmGiuLSoXgdaGz08N8wCi6ROxDoyreR8NGU=; b=GXF0olw1NCYww3ZTl7Myoj2aspe/K+9nenMlB+CfhPl3y2VFNsO1aIEu324f93naGZ mPy/q/e1ifcuXR3adeeucr/EI1FcDSXkm0kUz3qKF7CVfuyuQmmYSYXkWeaSsI+LSV17 PPo8t2Zg4yypahuFqfuNm7lbU73LfibcXzBGzVMI9vlEkxKF3GZ/4v3Rildpw4cGJPqW OuH8LFqPkH6Wtm5pa7rXmca8gkAIzPwue7b2LYzY8JW05trL4VSF0PAC5I3vNC1TeQwY FOaBLajoHq9FYqn0ow8RiawNrA97A+y+HmPOeEybmK2H2zuAeUbEMaLXqwBq+gx3v5Mj bs0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203150; x=1727807950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=phA2n2FlmmGiuLSoXgdaGz08N8wCi6ROxDoyreR8NGU=; b=doPTLbVdQEVJRYFGb7Dhv86Zk1x0CcyAtiLv3YvV9dgkj5oTutA3d2n2c3dq7XG/QN xSoslTwvuGUujwvQpPjWqxvEPAbtseo7kFGFVgJfvwcIEu1cx44uofiAo6gZVf90AAOz 7gK9MGEbRX786FIMgpv6jxpWPzFv7f5Mf5G39XWXXrCM+JI5TRzkjP+KVFuusLjGrUq6 /Oe25E/X+Sn/ZGMlteHQfFRzrU4oIICe0LTHPvevAefEYTLskcJhoF2CoTahb7w7lkcb ZIbwX6+Cl3uTDtKA7oBvwRX9lvSwaltwL2w2AL0kSjb0wSSTFLDtSILsbWMvuvuUmfLz PoOg== X-Gm-Message-State: AOJu0YwknhfglnvMVNIBHHqJdnApA5uPhlarKTGVUoovm+jtjI5IZXG6 NlGF4BNVWyPnjHiU7F986jjvm7ldLqi52lN4koIcejOPGgA7KjYdTrL4wHcy X-Google-Smtp-Source: AGHT+IHlOVr9A/RaqRigDxamW+bSnRSqNf/iOSlH/EtNDwQ7fxgFuNiUCpiXSBG6eVc9TGdU/hoFxA== X-Received: by 2002:a17:90a:2dc6:b0:2d3:bd32:fc7d with SMTP id 98e67ed59e1d1-2e06aff48f6mr76668a91.39.1727203149753; Tue, 24 Sep 2024 11:39:09 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:09 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Pengfei Xu , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 09/26] xfs: quotacheck failure can race with background inode inactivation Date: Tue, 24 Sep 2024 11:38:34 -0700 Message-ID: <20240924183851.1901667-10-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 0c7273e494dd5121e20e160cb2f047a593ee14a8 ] The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_qm.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index ff53d40a2dae..f51960d7dcbd 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1321,15 +1321,14 @@ xfs_qm_quotacheck( error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); - if (error) { - /* - * The inode walk may have partially populated the dquot - * caches. We must purge them before disabling quota and - * tearing down the quotainfo, or else the dquots will leak. - */ - xfs_qm_dqpurge_all(mp); - goto error_return; - } + + /* + * On error, the inode walk may have partially populated the dquot + * caches. We must purge them before disabling quota and tearing down + * the quotainfo, or else the dquots will leak. + */ + if (error) + goto error_purge; /* * We've made all the changes that we need to make incore. Flush them @@ -1363,10 +1362,8 @@ xfs_qm_quotacheck( * and turn quotaoff. The dquots won't be attached to any of the inodes * at this point (because we intentionally didn't in dqget_noattach). */ - if (error) { - xfs_qm_dqpurge_all(mp); - goto error_return; - } + if (error) + goto error_purge; /* * If one type of quotas is off, then it will lose its @@ -1376,7 +1373,7 @@ xfs_qm_quotacheck( mp->m_qflags &= ~XFS_ALL_QUOTA_CHKD; mp->m_qflags |= flags; - error_return: +error_return: xfs_buf_delwri_cancel(&buffer_list); if (error) { @@ -1395,6 +1392,21 @@ xfs_qm_quotacheck( } else xfs_notice(mp, "Quotacheck: Done."); return error; + +error_purge: + /* + * On error, we may have inodes queued for inactivation. This may try + * to attach dquots to the inode before running cleanup operations on + * the inode and this can race with the xfs_qm_destroy_quotainfo() call + * below that frees mp->m_quotainfo. To avoid this race, flush all the + * pending inodegc operations before we purge the dquots from memory, + * ensuring that background inactivation is idle whilst we turn off + * quotas. + */ + xfs_inodegc_flush(mp); + xfs_qm_dqpurge_all(mp); + goto error_return; + } /* From patchwork Tue Sep 24 18:38:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811105 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 404511AC45A; Tue, 24 Sep 2024 18:39:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203153; cv=none; b=UMGxj7qr6oapzLxP59DGKeCHcqMITuSHoHTkaJxYTQxIo6KKzg4F16X84uKDS1pEULOrI6h4dqp04mDkr4WJEVh/aPLLFTPpb4XXccWUg1RI02jGFHJF4KNCfExM7zqZL/4NA8t5cQ4ztaz7/ifWXQ5j9+w2pjk+uuL7XZGluNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203153; c=relaxed/simple; bh=8yYeCIKmKPe5of7iXfagMj/76j9/rMFDJcPT0YINd00=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JJZilGIQ58EUl3vKhFpSlN/R7KL9L/HoXUQvyVas7+xWOm/RcNa9FePchmig0itcpk0ZJAJXnTPhhMvByKgQmkL7RMqwL94s8/zOo+PXYVbeuwyE9f2FertapduFOjuoQyPIm0D4EFGZ/0KKQmAXIUYLabNlov5ZTfxN+qT7ZUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Fg7SHq1f; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Fg7SHq1f" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-7c6b4222fe3so3769079a12.3; Tue, 24 Sep 2024 11:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203151; x=1727807951; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iYr8XI7x/C+klcH+M93RinC8NekpG56P3408vvG5ctE=; b=Fg7SHq1fj9dkUEnsXVjSdeL/71blIjyflMnTbBHEmFfLpewyDySkhXX8xabRfd2AX8 EBenSkiOUGCfDQmtUsmCpvSbP/7B5SFCrjL/d2N07XZgTO4vbCLTWCvandhbhLOnQq0L jXdJHnXrDKb5mU1nJTkc1RD4xggHni4xSMnw0Yf99pAaTyezKtvSAGNy80+sJOwx+x+g b5OfeQe7A1kBGuCIMBtLIq2lDpz8OP6ZoLsqiYLYRlTvEhPW2ldEtuyfANTGTivPesMi 1rryHmd9s/ipZ64VLxjZJHh1AN3j6azUhxXYNOEWNvOiOeEjsbbNSfXcotEOeHDsZLnz mbMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203151; x=1727807951; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iYr8XI7x/C+klcH+M93RinC8NekpG56P3408vvG5ctE=; b=TvL6PkY6BNeoafuu3gPNNDVDrlAJ1F/M9IJWAFsP4V38Zz0/1enCUXTCF2JIy0H22U 186lg6zxbsFmcY/5Jqde587gs1/pBOpp55JFwzcmCdRKTtc0pgQjELW1CA8PrMvwgmvw gJameQ7lXO02pttePZBSoGLibw/lZCXeV2IxkdwR66I5C8MJv07ZKzviJACjWFcVg5Pb +p9kiVwfENhWNKViELQRRiPL1t6ZIJDa+/R3Np+jIhJyK8wc4GDzk6DaW3DbuJ7kwSGK 02Az93WSBha3Vn6a444lGAOPyWl0Ee/XuJgCRF6JM17ua50ge2G6UNdTdVAGBPi1r9Is H9Hw== X-Gm-Message-State: AOJu0YwaI+EUf/1+xi/p4NrELb/ODW2NxDPnfUVcExta2xLG7AD1vEQ5 tqy4myrSXD1B99wyzK8nsMKskY56CRk4OO/b9V4Pf2q1M7yXldp/ePYzd6D4 X-Google-Smtp-Source: AGHT+IGgGXLrZZJ+c78/UuCeudvABMzspR5IrYTrJ3b/e5umRTuhMBm1/M4N/VR/u5INid4eq/5PtA== X-Received: by 2002:a05:6a20:ac43:b0:1d2:ea38:3774 with SMTP id adf61e73a8af0-1d4e0bbe334mr115377637.32.1727203151287; Tue, 24 Sep 2024 11:39:11 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:10 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Ye Bin , "Darrick J. Wong" , Dave Chinner , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 10/26] xfs: fix BUG_ON in xfs_getbmap() Date: Tue, 24 Sep 2024 11:38:35 -0700 Message-ID: <20240924183851.1901667-11-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ye Bin [ Upstream commit 8ee81ed581ff35882b006a5205100db0b57bf070 ] There's issue as follows: XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422 RIP: 0010:assfail+0x96/0xa0 RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000 RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0 R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c FS: 00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfs_getbmap+0x1a5b/0x1e40 xfs_ioc_getbmap+0x1fd/0x5b0 xfs_file_ioctl+0x2cb/0x1d50 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happen as follows: ThreadA ThreadB do_shared_fault __do_fault xfs_filemap_fault __xfs_filemap_fault filemap_fault xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag xfs_getbmap xfs_ilock(ip, XFS_IOLOCK_SHARED); filemap_write_and_wait do_page_mkwrite xfs_filemap_page_mkwrite __xfs_filemap_fault xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); iomap_page_mkwrite ... xfs_buffered_write_iomap_begin xfs_bmapi_reserve_delalloc -> Allocate delay extent xfs_ilock_data_map_shared(ip) xfs_getbmap_report_one ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0) -> trigger BUG_ON As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's small window mkwrite can produce delay extent after file write in xfs_getbmap(). To solve above issue, just skip delalloc extents. Signed-off-by: Ye Bin Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_bmap_util.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 867645b74d88..351087cde27e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -314,15 +314,13 @@ xfs_getbmap_report_one( if (isnullstartblock(got->br_startblock) || got->br_startblock == DELAYSTARTBLOCK) { /* - * Delalloc extents that start beyond EOF can occur due to - * speculative EOF allocation when the delalloc extent is larger - * than the largest freespace extent at conversion time. These - * extents cannot be converted by data writeback, so can exist - * here even if we are not supposed to be finding delalloc - * extents. + * Take the flush completion as being a point-in-time snapshot + * where there are no delalloc extents, and if any new ones + * have been created racily, just skip them as being 'after' + * the flush and so don't get reported. */ - if (got->br_startoff < XFS_B_TO_FSB(ip->i_mount, XFS_ISIZE(ip))) - ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0); + if (!(bmv->bmv_iflags & BMV_IF_DELALLOC)) + return 0; p->bmv_oflags |= BMV_OF_DELALLOC; p->bmv_block = -2; From patchwork Tue Sep 24 18:38:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811106 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEFBA1AC898; Tue, 24 Sep 2024 18:39:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203155; cv=none; b=lFRdpBA1CXx0p+AaeUu3R9K4oKvbaQknl8tHHDNJ32JM0LZfRiOjAOew0wvx2nDL2eLmpDxSmFemtdI/IK23KIohQchbFR2E+8YtJ4BvwyEtGpSWQ4WV2C1OaK9hLhRg67h8ZpyffuLHyXmGlCRJH/LNF5DGNkMgC9uoKBj7kCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203155; c=relaxed/simple; bh=QrHQCREmjZbjtLXYa0nt5gXNnVYAhJpANMGTgB1TQIE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FI4yldzJf+kjHd2kEiVlvyC/AIphcwE1zWgGcuadM9b4LQri/lp3GJaPXbBgQMLkpN2yvZPriNCq4/X2XOfbBLDtnHXnkTGOSaaJRoOWaGzSwKP5Z2Jx5eeFUFwHtInR/E/FZtF717tQrnite/9rjzlIa4ED4Ca+GvcyK4c6R6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=d5/uPWKT; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d5/uPWKT" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-20573eb852aso1030965ad.1; Tue, 24 Sep 2024 11:39:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203153; x=1727807953; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mmbUGPdREXaFhRVGUVBBozF+DuSG+yeczOURWI6+Go8=; b=d5/uPWKTslrkcf2Z9ei0CDrBwKKFpoxtn0vmnuJ9HS7cRM9gslYW8+V32tWU0F2Vms zo9FTTTzpoTumTiCNLzJbcZMkFkRLZTTr+mKn0jdppoU3Mt37UK5QuVJnBPwTuBoBKnr XSTtz1lnTp2aUBq7jFs4LtGk0BxyDHAUO/j/P/6miRoFKlMxw4PgwCcT4O6HeB9fIEB4 PtNlX5Doo9mV6YNKHTrA3MKemNJYFOJSowtV4vI0k9TzC4NQkg/+J1qkd5j5Rl5Z7RXg BtR0eqFvp/TlFg0+eHN357os0NY8b03sTdDRLqwTbfgNyHtpOuJJH7qLG3em5pdI3Cud lwYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203153; x=1727807953; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mmbUGPdREXaFhRVGUVBBozF+DuSG+yeczOURWI6+Go8=; b=RuV3IhuCsQnT8magAGzuAb0WsKcD+90YiXqqGbQUi9UroVYQE/q3ZehD/uYs1aUr25 snJHIcSjjnYCd+jODNUEX0/BdF/6EboYsthUpsnX9Bv5HLmM4vSptsuAuew89DQsaiwY csxrVCxr+9j+eK8/A2nwkhPF6H5YZGB8jWGqlLQ0qsftkoxy7foCtI1z8yGMdO/zUkbF txm/aK/WG0C8DF93/4NUwwJ6/oNdXjjTZ7jtrvvNom5tv3RsD+8JeFIxoVcvNc8/TKIA S4/3tTmNWwuHW2H31cMK3q4EeqdGvKlOX09yJL99K/d0NWcGZB1BJ0udELu38GSvnQuJ 45aA== X-Gm-Message-State: AOJu0YyDqSinCxypXIMUY0HXTNha4Ht/ULTICKiCmikpIBxz7dnKCIwE NuxAby3qnZwTAUGef0EPJJgn7AIKcAvHzIB+DmPJq10e2oNPaibMHsvp1R1+ X-Google-Smtp-Source: AGHT+IEl1/DTPPVlxVTsOCFi6B+sm6cl7VFHAejl2xqd1IIlK9aGBgQuEcko+4LdO9FYl6Bx8yCJ6g== X-Received: by 2002:a17:90a:d18b:b0:2c9:7343:71f1 with SMTP id 98e67ed59e1d1-2e06ac390c5mr218008a91.14.1727203152647; Tue, 24 Sep 2024 11:39:12 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:12 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , yangerkun , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 11/26] xfs: buffer pins need to hold a buffer reference Date: Tue, 24 Sep 2024 11:38:36 -0700 Message-ID: <20240924183851.1901667-12-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ] When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index df7322ed73fa..023d4e0385dd 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -452,10 +452,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -470,13 +478,30 @@ xfs_buf_item_pin( trace_xfs_buf_item_pin(bip); + xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); } /* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -493,24 +518,18 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin(bip); - /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters); - /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active. + */ + if (!freed) { + xfs_buf_rele(bp); return; + } if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -522,6 +541,15 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin_stale(bip); + /* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will @@ -538,16 +566,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); } STATIC uint From patchwork Tue Sep 24 18:38:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811107 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D77831AC8BE; Tue, 24 Sep 2024 18:39:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203156; cv=none; b=GKBbP1DabHznyb232u/jWlWy5lbYPahctsJfNRqhcTQYbll8mqVVVwxlkryZzpH7rV9MUxdipgxtkDpGO6CsNgTk9LpLvsuFc6DIidBpx/ihdj07+i9kMzRMLEEKR7dgybdiaMmVbQ/gwp+tFdmADiFJ7DuWPQ8QmVRzK9pO1bE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203156; c=relaxed/simple; bh=GSoJ+xfdvN9VGcn4z9mhLFvNIWYJJi4wuZTOf4puMQk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T24pwlc1QnCUjamLdFjRd3fIYW6AZn6xTNl12q8xSEAR5FTMZIONP0eHHkLfIepo+i2SqCJAGmjYCk4teYpQysGX+FBG1onOgBSZ/dD7lFXU/oEufaSxnrrhTtPCSBIFReuBvfb6IIhifi6kdAO4QNlhoayDK4V0uwUFt2LuHqk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BRi+flTJ; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BRi+flTJ" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-656d8b346d2so3680722a12.2; Tue, 24 Sep 2024 11:39:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203154; x=1727807954; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QG5QFevg4WSS02ZDjik0CgJaj/D5fK0JboTy2CiftOk=; b=BRi+flTJX9f6lkBLEq/nrBGXCFHmVmhyf8mES9uIv0y+UjOnxqn6Fsr3Ofi9zDFA1M QWIH0S7plMyJ+ebvbjW5vM2y8CXQAa3PuAPgSbcCvpjCf3lw2u1XCdogbl0nRuXWfJEu Y8JQpkEhD4FMKex+jtOqdk0Lkxd539RmMfoeE6+Raory96dJAWHsJaHTHmRU4glW0kQs 2LRRtz8+Rq9+PNxH0hjRz5OlJrPFKcsmrhCfANpZ98pNpGRuNz4Vj0/RlWPHdSOZtyhX HN7dmhzrtw5RSLGtwPjcWPUURrMFnTOSMvLsBP8qzZ0JDC22Kb3+UWR5njg5ruGKSi3c xbsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203154; x=1727807954; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QG5QFevg4WSS02ZDjik0CgJaj/D5fK0JboTy2CiftOk=; b=j6Y/szgdGlDuEhymA5xGG2h+OMCga0LiTwPuTR39SqYLD7a8arhP4TeMi2gN8eHBxn km9O7W+G19+7mtqGNSIet1Et5un5Y0jWLIOX7OQr5c5gxBPDD2tUo4miuXW63n9DKgrM siKe+VUY9kf0QBYQjr09MRQONO2d7jKCBc2/ij93X+L+pCHm/WvvLCE1x/0XpMZ3dKFA hjWFnr4iL1YQ6W9VruMZ+h0frFs/g4Zk3M+LzwwyJALINFyuwH82MLvlAmOaNqn8czwP wM7JVrvckryM/2mtZ3JmBFoMe8eiCgPi6P/vuuKKyEqR9ZdYBORdTuFJAmKMS4webZK8 mITA== X-Gm-Message-State: AOJu0YyXBgreB//yh8ia1qs5BZ00HrQKT4kYTJG0lhcCamHUKcjGvwhV ljz0vu1QZpTGUfpf+jTx+3nY45y/Kt9hXdZbrl2CbsmNRwKqtTJCgGak0DaG X-Google-Smtp-Source: AGHT+IFVfWnKbiC5eFPmsAKgaYQ+s01s8fdmjPgzkyb7ZS58wywTxosSDvJbUhFFj0IzySfBzHlKXQ== X-Received: by 2002:a17:90b:4d90:b0:2c8:e888:26a2 with SMTP id 98e67ed59e1d1-2e06ae6200fmr114233a91.13.1727203153937; Tue, 24 Sep 2024 11:39:13 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:13 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 12/26] xfs: defered work could create precommits Date: Tue, 24 Sep 2024 11:38:37 -0700 Message-ID: <20240924183851.1901667-13-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit cb042117488dbf0b3b38b05771639890fada9a52 ] To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_trans.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 7bd16fbff534..a772f60de4a2 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -970,6 +970,11 @@ __xfs_trans_commit( error = xfs_defer_finish_noroll(&tp); if (error) goto out_unreserve; + + /* Run precommits from final tx in defer chain. */ + error = xfs_trans_run_precommits(tp); + if (error) + goto out_unreserve; } /* From patchwork Tue Sep 24 18:38:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811108 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7961B1AD5EB; Tue, 24 Sep 2024 18:39:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203158; cv=none; b=XGNoH+xOx/Qw6RHJLZJm/5VMgOMfZ9eUjuPdX+YIFa/kR98ykdAyYH4ekiSeh1YXUwgAZ43YhuRghlZHAxmgT/iGyQYaEFohOw4BkoVzK1yD7/IADlDMqUUmqRo2uYpfkwMgQSAxN7dW/0f1H4ypD36tRF0Ior7zdhvdIHju6vA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203158; c=relaxed/simple; bh=YRDazTeOFomx0ThoqGkq6byg6KzCcYzPscLBTVmAQKM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cogLXGl9tTLh/to2cErw7w9Y14DxSooFyy0OqMy8krrEtC/uiqqb8Ck8TScibMWLB/jdU3aYDPn+U0PkSg7jRpmTy0xzVg1STPQgu1NF+Cuy6EodkADDYX0ok0LRigTU5uLx5CXEABVt1fAebU2WJtYBs+IDj7K4OWdhtVVuYAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jdKQ9BKH; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jdKQ9BKH" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2d873dc644dso4568386a91.3; Tue, 24 Sep 2024 11:39:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203155; x=1727807955; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tekeTkwOi3rMYKlOM1MMVKO3o/Nz2Zsv4uC1lEB9km8=; b=jdKQ9BKHgLYrk81VC2vCg+8QiRicNXTkvAHAyzj12WYZzvOP2KGwJ5v/7wR44uhj/4 0rb///Yoap9j/SSrVqsdPkUpa0fpl3RbUSGE3j7OnlAdulpL5J9smd8yme+ne//BO10j nUpHzqYSP81N1gh9B7JIQYzHecUMFBCRDhjj+EibTuIX80UN8X/tNhgM4OjAbRKMhXIt TkerOehI9fmV20d/syKARdW/xfbJnIQEzU0pvTFEADea1BRS9p+b8tbu16bICs8/1QkD +MKq50eIY9S+CPD+ccl/godAC0WTNCZkwlCqWNYyeTFM/kPYCLYc10XsKDSACSrc0KB2 dBRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203155; x=1727807955; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tekeTkwOi3rMYKlOM1MMVKO3o/Nz2Zsv4uC1lEB9km8=; b=gRrrH+T5i6MlSEzCoBzy2XDbnnQ6JYkEPJjif6peQ2cM/2OmOEz7JWvopofl2fD7gc 9UjtnOZXQfNSBaF5T3ihswLAbM+8DKvRZM1roQsbHmPjWJ/qiqbMztTdb+Hy3j6whq5p sV8z70RbkDWLS0jHxSBS5B8ncbQYsgAD62EMlg9YatX3y2QDeqSHT8PfeJoBLbzf24PE COwiHruM6bTW+1XH5t5tUA/RQYN7EIbnxUmiMRiTIcmNzR/N5BEWtcTY7A6+JxiNznc0 0nXGwzWBbDyJTonE4iO5BNf/zfiS0/qDLI4H25PUlsHfi/amzh1J9QnNF+DV7sAG7xYe wJGQ== X-Gm-Message-State: AOJu0Ywa2BfcMzlYTk4CSnqs6KtZ8RCUz1Xs3gSx7Jac9SsgGCLvssEd juz7sZq4bhOSQg+UtP+OMYUsdYnvm0jmE8cdPkI0nhfDdRZ/6B2jIH+YUWHi X-Google-Smtp-Source: AGHT+IFFzN29Lk+GdywawMTr1/m1/vtOnaWmvVtbpIvKsNAI9hd70C6tRwMbOz57M740XA7vjk32uA== X-Received: by 2002:a17:90a:514f:b0:2d8:9c0a:8553 with SMTP id 98e67ed59e1d1-2e06ae7be13mr99859a91.21.1727203155220; Tue, 24 Sep 2024 11:39:15 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:14 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Christoph Hellwig , "Darrick J. Wong" , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 13/26] xfs: fix AGF vs inode cluster buffer deadlock Date: Tue, 24 Sep 2024 11:38:38 -0700 Message-ID: <20240924183851.1901667-14-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 82842fee6e5979ca7e2bf4d839ef890c22ffb7aa ] Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_log_format.h | 9 +- fs/xfs/libxfs/xfs_trans_inode.c | 113 ++---------------------- fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.h | 1 + 4 files changed, 166 insertions(+), 106 deletions(-) diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index f13e0809dc63..269573c82808 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -324,7 +324,6 @@ struct xfs_inode_log_format_32 { #define XFS_ILOG_DOWNER 0x200 /* change the data fork owner on replay */ #define XFS_ILOG_AOWNER 0x400 /* change the attr fork owner on replay */ - /* * The timestamps are dirty, but not necessarily anything else in the inode * core. Unlike the other fields above this one must never make it to disk @@ -333,6 +332,14 @@ struct xfs_inode_log_format_32 { */ #define XFS_ILOG_TIMESTAMP 0x4000 +/* + * The version field has been changed, but not necessarily anything else of + * interest. This must never make it to disk - it is used purely to ensure that + * the inode item ->precommit operation can update the fsync flag triggers + * in the inode item correctly. + */ +#define XFS_ILOG_IVERSION 0x8000 + #define XFS_ILOG_NONCORE (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \ XFS_ILOG_DBROOT | XFS_ILOG_DEV | \ XFS_ILOG_ADATA | XFS_ILOG_AEXT | \ diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 8b5547073379..cb4796b6e693 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -40,9 +40,8 @@ xfs_trans_ijoin( iip->ili_lock_flags = lock_flags; ASSERT(!xfs_iflags_test(ip, XFS_ISTALE)); - /* - * Get a log_item_desc to point at the new item. - */ + /* Reset the per-tx dirty context and add the item to the tx. */ + iip->ili_dirty_flags = 0; xfs_trans_add_item(tp, &iip->ili_item); } @@ -76,17 +75,10 @@ xfs_trans_ichgtime( /* * This is called to mark the fields indicated in fieldmask as needing to be * logged when the transaction is committed. The inode must already be - * associated with the given transaction. - * - * The values for fieldmask are defined in xfs_inode_item.h. We always log all - * of the core inode if any of it has changed, and we always log all of the - * inline data/extents/b-tree root if any of them has changed. - * - * Grab and pin the cluster buffer associated with this inode to avoid RMW - * cycles at inode writeback time. Avoid the need to add error handling to every - * xfs_trans_log_inode() call by shutting down on read error. This will cause - * transactions to fail and everything to error out, just like if we return a - * read error in a dirty transaction and cancel it. + * associated with the given transaction. All we do here is record where the + * inode was dirtied and mark the transaction and inode log item dirty; + * everything else is done in the ->precommit log item operation after the + * changes in the transaction have been completed. */ void xfs_trans_log_inode( @@ -96,7 +88,6 @@ xfs_trans_log_inode( { struct xfs_inode_log_item *iip = ip->i_itemp; struct inode *inode = VFS_I(ip); - uint iversion_flags = 0; ASSERT(iip); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -104,18 +95,6 @@ xfs_trans_log_inode( tp->t_flags |= XFS_TRANS_DIRTY; - /* - * Don't bother with i_lock for the I_DIRTY_TIME check here, as races - * don't matter - we either will need an extra transaction in 24 hours - * to log the timestamps, or will clear already cleared fields in the - * worst case. - */ - if (inode->i_state & I_DIRTY_TIME) { - spin_lock(&inode->i_lock); - inode->i_state &= ~I_DIRTY_TIME; - spin_unlock(&inode->i_lock); - } - /* * First time we log the inode in a transaction, bump the inode change * counter if it is configured for this to occur. While we have the @@ -128,86 +107,10 @@ xfs_trans_log_inode( if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) { if (IS_I_VERSION(inode) && inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE)) - iversion_flags = XFS_ILOG_CORE; - } - - /* - * If we're updating the inode core or the timestamps and it's possible - * to upgrade this inode to bigtime format, do so now. - */ - if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && - xfs_has_bigtime(ip->i_mount) && - !xfs_inode_has_bigtime(ip)) { - ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; - flags |= XFS_ILOG_CORE; - } - - /* - * Inode verifiers do not check that the extent size hint is an integer - * multiple of the rt extent size on a directory with both rtinherit - * and extszinherit flags set. If we're logging a directory that is - * misconfigured in this way, clear the hint. - */ - if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && - (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { - ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | - XFS_DIFLAG_EXTSZINHERIT); - ip->i_extsize = 0; - flags |= XFS_ILOG_CORE; + flags |= XFS_ILOG_IVERSION; } - /* - * Record the specific change for fdatasync optimisation. This allows - * fdatasync to skip log forces for inodes that are only timestamp - * dirty. - */ - spin_lock(&iip->ili_lock); - iip->ili_fsync_fields |= flags; - - if (!iip->ili_item.li_buf) { - struct xfs_buf *bp; - int error; - - /* - * We hold the ILOCK here, so this inode is not going to be - * flushed while we are here. Further, because there is no - * buffer attached to the item, we know that there is no IO in - * progress, so nothing will clear the ili_fields while we read - * in the buffer. Hence we can safely drop the spin lock and - * read the buffer knowing that the state will not change from - * here. - */ - spin_unlock(&iip->ili_lock); - error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); - if (error) { - xfs_force_shutdown(ip->i_mount, SHUTDOWN_META_IO_ERROR); - return; - } - - /* - * We need an explicit buffer reference for the log item but - * don't want the buffer to remain attached to the transaction. - * Hold the buffer but release the transaction reference once - * we've attached the inode log item to the buffer log item - * list. - */ - xfs_buf_hold(bp); - spin_lock(&iip->ili_lock); - iip->ili_item.li_buf = bp; - bp->b_flags |= _XBF_INODES; - list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); - xfs_trans_brelse(tp, bp); - } - - /* - * Always OR in the bits from the ili_last_fields field. This is to - * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines - * in the eventual clearing of the ili_fields bits. See the big comment - * in xfs_iflush() for an explanation of this coordination mechanism. - */ - iip->ili_fields |= (flags | iip->ili_last_fields | iversion_flags); - spin_unlock(&iip->ili_lock); + iip->ili_dirty_flags |= flags; } int diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index ca2941ab6cbc..91c847a84e10 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -29,6 +29,153 @@ static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip) return container_of(lip, struct xfs_inode_log_item, ili_item); } +static uint64_t +xfs_inode_item_sort( + struct xfs_log_item *lip) +{ + return INODE_ITEM(lip)->ili_inode->i_ino; +} + +/* + * Prior to finally logging the inode, we have to ensure that all the + * per-modification inode state changes are applied. This includes VFS inode + * state updates, format conversions, verifier state synchronisation and + * ensuring the inode buffer remains in memory whilst the inode is dirty. + * + * We have to be careful when we grab the inode cluster buffer due to lock + * ordering constraints. The unlinked inode modifications (xfs_iunlink_item) + * require AGI -> inode cluster buffer lock order. The inode cluster buffer is + * not locked until ->precommit, so it happens after everything else has been + * modified. + * + * Further, we have AGI -> AGF lock ordering, and with O_TMPFILE handling we + * have AGI -> AGF -> iunlink item -> inode cluster buffer lock order. Hence we + * cannot safely lock the inode cluster buffer in xfs_trans_log_inode() because + * it can be called on a inode (e.g. via bumplink/droplink) before we take the + * AGF lock modifying directory blocks. + * + * Rather than force a complete rework of all the transactions to call + * xfs_trans_log_inode() once and once only at the end of every transaction, we + * move the pinning of the inode cluster buffer to a ->precommit operation. This + * matches how the xfs_iunlink_item locks the inode cluster buffer, and it + * ensures that the inode cluster buffer locking is always done last in a + * transaction. i.e. we ensure the lock order is always AGI -> AGF -> inode + * cluster buffer. + * + * If we return the inode number as the precommit sort key then we'll also + * guarantee that the order all inode cluster buffer locking is the same all the + * inodes and unlink items in the transaction. + */ +static int +xfs_inode_item_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_inode_log_item *iip = INODE_ITEM(lip); + struct xfs_inode *ip = iip->ili_inode; + struct inode *inode = VFS_I(ip); + unsigned int flags = iip->ili_dirty_flags; + + /* + * Don't bother with i_lock for the I_DIRTY_TIME check here, as races + * don't matter - we either will need an extra transaction in 24 hours + * to log the timestamps, or will clear already cleared fields in the + * worst case. + */ + if (inode->i_state & I_DIRTY_TIME) { + spin_lock(&inode->i_lock); + inode->i_state &= ~I_DIRTY_TIME; + spin_unlock(&inode->i_lock); + } + + /* + * If we're updating the inode core or the timestamps and it's possible + * to upgrade this inode to bigtime format, do so now. + */ + if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && + xfs_has_bigtime(ip->i_mount) && + !xfs_inode_has_bigtime(ip)) { + ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; + flags |= XFS_ILOG_CORE; + } + + /* + * Inode verifiers do not check that the extent size hint is an integer + * multiple of the rt extent size on a directory with both rtinherit + * and extszinherit flags set. If we're logging a directory that is + * misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + flags |= XFS_ILOG_CORE; + } + + /* + * Record the specific change for fdatasync optimisation. This allows + * fdatasync to skip log forces for inodes that are only timestamp + * dirty. Once we've processed the XFS_ILOG_IVERSION flag, convert it + * to XFS_ILOG_CORE so that the actual on-disk dirty tracking + * (ili_fields) correctly tracks that the version has changed. + */ + spin_lock(&iip->ili_lock); + iip->ili_fsync_fields |= (flags & ~XFS_ILOG_IVERSION); + if (flags & XFS_ILOG_IVERSION) + flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); + + if (!iip->ili_item.li_buf) { + struct xfs_buf *bp; + int error; + + /* + * We hold the ILOCK here, so this inode is not going to be + * flushed while we are here. Further, because there is no + * buffer attached to the item, we know that there is no IO in + * progress, so nothing will clear the ili_fields while we read + * in the buffer. Hence we can safely drop the spin lock and + * read the buffer knowing that the state will not change from + * here. + */ + spin_unlock(&iip->ili_lock); + error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); + if (error) + return error; + + /* + * We need an explicit buffer reference for the log item but + * don't want the buffer to remain attached to the transaction. + * Hold the buffer but release the transaction reference once + * we've attached the inode log item to the buffer log item + * list. + */ + xfs_buf_hold(bp); + spin_lock(&iip->ili_lock); + iip->ili_item.li_buf = bp; + bp->b_flags |= _XBF_INODES; + list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); + xfs_trans_brelse(tp, bp); + } + + /* + * Always OR in the bits from the ili_last_fields field. This is to + * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines + * in the eventual clearing of the ili_fields bits. See the big comment + * in xfs_iflush() for an explanation of this coordination mechanism. + */ + iip->ili_fields |= (flags | iip->ili_last_fields); + spin_unlock(&iip->ili_lock); + + /* + * We are done with the log item transaction dirty state, so clear it so + * that it doesn't pollute future transactions. + */ + iip->ili_dirty_flags = 0; + return 0; +} + /* * The logged size of an inode fork is always the current size of the inode * fork. This means that when an inode fork is relogged, the size of the logged @@ -662,6 +809,8 @@ xfs_inode_item_committing( } static const struct xfs_item_ops xfs_inode_item_ops = { + .iop_sort = xfs_inode_item_sort, + .iop_precommit = xfs_inode_item_precommit, .iop_size = xfs_inode_item_size, .iop_format = xfs_inode_item_format, .iop_pin = xfs_inode_item_pin, diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h index bbd836a44ff0..377e06007804 100644 --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -17,6 +17,7 @@ struct xfs_inode_log_item { struct xfs_log_item ili_item; /* common portion */ struct xfs_inode *ili_inode; /* inode ptr */ unsigned short ili_lock_flags; /* inode lock flags */ + unsigned int ili_dirty_flags; /* dirty in current tx */ /* * The ili_lock protects the interactions between the dirty state and * the flush state of the inode log item. This allows us to do atomic From patchwork Tue Sep 24 18:38:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811109 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A02E1AC8BE; Tue, 24 Sep 2024 18:39:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203159; cv=none; b=R2IKRX/x6+AGPO5M3ruoYvlihRxFXWJzFTYke7aJnGAf9lO3115LEpc1DO4rfk4xFvMFrDx+YlDbGE6wLLSHmd3KY+vtSB2JfX+bogZAL/rkBZ938HuvRYsUEcVrB0K/NHfHI0tHteIOKa05UMJTAAvQqiFKscQ0gWaVFlP6IHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203159; c=relaxed/simple; bh=0jmFHBw/vCU3VtCuvFmFU4TH0eWnRgMg4rr48Qvr4VE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ka8zv2/ffO0fU8pwuixNplyK7r174Ijor563zxlX8AJzx/V3hdyPw+5gVNxK5PJeaNlskzU+0+8SksZ7SXLfj03edIxoncbve+qCUKXcVguvlrSJccRO6pvJ2gfSZY7Fwk9Uf80v1JJm0SB+b/SE8oEi4ubK/wFC4X7ipKlyOpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BTqnBlhs; arc=none smtp.client-ip=209.85.216.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BTqnBlhs" Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2dee71e47c5so2403215a91.0; Tue, 24 Sep 2024 11:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203157; x=1727807957; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BmQQX1Xon3zuomztZ3fANi6Gg8W/RuxklKOScXmdVwU=; b=BTqnBlhsxbrCkV05j2ZIV3W93NaKx+UFTzKTT2Z4R39Racw3i9XNiR3mJWZZh/NGFe lez9qNUVSYtUFUbtOU/X+l5VPmFteeHcQfiDS+eh7n/d99DnwmRk8ITJ7PqsR8zi/TqO lpNtkHo6Lc4GTicJ2LS05aRsqJYwfZmfwZBvyu4Uh3jPBiKRf1q0DdwypgyjzVjh1Cxl qkRIxC/M5ragT3BMnEbFn77e4UVii3129IwvBepxkDwo/VH5WzD5Ll+zH8lF7qbMWx9H kaCPTGTV1TkGyE77Bm992ke9DFdDCgWD4i7iM4SklXcddzra36RPAC4qz33w6OVR7vOE 579w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203157; x=1727807957; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BmQQX1Xon3zuomztZ3fANi6Gg8W/RuxklKOScXmdVwU=; b=Ayvq+VIAh6ih55Bb0EantkEiZ9VmieiZhaNM6Iqcul1IXWtpZ5/hSgsb3W8m0txvUf gzMm8EIAUpkWMIl4YNBkP6UZhzvicOAWaYqWel7+y8pZmtPrDggdtW2pa3PBGBOl52U0 I8R4H9gqM1ORLZBC18OpLHeqMtVE+HWFz39UwmZfmu7O8LNfTk6vOzJk7Yh9HDzP4Hb/ CkFr0C3hZLKqzFe0hCs+4sDTEU97pvoF5jd+ttiUYYoCgmBWGNq8umsTCguTZbXci7Yv t0WCzAgpNEREP1l3RRAKNXdy8zW5wPk8/9i+dbvrRHBVp6zA4McMRINFMiWyEb7lVTew A16A== X-Gm-Message-State: AOJu0YyN9vhDpRD71D9rD6oQpidE9WWgDEATpkI77xw3C1aUMByZeLJI Pa2Ym+SXnTyDBPOqUxe1HXCBxd4eHqsHHikTtnMnpbJl+OcWOjhbg4VWy/wG X-Google-Smtp-Source: AGHT+IH0pAVn4Ru65vibzMaz6N0eNgwDTVRREwsGKHF2wjCa7PyaH2YWh5koLW43SkJCCZRNoe9q9A== X-Received: by 2002:a17:90b:48c5:b0:2dd:6969:208e with SMTP id 98e67ed59e1d1-2e06ae25d43mr128093a91.3.1727203156510; Tue, 24 Sep 2024 11:39:16 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:16 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , "Darrick J. Wong" , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 14/26] xfs: collect errors from inodegc for unlinked inode recovery Date: Tue, 24 Sep 2024 11:38:39 -0700 Message-ID: <20240924183851.1901667-15-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f ] Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_icache.c | 46 ++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_icache.h | 4 ++-- fs/xfs/xfs_inode.c | 20 ++++++----------- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_log_recover.c | 19 ++++++++--------- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 1 + fs/xfs/xfs_trans.c | 4 +++- 8 files changed, 60 insertions(+), 37 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index f5568fa54039..4b040740678c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -454,6 +454,27 @@ xfs_inodegc_queue_all( return ret; } +/* Wait for all queued work and collect errors */ +static int +xfs_inodegc_wait_all( + struct xfs_mount *mp) +{ + int cpu; + int error = 0; + + flush_workqueue(mp->m_inodegc_wq); + for_each_online_cpu(cpu) { + struct xfs_inodegc *gc; + + gc = per_cpu_ptr(mp->m_inodegc, cpu); + if (gc->error && !error) + error = gc->error; + gc->error = 0; + } + + return error; +} + /* * Check the validity of the inode we just found it the cache */ @@ -1490,15 +1511,14 @@ xfs_blockgc_free_space( if (error) return error; - xfs_inodegc_flush(mp); - return 0; + return xfs_inodegc_flush(mp); } /* * Reclaim all the free space that we can by scheduling the background blockgc * and inodegc workers immediately and waiting for them all to clear. */ -void +int xfs_blockgc_flush_all( struct xfs_mount *mp) { @@ -1519,7 +1539,7 @@ xfs_blockgc_flush_all( for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCKGC_TAG) flush_delayed_work(&pag->pag_blockgc_work); - xfs_inodegc_flush(mp); + return xfs_inodegc_flush(mp); } /* @@ -1841,13 +1861,17 @@ xfs_inodegc_set_reclaimable( * This is the last chance to make changes to an otherwise unreferenced file * before incore reclamation happens. */ -static void +static int xfs_inodegc_inactivate( struct xfs_inode *ip) { + int error; + trace_xfs_inode_inactivating(ip); - xfs_inactive(ip); + error = xfs_inactive(ip); xfs_inodegc_set_reclaimable(ip); + return error; + } void @@ -1879,8 +1903,12 @@ xfs_inodegc_worker( WRITE_ONCE(gc->shrinker_hits, 0); llist_for_each_entry_safe(ip, n, node, i_gclist) { + int error; + xfs_iflags_set(ip, XFS_INACTIVATING); - xfs_inodegc_inactivate(ip); + error = xfs_inodegc_inactivate(ip); + if (error && !gc->error) + gc->error = error; } memalloc_nofs_restore(nofs_flag); @@ -1904,13 +1932,13 @@ xfs_inodegc_push( * Force all currently queued inode inactivation work to run immediately and * wait for the work to finish. */ -void +int xfs_inodegc_flush( struct xfs_mount *mp) { xfs_inodegc_push(mp); trace_xfs_inodegc_flush(mp, __return_address); - flush_workqueue(mp->m_inodegc_wq); + return xfs_inodegc_wait_all(mp); } /* diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 6cd180721659..da58984b80d2 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -59,7 +59,7 @@ int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp, unsigned int iwalk_flags); int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags); int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm); -void xfs_blockgc_flush_all(struct xfs_mount *mp); +int xfs_blockgc_flush_all(struct xfs_mount *mp); void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); @@ -77,7 +77,7 @@ void xfs_blockgc_start(struct xfs_mount *mp); void xfs_inodegc_worker(struct work_struct *work); void xfs_inodegc_push(struct xfs_mount *mp); -void xfs_inodegc_flush(struct xfs_mount *mp); +int xfs_inodegc_flush(struct xfs_mount *mp); void xfs_inodegc_stop(struct xfs_mount *mp); void xfs_inodegc_start(struct xfs_mount *mp); void xfs_inodegc_cpu_dead(struct xfs_mount *mp, unsigned int cpu); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 54b707787f90..b0b4f6ac2397 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1620,16 +1620,7 @@ xfs_inactive_ifree( */ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1); - /* - * Just ignore errors at this point. There is nothing we can do except - * to try to keep going. Make sure it's not a silent error. - */ - error = xfs_trans_commit(tp); - if (error) - xfs_notice(mp, "%s: xfs_trans_commit returned error %d", - __func__, error); - - return 0; + return xfs_trans_commit(tp); } /* @@ -1696,12 +1687,12 @@ xfs_inode_needs_inactive( * now be truncated. Also, we clear all of the read-ahead state * kept for the inode here since the file is now closed. */ -void +int xfs_inactive( xfs_inode_t *ip) { struct xfs_mount *mp; - int error; + int error = 0; int truncate = 0; /* @@ -1742,7 +1733,7 @@ xfs_inactive( * reference to the inode at this point anyways. */ if (xfs_can_free_eofblocks(ip, true)) - xfs_free_eofblocks(ip); + error = xfs_free_eofblocks(ip); goto out; } @@ -1779,7 +1770,7 @@ xfs_inactive( /* * Free the inode. */ - xfs_inactive_ifree(ip); + error = xfs_inactive_ifree(ip); out: /* @@ -1787,6 +1778,7 @@ xfs_inactive( * the attached dquots. */ xfs_qm_dqdetach(ip); + return error; } /* diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index fa780f08dc89..225f6f93c2fa 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -470,7 +470,7 @@ enum layout_break_reason { (xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID)) int xfs_release(struct xfs_inode *ip); -void xfs_inactive(struct xfs_inode *ip); +int xfs_inactive(struct xfs_inode *ip); int xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name, struct xfs_inode **ipp, struct xfs_name *ci_name); int xfs_create(struct user_namespace *mnt_userns, diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 05e48523ea40..affe94356ed1 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2711,7 +2711,9 @@ xlog_recover_iunlink_bucket( * just to flush the inodegc queue and wait for it to * complete. */ - xfs_inodegc_flush(mp); + error = xfs_inodegc_flush(mp); + if (error) + break; } prev_agino = agino; @@ -2719,10 +2721,15 @@ xlog_recover_iunlink_bucket( } if (prev_ip) { + int error2; + ip->i_prev_unlinked = prev_agino; xfs_irele(prev_ip); + + error2 = xfs_inodegc_flush(mp); + if (error2 && !error) + return error2; } - xfs_inodegc_flush(mp); return error; } @@ -2789,7 +2796,6 @@ xlog_recover_iunlink_ag( * bucket and remaining inodes on it unreferenced and * unfreeable. */ - xfs_inodegc_flush(pag->pag_mount); xlog_recover_clear_agi_bucket(pag, bucket); } } @@ -2806,13 +2812,6 @@ xlog_recover_process_iunlinks( for_each_perag(log->l_mp, agno, pag) xlog_recover_iunlink_ag(pag); - - /* - * Flush the pending unlinked inodes to ensure that the inactivations - * are fully completed on disk and the incore inodes can be reclaimed - * before we signal that recovery is complete. - */ - xfs_inodegc_flush(log->l_mp); } STATIC void diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 69ddd5319634..c8e72f0d3965 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -62,6 +62,7 @@ struct xfs_error_cfg { struct xfs_inodegc { struct llist_head list; struct delayed_work work; + int error; /* approximate count of inodes in the list */ unsigned int items; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 12662b169b71..1c143c69da6e 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1089,6 +1089,7 @@ xfs_inodegc_init_percpu( #endif init_llist_head(&gc->list); gc->items = 0; + gc->error = 0; INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker); } return 0; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index a772f60de4a2..b45879868f90 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -290,7 +290,9 @@ xfs_trans_alloc( * Do not perform a synchronous scan because callers can hold * other locks. */ - xfs_blockgc_flush_all(mp); + error = xfs_blockgc_flush_all(mp); + if (error) + return error; want_retry = false; goto retry; } From patchwork Tue Sep 24 18:38:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811110 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFFDD1ACDF6; Tue, 24 Sep 2024 18:39:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203160; cv=none; b=qfVlnWzY9zd7R/8Xl7YihXhE7crVva6/d7/ikLiwAkF1Upf9Xa23lv884CqpeY+QkUjhY40tIl8F4kXNvBgcqeypuoIkFEFubSNnc8DnlhxksO+vWwXAgNALZ5mDMx8OPiKvAAewg8Pkg84nHxYzgWooY4KacN5qgnY2bjaihHE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203160; c=relaxed/simple; bh=sD7pQD/0+K038NkGUaU/3FEW+Cny9gBeEgOwfs5m5TQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VwTQbpv4MsSCQTIzehoc3a/DpVB7EzdAlKt/VmPDmZc0LhHeocIDYULAb1QcZmZRoJiMouzbT1gDoSq2qK3JVGcz7Azjr+TgB5OEIiO01W6E7o6MPrc48De9nyPf+Gcz2AG1DSijqfTH8ZgUxs7e4pj4qAyvJ6jaEg/7eYCWRC8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IgMVLxLk; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IgMVLxLk" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2e053f42932so1203483a91.0; Tue, 24 Sep 2024 11:39:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203158; x=1727807958; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H1SyRfWwAKY3gO3eiRjXmCJIbqZ4NJZTB59jo5dbPbU=; b=IgMVLxLku4ZfAO9kgSsfQ5vxCAJY2XjqGPvztLzyM0enNebLQ/kNKxaW+yduSqy8qJ 1mF5vbtiRSspqtHnUyd65EiEQ5rEfOOjxRNtYTpKIaNd/yHUwU4t4r++DbNVDZLdJWXW twgL1QkwRoFerdrrHq3H8IEpCsH43y/fNS1NsXuhmYauiKboFngRnxbbzMmXYnGnkOLS gDG1m9hcshsFRAgg9HH9x5VH88934m2rMctyDqaQjvFB23zIiwRVcfTASuBOekbAxJHN N6HrihPS75Q6OtgtFwDn5WPD7OD7A+T9Ghy91TnDnUCOBoWOBSpjC3yXEeiPRhSPSh// XzGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203158; x=1727807958; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H1SyRfWwAKY3gO3eiRjXmCJIbqZ4NJZTB59jo5dbPbU=; b=C9v7sThzzmnAz/lQnKZAW/gm3Kvls6pjhSHvCoBT1Y7jSl9RFYh14pL5ty/QEbRlRj HBKsCafftDWSziqxB0blLHM5E1AYLV9XjA9zVrYYbSW22OSpvBgpwIvTLfg3hMT92P1x 7UgciNtua2nR0FsX/1q6ocGr9cnTRpdKqyOukEfuVksl1S6drW33s0613/4oJ46tiegZ S53OZbr72LSl4+Z5oMq5lMF/oSLctazBdZ4FOQpKDV33l1O19cYXmAwW4ojBmRRhnbsr RPF9NACaxvVPRkrdPO6kGM/YY5ig5ujxlcX5nFB/53J30YA8vgCqOGC+BWOREBKh7WTJ fFrg== X-Gm-Message-State: AOJu0YwZTr29VKE2wj6DtV7HEUhMQ73T9GlyKFOEa9omDEi6yVke1sN8 LPfvLA0CfglTUZ+sV16qeBpeYehyFo8L3i4BbLRVZlN7znTD3KNyaxRkxU8/ X-Google-Smtp-Source: AGHT+IGUesybN91z/Z5saarZj0D8lNq38N/1oirDgxDF8EZDnN1eyOdUm9wattO7m55Wi4U0Uu8+5w== X-Received: by 2002:a17:90a:d496:b0:2da:936c:e5ad with SMTP id 98e67ed59e1d1-2e06afd6ec6mr84269a91.33.1727203157695; Tue, 24 Sep 2024 11:39:17 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:17 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Long Li , Long Li , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 15/26] xfs: fix ag count overflow during growfs Date: Tue, 24 Sep 2024 11:38:40 -0700 Message-ID: <20240924183851.1901667-16-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Long Li [ Upstream commit c3b880acadc95d6e019eae5d669e072afda24f1b ] I found a corruption during growfs: XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 __xfs_free_extent+0x2c1/0x3c0 xfs_ag_extend_space+0x291/0x3e0 xfs_growfs_data+0xd72/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_error_report+0x93/0xc0 xfs_trans_cancel+0x2c0/0x350 xfs_growfs_data+0x691/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2d86706577 The bug can be reproduced with the following sequence: # truncate -s 1073741824 xfs_test.img # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img # truncate -s 2305843009213693952 xfs_test.img # mount -o loop xfs_test.img /mnt/test # xfs_growfs -D 1125899907891200 /mnt/test The root cause is that during growfs, user space passed in a large value of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is too small, new AG count will exceed UINT_MAX. Because of AG number type is unsigned int and it would overflow, that caused nagcount much smaller than the actual value. During AG extent space, delta blocks in xfs_resizefs_init_new_ags() will much larger than the actual value due to incorrect nagcount, even exceed UINT_MAX. This will cause corruption and be detected in __xfs_free_extent. Fix it by growing the filesystem to up to the maximally allowed AGs and not return EINVAL when new AG count overflow. Signed-off-by: Long Li Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_fs.h | 2 ++ fs/xfs/xfs_fsops.c | 13 +++++++++---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 1cfd5bc6520a..9c60ebb328b4 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -257,6 +257,8 @@ typedef struct xfs_fsop_resblks { #define XFS_MAX_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_BLOCKSIZE) #define XFS_MAX_CRC_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_CRC_BLOCKSIZE) +#define XFS_MAX_AGNUMBER ((xfs_agnumber_t)(NULLAGNUMBER - 1)) + /* keep the maximum size under 2^31 by a small amount */ #define XFS_MAX_LOG_BYTES \ ((2 * 1024 * 1024 * 1024ULL) - XFS_MIN_LOG_BYTES) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 332da0d7b85c..77b14f788214 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -115,11 +115,16 @@ xfs_growfs_data_private( nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); - nagcount = nb_div + (nb_mod != 0); - if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) { - nagcount--; - nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks; + if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) + nb_div++; + else if (nb_mod) + nb = nb_div * mp->m_sb.sb_agblocks; + + if (nb_div > XFS_MAX_AGNUMBER + 1) { + nb_div = XFS_MAX_AGNUMBER + 1; + nb = nb_div * mp->m_sb.sb_agblocks; } + nagcount = nb_div; delta = nb - mp->m_sb.sb_dblocks; /* * Reject filesystems with a single AG because they are not From patchwork Tue Sep 24 18:38:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811111 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF5831AD9C9; Tue, 24 Sep 2024 18:39:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203161; cv=none; b=PAo92J4s1gMeozsudnlzQl1mjH5/fy0fapH+fH2ByiXZ1hIFBCBaJOdsXpCXdhPpmtExntdrl2NjsCbM+HwHbZokWvjw2JFThUjXrUauFlT+D5C8JCKaGl9R/MAC71I0Xbjol+PZvLFuo0ZA7lamHmP55rSvY5ATN62tZHD9d+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203161; c=relaxed/simple; bh=+zlQsXTCVYOj1wvZVYOxcabsbl6Tg4oTY01p1ZCtiys=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NA6uN1fJzJ1pLgt4qo2W/ao0a/Aeu1GrGylAo1iaheqbR3LvYw9ThXH0Lt2oC5IzmdyZ06gvfLs1LUjC+qVS0zUnOBcdh+IQ01/pHLpkEUyd+WJ2wG6l8i4mvkSzu2WitblmKWx/93HIeOIqo5YCRT9GK3mczznZ20wpj2xO4vE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=a9jwg5Iw; arc=none smtp.client-ip=209.85.216.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a9jwg5Iw" Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2db89fb53f9so3849659a91.3; Tue, 24 Sep 2024 11:39:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203159; x=1727807959; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QXMidJQ1AYXovK5QnNrVPj6wmS+iZCPj6KfD3MCeBIw=; b=a9jwg5Iw3OtR8K/qNaxciouXqRLT4hgBVL5A67lWysSuY3nLfVkyxu9iXmqfwzat0k hCJ9bC8Rk8gBfAvj2qqdVCaUsPWqzkIQPmI/3+CGjWxn2Q03D2C66ftbiMacUTzhOekN TmicckSZgaSizFsQCX74tfVzFdH5zBTeoj3G74kuMW4UdM2bkqeuMNk/Qor9T7KoPVRc 90UzGZMf6elXvXDP9ridNEBDj5Hi9eQqGonB0UHN3tikY4F+sWGLEgiWnt4eGSHuKBjB fWorLFc52STw7m/lg0pIkPVsdUR+jqUOxpRaivgfRb5CGBQUGohOBlpwexQpiI9wY8UF JbpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203159; x=1727807959; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QXMidJQ1AYXovK5QnNrVPj6wmS+iZCPj6KfD3MCeBIw=; b=MK0FkODR+yrqZtJt2ld6nVmurhycrNkThzlBmS7wtAhmivOKWqkTxnQ3e5cIEzjIqF 7iJfn0hK2mfKu3+G/48hAbMtlYgdlc4eoTVjCpl2+RmT6XGCrPw9FYN7w7KACHyU1mSL FrnZXvFrcgKbc26rzvmAUDUN/yJEO3yH1l2GhPvdnhKQDpk2ovT32027+rHUhPe0nflJ 2Hg50H/3DYJjlN/zNNe81oZ7fuJwiDs+qQAJI4zvY5TIX6bu3tMXtLmiG6nFxl4q7t6C kR5tkGNelwidcIw3Et5wNvkKBQcHEigllXYNiLhqRjzL4mRYueXbHI6JZBvl5pnNRpz9 ZbPw== X-Gm-Message-State: AOJu0YxRqtthFuiysX4i0/Lk6wfJTdL9rl8OX0oyXUt71nhFHLLekyGp vDe58Uf4Ik+GFgn/Hb7TUHTmhl0AjBsBUFrFV1IWDoc7a+J3UBYkboHnopWq X-Google-Smtp-Source: AGHT+IG0agEHkPzG4NeGNcKYOf6Wo8vQm/YgPbfVbPqIvQnht6QVnTkRqKAJbrxLbNluUIiMf3wdiQ== X-Received: by 2002:a17:90a:3883:b0:2bd:7e38:798e with SMTP id 98e67ed59e1d1-2e06afbddd6mr81616a91.28.1727203159018; Tue, 24 Sep 2024 11:39:19 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:18 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 16/26] xfs: remove WARN when dquot cache insertion fails Date: Tue, 24 Sep 2024 11:38:41 -0700 Message-ID: <20240924183851.1901667-17-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 4b827b3f305d1fcf837265f1e12acc22ee84327c ] It just creates unnecessary bot noise these days. Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_dquot.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 8fb90da89787..7f071757f278 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -798,7 +798,6 @@ xfs_qm_dqget_cache_insert( error = radix_tree_insert(tree, id, dqp); if (unlikely(error)) { /* Duplicate found! Caller must try again. */ - WARN_ON(error != -EEXIST); mutex_unlock(&qi->qi_tree_lock); trace_xfs_dqget_dup(dqp); return error; From patchwork Tue Sep 24 18:38:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811112 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 377741ACE10; Tue, 24 Sep 2024 18:39:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203162; cv=none; b=jYFY5EweVMUIH5BO9puP1ioKbojHGbA+0yQ+rNIbwr9rvKsfXwhNI/+aiTfH1N+7Ib6PMpvSUWVg7UXAUX+gZRfzzq1s1FE8wu+yXT/O3TzbbpL9icBVEIdee9CgRMB4hWEeoeD1/e6RXX3oR6vy5A985NJv5pU6qb3Wt9xWvpg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203162; c=relaxed/simple; bh=BqhpH8jm1bo0e8mrsvc10FfNERVGh7QMOiC/BpJ3VxU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LSCdlR/XG8tKkv9668aXsBGPnGuecZlcHCg3q7lsIYxydKFcrqip+YP8yOy14xADAyAJdeUOS44bxHDlKqZ9ZHOKtiV+Zlp1m67VIzl0Mx3YnZDoaHR7/o2wcr9u680V6T1GFEnpjDb/D2VXqdXZ/doApfjr6AmIG0yfgBwmTwc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gpX4M3pn; arc=none smtp.client-ip=209.85.216.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gpX4M3pn" Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-2d89dbb60bdso4115754a91.1; Tue, 24 Sep 2024 11:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203160; x=1727807960; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bnPsE9L919nUKOYw2LureM0PQ/hRlQrq9CZOkfmsz9U=; b=gpX4M3pn1XjESSpAB1RzrCh3eoSdHA6ckgGq0zDKuMqcx2OQdJiCfhxRivck9RkL8X vj7biJupnikI7G4/tbpLVkb6JMnpnAsK4KuCy7FfbYCKS7SmUaEBV7gHMPlZNXKC514A QsajZrPWw0qp3cL1bPNy51k7EeKg/QrT2M3TMflIHsC3RBx84o8F1Q56LpZ7hI0eD1n6 jFrtOpOqneir9Mu/PeFkcvgHrE+AISRmrGux+iqeUNkH1IPmM73AhqV0LBX5SQGPFtxI MCxrH0BpdbeOyLR8W7e5ySabZeeF8cReJraQTL6GFiyRcI8WrZs5uXsMzjgrGDHfSRZk /Emw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203160; x=1727807960; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bnPsE9L919nUKOYw2LureM0PQ/hRlQrq9CZOkfmsz9U=; b=Ly1QRCojdSJM/+XwUTvCIqMOA81Ov6sXOErSZ6y36srfpG4sEb70+HQkJMhOFnPnqO F82DdMJIdNK74YR4fS44C8wv97i39+dn4oA23wGori8IlJ5bP49rM2VmWqiDilzs4S9K +tdDsQlvKTx65sR5O9jVcOCH1i3fbWWW872j5b8yafPLr4oLxWVjatxYsmACsLapFKbD tFvuz3u5QbAaSzz7U9xq5oaT7j6hIFylQZLWRJ+qWkRGgVgHR9fZXjZ0EZRRCdrtVGeV VDPK73VhRPlKQ80XoTiVZMlfEDPOPBveqfehWt2ZowfjJiAYF53EvKR9wJlM5uNNc1C7 twJQ== X-Gm-Message-State: AOJu0YzDb52Zf+KyXbqu0TwaKUMv+TA+61Nop+1fYkf7MRMc+fO6TZ+m 9rnknhsOF4ImjvV3XjePoiIiGA18EvLNfc1w680u0zcCWWeGY2xyvh6YhDcW X-Google-Smtp-Source: AGHT+IEVLSzgZ2uSJoF9TZc0xpP0R/K89j7MGUD5pGweC5fVKWgKV9Z1v/JWJjC7E0+Zkz2ov4Ymhw== X-Received: by 2002:a17:90a:e183:b0:2d8:b205:2345 with SMTP id 98e67ed59e1d1-2e06ae7a941mr119347a91.23.1727203160250; Tue, 24 Sep 2024 11:39:20 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:19 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Shiyang Ruan , "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 17/26] xfs: fix the calculation for "end" and "length" Date: Tue, 24 Sep 2024 11:38:42 -0700 Message-ID: <20240924183851.1901667-18-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Shiyang Ruan [ Upstream commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29 ] The value of "end" should be "start + length - 1". Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_notify_failure.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index c4078d0ec108..4a9bbd3fe120 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -114,7 +114,8 @@ xfs_dax_notify_ddev_failure( int error = 0; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr); xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); - xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen); + xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, + daddr + bblen - 1); xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno); error = xfs_trans_alloc_empty(mp, &tp); @@ -210,7 +211,7 @@ xfs_dax_notify_failure( ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1; /* Ignore the range out of filesystem area */ - if (offset + len < ddev_start) + if (offset + len - 1 < ddev_start) return -ENXIO; if (offset > ddev_end) return -ENXIO; @@ -222,8 +223,8 @@ xfs_dax_notify_failure( len -= ddev_start - offset; offset = 0; } - if (offset + len > ddev_end) - len -= ddev_end - offset; + if (offset + len - 1 > ddev_end) + len = ddev_end - offset + 1; return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len), mf_flags); From patchwork Tue Sep 24 18:38:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811113 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEDE51AD9C9; Tue, 24 Sep 2024 18:39:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203164; cv=none; b=l2wA/j8CXrDbGyZeR4lBzfpH2Rsfr5F7patvNI+mLPvu+OQs84g0dYhZs7UkxLKrTXlVqcnDHz2JUY0c+AIJiE/cGD+hvPFijlkjKlq6gZbEkpsgG8pT/5Jlk6aG8lVxa/PLt2Vqt361jdp2SV4r0/rAP+jU9pLKyxLeRB/1F6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203164; c=relaxed/simple; bh=pFrUX4LHWjx+Zz4VAmYop1W3UHAZUAJMRacVzDjvOec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=meEyjYUkxg6gq78A+eUd3/RMiKVb1DqMkAmPu9efGLlkXt+ZsP/foWuTzpfp674ENyHESfQDyWQaM5xYXhxB86hcXX0o3urLogzHe719qSr5s+gOQJ6hK+U0vrqfduOLNlMkSJ7ZtGAV1zcjQlQxZDVkbJwFY3TfrUUaDI90Vrs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KtBJ2QuV; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KtBJ2QuV" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-71b0722f221so451749b3a.3; Tue, 24 Sep 2024 11:39:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203162; x=1727807962; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6UkqPyulwqtKNbTTvS0V1pYpEDlaZdg76rYzAqGAq4Y=; b=KtBJ2QuVDkSmoISykVqwVqeTK4QbhXZ9LdAtAiu5GleSqqyqsswigvnvV7228f8plB u6SmcqS4YTh8b6/PATbOKbQK/qv9LGJNFgxtuybM0h/3h4abaPCgi4x/TFGwVbx8aBJA ZdLdSThpAPqSmmE3QWewewpQHWD7GPshv1PXIryfDwFMto9yR+jDh5wQwm46csfkWjqJ Py7sqBlIl9VsgbA1041Zunc47dY+5NjFH8HUsJY71cZnF+K+uFUBNz3HWPJsq2owfSex YsudzGjNODc6SIMYZJGOhtSa28E4K4GEQfRDeUNCm7lLuUjywjEKjPSZitWbKWV6kXAD 4X1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203162; x=1727807962; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6UkqPyulwqtKNbTTvS0V1pYpEDlaZdg76rYzAqGAq4Y=; b=WWWyiN847P+aWyLbpEGp0icozaQTlZri312z6yAfXEAWPPnuT9WnkkFuz98CNduX+J av1kUd65vornEVJMZz37sCOBR531ZkwnEwsQjehmhU0GW9Z1C1iC1x50mlpSy9uMPIPb zsxZzveNfxEO2eN9bM6EV8mmZn39tJU5/Kzemu+pcEZWLSmxNlFv9gbARjy71Hq+zeDA ygO/+c/U5E2B+TMIpkdYCfwhTODQShBveojgYILKodaDM6hVjdLyYVoIw5h8VC7RQOuy J+ErEe0uiYEnRdGRq/CcM6d3jPGAjMkm6fryL4Hx5aj2DYME7b41qNGw0cfgStDi9PQ0 Amog== X-Gm-Message-State: AOJu0YzxuOUZHpcxTB39T0HEoaqVZD/EiLALvgVRln0wsD5LLNaBN2HA YuMPOdPrYae6oVzBDBGJ7ehoIs+uGXat2h9HZ7axdvHpqRQqlu1q0Ut/SOE2 X-Google-Smtp-Source: AGHT+IHab9K6X7mdTZHgRs6zFjP1xCuGWsigZWi8qeF0s1NNNG7b4yJPZquS7IedzIvO32NmcEkX8A== X-Received: by 2002:a05:6a21:1693:b0:1cf:4458:8b0d with SMTP id adf61e73a8af0-1d4d4aa962bmr133265637.11.1727203161809; Tue, 24 Sep 2024 11:39:21 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:21 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , shrikanth hegde , Ritesh Harjani , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 18/26] xfs: load uncached unlinked inodes into memory on demand Date: Tue, 24 Sep 2024 11:38:43 -0700 Message-ID: <20240924183851.1901667-19-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ] shrikanth hegde reports that filesystems fail shortly after mount with the following failure: WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] This of course is the WARN_ON_ONCE in xfs_iunlink_lookup: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... } From diagnostic data collected by the bug reporters, it would appear that we cleanly mounted a filesystem that contained unlinked inodes. Unlinked inodes are only processed as a final step of log recovery, which means that clean mounts do not process the unlinked list at all. Prior to the introduction of the incore unlinked lists, this wasn't a problem because the unlink code would (very expensively) traverse the entire ondisk metadata iunlink chain to keep things up to date. However, the incore unlinked list code complains when it realizes that it is out of sync with the ondisk metadata and shuts down the fs, which is bad. Ritesh proposed to solve this problem by unconditionally parsing the unlinked lists at mount time, but this imposes a mount time cost for every filesystem to catch something that should be very infrequent. Instead, let's target the places where we can encounter a next_unlinked pointer that refers to an inode that is not in cache, and load it into cache. Note: This patch does not address the problem of iget loading an inode from the middle of the iunlink list and needing to set i_prev_unlinked correctly. Reported-by: shrikanth hegde Triaged-by: Ritesh Harjani Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_inode.c | 80 +++++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_trace.h | 25 +++++++++++++++ 2 files changed, 100 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b0b4f6ac2397..4e73dd4a4d82 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1829,12 +1829,17 @@ xfs_iunlink_lookup( rcu_read_lock(); ip = radix_tree_lookup(&pag->pag_ici_root, agino); + if (!ip) { + /* Caller can handle inode not being in memory. */ + rcu_read_unlock(); + return NULL; + } /* - * Inode not in memory or in RCU freeing limbo should not happen. - * Warn about this and let the caller handle the failure. + * Inode in RCU freeing limbo should not happen. Warn about this and + * let the caller handle the failure. */ - if (WARN_ON_ONCE(!ip || !ip->i_ino)) { + if (WARN_ON_ONCE(!ip->i_ino)) { rcu_read_unlock(); return NULL; } @@ -1843,7 +1848,10 @@ xfs_iunlink_lookup( return ip; } -/* Update the prev pointer of the next agino. */ +/* + * Update the prev pointer of the next agino. Returns -ENOLINK if the inode + * is not in cache. + */ static int xfs_iunlink_update_backref( struct xfs_perag *pag, @@ -1858,7 +1866,8 @@ xfs_iunlink_update_backref( ip = xfs_iunlink_lookup(pag, next_agino); if (!ip) - return -EFSCORRUPTED; + return -ENOLINK; + ip->i_prev_unlinked = prev_agino; return 0; } @@ -1902,6 +1911,62 @@ xfs_iunlink_update_bucket( return 0; } +/* + * Load the inode @next_agino into the cache and set its prev_unlinked pointer + * to @prev_agino. Caller must hold the AGI to synchronize with other changes + * to the unlinked list. + */ +STATIC int +xfs_iunlink_reload_next( + struct xfs_trans *tp, + struct xfs_buf *agibp, + xfs_agino_t prev_agino, + xfs_agino_t next_agino) +{ + struct xfs_perag *pag = agibp->b_pag; + struct xfs_mount *mp = pag->pag_mount; + struct xfs_inode *next_ip = NULL; + xfs_ino_t ino; + int error; + + ASSERT(next_agino != NULLAGINO); + +#ifdef DEBUG + rcu_read_lock(); + next_ip = radix_tree_lookup(&pag->pag_ici_root, next_agino); + ASSERT(next_ip == NULL); + rcu_read_unlock(); +#endif + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating recovery.", + next_agino, pag->pag_agno); + + /* + * Use an untrusted lookup just to be cautious in case the AGI has been + * corrupted and now points at a free inode. That shouldn't happen, + * but we'd rather shut down now since we're already running in a weird + * situation. + */ + ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, next_agino); + error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &next_ip); + if (error) + return error; + + /* If this is not an unlinked inode, something is very wrong. */ + if (VFS_I(next_ip)->i_nlink != 0) { + error = -EFSCORRUPTED; + goto rele; + } + + next_ip->i_prev_unlinked = prev_agino; + trace_xfs_iunlink_reload_next(next_ip); +rele: + ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + xfs_irele(next_ip); + return error; +} + static int xfs_iunlink_insert_inode( struct xfs_trans *tp, @@ -1933,6 +1998,8 @@ xfs_iunlink_insert_inode( * inode. */ error = xfs_iunlink_update_backref(pag, agino, next_agino); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, agino, next_agino); if (error) return error; @@ -2027,6 +2094,9 @@ xfs_iunlink_remove_inode( */ error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, ip->i_next_unlinked); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, ip->i_prev_unlinked, + ip->i_next_unlinked); if (error) return error; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 5587108d5678..d713e10dff8a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3679,6 +3679,31 @@ TRACE_EVENT(xfs_iunlink_update_dinode, __entry->new_ptr) ); +TRACE_EVENT(xfs_iunlink_reload_next, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + __field(xfs_agino_t, prev_agino) + __field(xfs_agino_t, next_agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + __entry->prev_agino = ip->i_prev_unlinked; + __entry->next_agino = ip->i_next_unlinked; + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x prev_unlinked 0x%x next_unlinked 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->prev_agino, + __entry->next_agino) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip), From patchwork Tue Sep 24 18:38:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811114 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAA131ACE10; Tue, 24 Sep 2024 18:39:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203165; cv=none; b=ZQcPJplTWxi3qxwCce7tHffX7fHc2zwCU4alJLWMC3ouFtyhbG90FdVcjABi5wlbuoQ7o5wO2jkXPWbxVnMfFhyT+ELv+ICOFwo0SowQN7bqxhxUhnHBs9bwNwSFeEuSwJpfGb0OtOvSyYOKQso+XgXw3WUUHG8r58OohKqpHEM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203165; c=relaxed/simple; bh=tcENrtcMJxRF70LM2MBJbIsWcpwzbv86B1hK9uz8Mn8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LhZPDR3TDNGRk3gMGz7LgIK8oFhMycv81Gg5Zo20zkV6WEA8J3u6Ly/rM9BE9Avd65I1145iTpGjKvcYeDgZz3rLvsInTiCC/KQSjj7taoRH9LIm5WbYoBcDXog8aQTVKbO7LfRW5PGhEICCLI/vZTdrleaBwgb56slI0peQFr4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NGdtx7Ix; arc=none smtp.client-ip=209.85.216.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NGdtx7Ix" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2d877e9054eso4108113a91.3; Tue, 24 Sep 2024 11:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203163; x=1727807963; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o6EXbfRZ2aZwFPjG7rvKiOifRZyraOAGhm74E6z7psA=; b=NGdtx7IxOovDCL53/kFpANWd/M3nkIaD5X+AGTaLLSQCWT28c1KEnX6vy8GDKFp1Ku fbgwavbAVmixYNk69EVHBbGEvUsJNDFCxhz4idvtZeNN5t8hq3AS/6JttPNdUv2KWg/U SW8ESn6pc4XxvMbM4VRF5ci7rSG1PhQeJbbbcqjnQr/FtNu8RJqfRUfnYoe75Ny9eOxm GDj35YCmxb6TysAMV8wd9jHd/9Xfp3k5Fi1WfaBh4DIhf39Qa+prYxDMbGvHwzwvDajo M3v3kXZiSsP2Wg8n/XekRNQT7mJ60hGQz9H6n+ygGPeJdvNY+eop9eX5jkvpfPielmMW hdaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203163; x=1727807963; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o6EXbfRZ2aZwFPjG7rvKiOifRZyraOAGhm74E6z7psA=; b=vDoxsYAV8lSO17TjFqEtKdF90a5dqN3RHDb9YuylSTFjVyd0L2zoxa94+gSv8krKmN jW6oYUMaVXaOZ53sP55PQ1U927semXkWPDqSxJB4i0gke8Vq0QamGONk8tQtgRni41pb z4FJuS+jz64UvoIPOFlSxbAK8G5PWbCnLr8v4N7rP7M61YfxnH2WJK3viDgLQ7eRXpDg 7+ZeIkfQb3nzEvD4tX4q7P5+I6/ipKP7KkPpbz5oZigi2uCG7yNlGcfl5hUAAls06Vgp prnbfuCt+1Ao2ZRltMFM01klOoCfK1DFkPi08EUDf5/NsD8y9jj6vCIiRNwsQDCbHfkr vmKA== X-Gm-Message-State: AOJu0Yz3gUYMnZfqg3j8mN7eILmBe1lWLA5EXNZkVIOfstqcKHaOp2kb IKTF1WZBxfed4k0WYThasS/asp7dt836rHgyLPsBWiVhdRqSMlbOE57mj7vO X-Google-Smtp-Source: AGHT+IF19QN7JkA/TOXDRDdHHq1ZcXzQ9EGx2SStW1gMtkBdiaFCiOmkqPYOB8mFXJ7iUtP8jWhIEg== X-Received: by 2002:a17:90b:384b:b0:2d1:bf48:e767 with SMTP id 98e67ed59e1d1-2e06afc0ff2mr87497a91.29.1727203163083; Tue, 24 Sep 2024 11:39:23 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:22 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com, Dave Chinner , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 19/26] xfs: fix negative array access in xfs_getbmap Date: Tue, 24 Sep 2024 11:38:44 -0700 Message-ID: <20240924183851.1901667-20-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032 ] In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx code that trips if we encounter a delalloc extent after flushing the pagecache to disk. The ioctl code does not hold MMAPLOCK so it's entirely possible that a racing write page fault can create a delalloc extent after the file has been flushed. The proposed solution was to replace the assertion with an early return that avoids filling out the bmap recordset with a delalloc entry if the caller didn't ask for it. At the time, I recall thinking that the forward logic sounded ok, but felt hesitant because I suspected that changing this code would cause something /else/ to burst loose due to some other subtlety. syzbot of course found that subtlety. If all the extent mappings found after the flush are delalloc mappings, we'll reach the end of the data fork without ever incrementing bmv->bmv_entries. This is new, since before we'd have emitted the delalloc mappings even though the caller didn't ask for them. Once we reach the end, we'll try to set BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go corrupt something else in memory. Yay. I really dislike all these stupid patches that fiddle around with debug code and break things that otherwise worked well enough. Nobody was complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would return BMV_OF_DELALLOC records, and now we've gone from "weird behavior that nobody cared about" to "bad behavior that must be addressed immediately". Maybe I'll just ignore anything from Huawei from now on for my own sake. Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/ Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_bmap_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 351087cde27e..ce8e17ab5434 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -558,7 +558,9 @@ xfs_getbmap( if (!xfs_iext_next_extent(ifp, &icur, &got)) { xfs_fileoff_t end = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); - out[bmv->bmv_entries - 1].bmv_oflags |= BMV_OF_LAST; + if (bmv->bmv_entries > 0) + out[bmv->bmv_entries - 1].bmv_oflags |= + BMV_OF_LAST; if (whichfork != XFS_ATTR_FORK && bno < end && !xfs_getbmap_full(bmv)) { From patchwork Tue Sep 24 18:38:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811115 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8096D1AD9C9; Tue, 24 Sep 2024 18:39:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203167; cv=none; b=h5LWsPc81kPiQUWYhCVKQ2ZoapyOksKA4rQkCmgp5LRHrc0Bv2BboosVrb8v1k5B9TjfoYgt+rG2dZyZBuwlE4LICQis5tB8swlyxIu93xfxYmpmlWj0n7rdW6FF7+YTJiQfdsmwsirh0+b0cNPtjM4yq0jxF9B3m+UB6q1rtCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203167; c=relaxed/simple; bh=sjSUKGUwTMb14PWlOL3b1BP8YgApVgVH6m0JOZLXlMw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uz6oPK7XLYUjT50tTD2rkyldGmWgGZaNHl7rry5EJ5o0C7NmddSWvYUapweQk9TFexzK+3T0S5+xE5vARyENMr05cy10zTkO7T5yi8r6QkvA3pBZ4RagquOrqpKthQDIzvhfrSbprYzYgcEYcZIQB8SRCAk3+qD2Q4oBq4d80Z4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VFa/ERGL; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VFa/ERGL" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-205722ba00cso49568335ad.0; Tue, 24 Sep 2024 11:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203165; x=1727807965; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qgj9ZUQAEWKSnyua1fCPRFwXSsrGMjRXKQ10U9x+mqE=; b=VFa/ERGLp/O9QXjSNzdoNkkY3/nEBbgTcl78rsVyL2tEu/F0iPIAGxVdNylg8cf+CD UgURAbIKF6MCgDzJOO1s33iaw5gknLshj8oY1OOIEPBU3rnLYpYUgWekDAoQTE9CulBE E6b3hu+/OykvjQoy4brXTgw0ml1WuANJiWs1eT8RDyTtAu0dhr2ifRmz2+JXOjIlZ4Y8 mTyl4kXDud0A9CXqKpaKxJ9nNKW6l2BxExmRzXlbU43YqIqW1zOSa5Mkoh69fi2TYm78 Rj9rrNqIcyd/ix8OuBJu0SqNvKBLutf/yStAZ18LrpA/aAaGfx0RSkDjo3avP7tjUTap Tdcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203165; x=1727807965; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qgj9ZUQAEWKSnyua1fCPRFwXSsrGMjRXKQ10U9x+mqE=; b=aqTd/2jK5dgaieGZReQ9teMubPQmURQtjMJni6YV/fXmiioXgo+WPGcLOBap9Ef0bo BteU/ciwqFMU7ufc2qgkIxIaXNuBd6CXzf+yfMgqhLb3SAza7cQgGVORGX61fzvMq7Ip e/5hTOxuvGrel9Wc883eEkF2QKkBqbjQAMoo7AwN0b/Grqw5aYKRY51WY0vE5o6wLGW5 Z9rFo9ISgvYJ6iUZbPvtpm6QrBWrxASKU5wYeSXGwT3z77joUApotCsr3A9gxN8hwvPc kI3AirwZtcXJ6ZFBPYreByA7zRzq10Lil4+6dekn9fGjmv/uWyZUYi2BxY9WDnQMwpcu eWbQ== X-Gm-Message-State: AOJu0YwAiFMyVRq8nTvsb+b2DAimyn8TU6nd2+VhrzLgP4dlRcFH9WCu bbpTxDYsLUuBgtr2JZcHqHSIQtSSTO2RntayUNi6CGvPQXd32EYmShhospWt X-Google-Smtp-Source: AGHT+IG54w6mdol1psy5pWYEP0QRwKQ2aWkJUR+cxS4OXym9oCc1J8zLLxT48MpwQKlvPGzo1ymIcw== X-Received: by 2002:a17:90b:4a52:b0:2d3:ce99:44b6 with SMTP id 98e67ed59e1d1-2e06afbbdbfmr82388a91.29.1727203164413; Tue, 24 Sep 2024 11:39:24 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:23 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Dave Chinner , Luis Chamberlain , Christoph Hellwig , "Darrick J. Wong" , Chandan Babu R , Leah Rumancik Subject: [PATCH 6.1 20/26] xfs: fix unlink vs cluster buffer instantiation race Date: Tue, 24 Sep 2024 11:38:45 -0700 Message-ID: <20240924183851.1901667-21-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Dave Chinner [ Upstream commit 348a1983cf4cf5099fc398438a968443af4c9f65 ] Luis has been reporting an assert failure when freeing an inode cluster during inode inactivation for a while. The assert looks like: XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs] RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7 RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0 RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000 R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80 FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs process_one_work (kernel/workqueue.c:3231) worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2)) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) And occurs when the the inode precommit handlers is attempt to look up the inode cluster buffer to attach the inode for writeback. The trail of logic that I can reconstruct is as follows. 1. the inode is clean when inodegc runs, so it is not attached to a cluster buffer when precommit runs. 2. #1 implies the inode cluster buffer may be clean and not pinned by dirty inodes when inodegc runs. 3. #2 implies that the inode cluster buffer can be reclaimed by memory pressure at any time. 4. The assert failure implies that the cluster buffer was attached to the transaction, but not marked done. It had been accessed earlier in the transaction, but not marked done. 5. #4 implies the cluster buffer has been invalidated (i.e. marked stale). 6. #5 implies that the inode cluster buffer was instantiated uninitialised in the transaction in xfs_ifree_cluster(), which only instantiates the buffers to invalidate them and never marks them as done. Given factors 1-3, this issue is highly dependent on timing and environmental factors. Hence the issue can be very difficult to reproduce in some situations, but highly reliable in others. Luis has an environment where it can be reproduced easily by g/531 but, OTOH, I've reproduced it only once in ~2000 cycles of g/531. I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag on the cluster buffers, even though they may not be initialised. The reasons why I think this is safe are: 1. A buffer cache lookup hit on a XBF_STALE buffer will clear the XBF_DONE flag. Hence all future users of the buffer know they have to re-initialise the contents before use and mark it done themselves. 2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which means the buffer remains locked until the journal commit completes and the buffer is unpinned. Hence once marked XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only context that can access the freed buffer is the currently running transaction. 3. #2 implies that future buffer lookups in the currently running transaction will hit the transaction match code and not the buffer cache. Hence XBF_STALE and XFS_BLI_STALE will not be cleared unless the transaction initialises and logs the buffer with valid contents again. At which point, the buffer will be marked marked XBF_DONE again, so having XBF_DONE already set on the stale buffer is a moot point. 4. #2 also implies that any concurrent access to that cluster buffer will block waiting on the buffer lock until the inode cluster has been fully freed and is no longer an active inode cluster buffer. 5. #4 + #1 means that any future user of the disk range of that buffer will always see the range of disk blocks covered by the cluster buffer as not done, and hence must initialise the contents themselves. 6. Setting XBF_DONE in xfs_ifree_cluster() then means the unlinked inode precommit code will see a XBF_DONE buffer from the transaction match as it expects. It can then attach the stale but newly dirtied inode to the stale but newly dirtied cluster buffer without unexpected failures. The stale buffer will then sail through the journal and do the right thing with the attached stale inode during unpin. Hence the fix is just one line of extra code. The explanation of why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and complex.... Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock") Signed-off-by: Dave Chinner Tested-by: Luis Chamberlain Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_inode.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 4e73dd4a4d82..8c7cbe7f47ef 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2297,11 +2297,26 @@ xfs_ifree_cluster( * This buffer may not have been correctly initialised as we * didn't read it from disk. That's not important because we are * only using to mark the buffer as stale in the log, and to - * attach stale cached inodes on it. That means it will never be - * dispatched for IO. If it is, we want to know about it, and we - * want it to fail. We can acheive this by adding a write - * verifier to the buffer. + * attach stale cached inodes on it. + * + * For the inode that triggered the cluster freeing, this + * attachment may occur in xfs_inode_item_precommit() after we + * have marked this buffer stale. If this buffer was not in + * memory before xfs_ifree_cluster() started, it will not be + * marked XBF_DONE and this will cause problems later in + * xfs_inode_item_precommit() when we trip over a (stale, !done) + * buffer to attached to the transaction. + * + * Hence we have to mark the buffer as XFS_DONE here. This is + * safe because we are also marking the buffer as XBF_STALE and + * XFS_BLI_STALE. That means it will never be dispatched for + * IO and it won't be unlocked until the cluster freeing has + * been committed to the journal and the buffer unpinned. If it + * is written, we want to know about it, and we want it to + * fail. We can acheive this by adding a write verifier to the + * buffer. */ + bp->b_flags |= XBF_DONE; bp->b_ops = &xfs_inode_buf_ops; /* From patchwork Tue Sep 24 18:38:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811116 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6411C11CA0; Tue, 24 Sep 2024 18:39:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203167; cv=none; b=qsU1ZojDx/4/eKCjVsQF9xpAHxJkvVw4087hORCfDBvvUbk81YPbM5FYN5RLBt923fkhIuzrkII4wVygT7pG3Yteuo0vsoBy+KHylMG3HJIPhSTwksCbgHXBlZj6u4TT3I7vrrFgk4EezAAKCQD0eML5vfAzoBjY7AASD7mWdvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203167; c=relaxed/simple; bh=mJ+0+PF9dIrRrbKqrqCbHxEPbAmviyK6H/hUlOm3LgQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nT7JAS1cfsFt2h0M/hie7yPVS2S8qNXoiplGrJyZB2IGEnxcCeEHzQo477LFK7ykGbD7mVeGZZMhSsuUNVwL+ANnHgteuSmXYvgLAZXumnqazkP3CRs+cQPSQKzL3Mq0KoFiJVlwT0rwf0FkO3GMSyq3EOvPR+YOMmylsI00Zbg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U3Bnwuv1; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U3Bnwuv1" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-20570b42f24so65399385ad.1; Tue, 24 Sep 2024 11:39:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203165; x=1727807965; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wdJjExmLoyiJTe5TvB/BM44OCYja0VM81kfpCdWY7bY=; b=U3Bnwuv1CAL/wYySBCfyrp++J48QcZR/jB2yMTuuPwWobZyRtslSUIp8D60yBDaDDz PA/zT/tk6oT1L1gFNgE0mK6n5BarLUxjt6abXyKMt6SdXXjyuo3hPBIdi0rNNh5xiFVO Yr7ozYYLPq9kpsLwllG5woyDuHB1wU00zblCYJUwhmPoI0t7tD1h9KLqHewXirNNOgZr J30KnPgGO6cGaCOnnSEd+JYUXK9jSCaukwkOTpv9Oxa9AUqTtFeuxnrxcD07q89RKpV7 xhXUGo3SF75sM+HyCtrkLf+q55ad61sp0xC6DcZJ1TF9GmMPjybFdqDpf8hRnKbCeJr0 S/1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203165; x=1727807965; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wdJjExmLoyiJTe5TvB/BM44OCYja0VM81kfpCdWY7bY=; b=H/mpxp5squI0lNIgX6mqqXeB2ku8TpNawXeZ4H9pyNLRKVjcw0zGMnAX5cYz16YMS0 WO0wJNE75853ExV9+99oxf+7pZt9ZzyoAJM3EOqchGJbK590rQ6haBkemsnzcnMSiva1 X2ru4v+DIyDI17W1w3IGuBT78LgLwslwPSVJnl4AW8ziCyTKGHuswyGBooLwzJa7mkPt 0c+liEiXByAlHL/UsXPjqo1PET0sJ10KxvCkvg+uTYZHVoic2YjP4e0w5OZbcmyucGa9 EXbiGX1qCUvQCG1TqdRyDrRIlpQeunvVFGFvuD+20SXS0tlVsjpojraA/Pt2DquFdWh0 Ll4Q== X-Gm-Message-State: AOJu0YxidSP+luvHqy2qHKg1ZkjGDLdfJqBEt6s/yvq+U+uQ7u6fk2UU 5WN/V2cMxP4zrZBzpYIu0PrFoQFDZY4rBLjo9bVvx6UVVA+mRWhLFyC2JMGd X-Google-Smtp-Source: AGHT+IHvDv2xraWu2YbEC/nqfi4bMStC7yHqZN7MNDK5mfrVob3V+BTgdPd7HueOSWhjBdrkLP2mEw== X-Received: by 2002:a17:90a:d490:b0:2d8:abdf:2ca9 with SMTP id 98e67ed59e1d1-2e06ae2cbfemr113729a91.3.1727203165553; Tue, 24 Sep 2024 11:39:25 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:25 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, Shiyang Ruan , "Darrick J. Wong" , Chandan Babu R , Leah Rumancik Subject: [PATCH 6.1 21/26] xfs: correct calculation for agend and blockcount Date: Tue, 24 Sep 2024 11:38:46 -0700 Message-ID: <20240924183851.1901667-22-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Shiyang Ruan [ Upstream commit 3c90c01e49342b166e5c90ec2c85b220be15a20e ] The agend should be "start + length - 1", then, blockcount should be "end + 1 - start". Correct 2 calculation mistakes. Also, rename "agend" to "range_agend" because it's not the end of the AG per se; it's the end of the dead region within an AG's agblock space. Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"") Signed-off-by: Shiyang Ruan Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_notify_failure.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index 4a9bbd3fe120..a7daa522e00f 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -126,8 +126,8 @@ xfs_dax_notify_ddev_failure( struct xfs_rmap_irec ri_low = { }; struct xfs_rmap_irec ri_high; struct xfs_agf *agf; - xfs_agblock_t agend; struct xfs_perag *pag; + xfs_agblock_t range_agend; pag = xfs_perag_get(mp, agno); error = xfs_alloc_read_agf(pag, tp, 0, &agf_bp); @@ -148,10 +148,10 @@ xfs_dax_notify_ddev_failure( ri_high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsbno); agf = agf_bp->b_addr; - agend = min(be32_to_cpu(agf->agf_length), + range_agend = min(be32_to_cpu(agf->agf_length) - 1, ri_high.rm_startblock); notify.startblock = ri_low.rm_startblock; - notify.blockcount = agend - ri_low.rm_startblock; + notify.blockcount = range_agend + 1 - ri_low.rm_startblock; error = xfs_rmap_query_range(cur, &ri_low, &ri_high, xfs_dax_failure_fn, ¬ify); From patchwork Tue Sep 24 18:38:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811117 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD4361AC884; Tue, 24 Sep 2024 18:39:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203169; cv=none; b=NtbDpLRsHb2rMXzHilKto8H6JMbmmkUdJRw+khEXEF13SS4wmstvXJAlhnwAaEpfFNcQJOJk8mGSdubYzsEGes6l0CQaej7CaREw6a+8adZeGHjN3O4MtzQz36m/3SsVXEoiHt20wqIIQUdym9ERuJH1U9K4BE7ueApykijg28w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203169; c=relaxed/simple; bh=xn1TSxOH9t9D6ncb5xsyR6YORTVeRRBLGHiIjk38a8E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FZroNLDAxx/98TxfIP28Uh+aKMy7O1bVybY151Tv8hYj0P6Ue8ASrksdRQ24u+8pnksTbjID4zstdkNvy8Yo4RoaccCwgVKZAXXlBb+2OVn7lSskvti1nwuA5WNmCQbSvy6HYE2sgW9opS+B1e7yYgArUCHeu+zgyHt3lFlQ0K0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LeIF2BZS; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LeIF2BZS" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-7d916b6a73aso3708468a12.1; Tue, 24 Sep 2024 11:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203167; x=1727807967; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BZUeJe/rRA7SJET1iqfLUgqNK6rrqLShtm4GagY4nYk=; b=LeIF2BZSrpF65XRPtekXpqZuFmIR/NBFfXj0Lt+0n6BofKyXw3DXi7IYydoqpQ6qlV YeDfpbXac59g6qWghYg53qzW8WZ8mIx7RCuh00NxpTAuBRJ4TN9Lnlt4BHQSamidgdzW czIRo8nOP5fbNcsgssYmAt2CSkczzwtB7QKMWa67bx29O93Q1e+yNoEFV0IUlKOGxXbE 5a3bkKGHR2uwYBt7gomN9PfGqThnQJMFVLvJH8epLJp0ucMhsAIQ/P9tvG7ou+Z3kGQg fQ0NlWsgqEUat8ZcKHS3gbIO6xm7lPADkXD5eCkfv2EJd0zFK/8og5QTrt78WWaWBBqK LnrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203167; x=1727807967; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BZUeJe/rRA7SJET1iqfLUgqNK6rrqLShtm4GagY4nYk=; b=HXOVng+zghvj3jvNQfghgAhy1l3qdOOH49yYS63z34Xj+XHH3Sc/agl4XqMptQUlzO /xapHp7sbOdG8askWS3azMnFDKcfYsDRqdhywwCtiEiCxFXI2iR3o7qj0lItrVqar+Ge n5crLa+u2r2b80WAA/SOYd2aHbxHqZgyFpziZ8zCuQuxu40iDvDyCda8gpbW0C3sgdt/ Ys+IBmccIrvuobgVEL6mtYKOEP2UBg/gII5VyZ1uy8NJ6IsOOVkANdsqjDnqApwi9Mks /4Szxg6JwjZufpDsOFPh3mbJCyqsii1w+6/NIRnPuASPiiEs7B9LazPtbHMM6FLwZn4R +AtQ== X-Gm-Message-State: AOJu0YzamEW3firlVEXANRTNAYEU8AdlOYbRcH7/QMJbN/w5l6D7uyx6 a5oslkSqBtk3mq6ndf7c/mGgUCJb4agowZvUw+0r0rQsafOs69Y+QiJGDWDl X-Google-Smtp-Source: AGHT+IGb1dWKCpGdqgojkJahejDdLsMckptxWWQmqFAfxOdlXze9MbjgfgdqYzNSX50EIbc0+Z0AzQ== X-Received: by 2002:a17:90b:c12:b0:2c9:6a38:54e4 with SMTP id 98e67ed59e1d1-2e06b0029fcmr65779a91.41.1727203166628; Tue, 24 Sep 2024 11:39:26 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:26 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 22/26] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Date: Tue, 24 Sep 2024 11:38:47 -0700 Message-ID: <20240924183851.1901667-23-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit f12b96683d6976a3a07fdf3323277c79dbe8f6ab ] Alter the definition of i_prev_unlinked slightly to make it more obvious when an inode with 0 link count is not part of the iunlink bucket lists rooted in the AGI. This distinction is necessary because it is not sufficient to check inode.i_nlink to decide if an inode is on the unlinked list. Updates to i_nlink can happen while holding only ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list (which happen after the nlink update) requires both ILOCK_EXCL and the AGI buffer lock. The next few patches will make it possible to reload an entire unlinked bucket list when we're walking the inode table or performing handle operations and need more than the ability to iget the last inode in the chain. The upcoming directory repair code also needs to be able to make this distinction to decide if a zero link count directory should be moved to the orphanage or allowed to inactivate. An upcoming enhancement to the online AGI fsck code will need this distinction to check and rebuild the AGI unlinked buckets. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 3 ++- fs/xfs/xfs_inode.h | 20 +++++++++++++++++++- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 4b040740678c..6df826fc787c 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -113,7 +113,7 @@ xfs_inode_alloc( INIT_LIST_HEAD(&ip->i_ioend_list); spin_lock_init(&ip->i_ioend_lock); ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; return ip; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 8c7cbe7f47ef..8c1782a72487 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2015,6 +2015,7 @@ xfs_iunlink_insert_inode( } /* Point the head of the list to point to this inode. */ + ip->i_prev_unlinked = NULLAGINO; return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); } @@ -2117,7 +2118,7 @@ xfs_iunlink_remove_inode( } ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; return error; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 225f6f93c2fa..c0211ff2874e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -68,8 +68,21 @@ typedef struct xfs_inode { uint64_t i_diflags2; /* XFS_DIFLAG2_... */ struct timespec64 i_crtime; /* time created */ - /* unlinked list pointers */ + /* + * Unlinked list pointers. These point to the next and previous inodes + * in the AGI unlinked bucket list, respectively. These fields can + * only be updated with the AGI locked. + * + * i_next_unlinked caches di_next_unlinked. + */ xfs_agino_t i_next_unlinked; + + /* + * If the inode is not on an unlinked list, this field is zero. If the + * inode is the first element in an unlinked list, this field is + * NULLAGINO. Otherwise, i_prev_unlinked points to the previous inode + * in the unlinked list. + */ xfs_agino_t i_prev_unlinked; /* VFS inode */ @@ -81,6 +94,11 @@ typedef struct xfs_inode { struct list_head i_ioend_list; } xfs_inode_t; +static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip) +{ + return ip->i_prev_unlinked != 0; +} + static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip) { return ip->i_forkoff > 0; From patchwork Tue Sep 24 18:38:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811118 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B935C11CA0; Tue, 24 Sep 2024 18:39:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203170; cv=none; b=T5aoLaaSV8nz8BSPs8+oyLmoSKI1AOLyLA95rTV780+Dba66s0pH/tBfW3aGCZtaFFS8ekl+KE2/KR37clVmzTvSq4FxIl1DVrwp8Y2UKFrrQQWeU+ogDJpu5MbZF0kxPuH1jqhHeg5iTbzauXVlExOPPoNt1mtXE7Jck8lsg/M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203170; c=relaxed/simple; bh=q+UPFJGYT8nxbelhuMllsIqBuCBZNZmx9ZwJftIMsOk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y7eHUEevgCNhEYj3dklyteyA/UVsgZBbWjBblmCNca1uRa6bmHaRGKvrsJGzBGtN6lPqV3GHxVEQbVa0uwmVUTW3NHnT5EqoH56cb/88iqZCHazD+qkDkgbDXgtnK7jHJFVwYo+xnfNXk0iP7oLkXjeiKgABMJZsX+diByHKIJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aRZZRiWs; arc=none smtp.client-ip=209.85.216.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aRZZRiWs" Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-2d8b68bddeaso4672450a91.1; Tue, 24 Sep 2024 11:39:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203168; x=1727807968; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ebsEMFRoJLREm1Uxmlgwzjc+nX6q5GCxsem7KjM+c2E=; b=aRZZRiWsbdNRxTeW7a+HPSzaDMxbO7eJToUjgeJe0gdxP8PWo+4+O61yGwffPS8dXi 6hgXDesD6A3mQaspUh+BH8JoQ/KOzxBQm3Ozz6WJxl+45wl6Uc/VWmEqFrySQlzMv7+1 FtosiMid/Ojpnu0vAuk1a/EoWgR8WdALaIwXV9zQ/WHKbpVAe1/x3JHoXRwWBFf2P7C1 sFoSBqBb7xBRFMnOHGDFeyuXb2p4/ODj4lTiDvxeHRS/DA+9p+lHtORaDdTLCprYcOHw cbpszaKrxTwGbdfO/s/oci02NSqg62uYM0SWbblEaqfUeJH9CNyKAW2GTJX4lS8n+opu 8gnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203168; x=1727807968; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ebsEMFRoJLREm1Uxmlgwzjc+nX6q5GCxsem7KjM+c2E=; b=HpjA9sPkO5UYTCSjL/42rdZX3XELSKnZwe/USddj+pRKDjhckcXwVybXmpc1AmSzmm enJZttDXhorkjMUIpQMzzs7zEtGlmzxHCkoAW5qTvFRT9CD4WItVNYkYjLqm2EeIlejY rPynoMLs1Bj+ESM+XwtmsZtLcBUqO1/42BgTtgWGoyN7v9yaolHlWqWrJNy856WpF6nb oRRV21Gl99E4L/klJln6G0e6OVMo3njqnVrWyN+yyQkayVIbDwD273FCEcSX4c2Qwjvx kOCuyI+R+ESryPHVAtct63GIhIRhvNd/rP7VKqUsNqI8gSBu/HAoJQvddf0LnS+aIMlD hJlQ== X-Gm-Message-State: AOJu0YznVVsQr309TACAgjXe7oZb1z6a0Cga1qQzSxxJGof3Cabn3kXk ZHEg4Hh+qjW/xjZGMICldlJlKRnwmNmUtpYC89Ik9Y0KM5GaD5rp4XhNbPAr X-Google-Smtp-Source: AGHT+IGiKY4bWh1IYhcQaSSPGyGqCzXDsmMEl8qM8+jN6owT06xTsNH/nsHQScgb/BZgHbs6xdjOeA== X-Received: by 2002:a17:90b:1651:b0:2d8:8ab3:2889 with SMTP id 98e67ed59e1d1-2e06ae53a33mr111268a91.11.1727203167788; Tue, 24 Sep 2024 11:39:27 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:27 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 23/26] xfs: reload entire unlinked bucket lists Date: Tue, 24 Sep 2024 11:38:48 -0700 Message-ID: <20240924183851.1901667-24-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 83771c50e42b92de6740a63e152c96c052d37736 ] The previous patch to reload unrecovered unlinked inodes when adding a newly created inode to the unlinked list is missing a key piece of functionality. It doesn't handle the case that someone calls xfs_iget on an inode that is not the last item in the incore list. For example, if at mount time the ondisk iunlink bucket looks like this: AGI -> 7 -> 22 -> 3 -> NULL None of these three inodes are cached in memory. Now let's say that someone tries to open inode 3 by handle. We need to walk the list to make sure that inodes 7 and 22 get loaded cold, and that the i_prev_unlinked of inode 3 gets set to 22. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_export.c | 6 +++ fs/xfs/xfs_inode.c | 100 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 9 ++++ fs/xfs/xfs_itable.c | 9 ++++ fs/xfs/xfs_trace.h | 20 +++++++++ 5 files changed, 144 insertions(+) diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index 1064c2342876..f71ea786a6d2 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,6 +146,12 @@ xfs_nfs_get_inode( return ERR_PTR(error); } + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_irele(ip); + return ERR_PTR(error); + } + if (VFS_I(ip)->i_generation != generation) { xfs_irele(ip); return ERR_PTR(-ESTALE); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 8c1782a72487..06cdf5dd88af 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3622,3 +3622,103 @@ xfs_iunlock2_io_mmap( if (ip1 != ip2) inode_unlock(VFS_I(ip1)); } + +/* + * Reload the incore inode list for this inode. Caller should ensure that + * the link count cannot change, either by taking ILOCK_SHARED or otherwise + * preventing other threads from executing. + */ +int +xfs_inode_reload_unlinked_bucket( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *agibp; + struct xfs_agi *agi; + struct xfs_perag *pag; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino); + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + xfs_agino_t prev_agino, next_agino; + unsigned int bucket; + bool foundit = false; + int error; + + /* Grab the first inode in the list */ + pag = xfs_perag_get(mp, agno); + error = xfs_ialloc_read_agi(pag, tp, &agibp); + xfs_perag_put(pag); + if (error) + return error; + + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; + agi = agibp->b_addr; + + trace_xfs_inode_reload_unlinked_bucket(ip); + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.", + agino, agno); + + prev_agino = NULLAGINO; + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]); + while (next_agino != NULLAGINO) { + struct xfs_inode *next_ip = NULL; + + if (next_agino == agino) { + /* Found this inode, set its backlink. */ + next_ip = ip; + next_ip->i_prev_unlinked = prev_agino; + foundit = true; + } + if (!next_ip) { + /* Inode already in memory. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* Inode not in memory, reload. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* No incore inode at all? We reloaded it... */ + ASSERT(next_ip != NULL); + error = -EFSCORRUPTED; + break; + } + + prev_agino = next_agino; + next_agino = next_ip->i_next_unlinked; + } + + xfs_trans_brelse(tp, agibp); + /* Should have found this inode somewhere in the iunlinked bucket. */ + if (!error && !foundit) + error = -EFSCORRUPTED; + return error; +} + +/* Decide if this inode is missing its unlinked list and reload it. */ +int +xfs_inode_reload_unlinked( + struct xfs_inode *ip) +{ + struct xfs_trans *tp; + int error; + + error = xfs_trans_alloc_empty(ip->i_mount, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_SHARED); + if (xfs_inode_unlinked_incomplete(ip)) + error = xfs_inode_reload_unlinked_bucket(tp, ip); + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_trans_cancel(tp); + + return error; +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index c0211ff2874e..0467d297531e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -593,4 +593,13 @@ void xfs_end_io(struct work_struct *work); int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); +static inline bool +xfs_inode_unlinked_incomplete( + struct xfs_inode *ip) +{ + return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip); +} +int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip); +int xfs_inode_reload_unlinked(struct xfs_inode *ip); + #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index a1c2bcf65d37..ee3eb3181e3e 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,6 +80,15 @@ xfs_bulkstat_one_int( if (error) goto out; + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked_bucket(tp, ip); + if (error) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_irele(ip); + return error; + } + } + ASSERT(ip != NULL); ASSERT(ip->i_imap.im_blkno != 0); inode = VFS_I(ip); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d713e10dff8a..0cd62031e53f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3704,6 +3704,26 @@ TRACE_EVENT(xfs_iunlink_reload_next, __entry->next_agino) ); +TRACE_EVENT(xfs_inode_reload_unlinked_bucket, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x bucket %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->agino % XFS_AGI_UNLINKED_BUCKETS) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip), From patchwork Tue Sep 24 18:38:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811119 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8C2D1AC884; Tue, 24 Sep 2024 18:39:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203171; cv=none; b=Y7VO4j+0qIK+3z+5EyZ0ccpYMrD3vbaRuiZE5a6KaOcpLAZbqDH72WwTsWKks7Qa5jG2kHk3JlOA/hf5arVrvlyowAVPckmtyPItALAg0n5ROMltAUkVMN5SJWQ4DhZEYvrTbsTkfXDIuJxT0ifSqxyU6joC68IvmRrOEHcr5gI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203171; c=relaxed/simple; bh=6xLbpDitZiIPFDwXSGtLReWkN8SDx5sBqQL1AYJGfcg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HBobkSj2AHBJtnTnIDvRvK2G50+HMj6ibCaiy6Vgacj4rbTyUvMPmhye4QPVL8Iyg68dLx56xKzp2PU0kfbNF2m5YByQ+VPzYmlSCJsRYHe8dV59W0EzobbbxsiS5bX/PkyRCmGqxGRy/Am+NpTbPyblq1svWYe1lJUnYUgfX6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nq+KV6HG; arc=none smtp.client-ip=209.85.215.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nq+KV6HG" Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-7c1324be8easo113583a12.1; Tue, 24 Sep 2024 11:39:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203169; x=1727807969; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dPoDCUrC5ElHF7Zfg36m2emgueaMlx0UA0ALS/+pe9c=; b=nq+KV6HGBS5kTNqdiA0oKTtkqEn3TvvE9TvBGy019lE4rnRGkQOT1V7YA0wXfqujPl 5a3afMnwifuIU0dFFVe6/bpTmtfi6X0H+VQe/KFNe2bRYCd38weiQ/IH9PtpW+90IByv OYbmi51tVJS8zU2OlFgntTwyeNlStEhDj2Rwxy6nm9jMjIYCCRgpXIT+lyMsSh4HAwgt 7R8Qpxobi0n2qlqxIh3hOJkMxwWiF9s6BGaw6lxAJlOjPAOv2HeFeJVaM0xrwZJLCdmp 4YKTPkhTQ4Hs50bilu9oHDjHImH/tJtugFNPlUpGTibim32hfxy+pAbXPdeZ0e7pU91A J8oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203169; x=1727807969; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dPoDCUrC5ElHF7Zfg36m2emgueaMlx0UA0ALS/+pe9c=; b=ljwegWJvGDwMiNI//vBm9oczWTDTIaRl5yljjmjQXQ5SPFvwynNPS4+pjOy+ONErjI HyIQn5gMyjjwQ9gVPWPtpH8QC0gkGsj32KTirdmCpv7L68YiXk8owNxwojFI1voVuJ4r 44sqs0C5D3j5qegZraVEXpdEPyeEF8k1FzBpPIRrYmbI3nNKlpftUHDjrp4aJf/6KSHO FjiCkxDBjEL1lTY8bbwMkx3shJVBiibqKIW9x/Qs1WaHLp8ZJ3PNssyRZFAyOaIvr4N2 t5xVd5RmflOlhaQJDeEeHkXaCdH3Lsan5chIjfTXEXJiHbLvlf0VnJ7/t0XVAPatn09G fQlQ== X-Gm-Message-State: AOJu0YxZ6F/+5Qe9yPPm5DMAiTA6spWovEuhZYrCsr/0kH0KKSN1Sp6c 4XjgRMWHl09ugsHxyd5aZ6pViQ4LeT6QtE6GxuskH68XdhT3QVCpbpS6CsRj X-Google-Smtp-Source: AGHT+IG2YuPgU5PZ7SWyvJmupHd4nlNCiJ7UoQWFkzUlV4BYRvNktIsHPLzgshieNnvXzwd2/yn0FA== X-Received: by 2002:a17:90a:ca0f:b0:2d8:8a3a:7b88 with SMTP id 98e67ed59e1d1-2e0567e8c63mr6294150a91.6.1727203168964; Tue, 24 Sep 2024 11:39:28 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:28 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 24/26] xfs: make inode unlinked bucket recovery work with quotacheck Date: Tue, 24 Sep 2024 11:38:49 -0700 Message-ID: <20240924183851.1901667-25-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 49813a21ed57895b73ec4ed3b99d4beec931496f ] Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_attr_inactive.c | 1 - fs/xfs/xfs_inode.c | 12 +++++++++--- fs/xfs/xfs_inode.h | 5 ++++- fs/xfs/xfs_mount.h | 10 +++++++++- fs/xfs/xfs_qm.c | 7 +++++++ 5 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 5db87b34fb6e..89c7a9f4f930 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -333,7 +333,6 @@ xfs_attr_inactive( int error = 0; mp = dp->i_mount; - ASSERT(! XFS_NOT_DQATTACHED(mp, dp)); xfs_ilock(dp, lock_mode); if (!xfs_inode_has_attr_fork(dp)) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 06cdf5dd88af..00f41bc76bd7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1743,9 +1743,13 @@ xfs_inactive( ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0)) truncate = 1; - error = xfs_qm_dqattach(ip); - if (error) - goto out; + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + xfs_qm_dqdetach(ip); + } else { + error = xfs_qm_dqattach(ip); + if (error) + goto out; + } if (S_ISLNK(VFS_I(ip)->i_mode)) error = xfs_inactive_symlink(ip); @@ -1963,6 +1967,8 @@ xfs_iunlink_reload_next( trace_xfs_iunlink_reload_next(next_ip); rele: ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + if (xfs_is_quotacheck_running(mp) && next_ip) + xfs_iflags_set(next_ip, XFS_IQUOTAUNCHECKED); xfs_irele(next_ip); return error; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 0467d297531e..85395ad2859c 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -344,6 +344,9 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) */ #define XFS_INACTIVATING (1 << 13) +/* Quotacheck is running but inode has not been added to quota counts. */ +#define XFS_IQUOTAUNCHECKED (1 << 14) + /* All inode state flags related to inode reclaim. */ #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \ XFS_IRECLAIM | \ @@ -358,7 +361,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) #define XFS_IRECLAIM_RESET_FLAGS \ (XFS_IRECLAIMABLE | XFS_IRECLAIM | \ XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \ - XFS_INACTIVATING) + XFS_INACTIVATING | XFS_IQUOTAUNCHECKED) /* * Flags for inode locking. diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index c8e72f0d3965..9dc0acf7314f 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -401,6 +401,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID) #define XFS_OPSTATE_WARNED_SHRINK 8 /* Kernel has logged a warning about logged xattr updates being used. */ #define XFS_OPSTATE_WARNED_LARP 9 +/* Mount time quotacheck is running */ +#define XFS_OPSTATE_QUOTACHECK_RUNNING 10 #define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ @@ -423,6 +425,11 @@ __XFS_IS_OPSTATE(inode32, INODE32) __XFS_IS_OPSTATE(readonly, READONLY) __XFS_IS_OPSTATE(inodegc_enabled, INODEGC_ENABLED) __XFS_IS_OPSTATE(blockgc_enabled, BLOCKGC_ENABLED) +#ifdef CONFIG_XFS_QUOTA +__XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING) +#else +# define xfs_is_quotacheck_running(mp) (false) +#endif static inline bool xfs_should_warn(struct xfs_mount *mp, long nr) @@ -440,7 +447,8 @@ xfs_should_warn(struct xfs_mount *mp, long nr) { (1UL << XFS_OPSTATE_BLOCKGC_ENABLED), "blockgc" }, \ { (1UL << XFS_OPSTATE_WARNED_SCRUB), "wscrub" }, \ { (1UL << XFS_OPSTATE_WARNED_SHRINK), "wshrink" }, \ - { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" } + { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }, \ + { (1UL << XFS_OPSTATE_QUOTACHECK_RUNNING), "quotacheck" } /* * Max and min values for mount-option defined I/O diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index f51960d7dcbd..bbd0805fa94e 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust( if (error) return error; + error = xfs_inode_reload_unlinked(ip); + if (error) + goto error0; + ASSERT(ip->i_delayed_blks == 0); if (XFS_IS_REALTIME_INODE(ip)) { @@ -1173,6 +1177,7 @@ xfs_qm_dqusage_adjust( } nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks; + xfs_iflags_clear(ip, XFS_IQUOTAUNCHECKED); /* * Add the (disk blocks and inode) resources occupied by this @@ -1319,8 +1324,10 @@ xfs_qm_quotacheck( flags |= XFS_PQUOTA_CHKD; } + xfs_set_quotacheck_running(mp); error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); + xfs_clear_quotacheck_running(mp); /* * On error, the inode walk may have partially populated the dquot From patchwork Tue Sep 24 18:38:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811120 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A0D51AE850; Tue, 24 Sep 2024 18:39:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203172; cv=none; b=GBVuznR7LlY2tzQLt2A2FnaiRrat/+hcLN6SLx9it+D3Xzc3I9bGjxb2n+EI99/IAFHR8Gd7fx+5kvYOkDT6pd1y6MVrJ52OPSBthpj5RHGFeuEMAghEy4K4RHXT+8VW7uCRIE6heA03zHNSG4IO2Z//+1kkpfTsteGOwYDiqrY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203172; c=relaxed/simple; bh=MJ6LUi4hISTd8UZRPe2I8Sknsxu1sivFQVTG5DDtNI8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZHbVMM+VmjW9UbnpYM/0x44siB9RusN3iK/ObpHC7Xez9lzLPNSaqcjS+VliC4QuO4Iezyxe+CebyPM4SohFLOliQP8Mwqpf6UH2ADo+qWpTuKdu0PgS5AyvjGi04KFSdn5c/q5CZCuSHtbCAO5JXXlpRt2Vgr1YbMwx9Jt6B7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PrTkBXjK; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PrTkBXjK" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2d88c0f8e79so4688787a91.3; Tue, 24 Sep 2024 11:39:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203170; x=1727807970; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GwHenpa9G1DzD4VLD9ZfYSVIPY3k6s5QJDZYwbzAJLQ=; b=PrTkBXjK9OjXvJfxzYDesdJoc6nEpvl/z09qjRKppGYmWqjPhPp4xzBOizvJgzlCCI JAda1KPG9pWXHgj5F3rGTEyfD8MDCGYjxHUIrTRjNFSgRRjWZBCrI9AjcvtzWw8MEiVd q+ZGV7uMMDgOgMcob6/y9WPfulZQt/YbRhLmlQhT8fz6D1uSSRKa7c/gA1JiNvBLA3qx rUVPI7+FO3f8Rh7imYIZu2MoOg6uB7qoKuAd/IaJOQ6TMnvF/B6nkBBG0Ipb/Cvj6SQk KyhvqdEAd7KGU5ajGH8Q/Y1uhKcP4idgERg0UkYSqpp2GARKp1p6dV+reUizd7wp+VuR phcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203170; x=1727807970; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GwHenpa9G1DzD4VLD9ZfYSVIPY3k6s5QJDZYwbzAJLQ=; b=YCWcFKjSmz8wNX6rgFhnqmiide59Z/6BWwmsBc37vXiOO/UPW2Fb47cZI6Bo0JhaFn ApOw+Tie1hRv8Y9XKGjR5fpGYXX45bgzFnZxbwatbw17vt4rQWuNSoI0oXJGgBNcaxoK /eLkO9tGzyBc9lXHipWcj0eRXqiWd+9iIcFzGZpoEcpuNx5XroXbL0hsVlXW7+jEm3Hc Sw2deAPwaM0X30F+mvk9BuN2fpdjOcwcQAlvYkk3bmvivmtX5NyMehFf2kgg1CUhFgoc vmbh8NFaPRGOnpnN16QBsPvz6JxOP3Bs7TqeGjmkMh+7jllaWuw1WTLzXpNoYF5uV3db KdFg== X-Gm-Message-State: AOJu0YwhnfnrYP3J8ippXiRgMaxXMDGUOojm4djWCPJTWFdzOMEbLFsW +yyr94M60sRlrL5VBfkpqJX6Q9ugawfTAPSxkFT7uR25Z3+S9LAeO/3UkqI3 X-Google-Smtp-Source: AGHT+IGCY0k8VvBMPabQgk6LNQvv0O1qgN4RR2JAcnG5TJGmr95yd90Qsv8G+9neTsSlbVWe4CfERQ== X-Received: by 2002:a17:90b:96:b0:2d8:a672:186d with SMTP id 98e67ed59e1d1-2e06ae7896fmr122247a91.20.1727203170220; Tue, 24 Sep 2024 11:39:30 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:29 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 25/26] xfs: fix reloading entire unlinked bucket lists Date: Tue, 24 Sep 2024 11:38:50 -0700 Message-ID: <20240924183851.1901667-26-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 537c013b140d373d1ffe6290b841dc00e67effaa ] During review of the patcheset that provided reloading of the incore iunlink list, Dave made a few suggestions, and I updated the copy in my dev tree. Unfortunately, I then got distracted by ... who even knows what ... and forgot to backport those changes from my dev tree to my release candidate branch. I then sent multiple pull requests with stale patches, and that's what was merged into -rc3. So. This patch re-adds the use of an unlocked iunlink list check to determine if we want to allocate the resources to recreate the incore list. Since lost iunlinked inodes are supposed to be rare, this change helps us avoid paying the transaction and AGF locking costs every time we open any inode. This also re-adds the shutdowns on failure, and re-applies the restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and re-adds a requested comment about the quotachecking code. Retain the original RVB tag from Dave since there's no code change from the last submission. Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/xfs_export.c | 16 +++++++++++---- fs/xfs/xfs_inode.c | 48 +++++++++++++++++++++++++++++++++------------ fs/xfs/xfs_itable.c | 2 ++ fs/xfs/xfs_qm.c | 15 +++++++++++--- 4 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c index f71ea786a6d2..7cd09c3a82cb 100644 --- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,10 +146,18 @@ xfs_nfs_get_inode( return ERR_PTR(error); } - error = xfs_inode_reload_unlinked(ip); - if (error) { - xfs_irele(ip); - return ERR_PTR(error); + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + xfs_irele(ip); + return ERR_PTR(error); + } } if (VFS_I(ip)->i_generation != generation) { diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 00f41bc76bd7..909085269227 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1744,6 +1744,14 @@ xfs_inactive( truncate = 1; if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + /* + * If this inode is being inactivated during a quotacheck and + * has not yet been scanned by quotacheck, we /must/ remove + * the dquots from the inode before inactivation changes the + * block and inode counts. Most probably this is a result of + * reloading the incore iunlinked list to purge unrecovered + * unlinked inodes. + */ xfs_qm_dqdetach(ip); } else { error = xfs_qm_dqattach(ip); @@ -3657,6 +3665,16 @@ xfs_inode_reload_unlinked_bucket( if (error) return error; + /* + * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the + * incore unlinked list pointers for this inode. Check once more to + * see if we raced with anyone else to reload the unlinked list. + */ + if (!xfs_inode_unlinked_incomplete(ip)) { + foundit = true; + goto out_agibp; + } + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; agi = agibp->b_addr; @@ -3671,25 +3689,27 @@ xfs_inode_reload_unlinked_bucket( while (next_agino != NULLAGINO) { struct xfs_inode *next_ip = NULL; + /* Found this caller's inode, set its backlink. */ if (next_agino == agino) { - /* Found this inode, set its backlink. */ next_ip = ip; next_ip->i_prev_unlinked = prev_agino; foundit = true; + goto next_inode; } - if (!next_ip) { - /* Inode already in memory. */ - next_ip = xfs_iunlink_lookup(pag, next_agino); - } - if (!next_ip) { - /* Inode not in memory, reload. */ - error = xfs_iunlink_reload_next(tp, agibp, prev_agino, - next_agino); - if (error) - break; - next_ip = xfs_iunlink_lookup(pag, next_agino); - } + /* Try in-memory lookup first. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + if (next_ip) + goto next_inode; + + /* Inode not in memory, try reloading it. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + /* Grab the reloaded inode. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); if (!next_ip) { /* No incore inode at all? We reloaded it... */ ASSERT(next_ip != NULL); @@ -3697,10 +3717,12 @@ xfs_inode_reload_unlinked_bucket( break; } +next_inode: prev_agino = next_agino; next_agino = next_ip->i_next_unlinked; } +out_agibp: xfs_trans_brelse(tp, agibp); /* Should have found this inode somewhere in the iunlinked bucket. */ if (!error && !foundit) diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index ee3eb3181e3e..44d603364d5a 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,10 +80,12 @@ xfs_bulkstat_one_int( if (error) goto out; + /* Reload the incore unlinked list to avoid failure in inodegc. */ if (xfs_inode_unlinked_incomplete(ip)) { error = xfs_inode_reload_unlinked_bucket(tp, ip); if (error) { xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_irele(ip); return error; } diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index bbd0805fa94e..bd907bbc389c 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,9 +1160,18 @@ xfs_qm_dqusage_adjust( if (error) return error; - error = xfs_inode_reload_unlinked(ip); - if (error) - goto error0; + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + goto error0; + } + } ASSERT(ip->i_delayed_blks == 0); From patchwork Tue Sep 24 18:38:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leah Rumancik X-Patchwork-Id: 13811121 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 645DE1AE85E; Tue, 24 Sep 2024 18:39:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203173; cv=none; b=Q1PZFpFQLCgZRqOMTN+ZZFgswWf6TFWWRAycuKyVRhi99QkftPaa/GNyNvpDG/4Wa/3yCVI0Ad/3NlCz4gwDfCs7ZqioqwCEOWghfg25b+kQKdCyyGLZmWeGjxUfKXf1YtVv34fvOwHbUwighgKR1Ns3GoZMU+IWxw7xJd9cOb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727203173; c=relaxed/simple; bh=8Ixj3FOqUYJNs/W6ZWsxAuz3Mg1c1IJvpiZ0tJJJOGg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZvtykTxBg0qezM5GFDnB3gkBrHg1Hy4mRzb/XpBAcBne2LplrFRRSLQJUgHKS+1+DrVhMrfH/6H3Rzf2Ktf4msYDX+jayqj11jfw0opWlwitI82cHM1H+9lFXW1lKXblHqhBGeyyi2MYyXZaKYv/+vCh6e8FBlC84cpdBgVX26k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UU+29lEB; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UU+29lEB" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-2d8881850d9so4781419a91.3; Tue, 24 Sep 2024 11:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727203171; x=1727807971; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nO3YBG5o73Vw2lWjqc0u+EarG5ykEMqbJfL066qNrLI=; b=UU+29lEBxb5tmjQV5xUNlAx+jGQx9cfmLmabGzUO5MTwh66VA6t5v96AJGSOE8mUno A+uX4FB3QtsCHUhQq03dQOGqtEjVTeMg/cVLLt6Txt74lL8t6i9V5hF7iBHkjFYkHvsP ii28nndagNTzqZrd69/yzCqs7uCoQjyOoPl9dCJf6ZzbE3Yq8LBKg46KxXxnD5vhoaaU pjpmVjEqJeY+omX7t/SF+55aRmC9gGIoLs/t2XHNs63gvLcm1/vkIEUicZAIfYaEHqbI kPPNzEiPFBNNnVcdRj8Tf9MCHpo6bClkq7addxV5Mad6lXIa8jxvKNbfU0xZ75XPxCRM aQIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727203171; x=1727807971; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nO3YBG5o73Vw2lWjqc0u+EarG5ykEMqbJfL066qNrLI=; b=GX22r7MCjTcUcDKAx5GYn2+Ewrz0J87yqut701wcTP/9SkQEI+uISuWcNmaY2iTMdY hhfxjpOzgBgAtJ18dvzfifWBlpyiqRzGrXjhLrAJLcdrExusjRmD8+YXq2Vbvxt+oe82 Vs1L+wEamV/wpgv8wQxcAGCuBX1JSrIncxJ0gt0c+bgmsX7tGuXGDtPDg+dCKf7g5L7G uzHlnQpo+t3fFLB+dIALhXwhjSVHcKY8roL1hX+5Q+nGLlC7+sshF8Q9KPpaPFCgKj3i 0rRk5F55d8UQ57MjZdcnXj9QGBK/DgCwLA6I9CVjQt0KRikMP9bU8FdZsHlZ4Fnp8XCA Jagw== X-Gm-Message-State: AOJu0YxBYrtmeLXi/1cfHM15kJzlMaKMRe1DH4dp1WMTA7H5KaOVw63Y 8FvM72oqjAEMSuunG99uLtwoGunrRGf75nLFhy+gLvBG9GsWjdXopY8yk4eF X-Google-Smtp-Source: AGHT+IEe7ekUuaJs1FtbV7LouEUTTdoCxb5fDVD386fAI8gR3WD/a+5wsFjCHrVkynR7VvvoT3ZRDQ== X-Received: by 2002:a17:90b:94f:b0:2d8:a373:481e with SMTP id 98e67ed59e1d1-2e06ae8446fmr104388a91.24.1727203171508; Tue, 24 Sep 2024 11:39:31 -0700 (PDT) Received: from lrumancik.svl.corp.google.com ([2620:15c:2a3:200:3987:6b77:4621:58ca]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2dd6ef93b2fsm11644349a91.49.2024.09.24.11.39.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 11:39:31 -0700 (PDT) From: Leah Rumancik To: stable@vger.kernel.org Cc: linux-xfs@vger.kernel.org, amir73il@gmail.com, chandan.babu@oracle.com, cem@kernel.org, catherine.hoang@oracle.com, "Darrick J. Wong" , Dave Chinner , Dave Chinner , Leah Rumancik , Chandan Babu R Subject: [PATCH 6.1 26/26] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs Date: Tue, 24 Sep 2024 11:38:51 -0700 Message-ID: <20240924183851.1901667-27-leah.rumancik@gmail.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog In-Reply-To: <20240924183851.1901667-1-leah.rumancik@gmail.com> References: <20240924183851.1901667-1-leah.rumancik@gmail.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: "Darrick J. Wong" [ Upstream commit 8e698ee72c4ecbbf18264568eb310875839fd601 ] Through generic/300, I discovered that mkfs.xfs creates corrupt filesystems when given these parameters: Filesystems formatted with --unsupported are not supported!! meta-data=/dev/sda isize=512 agcount=8, agsize=16352 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=130816, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 blks Discarding blocks...Done. Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 16:30:50: zeroing log - 16320 of 16320 blocks done - scan filesystem freespace and inode maps... agf_freeblks 25, counted 0 in ag 4 sb_fdblocks 8823, counted 8798 The root cause of this problem is the numrecs handling in xfs_freesp_init_recs, which is used to initialize a new AG. Prior to calling the function, we set up the new bnobt block with numrecs == 1 and rely on _freesp_init_recs to format that new record. If the last record created has a blockcount of zero, then it sets numrecs = 0. That last bit isn't correct if the AG contains the log, the start of the log is not immediately after the initial blocks due to stripe alignment, and the end of the log is perfectly aligned with the end of the AG. For this case, we actually formatted a single bnobt record to handle the free space before the start of the (stripe aligned) log, and incremented arec to try to format a second record. That second record turned out to be unnecessary, so what we really want is to leave numrecs at 1. The numrecs handling itself is overly complicated because a different function sets numrecs == 1. Change the bnobt creation code to start with numrecs set to zero and only increment it after successfully formatting a free space extent into the btree block. Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R --- fs/xfs/libxfs/xfs_ag.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index bb0c700afe3c..bf47efe08a58 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -415,10 +415,12 @@ xfs_freesp_init_recs( ASSERT(start >= mp->m_ag_prealloc_blocks); if (start != mp->m_ag_prealloc_blocks) { /* - * Modify first record to pad stripe align of log + * Modify first record to pad stripe align of log and + * bump the record count. */ arec->ar_blockcount = cpu_to_be32(start - mp->m_ag_prealloc_blocks); + be16_add_cpu(&block->bb_numrecs, 1); nrec = arec + 1; /* @@ -429,7 +431,6 @@ xfs_freesp_init_recs( be32_to_cpu(arec->ar_startblock) + be32_to_cpu(arec->ar_blockcount)); arec = nrec; - be16_add_cpu(&block->bb_numrecs, 1); } /* * Change record start to after the internal log @@ -438,15 +439,13 @@ xfs_freesp_init_recs( } /* - * Calculate the record block count and check for the case where - * the log might have consumed all available space in the AG. If - * so, reset the record count to 0 to avoid exposure of an invalid - * record start block. + * Calculate the block count of this record; if it is nonzero, + * increment the record count. */ arec->ar_blockcount = cpu_to_be32(id->agsize - be32_to_cpu(arec->ar_startblock)); - if (!arec->ar_blockcount) - block->bb_numrecs = 0; + if (arec->ar_blockcount) + be16_add_cpu(&block->bb_numrecs, 1); } /* @@ -458,7 +457,7 @@ xfs_bnoroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); } @@ -468,7 +467,7 @@ xfs_cntroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); }