[v4] xfs: load uncached unlinked inodes into memory on demand

From: Darrick J. Wong <djwong@kernel.org>

From: Darrick J. Wong <djwong@kernel.org>

shrikanth hegde reports that filesystems fail shortly after mount with
the following failure:

	WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]

This of course is the WARN_ON_ONCE in xfs_iunlink_lookup:

	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
	if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... }

From diagnostic data collected by the bug reporters, it would appear
that we cleanly mounted a filesystem that contained unlinked inodes.
Unlinked inodes are only processed as a final step of log recovery,
which means that clean mounts do not process the unlinked list at all.

Prior to the introduction of the incore unlinked lists, this wasn't a
problem because the unlink code would (very expensively) traverse the
entire ondisk metadata iunlink chain to keep things up to date.
However, the incore unlinked list code complains when it realizes that
it is out of sync with the ondisk metadata and shuts down the fs, which
is bad.

Ritesh proposed to solve this problem by unconditionally parsing the
unlinked lists at mount time, but this imposes a mount time cost for
every filesystem to catch something that should be very infrequent.
Instead, let's target the places where we can encounter a next_unlinked
pointer that refers to an inode that is not in cache, and load it into
cache.

Note: This patch does not address the problem of iget loading an inode
from the middle of the iunlink list and needing to set i_prev_unlinked
correctly.

Eric Sandeen adds:

"One way to end up in this situation is to have at one point run a very
old kernel which did not contain this commit, merged in kernel v4.14:

commit 6f4a1eefdd0ad4561543270a7fceadabcca075dd
Author: Eric Sandeen <sandeen@sandeen.net>
Date:   Tue Aug 8 18:21:49 2017 -0700

    xfs: toggle readonly state around xfs_log_mount_finish

    When we do log recovery on a readonly mount, unlinked inode
    processing does not happen due to the readonly checks in
    xfs_inactive(), which are trying to prevent any I/O on a
    readonly mount.

    This is misguided - we do I/O on readonly mounts all the time,
    for consistency; for example, log recovery.  So do the same
    RDONLY flag twiddling around xfs_log_mount_finish() as we
    do around xfs_log_mount(), for the same reason.

    This all cries out for a big rework but for now this is a
    simple fix to an obvious problem.

    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

"so if you:

1) Crash with unlinked inodes
2) mount -o ro <recovers log but skips unlinked inode recovery>
3) mount -o remount,rw
4) umount <writes clean log record>

"You now have a filesystem with on-disk unlinked inodes and a clean log,
and those inodes won't get cleaned up until log recovery runs again or
xfs_repair is run.

"And in testing an old OS (RHEL7) it does seem that the root filesystem
goes through a mount -o ro, mount -o remount,rw transition at boot time.
So this situation may be somewhat common."

Reported-by: shrikanth hegde <sshegde@linux.vnet.ibm.com>
Triaged-by: Ritesh Harjani <ritesh.list@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
---
v4: update the comment for update_backref
v3: add RVB tags and historical context from sandeen
v2: log that we're doing runtime recovery, dont mess with DONTCACHE,
    and actually return ENOLINK
---
 fs/xfs/xfs_inode.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.h |   25 ++++++++++++++++
 2 files changed, 100 insertions(+), 5 deletions(-)

Message ID	20230901160020.GT28186@frogsfrogsfrogs (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@vger.kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FE0BCA0FE4 for <linux-xfs@archiver.kernel.org>; Fri, 1 Sep 2023 16:00:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231607AbjIAQAd (ORCPT <rfc822;linux-xfs@archiver.kernel.org>); Fri, 1 Sep 2023 12:00:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237420AbjIAQAd (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Fri, 1 Sep 2023 12:00:33 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03E9A1715 for <linux-xfs@vger.kernel.org>; Fri, 1 Sep 2023 09:00:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A98FCB825C2 for <linux-xfs@vger.kernel.org>; Fri, 1 Sep 2023 16:00:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 59E2FC433C7; Fri, 1 Sep 2023 16:00:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1693584021; bh=BsmQoFTgFO+5Nj9Kw7/YVh62wt3GDN80Q1ybWRvOqoc=; h=Date:From:To:Cc:Subject:From; b=cm4d25ttWi7GweQ2Q2VrCadpMoasovVDv9Tpm632s9R9lX8ztRD5/MdZLDwdfcYGl a8tMvcpUKi3PUgTbUpIzEkIstwD6GeJE1Yu5kery+/DrKKLc1qxADByo54FZaO9LHy S24lIV4maagVnh8Y+5vj8Bd68lCIauq4tk8RINJfpHbKRIC8n9VEVJXWqjBc1tsigZ m/5DnaO6sUzEBY8kz7GoS14Y5qwZd2Y59XUa/pC2EZYs8F5kxOjHiZ31XrLLkBLFeh /Vp1hi892fSXePyFVK18Msanknnf6VkXA10KxfeXup6MM9w5XvjbEmt3wNBTjmlQBp f7xYvl2kuBjbQ== Date: Fri, 1 Sep 2023 09:00:20 -0700 From: "Darrick J. Wong" <djwong@kernel.org> To: Chandan Babu R <chandanrlinux@gmail.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>, Dave Chinner <david@fromorbit.com>, Eric Sandeen <sandeen@redhat.com>, xfs <linux-xfs@vger.kernel.org>, shrikanth hegde <sshegde@linux.vnet.ibm.com>, Bill O'Donnell <bodonnel@redhat.com>, Eric Sandeen <sandeen@sandeen.net> Subject: [PATCH v4] xfs: load uncached unlinked inodes into memory on demand Message-ID: <20230901160020.GT28186@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	[v4] xfs: load uncached unlinked inodes into memory on demand \| expand [v4] xfs: load uncached unlinked inodes into memory on demand

[v4] xfs: load uncached unlinked inodes into memory on demand

Commit Message

Patch