[01/30] xfs: Don't allow logging of XFS_ISTALE inodes

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

In tracking down a problem in this patchset, I discovered we are
reclaiming dirty stale inodes. This wasn't discovered until inodes
were always attached to the cluster buffer and then the rcu callback
that freed inodes was assert failing because the inode still had an
active pointer to the cluster buffer after it had been reclaimed.

Debugging the issue indicated that this was a pre-existing issue
resulting from the way the inodes are handled in xfs_inactive_ifree.
When we free a cluster buffer from xfs_ifree_cluster, all the inodes
in cache are marked XFS_ISTALE. Those that are clean have nothing
else done to them and so eventually get cleaned up by background
reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
XFS_ISTALE.

On journal commit dirty stale inodes as are handled by both
buffer and inode log items to run though xfs_istale_done() and
removed from the AIL (buffer log item commit) or the log item will
simply unpin it because the buffer log item will clean it. What happens
to any specific inode is entirely dependent on which log item wins
the commit race, but the result is the same - stale inodes are
clean, not attached to the cluster buffer, and not in the AIL. Hence
inode reclaim can just free these inodes without further care.

However, if the stale inode is relogged, it gets dirtied again and
relogged into the CIL. Most of the time this isn't an issue, because
relogging simply changes the inode's location in the current
checkpoint. Problems arise, however, when the CIL checkpoints
between two transactions in the xfs_inactive_ifree() deferops
processing. This results in the XFS_ISTALE inode being redirtied
and inserted into the CIL without any of the other stale cluster
buffer infrastructure being in place.

Hence on journal commit, it simply gets unpinned, so it remains
dirty in memory. Everything in inode writeback avoids XFS_ISTALE
inodes so it can't be written back, and it is not tracked in the AIL
so there's not even a trigger to attempt to clean the inode. Hence
the inode just sits dirty in memory until inode reclaim comes along,
sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
of a dirty inode caused use after free, list corruptions and other
nasty issues later in this patchset.

Hence this patch addresses a violation of the "never log XFS_ISTALE
inodes" caused by the deferops processing rolling a transaction
and relogging a stale inode in xfs_inactive_free. It also adds a
bunch of asserts to catch this problem in debug kernels so that
we don't reintroduce this problem in future.

Reproducer for this issue was generic/558 on a v4 filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_inode.c |  2 ++
 fs/xfs/xfs_icache.c             |  3 ++-
 fs/xfs/xfs_inode.c              | 25 ++++++++++++++++++++++---
 3 files changed, 26 insertions(+), 4 deletions(-)

Message ID	20200601214251.4167140-2-david@fromorbit.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=XPcr=7O=vger.kernel.org=linux-xfs-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB1851392 for <patchwork-linux-xfs@patchwork.kernel.org>; Mon, 1 Jun 2020 21:42:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACC5D2076B for <patchwork-linux-xfs@patchwork.kernel.org>; Mon, 1 Jun 2020 21:42:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728541AbgFAVm6 (ORCPT <rfc822;patchwork-linux-xfs@patchwork.kernel.org>); Mon, 1 Jun 2020 17:42:58 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:57850 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728336AbgFAVm4 (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Mon, 1 Jun 2020 17:42:56 -0400 Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 543693A3E5F for <linux-xfs@vger.kernel.org>; Tue, 2 Jun 2020 07:42:52 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from <david@fromorbit.com>) id 1jfsCp-0000W7-Qj for linux-xfs@vger.kernel.org; Tue, 02 Jun 2020 07:42:51 +1000 Received: from dave by discord.disaster.area with local (Exim 4.93) (envelope-from <david@fromorbit.com>) id 1jfsCp-00HU4i-Gw for linux-xfs@vger.kernel.org; Tue, 02 Jun 2020 07:42:51 +1000 From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Date: Tue, 2 Jun 2020 07:42:22 +1000 Message-Id: <20200601214251.4167140-2-david@fromorbit.com> X-Mailer: git-send-email 2.26.2.761.g0e0b3e54be In-Reply-To: <20200601214251.4167140-1-david@fromorbit.com> References: <20200601214251.4167140-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=nTHF0DUjJn0A:10 a=20KFwNOVAAAA:8 a=E4R3UMY25Q1b2mD5dw0A:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	xfs: rework inode flushing to make inode reclaim fully asynchronous \| expand [00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous [01/30] xfs: Don't allow logging of XFS_ISTALE inodes [02/30] xfs: remove logged flag from inode log item [03/30] xfs: add an inode item lock [04/30] xfs: mark inode buffers in cache [05/30] xfs: mark dquot buffers in cache [06/30] xfs: mark log recovery buffers for completion [07/30] xfs: call xfs_buf_iodone directly [08/30] xfs: clean up whacky buffer log item list reinit [09/30] xfs: make inode IO completion buffer centric [10/30] xfs: use direct calls for dquot IO completion [11/30] xfs: clean up the buffer iodone callback functions [12/30] xfs: get rid of log item callbacks [13/30] xfs: handle buffer log item IO errors directly [14/30] xfs: unwind log item error flagging [15/30] xfs: move xfs_clear_li_failed out of xfs_ail_delete_one() [16/30] xfs: pin inode backing buffer to the inode log item [17/30] xfs: make inode reclaim almost non-blocking [18/30] xfs: remove IO submission from xfs_reclaim_inode() [19/30] xfs: allow multiple reclaimers per AG [20/30] xfs: don't block inode reclaim on the ILOCK [21/30] xfs: remove SYNC_TRYLOCK from inode reclaim [22/30] xfs: remove SYNC_WAIT from xfs_reclaim_inodes() [23/30] xfs: clean up inode reclaim comments [24/30] xfs: rework stale inodes in xfs_ifree_cluster [25/30] xfs: attach inodes to the cluster buffer when dirtied [26/30] xfs: xfs_iflush() is no longer necessary [27/30] xfs: rename xfs_iflush_int() [28/30] xfs: rework xfs_iflush_cluster() dirty inode iteration [29/30] xfs: factor xfs_iflush_done [30/30] xfs: remove xfs_inobp_check()

[01/30] xfs: Don't allow logging of XFS_ISTALE inodes

Commit Message

Comments

Patch