[2/2] xfs: only run COW extent recovery when there are no live extents

From: Darrick J. Wong <djwong@kernel.org>

From: Darrick J. Wong <djwong@kernel.org>

As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose.  Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:

mount <filesystem>				# (0)
fsstress <run only readonly ops> &		# (1)
while true; do
	fsstress <run all ops>
	mount -o remount,ro			# (2)
	fsstress <run only readonly ops>
	mount -o remount,rw			# (3)
done

When (2) happens, notice that (1) is still running.  xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).

When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.

The results are catastrophic -- file (B) and the refcount btree are now
corrupt.  In the first patch, we fixed the race condition in (2) so that
(A) will always flush the COW fork.  In this second patch, we move the
_recover_cow call to the initial mount call in (0) for safety.

As mentioned previously, xfs_reflink_recover_cow walks the refcount
btree looking for COW staging extents, and frees them.  This was
intended to be run at mount time (when we know there are no live inodes)
to clean up any leftover staging events that may have been left behind
during an unclean shutdown.  As a time "optimization" for readonly
mounts, we deferred this to the ro->rw transition, not realizing that
any failure to clean all COW forks during a rw->ro transition would
result in catastrophic corruption.

Therefore, remove this optimization and only run the recovery routine
when we're guaranteed not to have any COW staging extents anywhere,
which means we always run this at mount time.  While we're at it, move
the callsite to xfs_log_mount_finish because any refcount btree
expansion (however unlikely given that we're removing records from the
right side of the index) must be fed by a per-AG reservation, which
doesn't exist in its current location.

Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_log.c     |   23 ++++++++++++++++++++++-
 fs/xfs/xfs_mount.c   |   10 ----------
 fs/xfs/xfs_reflink.c |    5 ++++-
 fs/xfs/xfs_super.c   |    9 ---------
 4 files changed, 26 insertions(+), 21 deletions(-)

Message ID	163900531629.374528.14641806907962114873.stgit@magnolia (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7892CC433F5 for <linux-xfs@archiver.kernel.org>; Wed, 8 Dec 2021 23:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241099AbhLHXSw (ORCPT <rfc822;linux-xfs@archiver.kernel.org>); Wed, 8 Dec 2021 18:18:52 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:41082 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233080AbhLHXSv (ORCPT <rfc822;linux-xfs@vger.kernel.org>); Wed, 8 Dec 2021 18:18:51 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E1ADAB82325 for <linux-xfs@vger.kernel.org>; Wed, 8 Dec 2021 23:15:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 923E9C00446; Wed, 8 Dec 2021 23:15:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1639005316; bh=8itlwooVJUB54RcM8AWOkfDWek3uhuUaJxv6B2Id+oU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=V57rPJU6ypeEhWk9Nz3fWAtu670HvRZ8xX9JK2/Q8h1vof9J6WZDUJKEI/MUiuXpU lc65qwwYR6LWPJyhaHvPhIjFM91DDdhjkVDQ8c+hs6AwoZKYwhVedLcnwJpKEIr9LS 9HVS2Xw7zTilA9ZlIqrCgfivlMafCVFUsoaARV7yQBHrJrffVaObGJ26ECx92CvZOC a0m67X078kDTX/18FLe/H3tKFeznaM5xBSx8jnaaCM/0DaDVjWNb6uFR6J7Brn3968 2XGCarqG+Mn7l+PRaSCATbMGYMIN9XjvOqEXRdUDzDbiWnlc2A6+0TH8nRABaS1aRI drJ2l+pefWbUg== Subject: [PATCH 2/2] xfs: only run COW extent recovery when there are no live extents From: "Darrick J. Wong" <djwong@kernel.org> To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, david@fromorbit.com, wen.gang.wang@oracle.com Date: Wed, 08 Dec 2021 15:15:16 -0800 Message-ID: <163900531629.374528.14641806907962114873.stgit@magnolia> In-Reply-To: <163900530491.374528.3847809977076817523.stgit@magnolia> References: <163900530491.374528.3847809977076817523.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: <linux-xfs.vger.kernel.org> X-Mailing-List: linux-xfs@vger.kernel.org
Series	xfs: fix data corruption when cycling ro/rw mounts \| expand [PATCHSET,V2,for-5.16,0/2] xfs: fix data corruption when cycling ro/rw mounts [1/2] xfs: remove all COW fork extents when remounting readonly [2/2] xfs: only run COW extent recovery when there are no live extents

[2/2] xfs: only run COW extent recovery when there are no live extents

Commit Message

Comments

Patch