[04/14] xfs: buffered write failure should not truncate the page cache

From: Dave Chinner <dchinner@redhat.com>

From: Dave Chinner <dchinner@redhat.com>

xfs_buffered_write_iomap_end() currently invalidates the page cache
over the unused range of the delalloc extent it allocated. While the
write allocated the delalloc extent, it does not own it exclusively
as the write does not hold any locks that prevent either writeback
or mmap page faults from changing the state of either the page cache
or the extent state backing this range.

Whilst xfs_bmap_punch_delalloc_range() already handles races in
extent conversion - it will only punch out delalloc extents and it
ignores any other type of extent - the page cache truncate does not
discriminate between data written by this write or some other task.
As a result, truncating the page cache can result in data corruption
if the write races with mmap modifications to the file over the same
range.

generic/346 exercises this workload, and if we randomly fail writes
(as will happen when iomap gets stale iomap detection later in the
patchset), it will randomly corrupt the file data because it removes
data written by mmap() in the same page as the write() that failed.

Hence we do not want to punch out the page cache over the range of
the extent we failed to write to - what we actually need to do is
detect the ranges that have dirty data in cache over them and *not
punch them out*.

TO do this, we have to walk the page cache over the range of the
delalloc extent we want to remove. This is made complex by the fact
we have to handle partially up-to-date folios correctly and this can
happen even when the FSB size == PAGE_SIZE because we now support
multi-page folios in the page cache.

Because we are only interested in discovering the edges of data
ranges in the page cache (i.e. hole-data boundaries) we can make use
of mapping_seek_hole_data() to find those transitions in the page
cache. As we hold the invalidate_lock, we know that the boundaries
are not going to change while we walk the range. This interface is
also byte-based and is sub-page block aware, so we can find the data
ranges in the cache based on byte offsets rather than page, folio or
fs block sized chunks. This greatly simplifies the logic of finding
dirty cached ranges in the page cache.

Once we've identified a range that contains cached data, we can then
iterate the range folio by folio. This allows us to determine if the
data is dirty and hence perform the correct delalloc extent punching
operations. The seek interface we use to iterate data ranges will
give us sub-folio start/end granularity, so we may end up looking up
the same folio multiple times as the seek interface iterates across
each discontiguous data region in the folio.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c |  151 +++++++++++++++++++++++++++++++++++++++++++++++++---
 mm/filemap.c       |    1 
 2 files changed, 142 insertions(+), 10 deletions(-)

Message ID	166801776726.3992140.478044755009071447.stgit@magnolia (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0092C4332F for <linux-fsdevel@archiver.kernel.org>; Wed, 9 Nov 2022 18:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230137AbiKISQO (ORCPT <rfc822;linux-fsdevel@archiver.kernel.org>); Wed, 9 Nov 2022 13:16:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229640AbiKISQJ (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Wed, 9 Nov 2022 13:16:09 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD632275E2; Wed, 9 Nov 2022 10:16:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 58E5B61C3C; Wed, 9 Nov 2022 18:16:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B232EC433D6; Wed, 9 Nov 2022 18:16:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1668017767; bh=AoVf6jOZ7obtljtTuw9O16bBYvbPSn5kp27aOGL2jes=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=LTEFchE9b05QPhsnaBcbriNEbjz6+7geK1QJCWH0iN4a2cEb3mPc0XQnVO0L2FmnT 7dfIGUwslW9aoNPllBu6SF825jf3aQgW09YtglPKq8wtoGJz7E3gJYkD59PTplcYh+ x3byJaEGc2AcQXjMkhfGiPKcMuteC3v6rFuQfl84HymAfcdqhwSsTYAX2wRxRW1DHq ZjDa8n7CRhTJolRnTFB9Jqv9QA93ovXON5+0JynELk5FQuaKIBY8YIjnXb/KtcUvQe FPt7JOTSvGaOOkJ1u5xcKtPimTPk+sob+WQkuIcrowncRtYQev8J8nRu/1a9mFJwET auOkMmq4YT6FA== Subject: [PATCH 04/14] xfs: buffered write failure should not truncate the page cache From: "Darrick J. Wong" <djwong@kernel.org> To: djwong@kernel.org Cc: Dave Chinner <dchinner@redhat.com>, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, david@fromorbit.com, hch@infradead.org Date: Wed, 09 Nov 2022 10:16:07 -0800 Message-ID: <166801776726.3992140.478044755009071447.stgit@magnolia> In-Reply-To: <166801774453.3992140.241667783932550826.stgit@magnolia> References: <166801774453.3992140.241667783932550826.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	xfs, iomap: fix data corruption due to stale cached iomaps \| expand [PATCHSET,RFCRAP,v2,00/14] xfs, iomap: fix data corruption due to stale cached iomaps [01/14] xfs: write page faults in iomap are not buffered writes [02/14] xfs: punching delalloc extents on write failure is racy [03/14] xfs: use byte ranges for write cleanup ranges [04/14] xfs: buffered write failure should not truncate the page cache [05/14] iomap: write iomap validity checks [06/14] xfs: use iomap_valid method to detect stale cached iomaps [07/14] xfs: drop write error injection is unfixable, remove it [08/14] iomap: pass iter to ->iomap_begin implementations [09/14] iomap: pass iter to ->iomap_end implementations [10/14] iomap: pass a private pointer to iomap_file_buffered_write [11/14] xfs: move the seq counters for buffered writes to a private struct [12/14] xfs: validate COW fork sequence counters during buffered writes [13/14] xfs: add debug knob to slow down writeback for fun [14/14] xfs: add debug knob to slow down write for fun [15/14] fstest: regression test for writeback corruption bug [16/14] fstest: regression test for writes racing with reclaim writeback

[04/14] xfs: buffered write failure should not truncate the page cache

Commit Message

Comments

Patch