[15/14] fstest: regression test for writeback corruption bug

Message ID	Y2vv+tdWVEStVpaO@magnolia (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> Date: Wed, 9 Nov 2022 10:22:50 -0800 From: "Darrick J. Wong" <djwong@kernel.org> To: Christoph Hellwig <hch@lst.de>, Dave Chinner <dchinner@redhat.com> Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, david@fromorbit.com, hch@infradead.org Subject: [PATCH 15/14] fstest: regression test for writeback corruption bug Message-ID: <Y2vv+tdWVEStVpaO@magnolia> References: <166801774453.3992140.241667783932550826.stgit@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <166801774453.3992140.241667783932550826.stgit@magnolia> Precedence: bulk
Series	xfs, iomap: fix data corruption due to stale cached iomaps \| expand [PATCHSET,RFCRAP,v2,00/14] xfs, iomap: fix data corruption due to stale cached iomaps [01/14] xfs: write page faults in iomap are not buffered writes [02/14] xfs: punching delalloc extents on write failure is racy [03/14] xfs: use byte ranges for write cleanup ranges [04/14] xfs: buffered write failure should not truncate the page cache [05/14] iomap: write iomap validity checks [06/14] xfs: use iomap_valid method to detect stale cached iomaps [07/14] xfs: drop write error injection is unfixable, remove it [08/14] iomap: pass iter to ->iomap_begin implementations [09/14] iomap: pass iter to ->iomap_end implementations [10/14] iomap: pass a private pointer to iomap_file_buffered_write [11/14] xfs: move the seq counters for buffered writes to a private struct [12/14] xfs: validate COW fork sequence counters during buffered writes [13/14] xfs: add debug knob to slow down writeback for fun [14/14] xfs: add debug knob to slow down write for fun [15/14] fstest: regression test for writeback corruption bug [16/14] fstest: regression test for writes racing with reclaim writeback

diff --git a/tests/xfs/924 b/tests/xfs/924 new file mode 100755 index 0000000000..486afefedd --- /dev/null +++ b/tests/xfs/924 @@ -0,0 +1,197 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test 924 +# +# This is a regression test for a data corruption bug that existed in XFS' copy +# on write code between 4.9 and 4.19. The root cause is a concurrency bug +# wherein we would drop ILOCK_SHARED after querying the CoW fork in xfs_map_cow +# and retake it before querying the data fork in xfs_map_blocks. If a second +# thread changes the CoW fork mappings between the two calls, it's possible for +# xfs_map_blocks to return a zero-block mapping, which results in writeback +# being elided for that block. Elided writeback of dirty data results in +# silent loss of writes. +# +# Worse yet, kernels from that era still used buffer heads, which means that an +# elided writeback leaves the page clean but the bufferheads dirty. Due to a +# naïve optimization in mark_buffer_dirty, the SetPageDirty call is elided if +# the bufferhead is dirty, which means that a subsequent rewrite of the data +# block will never result in the page being marked dirty, and all subsequent +# writes are lost. +# +# It turns out that Christoph Hellwig unwittingly fixed the race in commit +# 5c665e5b5af6 ("xfs: remove xfs_map_cow"), and no testcase was ever written. +# Four years later, we hit it on a production 4.14 kernel. This testcase +# relies on a debugging knob that introduces artificial delays into writeback. +# +# Before the race, the file blocks 0-1 are not shared and blocks 2-5 are +# shared. There are no extents in CoW fork. +# +# Two threads race like this: +# +# Thread 1 (writeback block 0) | Thread 2 (write to block 2) +# ---------------------------------|-------------------------------- +# | +# 1. Check if block 0 in CoW fork | +# from xfs_map_cow. | +# | +# 2. Block 0 not found in CoW | +# fork; the block is considered | +# not shared. | +# | +# 3. xfs_map_blocks looks up data | +# fork to get a map covering | +# block 0. | +# | +# 4. It gets a data fork mapping | +# for block 0 with length 2. | +# | +# | 1. A buffered write to block 2 sees +# | that it is a shared block and no +# | extent covers block 2 in CoW fork. +# | +# | It creates a new CoW fork mapping. +# | Due to the cowextsize, the new +# | extent starts at block 0 with +# | length 128. +# | +# | +# 5. It lookup CoW fork again to | +# trim the map (0, 2) to a | +# shared block boundary. | +# | +# 5a. It finds (0, 128) in CoW fork| +# 5b. It trims the data fork map | +# from (0, 1) to (0, 0) (!!!) | +# | +# 6. The xfs_imap_valid call after | +# the xfs_map_blocks call checks| +# if the mapping (0, 0) covers | +# block 0. The result is "NO". | +# | +# 7. Since block 0 has no physical | +# block mapped, it's not added | +# to the ioend. This is the | +# first problem. | +# | +# 8. xfs_add_to_ioend usually | +# clears the bufferhead dirty | +# flag Because this is skipped,| +# we leave the page clean with | +# the associated buffer head(s) | +# dirty (the second problem). | +# Now the dirty state is | +# inconsistent. +# +# On newer kernels, this is also a functionality test for the ifork sequence +# counter because the writeback completions will change the data fork and force +# revalidations of the wb mapping. +# +. ./common/preamble +_begin_fstest auto quick clone + +# Import common functions. +. ./common/reflink +. ./common/inject + +# real QA test starts here + +# Modify as appropriate. +_supported_fs xfs +_fixed_by_kernel_commit 5c665e5b5af6 "xfs: remove xfs_map_cow" +_require_error_injection +_require_scratch_reflink +_require_cp_reflink + +_scratch_mkfs >> $seqres.full +_scratch_mount >> $seqres.full + +knob="$(_find_xfs_mountdev_errortag_knob $SCRATCH_DEV "wb_delay_ms")" +test -w "$knob" || _notrun "Kernel does not have wb_delay_ms error injector" + +blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz + +# Make sure we have sufficient extent size to create speculative CoW +# preallocations. +$XFS_IO_PROG -c 'cowextsize 1m' $SCRATCH_MNT + +# Write out a file with the first two blocks unshared and the rest shared. +_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file >> $seqres.full +_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full +sync + +_cp_reflink $SCRATCH_MNT/file $SCRATCH_MNT/file.reflink + +_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file >> $seqres.full +_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full +sync + +# Avoid creation of large folios on newer kernels by cycling the mount and +# immediately writing to the page cache. +_scratch_cycle_mount + +# Write the same data to file.compare as we're about to do to file. Do this +# before slowing down writeback to avoid unnecessary delay. +_pwrite_byte 0x57 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full +_pwrite_byte 0x56 $((2 * blksz)) $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full +sync + +# Introduce a half-second wait to each writeback block mapping call. This +# gives us a chance to race speculative cow prealloc with writeback. +wb_delay=500 +echo $wb_delay > $knob +curval="$(cat $knob)" +test "$curval" -eq $wb_delay || echo "expected wb_delay_ms == $wb_delay" + +# Start thread 1 + writeback above +$XFS_IO_PROG -c "pwrite -S 0x57 0 $((2 * blksz))" \ + -c 'bmap -celpv' -c 'bmap -elpv' \ + -c 'fsync' $SCRATCH_MNT/file >> $seqres.full & +sleep 1 + +# Start a sentry to look for evidence of the XFS_ERRORTAG_REPORT logging. If +# we see that, we know we've forced writeback to revalidate a mapping. The +# test has been successful, so turn off the delay. +sentryfile=$TEST_DIR/$seq.sentry +wait_for_errortag() { + while [ -e "$sentryfile" ]; do + if _check_dmesg_for XFS_ERRTAG_WB_DELAY_MS; then + echo 0 > $knob + break; + fi + sleep 1 + done +} +touch $sentryfile +wait_for_errortag & + +# Start thread 2 to create the cowextsize reservation +$XFS_IO_PROG -c "pwrite -S 0x56 $((2 * blksz)) $((2 * blksz))" \ + -c 'bmap -celpv' -c 'bmap -elpv' \ + -c 'fsync' $SCRATCH_MNT/file >> $seqres.full +rm -f $sentryfile + +_check_dmesg_for XFS_ERRTAG_WB_DELAY_MS +saw_delay=$? + +# Flush everything to disk. If the bug manifests, then after the cycle, +# file should have stale 0x58 in block 0 because we silently dropped a write. +_scratch_cycle_mount + +if ! cmp -s $SCRATCH_MNT/file $SCRATCH_MNT/file.compare; then + echo file and file.compare do not match + $XFS_IO_PROG -c 'bmap -celpv' -c 'bmap -elpv' $SCRATCH_MNT/file >> $seqres.full + echo file.compare + od -tx1 -Ad -c $SCRATCH_MNT/file.compare + echo file + od -tx1 -Ad -c $SCRATCH_MNT/file +elif [ $saw_delay -ne 0 ]; then + # The files matched, but nothing got logged about the revalidation? + echo "Expected to hear about XFS_ERRTAG_WB_DELAY_MS?" +fi + +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/924.out b/tests/xfs/924.out new file mode 100644 index 0000000000..c6655da35a --- /dev/null +++ b/tests/xfs/924.out @@ -0,0 +1,2 @@ +QA output created by 924 +Silence is golden

[15/14] fstest: regression test for writeback corruption bug

Commit Message

Patch