diff mbox series

[03/17] logwrites: warn if we don't think read after discard returns zeroes

Message ID 173229420060.358248.11054238752146807489.stgit@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series [01/17] generic/757: fix various bugs in this test | expand

Commit Message

Darrick J. Wong Nov. 22, 2024, 4:51 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

The logwrites replay program expects that it can issue a DISCARD against
the block device passed to _log_writes_init and that will cause all
subsequent reads to return zeroes.  This is required for correct log
recovery on filesystems such as XFS that skip recovering buffers if
newer ones are found on disk.

Unfortunately, there's no way to discover if a device's discard
implementation actually guarantees zeroes.  There used to be a sysfs
knob keyed to an allowlist, but it is now hardwired to return 0.  So
either we need a magic device that does discard-and-zero, or we need to
do the zeroing ourselves.  The logwrites program does its own zeroing if
there is no discard support, and some tests do their own zeroing.

The only devices we know to work reliably are the software defined ones
that are provided by the kernel itself -- which means dm-thinp.  Warn if
we have a device that supports discard that isn't thinp and the test
fails.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 common/dmlogwrites |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Christoph Hellwig Nov. 25, 2024, 5:12 a.m. UTC | #1
This feels like it's heading in the wrong direction, as it just adds
more hacks.

IMHO the right thing is to:

 a) document current assumptions and tell users very clearly to use
    dm-thin to get the expected semantics
 b) if we have a use for it, provide an option to use write zeroes
    instead that can be used on any device

I'm happy to do that work, but until then I'd suggest to skip this and
the next two patches.
Darrick J. Wong Nov. 25, 2024, 5:19 p.m. UTC | #2
On Sun, Nov 24, 2024 at 09:12:26PM -0800, Christoph Hellwig wrote:
> This feels like it's heading in the wrong direction, as it just adds
> more hacks.
> 
> IMHO the right thing is to:
> 
>  a) document current assumptions and tell users very clearly to use
>     dm-thin to get the expected semantics
>  b) if we have a use for it, provide an option to use write zeroes
>     instead that can be used on any device
> 
> I'm happy to do that work, but until then I'd suggest to skip this and
> the next two patches.

Works for me. :)

Also feel free to grab/modify patch 4 if you like.

--D
diff mbox series

Patch

diff --git a/common/dmlogwrites b/common/dmlogwrites
index c1c85de9dd43ac..24a8a25ace277f 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -59,6 +59,35 @@  _require_log_writes_dax_mountopt()
 	fi
 }
 
+_log_writes_check_bdev()
+{
+	local sysfs="/sys/block/$(_short_dev $1)"
+
+	# Some filesystems (e.g. XFS) optimize log recovery by assuming that
+	# they can elide replay of metadata blocks if the block has a higher
+	# log serial number than the transaction being recovered.  This is a
+	# problem if the filesystem log contents can go back in time, which is
+	# what the logwrites replay program does.
+	#
+	# The logwrites replay program begins by erasing the block device's
+	# contents.  This can be done very quickly with DISCARD provided the
+	# device guarantees that all reads after a DISCARD return zeroes, or
+	# very slowly by writing zeroes to the device.  Fast is preferable, but
+	# there's no longer any way to detect that DISCARD actually unmaps
+	# zeroes, so warn the user about this requirement if the test happens
+	# to fail.
+
+	# No discard support means the logwrites will do its own zeroing
+	test "$(cat "$sysfs/queue/discard_max_bytes")" -eq 0 && return
+
+	# dm-thinp guarantees that reads after discards return zeroes
+	dmsetup status "$blkdev" 2>/dev/null | grep -q '^0.* thin ' && return
+
+	echo "HINT: $blkdev doesn't guarantee that reads after DISCARD will return zeroes" >> $seqres.hints
+	echo "      This is required for correct journal replay on some filesystems (e.g. xfs)" >> $seqres.hints
+	echo >> $seqres.hints
+}
+
 # Set up a dm-log-writes device
 #
 # blkdev: the specified target device
@@ -84,6 +113,8 @@  _log_writes_init()
 	LOGWRITES_NAME=logwrites-test
 	LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
 	LOGWRITES_TABLE="0 $BLK_DEV_SIZE log-writes $blkdev $LOGWRITES_DEV"
+
+	_log_writes_check_bdev "$blkdev"
 	_dmsetup_create $LOGWRITES_NAME --table "$LOGWRITES_TABLE" || \
 		_fail "failed to create log-writes device"
 }