[v3] fstests: btrfs: Check snapshot creation and deletion with dm-logwrites
diff mbox series

Message ID 20190826062045.18670-1-wqu@suse.com
State New
Headers show
Series
  • [v3] fstests: btrfs: Check snapshot creation and deletion with dm-logwrites
Related show

Commit Message

Qu WenRuo Aug. 26, 2019, 6:20 a.m. UTC
We have generic dm-logwrites with fsstress test case (generic/482), but
it doesn't cover fs specific operations like btrfs snapshot creation and
deletion.

Furthermore, that test is not heavy enough to bump btrfs tree height by
its short runtime.

And finally, btrfs check doesn't consider dirty log as an error, unlike
ext*/xfs, that's to say we don't need to mount the fs to replay the log,
but just run btrfs check on the fs is enough.

So introduce a similar test case but for btrfs only.

The test case will stress btrfs by:
- Use small nodesize to bump tree height
- Create a base tree which is already high enough
- Trim tree blocks to find possible trim bugs
- Call snapshot creation and deletion along with fsstress

To utilize replay-log --check and --fsck command, we fix one bug in
replay-log first:
- Return 1 when fsck failed
  Original when fsck failed, run_fsck() returns -1, but to make
  replay_log prog to return 1, we need to return a minus value, so
  fix it by setting @ret to -EUCLEAN when run_fsck() failed, so that
  we can detect the fsck failure by simply checking the return value
  of replay-log.

Also it includes certain workaround for btrfs:
- Use no-holes feature
  To avoid missing hole file extents.
  Although that behavior doesn't follow the on-disk format spec, it
  doesn't cause data loss. And will follow the new on-disk format spec
  of no-holes feature, so it's better to workaround it.

And an optimization for btrfs only:
- Use replay-log --fsck/--check command
  Since dm-log-writes records bios sequentially, there is no way to
  locate certain entry unless we iterate all entries.
  This is becoming a big performance penalty if we replay certain a
  range, check the fs, then re-execute replay-log to replay another
  range.

  We need to records the previous entry location, or we need to
  re-iterate all previous entries.

  Thankfully, replay-log has already address it by providing --fsck and
  --check command, thus we don't need to break replay-log command.

Please note, for fast storage (e.g. fast NVME or unsafe cache mode),
it's recommended to use log devices larger than 15G, or we can't record
the full log of just 30s fsstress run.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
For the log devices size problem, I have submitted dm-logwrites bio flag
filter support, to filter out data bios.

But that is not yet merged into kernel, thus we need a large log device
for short run.

For reference, if using unsafe cache mode for all test devices, on a
system with 32G dual-channel DDR4 3200 RAM, 5G log device will be
filled up in less than 15 seconds.
So to ensure dm-log-writes covers all the operations, one needs at least
15G log device, and even more if using RAM with more channels.

Changelog:
v2
- Better expression/words for comment
- Add requirement for no-holes features
- Use xattr to bump up tree height
  So no need for max_inline mount option
- Coding style fixes for function definition
- Add -f for rm to avoid user alias setting
- Add new workload (update time stamp and create new files) for snapshot
  workload
- Remove an unnecessary sync call
- Get rid of wrong 2>&1 redirection
- Add to group "snapshot" and "stress"

v3:
- Add '_require_attrs' and source common/attr
- Introduce '_require_fsck_not_report_dirty_logs_as_error'
- Add comment for the replay-log code fix
- Wait after killing all background fsstress
- Use $BLKDISCARD_PROG instead of plain 'blkdiscard'
- Add trap for snapshot and delete workload
---
 common/config               |   1 +
 common/dmlogwrites          |  44 ++++++++++
 src/log-writes/replay-log.c |   2 +
 tests/btrfs/192             | 156 ++++++++++++++++++++++++++++++++++++
 tests/btrfs/192.out         |   2 +
 tests/btrfs/group           |   1 +
 6 files changed, 206 insertions(+)
 create mode 100755 tests/btrfs/192
 create mode 100644 tests/btrfs/192.out

Comments

Eryu Guan Aug. 31, 2019, 6:10 p.m. UTC | #1
On Mon, Aug 26, 2019 at 02:20:45PM +0800, Qu Wenruo wrote:
> We have generic dm-logwrites with fsstress test case (generic/482), but
> it doesn't cover fs specific operations like btrfs snapshot creation and
> deletion.
> 
> Furthermore, that test is not heavy enough to bump btrfs tree height by
> its short runtime.
> 
> And finally, btrfs check doesn't consider dirty log as an error, unlike
> ext*/xfs, that's to say we don't need to mount the fs to replay the log,
> but just run btrfs check on the fs is enough.
> 
> So introduce a similar test case but for btrfs only.
> 
> The test case will stress btrfs by:
> - Use small nodesize to bump tree height
> - Create a base tree which is already high enough
> - Trim tree blocks to find possible trim bugs
> - Call snapshot creation and deletion along with fsstress
> 
> To utilize replay-log --check and --fsck command, we fix one bug in
> replay-log first:
> - Return 1 when fsck failed
>   Original when fsck failed, run_fsck() returns -1, but to make
>   replay_log prog to return 1, we need to return a minus value, so
>   fix it by setting @ret to -EUCLEAN when run_fsck() failed, so that
>   we can detect the fsck failure by simply checking the return value
>   of replay-log.

Sorry, I didn't quite get this. run_fsck() already returns a negative
value (-1) on fsck failure (thus @ret is -1 in this case), and
replay_log exits with 1 if @ret < 0. All seem fine to me, setting
-EUCLEAN doesn't seem necessary to me. Did I miss anything?

Anyway, I think this bugfix could be in a separate patch.

> 
> Also it includes certain workaround for btrfs:
> - Use no-holes feature
>   To avoid missing hole file extents.
>   Although that behavior doesn't follow the on-disk format spec, it
>   doesn't cause data loss. And will follow the new on-disk format spec
>   of no-holes feature, so it's better to workaround it.
> 
> And an optimization for btrfs only:
> - Use replay-log --fsck/--check command
>   Since dm-log-writes records bios sequentially, there is no way to
>   locate certain entry unless we iterate all entries.
>   This is becoming a big performance penalty if we replay certain a
>   range, check the fs, then re-execute replay-log to replay another
>   range.
> 
>   We need to records the previous entry location, or we need to
>   re-iterate all previous entries.
> 
>   Thankfully, replay-log has already address it by providing --fsck and
>   --check command, thus we don't need to break replay-log command.
> 
> Please note, for fast storage (e.g. fast NVME or unsafe cache mode),
> it's recommended to use log devices larger than 15G, or we can't record
> the full log of just 30s fsstress run.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> For the log devices size problem, I have submitted dm-logwrites bio flag
> filter support, to filter out data bios.
> 
> But that is not yet merged into kernel, thus we need a large log device
> for short run.
> 
> For reference, if using unsafe cache mode for all test devices, on a
> system with 32G dual-channel DDR4 3200 RAM, 5G log device will be
> filled up in less than 15 seconds.
> So to ensure dm-log-writes covers all the operations, one needs at least
> 15G log device, and even more if using RAM with more channels.
> 
> Changelog:
> v2
> - Better expression/words for comment
> - Add requirement for no-holes features
> - Use xattr to bump up tree height
>   So no need for max_inline mount option
> - Coding style fixes for function definition
> - Add -f for rm to avoid user alias setting
> - Add new workload (update time stamp and create new files) for snapshot
>   workload
> - Remove an unnecessary sync call
> - Get rid of wrong 2>&1 redirection
> - Add to group "snapshot" and "stress"
> 
> v3:
> - Add '_require_attrs' and source common/attr
> - Introduce '_require_fsck_not_report_dirty_logs_as_error'
> - Add comment for the replay-log code fix
> - Wait after killing all background fsstress
> - Use $BLKDISCARD_PROG instead of plain 'blkdiscard'
> - Add trap for snapshot and delete workload
> ---
>  common/config               |   1 +
>  common/dmlogwrites          |  44 ++++++++++
>  src/log-writes/replay-log.c |   2 +
>  tests/btrfs/192             | 156 ++++++++++++++++++++++++++++++++++++
>  tests/btrfs/192.out         |   2 +
>  tests/btrfs/group           |   1 +
>  6 files changed, 206 insertions(+)
>  create mode 100755 tests/btrfs/192
>  create mode 100644 tests/btrfs/192.out
> 
> diff --git a/common/config b/common/config
> index bd64be62..4c86a492 100644
> --- a/common/config
> +++ b/common/config
> @@ -183,6 +183,7 @@ export LOGGER_PROG="$(type -P logger)"
>  export DBENCH_PROG="$(type -P dbench)"
>  export DMSETUP_PROG="$(type -P dmsetup)"
>  export WIPEFS_PROG="$(type -P wipefs)"
> +export BLKDISCARD_PROG="$(type -P blkdiscard)"
>  export DUMP_PROG="$(type -P dump)"
>  export RESTORE_PROG="$(type -P restore)"
>  export LVM_PROG="$(type -P lvm)"
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index ae2cbc6a..474ec570 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -175,3 +175,47 @@ _log_writes_replay_log_range()
>  		>> $seqres.full 2>&1
>  	[ $? -ne 0 ] && _fail "replay failed"
>  }
> +
> +# Require fsck not to report dirty logs as error
> +#
> +# This is a special requirement to use _log_writes_fast_replay_check
> +# The reasons are:
> +# - To avoid unnecessary seek when there are a lot of entries
> +#   replay-log doesn't have a tree-like structure to do fast index,
> +#   thus it iterate all entries one by one, this can be very slow
> +# - No way to revert the log replay for next check
> +#   A lot of fsck will replay the log, which will pollute the replay device
> +#   for next entry
> +_require_fsck_not_report_dirty_logs_as_error()
> +{
> +	if [ $FSTYP != "btrfs" ]; then
> +		_notrun "fsck of $FSTYP reports dirty jounal/log as error, skipping test"
> +	fi
> +}

The rule name seems ugly :)

> +
> +# Replay and check each fua/flush (specified by $2) point.
> +#
> +# Since dm-log-writes records bio sequentially, even just replaying a range
> +# still needs to iterate all records before the end point.
> +# When number of records grows, it will be unacceptably slow, thus we need
> +# to use relay-log itself to trigger fsck, avoid unnecessary seek.
> +_log_writes_fast_replay_check()
> +{
> +	local check_point=$1
> +	local blkdev=$2
> +	local fsck_command
> +
> +	_require_fsck_not_report_dirty_logs_as_error
> +
> +	[ -z "$check_point" -o -z "$blkdev" ] && _fail \
> +	"check_point and blkdev must be specified for _log_writes_fast_replay_check"
> +	case $FSTYP in
> +	btrfs)
> +		fsck_command="$BTRFS_UTIL_PROG check $blkdev"
> +		;;
> +	esac
> +	$here/src/log-writes/replay-log --log $LOGWRITES_DEV \
> +		--replay $blkdev --check $check_point --fsck "$fsck_command" \
> +		2>&1 | tail -n 128 >> $seqres.full
> +	[ $? -ne 0 ] && _fail "fsck failed during replay"
> +}

And I think we could make _log_writes_fast_replay_check(), which seems
only useful to btrfs, a local function in the test, so we avoid all
these $FSTYP == btrfs checks.

Thanks,
Eryu

> diff --git a/src/log-writes/replay-log.c b/src/log-writes/replay-log.c
> index 829b18e2..1e1cd524 100644
> --- a/src/log-writes/replay-log.c
> +++ b/src/log-writes/replay-log.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <stdio.h>
> +#include <errno.h>
>  #include <unistd.h>
>  #include <getopt.h>
>  #include <stdlib.h>
> @@ -375,6 +376,7 @@ int main(int argc, char **argv)
>  				fprintf(stderr, "Fsck errored out on entry "
>  					"%llu\n",
>  					(unsigned long long)log->cur_entry - 1);
> +				ret = -EUCLEAN;
>  				break;
>  			}
>  		}
> diff --git a/tests/btrfs/192 b/tests/btrfs/192
> new file mode 100755
> index 00000000..db9bc40e
> --- /dev/null
> +++ b/tests/btrfs/192
> @@ -0,0 +1,156 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 192
> +#
> +# Test btrfs consistency after each FUA for a workload with snapshot creation
> +# and removal
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	kill -q $pid1 &> /dev/null
> +	kill -q $pid2 &> /dev/null
> +	"$KILLALL_PROG" -q $FSSTRESS_PROG &> /dev/null
> +	wait
> +	_log_writes_cleanup &> /dev/null
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/attr
> +. ./common/dmlogwrites
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +
> +_require_command "$KILLALL_PROG" killall
> +_require_command "$BLKDISCARD_PROG" blkdiscard
> +_require_btrfs_fs_feature "no_holes"
> +_require_btrfs_mkfs_feature "no-holes"
> +_require_fsck_not_report_dirty_logs_as_error
> +_require_log_writes
> +_require_scratch
> +_require_attrs
> +
> +# To generate 3 level fs tree for 64K nodesize, we need 32768 xattr items.
> +# That will cause too many transactions, bumping replay check time
> +# from ~60s to ~300s. (VM alreayd using unsafe cache for the test devices)
> +# So here we skip non-4K page size system, in favor of a shorter default
> +# test time
> +if [ $(get_page_size) -ne 4096 ]; then
> +	_notrun "This test doesn't support non-4K page size yet"
> +fi
> +
> +runtime=30
> +nr_cpus=$("$here/src/feature" -o)
> +# cap nr_cpus to 8 to avoid spending too much time on hosts with many cpus
> +if [ $nr_cpus -gt 8 ]; then
> +	nr_cpus=8
> +fi
> +fsstress_args=$(_scale_fsstress_args -w -d $SCRATCH_MNT -n 99999 -p $nr_cpus \
> +		$FSSTRESS_AVOID)
> +_log_writes_init $SCRATCH_DEV
> +
> +# Discard the whole devices so when some tree pointer is wrong, it won't point
> +# to some older valid tree blocks, so we can detect it.
> +$BLKDISCARD_PROG $LOGWRITES_DMDEV > /dev/null 2>&1
> +
> +# Workaround minor file extent discountinous.
> +# And use 4K nodesize to bump tree height.
> +_log_writes_mkfs -O no-holes -n 4k >> $seqres.full
> +_log_writes_mount
> +
> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null
> +mkdir -p $SCRATCH_MNT/snapshots
> +mkdir -p $SCRATCH_MNT/src/padding
> +
> +random_file()
> +{
> +	local basedir=$1
> +	echo "$basedir/$(ls $basedir | sort -R | tail -1)"
> +}
> +
> +snapshot_workload()
> +{
> +	trap "wait; exit" SIGTERM
> +
> +	local i=0
> +	while true; do
> +		$BTRFS_UTIL_PROG subvolume snapshot \
> +			$SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \
> +			> /dev/null
> +		# Do something small to make snapshots different
> +		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
> +		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
> +		touch "$(random_file $SCRATCH_MNT/src/padding)"
> +		touch "$SCRATCH_MNT/src/padding/random_$RANDOM"
> +
> +		i=$(($i + 1))
> +		sleep 1
> +	done
> +}
> +
> +delete_workload()
> +{
> +	trap "wait; exit" SIGTERM
> +
> +	while true; do
> +		sleep 2
> +		$BTRFS_UTIL_PROG subvolume delete \
> +			"$(random_file $SCRATCH_MNT/snapshots)" \
> +			> /dev/null 2>&1
> +	done
> +}
> +
> +xattr_value=$(printf '%0.sX' $(seq 1 3800))
> +
> +# Bumping tree height to level 2.
> +for ((i = 0; i < 64; i++)); do
> +	touch "$SCRATCH_MNT/src/padding/$i"
> +	$SETFATTR_PROG -n 'user.x1' -v $xattr_value \
> +		"$SCRATCH_MNT/src/padding/$i"
> +done
> +
> +_log_writes_mark prepare
> +
> +snapshot_workload &
> +pid1=$!
> +delete_workload &
> +pid2=$!
> +
> +"$FSSTRESS_PROG" $fsstress_args > /dev/null &
> +sleep $runtime
> +
> +"$KILLALL_PROG" -q "$FSSTRESS_PROG" &> /dev/null
> +kill $pid1 &> /dev/null
> +kill $pid2 &> /dev/null
> +wait
> +_log_writes_unmount
> +_log_writes_remove
> +
> +_log_writes_fast_replay_check fua "$SCRATCH_DEV"
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/192.out b/tests/btrfs/192.out
> new file mode 100644
> index 00000000..6779aa77
> --- /dev/null
> +++ b/tests/btrfs/192.out
> @@ -0,0 +1,2 @@
> +QA output created by 192
> +Silence is golden
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index 2474d43e..cab10d19 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -194,3 +194,4 @@
>  189 auto quick send clone
>  190 auto quick replay balance qgroup
>  191 auto quick send dedupe
> +192 auto replay snapshot stress
> -- 
> 2.22.0
>
Qu Wenruo Aug. 31, 2019, 11:43 p.m. UTC | #2
On 2019/9/1 上午2:10, Eryu Guan wrote:
> On Mon, Aug 26, 2019 at 02:20:45PM +0800, Qu Wenruo wrote:
>> We have generic dm-logwrites with fsstress test case (generic/482), but
>> it doesn't cover fs specific operations like btrfs snapshot creation and
>> deletion.
>>
>> Furthermore, that test is not heavy enough to bump btrfs tree height by
>> its short runtime.
>>
>> And finally, btrfs check doesn't consider dirty log as an error, unlike
>> ext*/xfs, that's to say we don't need to mount the fs to replay the log,
>> but just run btrfs check on the fs is enough.
>>
>> So introduce a similar test case but for btrfs only.
>>
>> The test case will stress btrfs by:
>> - Use small nodesize to bump tree height
>> - Create a base tree which is already high enough
>> - Trim tree blocks to find possible trim bugs
>> - Call snapshot creation and deletion along with fsstress
>>
>> To utilize replay-log --check and --fsck command, we fix one bug in
>> replay-log first:
>> - Return 1 when fsck failed
>>   Original when fsck failed, run_fsck() returns -1, but to make
>>   replay_log prog to return 1, we need to return a minus value, so
>>   fix it by setting @ret to -EUCLEAN when run_fsck() failed, so that
>>   we can detect the fsck failure by simply checking the return value
>>   of replay-log.
> 
> Sorry, I didn't quite get this. run_fsck() already returns a negative
> value (-1) on fsck failure (thus @ret is -1 in this case), and
> replay_log exits with 1 if @ret < 0. All seem fine to me, setting
> -EUCLEAN doesn't seem necessary to me. Did I miss anything?

My bad, it looks like so.

> 
> Anyway, I think this bugfix could be in a separate patch.
> 
>>
>> Also it includes certain workaround for btrfs:
>> - Use no-holes feature
>>   To avoid missing hole file extents.
>>   Although that behavior doesn't follow the on-disk format spec, it
>>   doesn't cause data loss. And will follow the new on-disk format spec
>>   of no-holes feature, so it's better to workaround it.
>>
>> And an optimization for btrfs only:
>> - Use replay-log --fsck/--check command
>>   Since dm-log-writes records bios sequentially, there is no way to
>>   locate certain entry unless we iterate all entries.
>>   This is becoming a big performance penalty if we replay certain a
>>   range, check the fs, then re-execute replay-log to replay another
>>   range.
>>
>>   We need to records the previous entry location, or we need to
>>   re-iterate all previous entries.
>>
>>   Thankfully, replay-log has already address it by providing --fsck and
>>   --check command, thus we don't need to break replay-log command.
>>
>> Please note, for fast storage (e.g. fast NVME or unsafe cache mode),
>> it's recommended to use log devices larger than 15G, or we can't record
>> the full log of just 30s fsstress run.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> For the log devices size problem, I have submitted dm-logwrites bio flag
>> filter support, to filter out data bios.
>>
>> But that is not yet merged into kernel, thus we need a large log device
>> for short run.
>>
>> For reference, if using unsafe cache mode for all test devices, on a
>> system with 32G dual-channel DDR4 3200 RAM, 5G log device will be
>> filled up in less than 15 seconds.
>> So to ensure dm-log-writes covers all the operations, one needs at least
>> 15G log device, and even more if using RAM with more channels.
>>
>> Changelog:
>> v2
>> - Better expression/words for comment
>> - Add requirement for no-holes features
>> - Use xattr to bump up tree height
>>   So no need for max_inline mount option
>> - Coding style fixes for function definition
>> - Add -f for rm to avoid user alias setting
>> - Add new workload (update time stamp and create new files) for snapshot
>>   workload
>> - Remove an unnecessary sync call
>> - Get rid of wrong 2>&1 redirection
>> - Add to group "snapshot" and "stress"
>>
>> v3:
>> - Add '_require_attrs' and source common/attr
>> - Introduce '_require_fsck_not_report_dirty_logs_as_error'
>> - Add comment for the replay-log code fix
>> - Wait after killing all background fsstress
>> - Use $BLKDISCARD_PROG instead of plain 'blkdiscard'
>> - Add trap for snapshot and delete workload
>> ---
>>  common/config               |   1 +
>>  common/dmlogwrites          |  44 ++++++++++
>>  src/log-writes/replay-log.c |   2 +
>>  tests/btrfs/192             | 156 ++++++++++++++++++++++++++++++++++++
>>  tests/btrfs/192.out         |   2 +
>>  tests/btrfs/group           |   1 +
>>  6 files changed, 206 insertions(+)
>>  create mode 100755 tests/btrfs/192
>>  create mode 100644 tests/btrfs/192.out
>>
>> diff --git a/common/config b/common/config
>> index bd64be62..4c86a492 100644
>> --- a/common/config
>> +++ b/common/config
>> @@ -183,6 +183,7 @@ export LOGGER_PROG="$(type -P logger)"
>>  export DBENCH_PROG="$(type -P dbench)"
>>  export DMSETUP_PROG="$(type -P dmsetup)"
>>  export WIPEFS_PROG="$(type -P wipefs)"
>> +export BLKDISCARD_PROG="$(type -P blkdiscard)"
>>  export DUMP_PROG="$(type -P dump)"
>>  export RESTORE_PROG="$(type -P restore)"
>>  export LVM_PROG="$(type -P lvm)"
>> diff --git a/common/dmlogwrites b/common/dmlogwrites
>> index ae2cbc6a..474ec570 100644
>> --- a/common/dmlogwrites
>> +++ b/common/dmlogwrites
>> @@ -175,3 +175,47 @@ _log_writes_replay_log_range()
>>  		>> $seqres.full 2>&1
>>  	[ $? -ne 0 ] && _fail "replay failed"
>>  }
>> +
>> +# Require fsck not to report dirty logs as error
>> +#
>> +# This is a special requirement to use _log_writes_fast_replay_check
>> +# The reasons are:
>> +# - To avoid unnecessary seek when there are a lot of entries
>> +#   replay-log doesn't have a tree-like structure to do fast index,
>> +#   thus it iterate all entries one by one, this can be very slow
>> +# - No way to revert the log replay for next check
>> +#   A lot of fsck will replay the log, which will pollute the replay device
>> +#   for next entry
>> +_require_fsck_not_report_dirty_logs_as_error()
>> +{
>> +	if [ $FSTYP != "btrfs" ]; then
>> +		_notrun "fsck of $FSTYP reports dirty jounal/log as error, skipping test"
>> +	fi
>> +}
> 
> The rule name seems ugly :)
> 
>> +
>> +# Replay and check each fua/flush (specified by $2) point.
>> +#
>> +# Since dm-log-writes records bio sequentially, even just replaying a range
>> +# still needs to iterate all records before the end point.
>> +# When number of records grows, it will be unacceptably slow, thus we need
>> +# to use relay-log itself to trigger fsck, avoid unnecessary seek.
>> +_log_writes_fast_replay_check()
>> +{
>> +	local check_point=$1
>> +	local blkdev=$2
>> +	local fsck_command
>> +
>> +	_require_fsck_not_report_dirty_logs_as_error
>> +
>> +	[ -z "$check_point" -o -z "$blkdev" ] && _fail \
>> +	"check_point and blkdev must be specified for _log_writes_fast_replay_check"
>> +	case $FSTYP in
>> +	btrfs)
>> +		fsck_command="$BTRFS_UTIL_PROG check $blkdev"
>> +		;;
>> +	esac
>> +	$here/src/log-writes/replay-log --log $LOGWRITES_DEV \
>> +		--replay $blkdev --check $check_point --fsck "$fsck_command" \
>> +		2>&1 | tail -n 128 >> $seqres.full
>> +	[ $? -ne 0 ] && _fail "fsck failed during replay"
>> +}
> 
> And I think we could make _log_writes_fast_replay_check(), which seems
> only useful to btrfs, a local function in the test, so we avoid all
> these $FSTYP == btrfs checks.

Yep, that's much better.

Thanks,
Qu
> 
> Thanks,
> Eryu
> 
>> diff --git a/src/log-writes/replay-log.c b/src/log-writes/replay-log.c
>> index 829b18e2..1e1cd524 100644
>> --- a/src/log-writes/replay-log.c
>> +++ b/src/log-writes/replay-log.c
>> @@ -1,5 +1,6 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <stdio.h>
>> +#include <errno.h>
>>  #include <unistd.h>
>>  #include <getopt.h>
>>  #include <stdlib.h>
>> @@ -375,6 +376,7 @@ int main(int argc, char **argv)
>>  				fprintf(stderr, "Fsck errored out on entry "
>>  					"%llu\n",
>>  					(unsigned long long)log->cur_entry - 1);
>> +				ret = -EUCLEAN;
>>  				break;
>>  			}
>>  		}
>> diff --git a/tests/btrfs/192 b/tests/btrfs/192
>> new file mode 100755
>> index 00000000..db9bc40e
>> --- /dev/null
>> +++ b/tests/btrfs/192
>> @@ -0,0 +1,156 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 192
>> +#
>> +# Test btrfs consistency after each FUA for a workload with snapshot creation
>> +# and removal
>> +#
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	kill -q $pid1 &> /dev/null
>> +	kill -q $pid2 &> /dev/null
>> +	"$KILLALL_PROG" -q $FSSTRESS_PROG &> /dev/null
>> +	wait
>> +	_log_writes_cleanup &> /dev/null
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/attr
>> +. ./common/dmlogwrites
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +
>> +_require_command "$KILLALL_PROG" killall
>> +_require_command "$BLKDISCARD_PROG" blkdiscard
>> +_require_btrfs_fs_feature "no_holes"
>> +_require_btrfs_mkfs_feature "no-holes"
>> +_require_fsck_not_report_dirty_logs_as_error
>> +_require_log_writes
>> +_require_scratch
>> +_require_attrs
>> +
>> +# To generate 3 level fs tree for 64K nodesize, we need 32768 xattr items.
>> +# That will cause too many transactions, bumping replay check time
>> +# from ~60s to ~300s. (VM alreayd using unsafe cache for the test devices)
>> +# So here we skip non-4K page size system, in favor of a shorter default
>> +# test time
>> +if [ $(get_page_size) -ne 4096 ]; then
>> +	_notrun "This test doesn't support non-4K page size yet"
>> +fi
>> +
>> +runtime=30
>> +nr_cpus=$("$here/src/feature" -o)
>> +# cap nr_cpus to 8 to avoid spending too much time on hosts with many cpus
>> +if [ $nr_cpus -gt 8 ]; then
>> +	nr_cpus=8
>> +fi
>> +fsstress_args=$(_scale_fsstress_args -w -d $SCRATCH_MNT -n 99999 -p $nr_cpus \
>> +		$FSSTRESS_AVOID)
>> +_log_writes_init $SCRATCH_DEV
>> +
>> +# Discard the whole devices so when some tree pointer is wrong, it won't point
>> +# to some older valid tree blocks, so we can detect it.
>> +$BLKDISCARD_PROG $LOGWRITES_DMDEV > /dev/null 2>&1
>> +
>> +# Workaround minor file extent discountinous.
>> +# And use 4K nodesize to bump tree height.
>> +_log_writes_mkfs -O no-holes -n 4k >> $seqres.full
>> +_log_writes_mount
>> +
>> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null
>> +mkdir -p $SCRATCH_MNT/snapshots
>> +mkdir -p $SCRATCH_MNT/src/padding
>> +
>> +random_file()
>> +{
>> +	local basedir=$1
>> +	echo "$basedir/$(ls $basedir | sort -R | tail -1)"
>> +}
>> +
>> +snapshot_workload()
>> +{
>> +	trap "wait; exit" SIGTERM
>> +
>> +	local i=0
>> +	while true; do
>> +		$BTRFS_UTIL_PROG subvolume snapshot \
>> +			$SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \
>> +			> /dev/null
>> +		# Do something small to make snapshots different
>> +		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
>> +		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
>> +		touch "$(random_file $SCRATCH_MNT/src/padding)"
>> +		touch "$SCRATCH_MNT/src/padding/random_$RANDOM"
>> +
>> +		i=$(($i + 1))
>> +		sleep 1
>> +	done
>> +}
>> +
>> +delete_workload()
>> +{
>> +	trap "wait; exit" SIGTERM
>> +
>> +	while true; do
>> +		sleep 2
>> +		$BTRFS_UTIL_PROG subvolume delete \
>> +			"$(random_file $SCRATCH_MNT/snapshots)" \
>> +			> /dev/null 2>&1
>> +	done
>> +}
>> +
>> +xattr_value=$(printf '%0.sX' $(seq 1 3800))
>> +
>> +# Bumping tree height to level 2.
>> +for ((i = 0; i < 64; i++)); do
>> +	touch "$SCRATCH_MNT/src/padding/$i"
>> +	$SETFATTR_PROG -n 'user.x1' -v $xattr_value \
>> +		"$SCRATCH_MNT/src/padding/$i"
>> +done
>> +
>> +_log_writes_mark prepare
>> +
>> +snapshot_workload &
>> +pid1=$!
>> +delete_workload &
>> +pid2=$!
>> +
>> +"$FSSTRESS_PROG" $fsstress_args > /dev/null &
>> +sleep $runtime
>> +
>> +"$KILLALL_PROG" -q "$FSSTRESS_PROG" &> /dev/null
>> +kill $pid1 &> /dev/null
>> +kill $pid2 &> /dev/null
>> +wait
>> +_log_writes_unmount
>> +_log_writes_remove
>> +
>> +_log_writes_fast_replay_check fua "$SCRATCH_DEV"
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/192.out b/tests/btrfs/192.out
>> new file mode 100644
>> index 00000000..6779aa77
>> --- /dev/null
>> +++ b/tests/btrfs/192.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 192
>> +Silence is golden
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index 2474d43e..cab10d19 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -194,3 +194,4 @@
>>  189 auto quick send clone
>>  190 auto quick replay balance qgroup
>>  191 auto quick send dedupe
>> +192 auto replay snapshot stress
>> -- 
>> 2.22.0
>>
Filipe Manana Sept. 5, 2019, 3:48 p.m. UTC | #3
On Mon, Aug 26, 2019 at 7:21 AM Qu Wenruo <wqu@suse.com> wrote:
>
> We have generic dm-logwrites with fsstress test case (generic/482), but
> it doesn't cover fs specific operations like btrfs snapshot creation and
> deletion.
>
> Furthermore, that test is not heavy enough to bump btrfs tree height by
> its short runtime.
>
> And finally, btrfs check doesn't consider dirty log as an error, unlike
> ext*/xfs, that's to say we don't need to mount the fs to replay the log,
> but just run btrfs check on the fs is enough.
>
> So introduce a similar test case but for btrfs only.
>
> The test case will stress btrfs by:
> - Use small nodesize to bump tree height
> - Create a base tree which is already high enough
> - Trim tree blocks to find possible trim bugs
> - Call snapshot creation and deletion along with fsstress
>
> To utilize replay-log --check and --fsck command, we fix one bug in
> replay-log first:
> - Return 1 when fsck failed
>   Original when fsck failed, run_fsck() returns -1, but to make
>   replay_log prog to return 1, we need to return a minus value, so
>   fix it by setting @ret to -EUCLEAN when run_fsck() failed, so that
>   we can detect the fsck failure by simply checking the return value
>   of replay-log.
>
> Also it includes certain workaround for btrfs:
> - Use no-holes feature
>   To avoid missing hole file extents.
>   Although that behavior doesn't follow the on-disk format spec, it
>   doesn't cause data loss. And will follow the new on-disk format spec
>   of no-holes feature, so it's better to workaround it.
>
> And an optimization for btrfs only:
> - Use replay-log --fsck/--check command
>   Since dm-log-writes records bios sequentially, there is no way to
>   locate certain entry unless we iterate all entries.
>   This is becoming a big performance penalty if we replay certain a
>   range, check the fs, then re-execute replay-log to replay another
>   range.
>
>   We need to records the previous entry location, or we need to
>   re-iterate all previous entries.
>
>   Thankfully, replay-log has already address it by providing --fsck and
>   --check command, thus we don't need to break replay-log command.
>
> Please note, for fast storage (e.g. fast NVME or unsafe cache mode),
> it's recommended to use log devices larger than 15G, or we can't record
> the full log of just 30s fsstress run.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> For the log devices size problem, I have submitted dm-logwrites bio flag
> filter support, to filter out data bios.
>
> But that is not yet merged into kernel, thus we need a large log device
> for short run.
>
> For reference, if using unsafe cache mode for all test devices, on a
> system with 32G dual-channel DDR4 3200 RAM, 5G log device will be
> filled up in less than 15 seconds.
> So to ensure dm-log-writes covers all the operations, one needs at least
> 15G log device, and even more if using RAM with more channels.
>
> Changelog:
> v2
> - Better expression/words for comment
> - Add requirement for no-holes features
> - Use xattr to bump up tree height
>   So no need for max_inline mount option
> - Coding style fixes for function definition
> - Add -f for rm to avoid user alias setting
> - Add new workload (update time stamp and create new files) for snapshot
>   workload
> - Remove an unnecessary sync call
> - Get rid of wrong 2>&1 redirection
> - Add to group "snapshot" and "stress"
>
> v3:
> - Add '_require_attrs' and source common/attr
> - Introduce '_require_fsck_not_report_dirty_logs_as_error'
> - Add comment for the replay-log code fix
> - Wait after killing all background fsstress
> - Use $BLKDISCARD_PROG instead of plain 'blkdiscard'
> - Add trap for snapshot and delete workload
> ---
>  common/config               |   1 +
>  common/dmlogwrites          |  44 ++++++++++
>  src/log-writes/replay-log.c |   2 +
>  tests/btrfs/192             | 156 ++++++++++++++++++++++++++++++++++++
>  tests/btrfs/192.out         |   2 +
>  tests/btrfs/group           |   1 +
>  6 files changed, 206 insertions(+)
>  create mode 100755 tests/btrfs/192
>  create mode 100644 tests/btrfs/192.out
>
> diff --git a/common/config b/common/config
> index bd64be62..4c86a492 100644
> --- a/common/config
> +++ b/common/config
> @@ -183,6 +183,7 @@ export LOGGER_PROG="$(type -P logger)"
>  export DBENCH_PROG="$(type -P dbench)"
>  export DMSETUP_PROG="$(type -P dmsetup)"
>  export WIPEFS_PROG="$(type -P wipefs)"
> +export BLKDISCARD_PROG="$(type -P blkdiscard)"
>  export DUMP_PROG="$(type -P dump)"
>  export RESTORE_PROG="$(type -P restore)"
>  export LVM_PROG="$(type -P lvm)"
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index ae2cbc6a..474ec570 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -175,3 +175,47 @@ _log_writes_replay_log_range()
>                 >> $seqres.full 2>&1
>         [ $? -ne 0 ] && _fail "replay failed"
>  }
> +
> +# Require fsck not to report dirty logs as error
> +#
> +# This is a special requirement to use _log_writes_fast_replay_check
> +# The reasons are:
> +# - To avoid unnecessary seek when there are a lot of entries
> +#   replay-log doesn't have a tree-like structure to do fast index,
> +#   thus it iterate all entries one by one, this can be very slow
> +# - No way to revert the log replay for next check
> +#   A lot of fsck will replay the log, which will pollute the replay device
> +#   for next entry
> +_require_fsck_not_report_dirty_logs_as_error()
> +{
> +       if [ $FSTYP != "btrfs" ]; then
> +               _notrun "fsck of $FSTYP reports dirty jounal/log as error, skipping test"
> +       fi
> +}
> +
> +# Replay and check each fua/flush (specified by $2) point.
> +#
> +# Since dm-log-writes records bio sequentially, even just replaying a range
> +# still needs to iterate all records before the end point.
> +# When number of records grows, it will be unacceptably slow, thus we need
> +# to use relay-log itself to trigger fsck, avoid unnecessary seek.
> +_log_writes_fast_replay_check()
> +{
> +       local check_point=$1
> +       local blkdev=$2
> +       local fsck_command
> +
> +       _require_fsck_not_report_dirty_logs_as_error
> +
> +       [ -z "$check_point" -o -z "$blkdev" ] && _fail \
> +       "check_point and blkdev must be specified for _log_writes_fast_replay_check"
> +       case $FSTYP in
> +       btrfs)
> +               fsck_command="$BTRFS_UTIL_PROG check $blkdev"
> +               ;;
> +       esac
> +       $here/src/log-writes/replay-log --log $LOGWRITES_DEV \
> +               --replay $blkdev --check $check_point --fsck "$fsck_command" \
> +               2>&1 | tail -n 128 >> $seqres.full
> +       [ $? -ne 0 ] && _fail "fsck failed during replay"
> +}
> diff --git a/src/log-writes/replay-log.c b/src/log-writes/replay-log.c
> index 829b18e2..1e1cd524 100644
> --- a/src/log-writes/replay-log.c
> +++ b/src/log-writes/replay-log.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <stdio.h>
> +#include <errno.h>
>  #include <unistd.h>
>  #include <getopt.h>
>  #include <stdlib.h>
> @@ -375,6 +376,7 @@ int main(int argc, char **argv)
>                                 fprintf(stderr, "Fsck errored out on entry "
>                                         "%llu\n",
>                                         (unsigned long long)log->cur_entry - 1);
> +                               ret = -EUCLEAN;
>                                 break;
>                         }
>                 }
> diff --git a/tests/btrfs/192 b/tests/btrfs/192
> new file mode 100755
> index 00000000..db9bc40e
> --- /dev/null
> +++ b/tests/btrfs/192
> @@ -0,0 +1,156 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 192
> +#
> +# Test btrfs consistency after each FUA for a workload with snapshot creation
> +# and removal
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1       # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +       cd /
> +       kill -q $pid1 &> /dev/null
> +       kill -q $pid2 &> /dev/null
> +       "$KILLALL_PROG" -q $FSSTRESS_PROG &> /dev/null
> +       wait
> +       _log_writes_cleanup &> /dev/null
> +       rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/attr
> +. ./common/dmlogwrites
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +
> +_require_command "$KILLALL_PROG" killall
> +_require_command "$BLKDISCARD_PROG" blkdiscard
> +_require_btrfs_fs_feature "no_holes"
> +_require_btrfs_mkfs_feature "no-holes"
> +_require_fsck_not_report_dirty_logs_as_error
> +_require_log_writes
> +_require_scratch
> +_require_attrs
> +
> +# To generate 3 level fs tree for 64K nodesize, we need 32768 xattr items.
> +# That will cause too many transactions, bumping replay check time
> +# from ~60s to ~300s. (VM alreayd using unsafe cache for the test devices)
> +# So here we skip non-4K page size system, in favor of a shorter default
> +# test time

So I would skip mentioning how much time it takes to run on your environment.
That highly depends on hardware, system configuration, mount and mkfs
options, etc, so mentioning values there is a bit odd.
Just say that we require a 4K nodesize to ensure the test isn't too
slow, that's enough to explain why the 4K requirement.

> +if [ $(get_page_size) -ne 4096 ]; then
> +       _notrun "This test doesn't support non-4K page size yet"
> +fi
> +
> +runtime=30
> +nr_cpus=$("$here/src/feature" -o)
> +# cap nr_cpus to 8 to avoid spending too much time on hosts with many cpus
> +if [ $nr_cpus -gt 8 ]; then
> +       nr_cpus=8
> +fi
> +fsstress_args=$(_scale_fsstress_args -w -d $SCRATCH_MNT -n 99999 -p $nr_cpus \
> +               $FSSTRESS_AVOID)
> +_log_writes_init $SCRATCH_DEV
> +
> +# Discard the whole devices so when some tree pointer is wrong, it won't point
> +# to some older valid tree blocks, so we can detect it.
> +$BLKDISCARD_PROG $LOGWRITES_DMDEV > /dev/null 2>&1
> +
> +# Workaround minor file extent discountinous.

Use no-holes to avoid warnings of missing file extent items (expected
for holes due to mix of buffered and direct IO writes).

> +# And use 4K nodesize to bump tree height.
> +_log_writes_mkfs -O no-holes -n 4k >> $seqres.full
> +_log_writes_mount
> +
> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null
> +mkdir -p $SCRATCH_MNT/snapshots
> +mkdir -p $SCRATCH_MNT/src/padding
> +
> +random_file()
> +{
> +       local basedir=$1
> +       echo "$basedir/$(ls $basedir | sort -R | tail -1)"
> +}
> +
> +snapshot_workload()
> +{
> +       trap "wait; exit" SIGTERM
> +
> +       local i=0
> +       while true; do
> +               $BTRFS_UTIL_PROG subvolume snapshot \
> +                       $SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \
> +                       > /dev/null
> +               # Do something small to make snapshots different
> +               rm -f "$(random_file $SCRATCH_MNT/src/padding)"
> +               rm -f "$(random_file $SCRATCH_MNT/src/padding)"
> +               touch "$(random_file $SCRATCH_MNT/src/padding)"
> +               touch "$SCRATCH_MNT/src/padding/random_$RANDOM"
> +
> +               i=$(($i + 1))
> +               sleep 1
> +       done
> +}
> +
> +delete_workload()
> +{
> +       trap "wait; exit" SIGTERM
> +
> +       while true; do
> +               sleep 2
> +               $BTRFS_UTIL_PROG subvolume delete \
> +                       "$(random_file $SCRATCH_MNT/snapshots)" \
> +                       > /dev/null 2>&1
> +       done
> +}
> +
> +xattr_value=$(printf '%0.sX' $(seq 1 3800))
> +
> +# Bumping tree height to level 2.
> +for ((i = 0; i < 64; i++)); do
> +       touch "$SCRATCH_MNT/src/padding/$i"
> +       $SETFATTR_PROG -n 'user.x1' -v $xattr_value \
> +               "$SCRATCH_MNT/src/padding/$i"

No need to split to 2 lines, everything fits in 80 characters.

Other than that it looks good to me. After adressing Eryu's comments
and those small changes, you can have:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +done
> +
> +_log_writes_mark prepare
> +
> +snapshot_workload &
> +pid1=$!
> +delete_workload &
> +pid2=$!
> +
> +"$FSSTRESS_PROG" $fsstress_args > /dev/null &
> +sleep $runtime
> +
> +"$KILLALL_PROG" -q "$FSSTRESS_PROG" &> /dev/null
> +kill $pid1 &> /dev/null
> +kill $pid2 &> /dev/null
> +wait
> +_log_writes_unmount
> +_log_writes_remove
> +
> +_log_writes_fast_replay_check fua "$SCRATCH_DEV"
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/192.out b/tests/btrfs/192.out
> new file mode 100644
> index 00000000..6779aa77
> --- /dev/null
> +++ b/tests/btrfs/192.out
> @@ -0,0 +1,2 @@
> +QA output created by 192
> +Silence is golden
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index 2474d43e..cab10d19 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -194,3 +194,4 @@
>  189 auto quick send clone
>  190 auto quick replay balance qgroup
>  191 auto quick send dedupe
> +192 auto replay snapshot stress
> --
> 2.22.0
>

Patch
diff mbox series

diff --git a/common/config b/common/config
index bd64be62..4c86a492 100644
--- a/common/config
+++ b/common/config
@@ -183,6 +183,7 @@  export LOGGER_PROG="$(type -P logger)"
 export DBENCH_PROG="$(type -P dbench)"
 export DMSETUP_PROG="$(type -P dmsetup)"
 export WIPEFS_PROG="$(type -P wipefs)"
+export BLKDISCARD_PROG="$(type -P blkdiscard)"
 export DUMP_PROG="$(type -P dump)"
 export RESTORE_PROG="$(type -P restore)"
 export LVM_PROG="$(type -P lvm)"
diff --git a/common/dmlogwrites b/common/dmlogwrites
index ae2cbc6a..474ec570 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -175,3 +175,47 @@  _log_writes_replay_log_range()
 		>> $seqres.full 2>&1
 	[ $? -ne 0 ] && _fail "replay failed"
 }
+
+# Require fsck not to report dirty logs as error
+#
+# This is a special requirement to use _log_writes_fast_replay_check
+# The reasons are:
+# - To avoid unnecessary seek when there are a lot of entries
+#   replay-log doesn't have a tree-like structure to do fast index,
+#   thus it iterate all entries one by one, this can be very slow
+# - No way to revert the log replay for next check
+#   A lot of fsck will replay the log, which will pollute the replay device
+#   for next entry
+_require_fsck_not_report_dirty_logs_as_error()
+{
+	if [ $FSTYP != "btrfs" ]; then
+		_notrun "fsck of $FSTYP reports dirty jounal/log as error, skipping test"
+	fi
+}
+
+# Replay and check each fua/flush (specified by $2) point.
+#
+# Since dm-log-writes records bio sequentially, even just replaying a range
+# still needs to iterate all records before the end point.
+# When number of records grows, it will be unacceptably slow, thus we need
+# to use relay-log itself to trigger fsck, avoid unnecessary seek.
+_log_writes_fast_replay_check()
+{
+	local check_point=$1
+	local blkdev=$2
+	local fsck_command
+
+	_require_fsck_not_report_dirty_logs_as_error
+
+	[ -z "$check_point" -o -z "$blkdev" ] && _fail \
+	"check_point and blkdev must be specified for _log_writes_fast_replay_check"
+	case $FSTYP in
+	btrfs)
+		fsck_command="$BTRFS_UTIL_PROG check $blkdev"
+		;;
+	esac
+	$here/src/log-writes/replay-log --log $LOGWRITES_DEV \
+		--replay $blkdev --check $check_point --fsck "$fsck_command" \
+		2>&1 | tail -n 128 >> $seqres.full
+	[ $? -ne 0 ] && _fail "fsck failed during replay"
+}
diff --git a/src/log-writes/replay-log.c b/src/log-writes/replay-log.c
index 829b18e2..1e1cd524 100644
--- a/src/log-writes/replay-log.c
+++ b/src/log-writes/replay-log.c
@@ -1,5 +1,6 @@ 
 // SPDX-License-Identifier: GPL-2.0
 #include <stdio.h>
+#include <errno.h>
 #include <unistd.h>
 #include <getopt.h>
 #include <stdlib.h>
@@ -375,6 +376,7 @@  int main(int argc, char **argv)
 				fprintf(stderr, "Fsck errored out on entry "
 					"%llu\n",
 					(unsigned long long)log->cur_entry - 1);
+				ret = -EUCLEAN;
 				break;
 			}
 		}
diff --git a/tests/btrfs/192 b/tests/btrfs/192
new file mode 100755
index 00000000..db9bc40e
--- /dev/null
+++ b/tests/btrfs/192
@@ -0,0 +1,156 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 192
+#
+# Test btrfs consistency after each FUA for a workload with snapshot creation
+# and removal
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	kill -q $pid1 &> /dev/null
+	kill -q $pid2 &> /dev/null
+	"$KILLALL_PROG" -q $FSSTRESS_PROG &> /dev/null
+	wait
+	_log_writes_cleanup &> /dev/null
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+. ./common/dmlogwrites
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+
+_require_command "$KILLALL_PROG" killall
+_require_command "$BLKDISCARD_PROG" blkdiscard
+_require_btrfs_fs_feature "no_holes"
+_require_btrfs_mkfs_feature "no-holes"
+_require_fsck_not_report_dirty_logs_as_error
+_require_log_writes
+_require_scratch
+_require_attrs
+
+# To generate 3 level fs tree for 64K nodesize, we need 32768 xattr items.
+# That will cause too many transactions, bumping replay check time
+# from ~60s to ~300s. (VM alreayd using unsafe cache for the test devices)
+# So here we skip non-4K page size system, in favor of a shorter default
+# test time
+if [ $(get_page_size) -ne 4096 ]; then
+	_notrun "This test doesn't support non-4K page size yet"
+fi
+
+runtime=30
+nr_cpus=$("$here/src/feature" -o)
+# cap nr_cpus to 8 to avoid spending too much time on hosts with many cpus
+if [ $nr_cpus -gt 8 ]; then
+	nr_cpus=8
+fi
+fsstress_args=$(_scale_fsstress_args -w -d $SCRATCH_MNT -n 99999 -p $nr_cpus \
+		$FSSTRESS_AVOID)
+_log_writes_init $SCRATCH_DEV
+
+# Discard the whole devices so when some tree pointer is wrong, it won't point
+# to some older valid tree blocks, so we can detect it.
+$BLKDISCARD_PROG $LOGWRITES_DMDEV > /dev/null 2>&1
+
+# Workaround minor file extent discountinous.
+# And use 4K nodesize to bump tree height.
+_log_writes_mkfs -O no-holes -n 4k >> $seqres.full
+_log_writes_mount
+
+$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null
+mkdir -p $SCRATCH_MNT/snapshots
+mkdir -p $SCRATCH_MNT/src/padding
+
+random_file()
+{
+	local basedir=$1
+	echo "$basedir/$(ls $basedir | sort -R | tail -1)"
+}
+
+snapshot_workload()
+{
+	trap "wait; exit" SIGTERM
+
+	local i=0
+	while true; do
+		$BTRFS_UTIL_PROG subvolume snapshot \
+			$SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \
+			> /dev/null
+		# Do something small to make snapshots different
+		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
+		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
+		touch "$(random_file $SCRATCH_MNT/src/padding)"
+		touch "$SCRATCH_MNT/src/padding/random_$RANDOM"
+
+		i=$(($i + 1))
+		sleep 1
+	done
+}
+
+delete_workload()
+{
+	trap "wait; exit" SIGTERM
+
+	while true; do
+		sleep 2
+		$BTRFS_UTIL_PROG subvolume delete \
+			"$(random_file $SCRATCH_MNT/snapshots)" \
+			> /dev/null 2>&1
+	done
+}
+
+xattr_value=$(printf '%0.sX' $(seq 1 3800))
+
+# Bumping tree height to level 2.
+for ((i = 0; i < 64; i++)); do
+	touch "$SCRATCH_MNT/src/padding/$i"
+	$SETFATTR_PROG -n 'user.x1' -v $xattr_value \
+		"$SCRATCH_MNT/src/padding/$i"
+done
+
+_log_writes_mark prepare
+
+snapshot_workload &
+pid1=$!
+delete_workload &
+pid2=$!
+
+"$FSSTRESS_PROG" $fsstress_args > /dev/null &
+sleep $runtime
+
+"$KILLALL_PROG" -q "$FSSTRESS_PROG" &> /dev/null
+kill $pid1 &> /dev/null
+kill $pid2 &> /dev/null
+wait
+_log_writes_unmount
+_log_writes_remove
+
+_log_writes_fast_replay_check fua "$SCRATCH_DEV"
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/192.out b/tests/btrfs/192.out
new file mode 100644
index 00000000..6779aa77
--- /dev/null
+++ b/tests/btrfs/192.out
@@ -0,0 +1,2 @@ 
+QA output created by 192
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 2474d43e..cab10d19 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -194,3 +194,4 @@ 
 189 auto quick send clone
 190 auto quick replay balance qgroup
 191 auto quick send dedupe
+192 auto replay snapshot stress