diff mbox

[v3,10/13] fstests: crash consistency fsx test using dm-log-writes

Message ID 1504638680-25682-11-git-send-email-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein Sept. 5, 2017, 7:11 p.m. UTC
Cherry-picked the test from commit 70d41e17164b
in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
Quoting from Josef's commit message:

  The test just runs some ops and exits, then finds all of the good buffers
  in the directory we provided and:
  - replays up to the mark given
  - mounts the file system and compares the md5sum
  - unmounts and fsck's to check for metadata integrity

  dm-log-writes will pretend to do discard and the replay-log tool will
  replay it properly depending on the underlying device, either by writing
  0's or actually calling the discard ioctl, so I've enabled discard in the
  test for maximum fun.

[Amir:]
- Removed unneeded _test_falloc_support dynamic FSX_OPTS
- Fold repetitions into for loops
- Added place holders for using constant random seeds
- Add pre umount checkpint
- Add test to new 'replay' group
- Address review comments by Eryu Guan

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/500.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 138 insertions(+)
 create mode 100755 tests/generic/500
 create mode 100644 tests/generic/500.out

Comments

Eryu Guan Sept. 7, 2017, 7:50 a.m. UTC | #1
On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote:
> Cherry-picked the test from commit 70d41e17164b
> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> Quoting from Josef's commit message:
> 
>   The test just runs some ops and exits, then finds all of the good buffers
>   in the directory we provided and:
>   - replays up to the mark given
>   - mounts the file system and compares the md5sum
>   - unmounts and fsck's to check for metadata integrity
> 
>   dm-log-writes will pretend to do discard and the replay-log tool will
>   replay it properly depending on the underlying device, either by writing
>   0's or actually calling the discard ioctl, so I've enabled discard in the
>   test for maximum fun.
> 
> [Amir:]
> - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> - Fold repetitions into for loops
> - Added place holders for using constant random seeds
> - Add pre umount checkpint
> - Add test to new 'replay' group
> - Address review comments by Eryu Guan
> 
> Cc: Josef Bacik <jbacik@fb.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/500.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 138 insertions(+)
>  create mode 100755 tests/generic/500
>  create mode 100644 tests/generic/500.out
> 
> diff --git a/tests/generic/500 b/tests/generic/500
> new file mode 100755
> index 0000000..82f7a93
> --- /dev/null
> +++ b/tests/generic/500
> @@ -0,0 +1,135 @@
> +#! /bin/bash
> +# FS QA Test No. 500
> +#
> +# Run fsx with log writes to verify power fail safeness.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2015 Facebook. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +status=1	# failure is the default!
> +
> +_cleanup()
> +{
> +	_log_writes_cleanup
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmlogwrites
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_test
> +_require_scratch_nocheck
> +_require_log_writes
> +
> +rm -f $seqres.full
> +
> +check_files()
> +{
> +	local name=$1
> +
> +	# Now look for our files
> +	for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
> +	do
> +		local filename=$(basename $i)
> +		local mark="${filename##*.}"
> +		echo "checking $filename" >> $seqres.full
> +		_log_writes_replay_log $filename
> +		_scratch_mount
> +		local expected_md5=$(_md5_checksum $i)
> +		local md5=$(_md5_checksum $SCRATCH_MNT/$name)
> +		[ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
> +		_scratch_unmount
> +		_check_scratch_fs
> +	done
> +}
> +
> +SANITY_DIR=$TEST_DIR/fsxtests
> +rm -rf $SANITY_DIR
> +mkdir $SANITY_DIR
> +
> +# Create the log
> +_log_writes_init
> +
> +_log_writes_mkfs >> $seqres.full 2>&1
> +
> +# Log writes emulates discard support, turn it on for maximum crying.
> +_log_writes_mount -o discard
> +
> +NUM_FILES=4
> +NUM_OPS=200
> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
> +# Set random seeds for fsx runs (0 for timestamp + pid)
> +seeds=(0 0 0 0)

Hmm, is this array necessary? Why not pass 0 to -S directly in the
commandline? And generic/502 follows the same pattern.

> +# Run fsx for a while
> +for j in `seq 0 $((NUM_FILES-1))`
> +do
> +	run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
> +done
> +wait

Thanks,
Eryu
Amir Goldstein Sept. 7, 2017, 8:50 a.m. UTC | #2
On Thu, Sep 7, 2017 at 10:50 AM, Eryu Guan <eguan@redhat.com> wrote:
> On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote:
>> Cherry-picked the test from commit 70d41e17164b
>> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> Quoting from Josef's commit message:
>>
>>   The test just runs some ops and exits, then finds all of the good buffers
>>   in the directory we provided and:
>>   - replays up to the mark given
>>   - mounts the file system and compares the md5sum
>>   - unmounts and fsck's to check for metadata integrity
>>
>>   dm-log-writes will pretend to do discard and the replay-log tool will
>>   replay it properly depending on the underlying device, either by writing
>>   0's or actually calling the discard ioctl, so I've enabled discard in the
>>   test for maximum fun.
>>
>> [Amir:]
>> - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> - Fold repetitions into for loops
>> - Added place holders for using constant random seeds
>> - Add pre umount checkpint
>> - Add test to new 'replay' group
>> - Address review comments by Eryu Guan
>>
>> Cc: Josef Bacik <jbacik@fb.com>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/generic/500.out |   2 +
>>  tests/generic/group   |   1 +
>>  3 files changed, 138 insertions(+)
>>  create mode 100755 tests/generic/500
>>  create mode 100644 tests/generic/500.out
>>
>> diff --git a/tests/generic/500 b/tests/generic/500
>> new file mode 100755
>> index 0000000..82f7a93
>> --- /dev/null
>> +++ b/tests/generic/500
>> @@ -0,0 +1,135 @@
>> +#! /bin/bash
>> +# FS QA Test No. 500
>> +#
>> +# Run fsx with log writes to verify power fail safeness.
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Facebook. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +status=1     # failure is the default!
>> +
>> +_cleanup()
>> +{
>> +     _log_writes_cleanup
>> +}
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/dmlogwrites
>> +
>> +# real QA test starts here
>> +_supported_fs generic
>> +_supported_os Linux
>> +_require_test
>> +_require_scratch_nocheck
>> +_require_log_writes
>> +
>> +rm -f $seqres.full
>> +
>> +check_files()
>> +{
>> +     local name=$1
>> +
>> +     # Now look for our files
>> +     for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
>> +     do
>> +             local filename=$(basename $i)
>> +             local mark="${filename##*.}"
>> +             echo "checking $filename" >> $seqres.full
>> +             _log_writes_replay_log $filename
>> +             _scratch_mount
>> +             local expected_md5=$(_md5_checksum $i)
>> +             local md5=$(_md5_checksum $SCRATCH_MNT/$name)
>> +             [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
>> +             _scratch_unmount
>> +             _check_scratch_fs
>> +     done
>> +}
>> +
>> +SANITY_DIR=$TEST_DIR/fsxtests
>> +rm -rf $SANITY_DIR
>> +mkdir $SANITY_DIR
>> +
>> +# Create the log
>> +_log_writes_init
>> +
>> +_log_writes_mkfs >> $seqres.full 2>&1
>> +
>> +# Log writes emulates discard support, turn it on for maximum crying.
>> +_log_writes_mount -o discard
>> +
>> +NUM_FILES=4
>> +NUM_OPS=200
>> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
>> +# Set random seeds for fsx runs (0 for timestamp + pid)
>> +seeds=(0 0 0 0)
>
> Hmm, is this array necessary? Why not pass 0 to -S directly in the
> commandline? And generic/502 follows the same pattern.
>

The reason for the array is trying to follow your guidelines for running
the test:
- Normally test runs with random seed (for which the array is not needed)
  and every fsx process prints out the seed to the full log
- When test detects a failure, you want to retry the test with same seed
  numbers, but those are different seed numbers for every fsx process
- So the array is here to make setting those presets possible
Eryu Guan Sept. 7, 2017, 8:55 a.m. UTC | #3
On Thu, Sep 07, 2017 at 11:50:25AM +0300, Amir Goldstein wrote:
> On Thu, Sep 7, 2017 at 10:50 AM, Eryu Guan <eguan@redhat.com> wrote:
> > On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote:
> >> Cherry-picked the test from commit 70d41e17164b
> >> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> >> Quoting from Josef's commit message:
> >>
> >>   The test just runs some ops and exits, then finds all of the good buffers
> >>   in the directory we provided and:
> >>   - replays up to the mark given
> >>   - mounts the file system and compares the md5sum
> >>   - unmounts and fsck's to check for metadata integrity
> >>
> >>   dm-log-writes will pretend to do discard and the replay-log tool will
> >>   replay it properly depending on the underlying device, either by writing
> >>   0's or actually calling the discard ioctl, so I've enabled discard in the
> >>   test for maximum fun.
> >>
> >> [Amir:]
> >> - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> >> - Fold repetitions into for loops
> >> - Added place holders for using constant random seeds
> >> - Add pre umount checkpint
> >> - Add test to new 'replay' group
> >> - Address review comments by Eryu Guan
> >>
> >> Cc: Josef Bacik <jbacik@fb.com>
> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> ---
> >>  tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  tests/generic/500.out |   2 +
> >>  tests/generic/group   |   1 +
> >>  3 files changed, 138 insertions(+)
> >>  create mode 100755 tests/generic/500
> >>  create mode 100644 tests/generic/500.out
> >>
> >> diff --git a/tests/generic/500 b/tests/generic/500
> >> new file mode 100755
> >> index 0000000..82f7a93
> >> --- /dev/null
> >> +++ b/tests/generic/500
> >> @@ -0,0 +1,135 @@
> >> +#! /bin/bash
> >> +# FS QA Test No. 500
> >> +#
> >> +# Run fsx with log writes to verify power fail safeness.
> >> +#
> >> +#-----------------------------------------------------------------------
> >> +# Copyright (c) 2015 Facebook. All Rights Reserved.
> >> +#
> >> +# This program is free software; you can redistribute it and/or
> >> +# modify it under the terms of the GNU General Public License as
> >> +# published by the Free Software Foundation.
> >> +#
> >> +# This program is distributed in the hope that it would be useful,
> >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> +# GNU General Public License for more details.
> >> +#
> >> +# You should have received a copy of the GNU General Public License
> >> +# along with this program; if not, write the Free Software Foundation,
> >> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> >> +#-----------------------------------------------------------------------
> >> +#
> >> +
> >> +seq=`basename $0`
> >> +seqres=$RESULT_DIR/$seq
> >> +echo "QA output created by $seq"
> >> +
> >> +here=`pwd`
> >> +status=1     # failure is the default!
> >> +
> >> +_cleanup()
> >> +{
> >> +     _log_writes_cleanup
> >> +}
> >> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >> +
> >> +# get standard environment, filters and checks
> >> +. ./common/rc
> >> +. ./common/filter
> >> +. ./common/dmlogwrites
> >> +
> >> +# real QA test starts here
> >> +_supported_fs generic
> >> +_supported_os Linux
> >> +_require_test
> >> +_require_scratch_nocheck
> >> +_require_log_writes
> >> +
> >> +rm -f $seqres.full
> >> +
> >> +check_files()
> >> +{
> >> +     local name=$1
> >> +
> >> +     # Now look for our files
> >> +     for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
> >> +     do
> >> +             local filename=$(basename $i)
> >> +             local mark="${filename##*.}"
> >> +             echo "checking $filename" >> $seqres.full
> >> +             _log_writes_replay_log $filename
> >> +             _scratch_mount
> >> +             local expected_md5=$(_md5_checksum $i)
> >> +             local md5=$(_md5_checksum $SCRATCH_MNT/$name)
> >> +             [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
> >> +             _scratch_unmount
> >> +             _check_scratch_fs
> >> +     done
> >> +}
> >> +
> >> +SANITY_DIR=$TEST_DIR/fsxtests
> >> +rm -rf $SANITY_DIR
> >> +mkdir $SANITY_DIR
> >> +
> >> +# Create the log
> >> +_log_writes_init
> >> +
> >> +_log_writes_mkfs >> $seqres.full 2>&1
> >> +
> >> +# Log writes emulates discard support, turn it on for maximum crying.
> >> +_log_writes_mount -o discard
> >> +
> >> +NUM_FILES=4
> >> +NUM_OPS=200
> >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
> >> +# Set random seeds for fsx runs (0 for timestamp + pid)
> >> +seeds=(0 0 0 0)
> >
> > Hmm, is this array necessary? Why not pass 0 to -S directly in the
> > commandline? And generic/502 follows the same pattern.
> >
> 
> The reason for the array is trying to follow your guidelines for running
> the test:
> - Normally test runs with random seed (for which the array is not needed)
>   and every fsx process prints out the seed to the full log
> - When test detects a failure, you want to retry the test with same seed
>   numbers, but those are different seed numbers for every fsx process
> - So the array is here to make setting those presets possible

Yeah, an array makes debug easier. Then I think a comment about the
debug usage of the array would be sufficient. Thanks!

Eryu
Amir Goldstein Sept. 7, 2017, 10:10 a.m. UTC | #4
On Thu, Sep 7, 2017 at 11:55 AM, Eryu Guan <eguan@redhat.com> wrote:

>> >> +NUM_FILES=4
>> >> +NUM_OPS=200
>> >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
>> >> +# Set random seeds for fsx runs (0 for timestamp + pid)
>> >> +seeds=(0 0 0 0)
>> >
>> > Hmm, is this array necessary? Why not pass 0 to -S directly in the
>> > commandline? And generic/502 follows the same pattern.
>> >
>>
>> The reason for the array is trying to follow your guidelines for running
>> the test:
>> - Normally test runs with random seed (for which the array is not needed)
>>   and every fsx process prints out the seed to the full log
>> - When test detects a failure, you want to retry the test with same seed
>>   numbers, but those are different seed numbers for every fsx process
>> - So the array is here to make setting those presets possible
>
> Yeah, an array makes debug easier. Then I think a comment about the
> debug usage of the array would be sufficient. Thanks!
>

FYI, I removed the array from generic/502, because it was never useful
during debugging. I guess the randomness stems mostly from timing
of forking the fsx processes in that test.

I reduced NUM_OPS in cloned files test to 10 in v4, because after a few
holes and writes the clone is no longer a clone anyway.
I already found a seed value to reproduce a failure of generic/502 on xfs
failures are still random, but on spinning disk, I get them every 2 test runs.

Amir.
Amir Goldstein Nov. 27, 2017, 9:56 a.m. UTC | #5
On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> Cherry-picked the test from commit 70d41e17164b
> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> Quoting from Josef's commit message:
>
>   The test just runs some ops and exits, then finds all of the good buffers
>   in the directory we provided and:
>   - replays up to the mark given
>   - mounts the file system and compares the md5sum
>   - unmounts and fsck's to check for metadata integrity
>
>   dm-log-writes will pretend to do discard and the replay-log tool will
>   replay it properly depending on the underlying device, either by writing
>   0's or actually calling the discard ioctl, so I've enabled discard in the
>   test for maximum fun.
>
> [Amir:]
> - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> - Fold repetitions into for loops
> - Added place holders for using constant random seeds
> - Add pre umount checkpint
> - Add test to new 'replay' group
> - Address review comments by Eryu Guan
>
> Cc: Josef Bacik <jbacik@fb.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>


Josef,

As you know, this test is now merged to xfstest as generic/455.
I have been running the test for a while on xfs and it occasionally
reports inconsistencies which I try to investigate.

In some of the reports, it appears that dm-log-writes may be exhibiting
a reliability issue (see below).

> ---
>  tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/500.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 138 insertions(+)
>  create mode 100755 tests/generic/500
>  create mode 100644 tests/generic/500.out
>
> diff --git a/tests/generic/500 b/tests/generic/500
> new file mode 100755
> index 0000000..82f7a93
> --- /dev/null
> +++ b/tests/generic/500
> @@ -0,0 +1,135 @@
> +#! /bin/bash
> +# FS QA Test No. 500
> +#
> +# Run fsx with log writes to verify power fail safeness.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2015 Facebook. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +status=1       # failure is the default!
> +
> +_cleanup()
> +{
> +       _log_writes_cleanup
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmlogwrites
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_test
> +_require_scratch_nocheck
> +_require_log_writes
> +
> +rm -f $seqres.full
> +
> +check_files()
> +{
> +       local name=$1
> +
> +       # Now look for our files
> +       for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
> +       do
> +               local filename=$(basename $i)
> +               local mark="${filename##*.}"
> +               echo "checking $filename" >> $seqres.full
> +               _log_writes_replay_log $filename
> +               _scratch_mount
> +               local expected_md5=$(_md5_checksum $i)
> +               local md5=$(_md5_checksum $SCRATCH_MNT/$name)
> +               [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"

One time, the test reported md5 mismatch on a file, but when I replayed
the log to the same mark I found that md5 of the file is correct compared to the
'snapshot' file in test partition.

> +               _scratch_unmount
> +               _check_scratch_fs
> +       done
> +}
> +
> +SANITY_DIR=$TEST_DIR/fsxtests
> +rm -rf $SANITY_DIR
> +mkdir $SANITY_DIR
> +
> +# Create the log
> +_log_writes_init
> +
> +_log_writes_mkfs >> $seqres.full 2>&1
> +
> +# Log writes emulates discard support, turn it on for maximum crying.
> +_log_writes_mount -o discard
> +
> +NUM_FILES=4
> +NUM_OPS=200
> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
> +# Set random seeds for fsx runs (0 for timestamp + pid)
> +seeds=(0 0 0 0)
> +# Run fsx for a while
> +for j in `seq 0 $((NUM_FILES-1))`
> +do
> +       run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
> +done
> +wait
> +
> +test_md5=()
> +for j in `seq 0 $((NUM_FILES-1))`
> +do
> +       test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j)
> +done
> +
> +# Unmount the scratch dir and tear down the log writes target
> +_log_writes_mark last
> +_log_writes_unmount
> +_log_writes_mark end
> +_log_writes_remove
> +_check_scratch_fs
> +

Another time xfs_repair complained about dirty log right after
_log_writes_remove
and I wasn't able to seek to the "end" mark. Log entries were valid a
few entries
after the "last" mark.

This leads me to believe that perhaps dm-log-writes doesn't flush all
its pending
log io before remove? Anyway, I added a "sync $LOGWRITES_DEV" call inside
_log_writes_remove. Not sure if that helps or if that is required
before removing the
target?

I will report if I see that the problem persists.
Thought? suggestions?

Thanks,
Amir.
Ashlie Martinez Nov. 27, 2017, 2:23 p.m. UTC | #6
Amir,

Just to throw in what I believe I've found about dm-log-writes (though
Josef would know more about this as it's just what I've concluded
after looking at  the code): dm-log-writes logs at IO completion,
meaning disks that ignore flush operations could exhibit bugs where
there are none, and dm-log-writes does synchronous marks though this
may not line up with the stream of IO operations (due to buffering)
unless you just did a sync. The CrashMonkey team wanted to make
something similar to the "mark" operation dm-log-writes has and we
concluded that the only place we could guarantee a mark would line up
with the stream of IO operations the user had performed was right
after a call to sync as that would force cached updates to disk. If
you call mark without a sync, then you could insert a mark after a
write(2) returns, but before the delayed allocation for that write(2)
actually allocates blocks and changes the extent tree on disk.

On Mon, Nov 27, 2017 at 3:56 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Cherry-picked the test from commit 70d41e17164b
>> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> Quoting from Josef's commit message:
>>
>>   The test just runs some ops and exits, then finds all of the good buffers
>>   in the directory we provided and:
>>   - replays up to the mark given
>>   - mounts the file system and compares the md5sum
>>   - unmounts and fsck's to check for metadata integrity
>>
>>   dm-log-writes will pretend to do discard and the replay-log tool will
>>   replay it properly depending on the underlying device, either by writing
>>   0's or actually calling the discard ioctl, so I've enabled discard in the
>>   test for maximum fun.
>>
>> [Amir:]
>> - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> - Fold repetitions into for loops
>> - Added place holders for using constant random seeds
>> - Add pre umount checkpint
>> - Add test to new 'replay' group
>> - Address review comments by Eryu Guan
>>
>> Cc: Josef Bacik <jbacik@fb.com>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
>
> Josef,
>
> As you know, this test is now merged to xfstest as generic/455.
> I have been running the test for a while on xfs and it occasionally
> reports inconsistencies which I try to investigate.
>
> In some of the reports, it appears that dm-log-writes may be exhibiting
> a reliability issue (see below).
>
>> ---
>>  tests/generic/500     | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/generic/500.out |   2 +
>>  tests/generic/group   |   1 +
>>  3 files changed, 138 insertions(+)
>>  create mode 100755 tests/generic/500
>>  create mode 100644 tests/generic/500.out
>>
>> diff --git a/tests/generic/500 b/tests/generic/500
>> new file mode 100755
>> index 0000000..82f7a93
>> --- /dev/null
>> +++ b/tests/generic/500
>> @@ -0,0 +1,135 @@
>> +#! /bin/bash
>> +# FS QA Test No. 500
>> +#
>> +# Run fsx with log writes to verify power fail safeness.
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Facebook. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +status=1       # failure is the default!
>> +
>> +_cleanup()
>> +{
>> +       _log_writes_cleanup
>> +}
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/dmlogwrites
>> +
>> +# real QA test starts here
>> +_supported_fs generic
>> +_supported_os Linux
>> +_require_test
>> +_require_scratch_nocheck
>> +_require_log_writes
>> +
>> +rm -f $seqres.full
>> +
>> +check_files()
>> +{
>> +       local name=$1
>> +
>> +       # Now look for our files
>> +       for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
>> +       do
>> +               local filename=$(basename $i)
>> +               local mark="${filename##*.}"
>> +               echo "checking $filename" >> $seqres.full
>> +               _log_writes_replay_log $filename
>> +               _scratch_mount
>> +               local expected_md5=$(_md5_checksum $i)
>> +               local md5=$(_md5_checksum $SCRATCH_MNT/$name)
>> +               [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
>
> One time, the test reported md5 mismatch on a file, but when I replayed
> the log to the same mark I found that md5 of the file is correct compared to the
> 'snapshot' file in test partition.
>
>> +               _scratch_unmount
>> +               _check_scratch_fs
>> +       done
>> +}
>> +
>> +SANITY_DIR=$TEST_DIR/fsxtests
>> +rm -rf $SANITY_DIR
>> +mkdir $SANITY_DIR
>> +
>> +# Create the log
>> +_log_writes_init
>> +
>> +_log_writes_mkfs >> $seqres.full 2>&1
>> +
>> +# Log writes emulates discard support, turn it on for maximum crying.
>> +_log_writes_mount -o discard
>> +
>> +NUM_FILES=4
>> +NUM_OPS=200
>> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
>> +# Set random seeds for fsx runs (0 for timestamp + pid)
>> +seeds=(0 0 0 0)
>> +# Run fsx for a while
>> +for j in `seq 0 $((NUM_FILES-1))`
>> +do
>> +       run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
>> +done
>> +wait
>> +
>> +test_md5=()
>> +for j in `seq 0 $((NUM_FILES-1))`
>> +do
>> +       test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j)
>> +done
>> +
>> +# Unmount the scratch dir and tear down the log writes target
>> +_log_writes_mark last
>> +_log_writes_unmount
>> +_log_writes_mark end
>> +_log_writes_remove
>> +_check_scratch_fs
>> +
>
> Another time xfs_repair complained about dirty log right after
> _log_writes_remove
> and I wasn't able to seek to the "end" mark. Log entries were valid a
> few entries
> after the "last" mark.
>
> This leads me to believe that perhaps dm-log-writes doesn't flush all
> its pending
> log io before remove? Anyway, I added a "sync $LOGWRITES_DEV" call inside
> _log_writes_remove. Not sure if that helps or if that is required
> before removing the
> target?
>
> I will report if I see that the problem persists.
> Thought? suggestions?
>
> Thanks,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Nov. 27, 2017, 3:07 p.m. UTC | #7
On Mon, Nov 27, 2017 at 08:23:49AM -0600, Ashlie Martinez wrote:
> Amir,
> 
> Just to throw in what I believe I've found about dm-log-writes (though
> Josef would know more about this as it's just what I've concluded
> after looking at  the code): dm-log-writes logs at IO completion,
> meaning disks that ignore flush operations could exhibit bugs where
> there are none, and dm-log-writes does synchronous marks though this
> may not line up with the stream of IO operations (due to buffering)
> unless you just did a sync. The CrashMonkey team wanted to make
> something similar to the "mark" operation dm-log-writes has and we
> concluded that the only place we could guarantee a mark would line up
> with the stream of IO operations the user had performed was right
> after a call to sync as that would force cached updates to disk. If
> you call mark without a sync, then you could insert a mark after a
> write(2) returns, but before the delayed allocation for that write(2)
> actually allocates blocks and changes the extent tree on disk.
> 

Yeah dm-log-writes gladly hands you the loaded gun, it's your job to not shoot
yourself with it.  This fsx test is specifically set up to only do the mark
_after_ a successful call of fsync(), which means that the mark will be after
where we expect our data to be consistent.  Thanks,

Josef
diff mbox

Patch

diff --git a/tests/generic/500 b/tests/generic/500
new file mode 100755
index 0000000..82f7a93
--- /dev/null
+++ b/tests/generic/500
@@ -0,0 +1,135 @@ 
+#! /bin/bash
+# FS QA Test No. 500
+#
+# Run fsx with log writes to verify power fail safeness.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Facebook. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_log_writes_cleanup
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmlogwrites
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_test
+_require_scratch_nocheck
+_require_log_writes
+
+rm -f $seqres.full
+
+check_files()
+{
+	local name=$1
+
+	# Now look for our files
+	for i in $(find $SANITY_DIR -type f | grep $name | grep mark)
+	do
+		local filename=$(basename $i)
+		local mark="${filename##*.}"
+		echo "checking $filename" >> $seqres.full
+		_log_writes_replay_log $filename
+		_scratch_mount
+		local expected_md5=$(_md5_checksum $i)
+		local md5=$(_md5_checksum $SCRATCH_MNT/$name)
+		[ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
+		_scratch_unmount
+		_check_scratch_fs
+	done
+}
+
+SANITY_DIR=$TEST_DIR/fsxtests
+rm -rf $SANITY_DIR
+mkdir $SANITY_DIR
+
+# Create the log
+_log_writes_init
+
+_log_writes_mkfs >> $seqres.full 2>&1
+
+# Log writes emulates discard support, turn it on for maximum crying.
+_log_writes_mount -o discard
+
+NUM_FILES=4
+NUM_OPS=200
+FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV"
+# Set random seeds for fsx runs (0 for timestamp + pid)
+seeds=(0 0 0 0)
+# Run fsx for a while
+for j in `seq 0 $((NUM_FILES-1))`
+do
+	run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j &
+done
+wait
+
+test_md5=()
+for j in `seq 0 $((NUM_FILES-1))`
+do
+	test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j)
+done
+
+# Unmount the scratch dir and tear down the log writes target
+_log_writes_mark last
+_log_writes_unmount
+_log_writes_mark end
+_log_writes_remove
+_check_scratch_fs
+
+# check pre umount
+echo "checking pre umount" >> $seqres.full
+_log_writes_replay_log last
+_scratch_mount
+_scratch_unmount
+_check_scratch_fs
+
+for j in `seq 0 $((NUM_FILES-1))`
+do
+	check_files testfile$j
+done
+
+# Check the end
+echo "checking post umount" >> $seqres.full
+_log_writes_replay_log end
+_scratch_mount
+for j in `seq 0 $((NUM_FILES-1))`
+do
+	md5=$(_md5_checksum $SCRATCH_MNT/testfile$j)
+	[ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched"
+done
+_scratch_unmount
+_check_scratch_fs
+
+echo "Silence is golden"
+status=0
+exit
+
diff --git a/tests/generic/500.out b/tests/generic/500.out
new file mode 100644
index 0000000..883b2ca
--- /dev/null
+++ b/tests/generic/500.out
@@ -0,0 +1,2 @@ 
+QA output created by 500
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 1894de0..c857153 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -457,3 +457,4 @@ 
 452 auto quick
 453 auto quick dir
 454 auto quick attr
+500 auto log replay