Message ID | 20171011231544.6746-1-david@fromorbit.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > generic/166 is takes way too long to run on iscsi disks - over an *hour* on > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device. > > The issue is that it takes 3-4s per file write cycle on slow disks, and it does > a thousand write cycles. The problem is taht reflink is so much faster than the > write cycle that it's doing many more snapshots on slow disks than fast > disks, and this slows it down even more. > > e.g. the pmem system that takes 18s to run does just under 1000 snapshots - > roughly one per file write. 20 minutes into the iscsi based test, it's only > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs, > we're doing 30 snapshots a file write, not ~1. > > Fix this by rate limiting snapshots to at most 1 per whole file write. This > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced > runtime from 3671s to just 311s. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > --- > tests/generic/166 | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tests/generic/166 b/tests/generic/166 > index 8600a133f2d3..9b53307b761c 100755 > --- a/tests/generic/166 > +++ b/tests/generic/166 > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1 > > testdir=$SCRATCH_MNT/test-$seq > finished_file=/tmp/finished > +do_snapshot=/tmp/snapshot > rm -rf $finished_file > mkdir $testdir > > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full > _scratch_cycle_mount > > # Snapshot creator... > +# > +# We rate limit the snapshot creator to one snapshot per full file write. this > +# limits the runtime on slow devices, whilst not substantially reducing the the > +# number of snapshots taken on fast devices. Hmm... but the whole point of this test is to try to get a simultaneous reflink and dio rewrite (of the source file) to collide and crash the system. Adding $do_snapshot would bias the race towards the beginning of the dio rewrite, whereas the goal was to try to make it happen anywhere. Granted, with proper locking it really doesn't matter... in the early days of rmap/reflink it was useful for shaking bugs out of the rmap code. As an alternative, how about we add another helper to (sleep 120; touch $timeout_file) and then teach both loops to (test -f $timeout_file && break)? --D > snappy() { > n=0 > while [ ! -e $finished_file ]; do > + if [ ! -e $do_snapshot ]; then > + sleep 0.01 > + continue; > + fi > out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)" > res=$? > echo "$out" | grep -q "No space left" && break > test -n "$out" && echo "$out" > test $res -ne 0 && break > n=$((n + 1)) > + rm -f $do_snapshot > done > } > > @@ -84,6 +94,7 @@ echo "Snapshot a file undergoing directio rewrite" > snappy & > seq $nr_loops -1 0 | while read i; do > _pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full > + touch $do_snapshot > done > touch $finished_file > wait > -- > 2.15.0.rc0 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote: > On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > generic/166 is takes way too long to run on iscsi disks - over an *hour* on > > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device. > > > > The issue is that it takes 3-4s per file write cycle on slow disks, and it does > > a thousand write cycles. The problem is taht reflink is so much faster than the > > write cycle that it's doing many more snapshots on slow disks than fast > > disks, and this slows it down even more. > > > > e.g. the pmem system that takes 18s to run does just under 1000 snapshots - > > roughly one per file write. 20 minutes into the iscsi based test, it's only > > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs, > > we're doing 30 snapshots a file write, not ~1. > > > > Fix this by rate limiting snapshots to at most 1 per whole file write. This > > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem > > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced > > runtime from 3671s to just 311s. > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > --- > > tests/generic/166 | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/tests/generic/166 b/tests/generic/166 > > index 8600a133f2d3..9b53307b761c 100755 > > --- a/tests/generic/166 > > +++ b/tests/generic/166 > > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1 > > > > testdir=$SCRATCH_MNT/test-$seq > > finished_file=/tmp/finished > > +do_snapshot=/tmp/snapshot > > rm -rf $finished_file > > mkdir $testdir > > > > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full > > _scratch_cycle_mount > > > > # Snapshot creator... > > +# > > +# We rate limit the snapshot creator to one snapshot per full file write. this > > +# limits the runtime on slow devices, whilst not substantially reducing the the > > +# number of snapshots taken on fast devices. > > Hmm... but the whole point of this test is to try to get a simultaneous > reflink and dio rewrite (of the source file) to collide and crash the > system. Adding $do_snapshot would bias the race towards the beginning > of the dio rewrite, whereas the goal was to try to make it happen > anywhere. As we discussed on IRC, that's still happens on faster devices because the sleep time is longer than the file write time. On slow devices, I don't think it really matters because the reflink race windows are so small compared to the IO times... > Granted, with proper locking it really doesn't matter... in the early > days of rmap/reflink it was useful for shaking bugs out of the rmap > code. *nod* > As an alternative, how about we add another helper to (sleep 120; touch > $timeout_file) and then teach both loops to (test -f $timeout_file && > break)? Not fussed. Unless there's some pressing reason biasing the write/reflink collision to the start of the file is a problem, then I'm happy with the 5m runtime this patch results in on these machines... Cheers, Dave.
On Thu, Oct 12, 2017 at 10:58:23AM +1100, Dave Chinner wrote: > On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote: > > On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > generic/166 is takes way too long to run on iscsi disks - over an *hour* on > > > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device. > > > > > > The issue is that it takes 3-4s per file write cycle on slow disks, and it does > > > a thousand write cycles. The problem is taht reflink is so much faster than the > > > write cycle that it's doing many more snapshots on slow disks than fast > > > disks, and this slows it down even more. > > > > > > e.g. the pmem system that takes 18s to run does just under 1000 snapshots - > > > roughly one per file write. 20 minutes into the iscsi based test, it's only > > > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs, > > > we're doing 30 snapshots a file write, not ~1. > > > > > > Fix this by rate limiting snapshots to at most 1 per whole file write. This > > > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem > > > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced > > > runtime from 3671s to just 311s. > > > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > > --- > > > tests/generic/166 | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/tests/generic/166 b/tests/generic/166 > > > index 8600a133f2d3..9b53307b761c 100755 > > > --- a/tests/generic/166 > > > +++ b/tests/generic/166 > > > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1 > > > > > > testdir=$SCRATCH_MNT/test-$seq > > > finished_file=/tmp/finished > > > +do_snapshot=/tmp/snapshot > > > rm -rf $finished_file > > > mkdir $testdir > > > > > > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full > > > _scratch_cycle_mount > > > > > > # Snapshot creator... > > > +# > > > +# We rate limit the snapshot creator to one snapshot per full file write. this > > > +# limits the runtime on slow devices, whilst not substantially reducing the the > > > +# number of snapshots taken on fast devices. > > > > Hmm... but the whole point of this test is to try to get a simultaneous > > reflink and dio rewrite (of the source file) to collide and crash the > > system. Adding $do_snapshot would bias the race towards the beginning > > of the dio rewrite, whereas the goal was to try to make it happen > > anywhere. > > As we discussed on IRC, that's still happens on faster devices > because the sleep time is longer than the file write time. On slow > devices, I don't think it really matters because the reflink race > windows are so small compared to the IO times... > > > Granted, with proper locking it really doesn't matter... in the early > > days of rmap/reflink it was useful for shaking bugs out of the rmap > > code. > > *nod* > > > As an alternative, how about we add another helper to (sleep 120; touch > > $timeout_file) and then teach both loops to (test -f $timeout_file && > > break)? > > Not fussed. Unless there's some pressing reason biasing the > write/reflink collision to the start of the file is a problem, then > I'm happy with the 5m runtime this patch results in on these > machines... Not really. In theory the two files should both be locked during the reflink so it probably doesn't matter exactly when they race... Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tests/generic/166 b/tests/generic/166 index 8600a133f2d3..9b53307b761c 100755 --- a/tests/generic/166 +++ b/tests/generic/166 @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1 testdir=$SCRATCH_MNT/test-$seq finished_file=/tmp/finished +do_snapshot=/tmp/snapshot rm -rf $finished_file mkdir $testdir @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full _scratch_cycle_mount # Snapshot creator... +# +# We rate limit the snapshot creator to one snapshot per full file write. this +# limits the runtime on slow devices, whilst not substantially reducing the the +# number of snapshots taken on fast devices. snappy() { n=0 while [ ! -e $finished_file ]; do + if [ ! -e $do_snapshot ]; then + sleep 0.01 + continue; + fi out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)" res=$? echo "$out" | grep -q "No space left" && break test -n "$out" && echo "$out" test $res -ne 0 && break n=$((n + 1)) + rm -f $do_snapshot done } @@ -84,6 +94,7 @@ echo "Snapshot a file undergoing directio rewrite" snappy & seq $nr_loops -1 0 | while read i; do _pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full + touch $do_snapshot done touch $finished_file wait