diff mbox

generic/166: speed up on slow disks

Message ID 20171011231544.6746-1-david@fromorbit.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Chinner Oct. 11, 2017, 11:15 p.m. UTC
From: Dave Chinner <dchinner@redhat.com>

generic/166 is takes way too long to run on iscsi disks - over an *hour* on
flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.

The issue is that it takes 3-4s per file write cycle on slow disks, and it does
a thousand write cycles. The problem is taht reflink is so much faster than the
write cycle that it's doing many more snapshots on slow disks than fast
disks, and this slows it down even more.

e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
roughly one per file write. 20 minutes into the iscsi based test, it's only
done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
we're doing 30 snapshots a file write, not ~1.

Fix this by rate limiting snapshots to at most 1 per whole file write. This
reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
runtime from 3671s to just 311s.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 tests/generic/166 | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Darrick J. Wong Oct. 11, 2017, 11:41 p.m. UTC | #1
On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> 
> The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> a thousand write cycles. The problem is taht reflink is so much faster than the
> write cycle that it's doing many more snapshots on slow disks than fast
> disks, and this slows it down even more.
> 
> e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> roughly one per file write. 20 minutes into the iscsi based test, it's only
> done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> we're doing 30 snapshots a file write, not ~1.
> 
> Fix this by rate limiting snapshots to at most 1 per whole file write. This
> reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> runtime from 3671s to just 311s.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  tests/generic/166 | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/generic/166 b/tests/generic/166
> index 8600a133f2d3..9b53307b761c 100755
> --- a/tests/generic/166
> +++ b/tests/generic/166
> @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
>  
>  testdir=$SCRATCH_MNT/test-$seq
>  finished_file=/tmp/finished
> +do_snapshot=/tmp/snapshot
>  rm -rf $finished_file
>  mkdir $testdir
>  
> @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
>  _scratch_cycle_mount
>  
>  # Snapshot creator...
> +#
> +# We rate limit the snapshot creator to one snapshot per full file write.  this
> +# limits the runtime on slow devices, whilst not substantially reducing the the
> +# number of snapshots taken on fast devices.

Hmm... but the whole point of this test is to try to get a simultaneous
reflink and dio rewrite (of the source file) to collide and crash the
system.  Adding $do_snapshot would bias the race towards the beginning
of the dio rewrite, whereas the goal was to try to make it happen
anywhere.

Granted, with proper locking it really doesn't matter... in the early
days of rmap/reflink it was useful for shaking bugs out of the rmap
code.

As an alternative, how about we add another helper to (sleep 120; touch
$timeout_file) and then teach both loops to (test -f $timeout_file &&
break)?

--D

>  snappy() {
>  	n=0
>  	while [ ! -e $finished_file ]; do
> +		if [ ! -e $do_snapshot ]; then
> +			sleep 0.01
> +			continue;
> +		fi
>  		out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)"
>  		res=$?
>  		echo "$out" | grep -q "No space left" && break
>  		test -n "$out" && echo "$out"
>  		test $res -ne 0 && break
>  		n=$((n + 1))
> +		rm -f $do_snapshot
>  	done
>  }
>  
> @@ -84,6 +94,7 @@ echo "Snapshot a file undergoing directio rewrite"
>  snappy &
>  seq $nr_loops -1 0 | while read i; do
>  	_pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full
> +	touch $do_snapshot
>  done
>  touch $finished_file
>  wait
> -- 
> 2.15.0.rc0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 11, 2017, 11:58 p.m. UTC | #2
On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> > 
> > The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> > a thousand write cycles. The problem is taht reflink is so much faster than the
> > write cycle that it's doing many more snapshots on slow disks than fast
> > disks, and this slows it down even more.
> > 
> > e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> > roughly one per file write. 20 minutes into the iscsi based test, it's only
> > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> > we're doing 30 snapshots a file write, not ~1.
> > 
> > Fix this by rate limiting snapshots to at most 1 per whole file write. This
> > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> > runtime from 3671s to just 311s.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  tests/generic/166 | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/tests/generic/166 b/tests/generic/166
> > index 8600a133f2d3..9b53307b761c 100755
> > --- a/tests/generic/166
> > +++ b/tests/generic/166
> > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
> >  
> >  testdir=$SCRATCH_MNT/test-$seq
> >  finished_file=/tmp/finished
> > +do_snapshot=/tmp/snapshot
> >  rm -rf $finished_file
> >  mkdir $testdir
> >  
> > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
> >  _scratch_cycle_mount
> >  
> >  # Snapshot creator...
> > +#
> > +# We rate limit the snapshot creator to one snapshot per full file write.  this
> > +# limits the runtime on slow devices, whilst not substantially reducing the the
> > +# number of snapshots taken on fast devices.
> 
> Hmm... but the whole point of this test is to try to get a simultaneous
> reflink and dio rewrite (of the source file) to collide and crash the
> system.  Adding $do_snapshot would bias the race towards the beginning
> of the dio rewrite, whereas the goal was to try to make it happen
> anywhere.

As we discussed on IRC, that's still happens on faster devices
because the sleep time is longer than the file write time. On slow
devices, I don't think it really matters because the reflink race
windows are so small compared to the IO times...

> Granted, with proper locking it really doesn't matter... in the early
> days of rmap/reflink it was useful for shaking bugs out of the rmap
> code.

*nod*

> As an alternative, how about we add another helper to (sleep 120; touch
> $timeout_file) and then teach both loops to (test -f $timeout_file &&
> break)?

Not fussed. Unless there's some pressing reason biasing the
write/reflink collision to the start of the file is a problem, then
I'm happy with the 5m runtime this patch results in on these
machines...

Cheers,

Dave.
Darrick J. Wong Oct. 12, 2017, 1:32 a.m. UTC | #3
On Thu, Oct 12, 2017 at 10:58:23AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> > > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> > > 
> > > The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> > > a thousand write cycles. The problem is taht reflink is so much faster than the
> > > write cycle that it's doing many more snapshots on slow disks than fast
> > > disks, and this slows it down even more.
> > > 
> > > e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> > > roughly one per file write. 20 minutes into the iscsi based test, it's only
> > > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> > > we're doing 30 snapshots a file write, not ~1.
> > > 
> > > Fix this by rate limiting snapshots to at most 1 per whole file write. This
> > > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> > > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> > > runtime from 3671s to just 311s.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  tests/generic/166 | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/tests/generic/166 b/tests/generic/166
> > > index 8600a133f2d3..9b53307b761c 100755
> > > --- a/tests/generic/166
> > > +++ b/tests/generic/166
> > > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
> > >  
> > >  testdir=$SCRATCH_MNT/test-$seq
> > >  finished_file=/tmp/finished
> > > +do_snapshot=/tmp/snapshot
> > >  rm -rf $finished_file
> > >  mkdir $testdir
> > >  
> > > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
> > >  _scratch_cycle_mount
> > >  
> > >  # Snapshot creator...
> > > +#
> > > +# We rate limit the snapshot creator to one snapshot per full file write.  this
> > > +# limits the runtime on slow devices, whilst not substantially reducing the the
> > > +# number of snapshots taken on fast devices.
> > 
> > Hmm... but the whole point of this test is to try to get a simultaneous
> > reflink and dio rewrite (of the source file) to collide and crash the
> > system.  Adding $do_snapshot would bias the race towards the beginning
> > of the dio rewrite, whereas the goal was to try to make it happen
> > anywhere.
> 
> As we discussed on IRC, that's still happens on faster devices
> because the sleep time is longer than the file write time. On slow
> devices, I don't think it really matters because the reflink race
> windows are so small compared to the IO times...
> 
> > Granted, with proper locking it really doesn't matter... in the early
> > days of rmap/reflink it was useful for shaking bugs out of the rmap
> > code.
> 
> *nod*
> 
> > As an alternative, how about we add another helper to (sleep 120; touch
> > $timeout_file) and then teach both loops to (test -f $timeout_file &&
> > break)?
> 
> Not fussed. Unless there's some pressing reason biasing the
> write/reflink collision to the start of the file is a problem, then
> I'm happy with the 5m runtime this patch results in on these
> machines...

Not really.  In theory the two files should both be locked during the
reflink so it probably doesn't matter exactly when they race...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/generic/166 b/tests/generic/166
index 8600a133f2d3..9b53307b761c 100755
--- a/tests/generic/166
+++ b/tests/generic/166
@@ -55,6 +55,7 @@  _scratch_mount >> $seqres.full 2>&1
 
 testdir=$SCRATCH_MNT/test-$seq
 finished_file=/tmp/finished
+do_snapshot=/tmp/snapshot
 rm -rf $finished_file
 mkdir $testdir
 
@@ -68,15 +69,24 @@  _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
 _scratch_cycle_mount
 
 # Snapshot creator...
+#
+# We rate limit the snapshot creator to one snapshot per full file write.  this
+# limits the runtime on slow devices, whilst not substantially reducing the the
+# number of snapshots taken on fast devices.
 snappy() {
 	n=0
 	while [ ! -e $finished_file ]; do
+		if [ ! -e $do_snapshot ]; then
+			sleep 0.01
+			continue;
+		fi
 		out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)"
 		res=$?
 		echo "$out" | grep -q "No space left" && break
 		test -n "$out" && echo "$out"
 		test $res -ne 0 && break
 		n=$((n + 1))
+		rm -f $do_snapshot
 	done
 }
 
@@ -84,6 +94,7 @@  echo "Snapshot a file undergoing directio rewrite"
 snappy &
 seq $nr_loops -1 0 | while read i; do
 	_pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full
+	touch $do_snapshot
 done
 touch $finished_file
 wait