Message ID | 173197064562.904310.6083759089693476713.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/12] generic/757: fix various bugs in this test | expand |
On Mon, Nov 18, 2024 at 03:03:43PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > On my test fleet, this test can run for well in excess of 20 minutes: > > 613 generic/251 > 616 generic/251 > 624 generic/251 > 630 generic/251 > 634 generic/251 > 652 generic/251 > 675 generic/251 > 749 generic/251 > 777 generic/251 > 808 generic/251 > 832 generic/251 > 946 generic/251 > 1082 generic/251 > 1221 generic/251 > 1241 generic/251 > 1254 generic/251 > 1305 generic/251 > 1366 generic/251 > 1646 generic/251 > 1936 generic/251 > 1952 generic/251 > 2358 generic/251 > 4359 generic/251 > 5325 generic/251 > 34046 generic/251 > > because it hardcodes 20 threads and 10 copies. It's not great to have a > test that results in a significant fraction of the total test runtime. > Fix the looping and load on this test to use LOAD and TIME_FACTOR to > scale up its operations, along with the usual SOAK_DURATION override. > That brings the default runtime down to less than a minute. > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Question for you: Does your $here directory contain a .git subdir? One of the causes of long runtime for me has been that $here might only contain 30MB of files, but the .git subdir balloons to several hundred MB over time, resulting is really long runtimes because it's copying GBs of data from the .git subdir. I have this patch in my tree: --- a/tests/generic/251 +++ b/tests/generic/251 @@ -175,9 +175,12 @@ nproc=20 # Copy $here to the scratch fs and make coipes of the replica. The fstests # output (and hence $seqres.full) could be in $here, so we need to snapshot # $here before computing file checksums. +# +# $here/* as the files to copy so we avoid any .git directory that might be +# much, much larger than the rest of the fstests source tree we are copying. content=$SCRATCH_MNT/orig mkdir -p $content -cp -axT $here/ $content/ +cp -ax $here/* $content/ mkdir -p $tmp And that's made the runtime drop from (typically) 10-15 minutes down to around 5 minutes.... Does this have any impact on the runtime on your test systems? -Dave.
On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > Question for you: Does your $here directory contain a .git subdir? > > One of the causes of long runtime for me has been that $here might > only contain 30MB of files, but the .git subdir balloons to several > hundred MB over time, resulting is really long runtimes because it's > copying GBs of data from the .git subdir. Or the results/ directory when run in a persistent test VM like the one for quick runs on my laptop. I currently need to persistently purge that for just this test. > > I have this patch in my tree: > > --- a/tests/generic/251 > +++ b/tests/generic/251 > @@ -175,9 +175,12 @@ nproc=20 > # Copy $here to the scratch fs and make coipes of the replica. The fstests > # output (and hence $seqres.full) could be in $here, so we need to snapshot > # $here before computing file checksums. > +# > +# $here/* as the files to copy so we avoid any .git directory that might be > +# much, much larger than the rest of the fstests source tree we are copying. > content=$SCRATCH_MNT/orig > mkdir -p $content > -cp -axT $here/ $content/ > +cp -ax $here/* $content/ Maybe we just need a way to generate more predictable file system content?
On Mon, Nov 18, 2024 at 10:13:23PM -0800, Christoph Hellwig wrote: > On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > > Question for you: Does your $here directory contain a .git subdir? > > > > One of the causes of long runtime for me has been that $here might > > only contain 30MB of files, but the .git subdir balloons to several > > hundred MB over time, resulting is really long runtimes because it's > > copying GBs of data from the .git subdir. > > Or the results/ directory when run in a persistent test VM like the > one for quick runs on my laptop. I currently need to persistently > purge that for just this test. > > > > > I have this patch in my tree: > > > > --- a/tests/generic/251 > > +++ b/tests/generic/251 > > @@ -175,9 +175,12 @@ nproc=20 > > # Copy $here to the scratch fs and make coipes of the replica. The fstests > > # output (and hence $seqres.full) could be in $here, so we need to snapshot > > # $here before computing file checksums. > > +# > > +# $here/* as the files to copy so we avoid any .git directory that might be > > +# much, much larger than the rest of the fstests source tree we are copying. > > content=$SCRATCH_MNT/orig > > mkdir -p $content > > -cp -axT $here/ $content/ > > +cp -ax $here/* $content/ > > Maybe we just need a way to generate more predictable file system > content? How about running fsstress for ~50000ops or so, to generate some test files and directory tree? --D
On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > On Mon, Nov 18, 2024 at 03:03:43PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > On my test fleet, this test can run for well in excess of 20 minutes: > > > > 613 generic/251 > > 616 generic/251 > > 624 generic/251 > > 630 generic/251 > > 634 generic/251 > > 652 generic/251 > > 675 generic/251 > > 749 generic/251 > > 777 generic/251 > > 808 generic/251 > > 832 generic/251 > > 946 generic/251 > > 1082 generic/251 > > 1221 generic/251 > > 1241 generic/251 > > 1254 generic/251 > > 1305 generic/251 > > 1366 generic/251 > > 1646 generic/251 > > 1936 generic/251 > > 1952 generic/251 > > 2358 generic/251 > > 4359 generic/251 > > 5325 generic/251 > > 34046 generic/251 > > > > because it hardcodes 20 threads and 10 copies. It's not great to have a > > test that results in a significant fraction of the total test runtime. > > Fix the looping and load on this test to use LOAD and TIME_FACTOR to > > scale up its operations, along with the usual SOAK_DURATION override. > > That brings the default runtime down to less than a minute. > > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > Question for you: Does your $here directory contain a .git subdir? > > One of the causes of long runtime for me has been that $here might > only contain 30MB of files, but the .git subdir balloons to several > hundred MB over time, resulting is really long runtimes because it's > copying GBs of data from the .git subdir. > > I have this patch in my tree: > > --- a/tests/generic/251 > +++ b/tests/generic/251 > @@ -175,9 +175,12 @@ nproc=20 > # Copy $here to the scratch fs and make coipes of the replica. The fstests > # output (and hence $seqres.full) could be in $here, so we need to snapshot > # $here before computing file checksums. > +# > +# $here/* as the files to copy so we avoid any .git directory that might be > +# much, much larger than the rest of the fstests source tree we are copying. > content=$SCRATCH_MNT/orig > mkdir -p $content > -cp -axT $here/ $content/ > +cp -ax $here/* $content/ > > mkdir -p $tmp > > And that's made the runtime drop from (typically) 10-15 minutes > down to around 5 minutes.... > > Does this have any impact on the runtime on your test systems? Nope, I do vpath builds (sort of) so there's no .git history getting sucked up by generic/251. The fstests directory on the test VMs is ~34MB spread across ~4800 files. --D > -Dave. > > -- > Dave Chinner > david@fromorbit.com >
On Tue, Nov 19, 2024 at 07:45:20AM -0800, Darrick J. Wong wrote: > On Mon, Nov 18, 2024 at 10:13:23PM -0800, Christoph Hellwig wrote: > > On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > > > Question for you: Does your $here directory contain a .git subdir? > > > > > > One of the causes of long runtime for me has been that $here might > > > only contain 30MB of files, but the .git subdir balloons to several > > > hundred MB over time, resulting is really long runtimes because it's > > > copying GBs of data from the .git subdir. > > > > Or the results/ directory when run in a persistent test VM like the > > one for quick runs on my laptop. I currently need to persistently > > purge that for just this test. Yeah, I use persistent VMs and that's why the .git dir grows... > > > --- a/tests/generic/251 > > > +++ b/tests/generic/251 > > > @@ -175,9 +175,12 @@ nproc=20 > > > # Copy $here to the scratch fs and make coipes of the replica. The fstests > > > # output (and hence $seqres.full) could be in $here, so we need to snapshot > > > # $here before computing file checksums. > > > +# > > > +# $here/* as the files to copy so we avoid any .git directory that might be > > > +# much, much larger than the rest of the fstests source tree we are copying. > > > content=$SCRATCH_MNT/orig > > > mkdir -p $content > > > -cp -axT $here/ $content/ > > > +cp -ax $here/* $content/ > > > > Maybe we just need a way to generate more predictable file system > > content? > > How about running fsstress for ~50000ops or so, to generate some test > files and directory tree? Do we even need to do that? It's a set of small files distributed over a few directories. There are few large files in the mix, so we could just create a heap of 1-4 block files across a dozen or so directories and get the same sort of data set to copy. And given this observation, if we are generating the data set in the first place, why use cp to copy it every time? Why not just have each thread generate the data set on the fly? # create a directory structure with numdirs directories and numfiles # files per directory. Files are 0-3 blocks in length, space is # allocated by fallocate to avoid needing to write data. Files are # created concurrently across directories to create the data set as # fast as possible. create_files() { local numdirs=$1 local numfiles=$2 local basedir=$3 for ((i=0; i<$numdirs; i++)); do mkdir -p $basedir/$i for ((j=0; j<$numfiles; j++); do local len=$((RANDOM % 4)) $XFS_IO_PROG -fc "falloc 0 ${len}b" $basedir/$i/$j done & done wait } -Dave
On Wed, Nov 20, 2024 at 08:04:43AM +1100, Dave Chinner wrote: > On Tue, Nov 19, 2024 at 07:45:20AM -0800, Darrick J. Wong wrote: > > On Mon, Nov 18, 2024 at 10:13:23PM -0800, Christoph Hellwig wrote: > > > On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > > > > Question for you: Does your $here directory contain a .git subdir? > > > > > > > > One of the causes of long runtime for me has been that $here might > > > > only contain 30MB of files, but the .git subdir balloons to several > > > > hundred MB over time, resulting is really long runtimes because it's > > > > copying GBs of data from the .git subdir. > > > > > > Or the results/ directory when run in a persistent test VM like the > > > one for quick runs on my laptop. I currently need to persistently > > > purge that for just this test. > > Yeah, I use persistent VMs and that's why the .git dir grows... > > > > > --- a/tests/generic/251 > > > > +++ b/tests/generic/251 > > > > @@ -175,9 +175,12 @@ nproc=20 > > > > # Copy $here to the scratch fs and make coipes of the replica. The fstests > > > > # output (and hence $seqres.full) could be in $here, so we need to snapshot > > > > # $here before computing file checksums. > > > > +# > > > > +# $here/* as the files to copy so we avoid any .git directory that might be > > > > +# much, much larger than the rest of the fstests source tree we are copying. > > > > content=$SCRATCH_MNT/orig > > > > mkdir -p $content > > > > -cp -axT $here/ $content/ > > > > +cp -ax $here/* $content/ > > > > > > Maybe we just need a way to generate more predictable file system > > > content? > > > > How about running fsstress for ~50000ops or so, to generate some test > > files and directory tree? > > Do we even need to do that? It's a set of small files distributed > over a few directories. There are few large files in the mix, so we > could just create a heap of 1-4 block files across a dozen or so > directories and get the same sort of data set to copy. > > And given this observation, if we are generating the data set in the > first place, why use cp to copy it every time? Why not just have > each thread generate the data set on the fly? run_process compares the copies to the original to try to discover places where written blocks got discarded, so they actually do need to be copies. /me suspects that this test is kinda bogus if the block device doesn't set discard_zeroes_data because it won't trip on discard errors for crappy sata ssds that don't actually clear the remapping tables until minutes later. --D > # create a directory structure with numdirs directories and numfiles > # files per directory. Files are 0-3 blocks in length, space is > # allocated by fallocate to avoid needing to write data. Files are > # created concurrently across directories to create the data set as > # fast as possible. > create_files() > { > local numdirs=$1 > local numfiles=$2 > local basedir=$3 > > for ((i=0; i<$numdirs; i++)); do > mkdir -p $basedir/$i > for ((j=0; j<$numfiles; j++); do > local len=$((RANDOM % 4)) > $XFS_IO_PROG -fc "falloc 0 ${len}b" $basedir/$i/$j > done & > done > wait > } > > -Dave > > -- > Dave Chinner > david@fromorbit.com
diff --git a/tests/generic/251 b/tests/generic/251 index d59e91c3e0a33a..b4ddda10cef403 100755 --- a/tests/generic/251 +++ b/tests/generic/251 @@ -15,7 +15,6 @@ _begin_fstest ioctl trim auto tmp=`mktemp -d` trap "_cleanup; exit \$status" 0 1 3 trap "_destroy; exit \$status" 2 15 -chpid=0 mypid=$$ # Import common functions. @@ -151,29 +150,28 @@ function check_sums() { function run_process() { local p=$1 - repeat=10 + if [ -n "$SOAK_DURATION" ]; then + local duration="$SOAK_DURATION" + else + local duration="$((30 * TIME_FACTOR))" + fi + local stopat="$(( $(date +%s) + duration))" - sleep $((5*$p))s & - export chpid=$! && wait $chpid &> /dev/null - chpid=0 - - while [ $repeat -gt 0 ]; do + sleep $((5*$p))s + while [ "$(date +%s)" -lt "$stopat" ]; do # Remove old directories. rm -rf $SCRATCH_MNT/$p - export chpid=$! && wait $chpid &> /dev/null # Copy content -> partition. mkdir $SCRATCH_MNT/$p cp -axT $content/ $SCRATCH_MNT/$p/ - export chpid=$! && wait $chpid &> /dev/null check_sums - repeat=$(( $repeat - 1 )) done } -nproc=20 +nproc=$((4 * LOAD_FACTOR)) # Copy $here to the scratch fs and make coipes of the replica. The fstests # output (and hence $seqres.full) could be in $here, so we need to snapshot @@ -194,11 +192,9 @@ pids="" echo run > $tmp.fstrim_loop fstrim_loop & fstrim_pid=$! -p=1 -while [ $p -le $nproc ]; do +for ((p = 1; p < nproc; p++)); do run_process $p & pids="$pids $!" - p=$(($p+1)) done echo "done."