Message ID | 20230817051317.3825299-1-naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs/179: optimize remove file selection | expand |
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote: > Currently, we use "ls ... | sort -R | head -n1" to choose a removing > victim. It sorts the files with "ls", sort it randomly and pick the first > line, which wastes the "ls" sort. > > Also, using "sort -R | head -n1" is inefficient. For example, in a > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > So, just use "ls -U" and "shuf -n 1" to choose a victim. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote: > Currently, we use "ls ... | sort -R | head -n1" to choose a removing > victim. It sorts the files with "ls", sort it randomly and pick the first > line, which wastes the "ls" sort. > > Also, using "sort -R | head -n1" is inefficient. For example, in a > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > So, just use "ls -U" and "shuf -n 1" to choose a victim. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > tests/btrfs/179 | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/btrfs/179 b/tests/btrfs/179 > index 2f17c9f9fb4a..0fbd875cf01b 100755 > --- a/tests/btrfs/179 > +++ b/tests/btrfs/179 > @@ -45,7 +45,7 @@ fill_workload() > > # Randomly remove some files for every 5 loop > if [ $(( $i % 5 )) -eq 0 ]; then > - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) > + victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1) Thanks for this improvement. This case has two lines have this similar logic, Why not change them both? And btrfs/192 has a similar line too: $ grep -rsn -- "sort -R" tests tests/btrfs/179:48: victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) tests/btrfs/179:72: victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1) tests/btrfs/192:75: echo "$basedir/$(ls $basedir | sort -R | tail -1)" tests/btrfs/004:204: for file in `find $dir -name f\* -size +0 | sort -R`; do Do we need to change that too? And a common helper might help, if more cases would like to have this helper? Thanks, Zorro > rm "$SCRATCH_MNT/src/$victim" > fi > i=$((i + 1)) > -- > 2.41.0 >
On Sat, Aug 19, 2023 at 03:35:33AM +0800, Zorro Lang wrote: > On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote: > > Currently, we use "ls ... | sort -R | head -n1" to choose a removing > > victim. It sorts the files with "ls", sort it randomly and pick the first > > line, which wastes the "ls" sort. > > > > Also, using "sort -R | head -n1" is inefficient. For example, in a > > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > > > So, just use "ls -U" and "shuf -n 1" to choose a victim. > > > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > --- > > tests/btrfs/179 | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tests/btrfs/179 b/tests/btrfs/179 > > index 2f17c9f9fb4a..0fbd875cf01b 100755 > > --- a/tests/btrfs/179 > > +++ b/tests/btrfs/179 > > @@ -45,7 +45,7 @@ fill_workload() > > > > # Randomly remove some files for every 5 loop > > if [ $(( $i % 5 )) -eq 0 ]; then > > - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) > > + victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1) > > Thanks for this improvement. This case has two lines have this similar logic, > Why not change them both? Indeed. I forgot to pick it. > And btrfs/192 has a similar line too: > > $ grep -rsn -- "sort -R" tests > tests/btrfs/179:48: victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) > tests/btrfs/179:72: victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1) > tests/btrfs/192:75: echo "$basedir/$(ls $basedir | sort -R | tail -1)" > tests/btrfs/004:204: for file in `find $dir -name f\* -size +0 | sort -R`; do Yeah, we can use "shuf" there too. I found even the simple "sort -R" (without head or tail) is really slower than "shuf". I'll address them too. > > Do we need to change that too? And a common helper might help, if more cases > would like to have this helper? Yeah, maybe, we can define _random_file() as named in btrfs/192. > Thanks, > Zorro > > > rm "$SCRATCH_MNT/src/$victim" > > fi > > i=$((i + 1)) > > -- > > 2.41.0 > > >
diff --git a/tests/btrfs/179 b/tests/btrfs/179 index 2f17c9f9fb4a..0fbd875cf01b 100755 --- a/tests/btrfs/179 +++ b/tests/btrfs/179 @@ -45,7 +45,7 @@ fill_workload() # Randomly remove some files for every 5 loop if [ $(( $i % 5 )) -eq 0 ]; then - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) + victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1) rm "$SCRATCH_MNT/src/$victim" fi i=$((i + 1))
Currently, we use "ls ... | sort -R | head -n1" to choose a removing victim. It sorts the files with "ls", sort it randomly and pick the first line, which wastes the "ls" sort. Also, using "sort -R | head -n1" is inefficient. For example, in a directory with 1000000 files, it takes more than 15 seconds to pick a file. $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total $ time bash -c "ls -U | shuf -n 1 >/dev/null" bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total So, just use "ls -U" and "shuf -n 1" to choose a victim. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- tests/btrfs/179 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)