diff mbox series

btrfs/179: optimize remove file selection

Message ID 20230817051317.3825299-1-naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs/179: optimize remove file selection | expand

Commit Message

Naohiro Aota Aug. 17, 2023, 5:13 a.m. UTC
Currently, we use "ls ... | sort -R | head -n1" to choose a removing
victim. It sorts the files with "ls", sort it randomly and pick the first
line, which wastes the "ls" sort.

Also, using "sort -R | head -n1" is inefficient. For example, in a
directory with 1000000 files, it takes more than 15 seconds to pick a file.

  $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
  bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total

  $ time bash -c "ls -U | shuf -n 1 >/dev/null"
  bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total

So, just use "ls -U" and "shuf -n 1" to choose a victim.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 tests/btrfs/179 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Josef Bacik Aug. 17, 2023, 12:58 p.m. UTC | #1
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> victim. It sorts the files with "ls", sort it randomly and pick the first
> line, which wastes the "ls" sort.
> 
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
> 
>   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
>   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> 
>   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
>   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> 
> So, just use "ls -U" and "shuf -n 1" to choose a victim.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Zorro Lang Aug. 18, 2023, 7:35 p.m. UTC | #2
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> victim. It sorts the files with "ls", sort it randomly and pick the first
> line, which wastes the "ls" sort.
> 
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
> 
>   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
>   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> 
>   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
>   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> 
> So, just use "ls -U" and "shuf -n 1" to choose a victim.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  tests/btrfs/179 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/btrfs/179 b/tests/btrfs/179
> index 2f17c9f9fb4a..0fbd875cf01b 100755
> --- a/tests/btrfs/179
> +++ b/tests/btrfs/179
> @@ -45,7 +45,7 @@ fill_workload()
>  
>  		# Randomly remove some files for every 5 loop
>  		if [ $(( $i % 5 )) -eq 0 ]; then
> -			victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> +			victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)

Thanks for this improvement. This case has two lines have this similar logic,
Why not change them both?

And btrfs/192 has a similar line too:

$ grep -rsn -- "sort -R" tests
tests/btrfs/179:48:                     victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
tests/btrfs/179:72:             victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1)
tests/btrfs/192:75:     echo "$basedir/$(ls $basedir | sort -R | tail -1)"
tests/btrfs/004:204:    for file in `find $dir -name f\* -size +0 | sort -R`; do

Do we need to change that too? And a common helper might help, if more cases
would like to have this helper?

Thanks,
Zorro

>  			rm "$SCRATCH_MNT/src/$victim"
>  		fi
>  		i=$((i + 1))
> -- 
> 2.41.0
>
Naohiro Aota Aug. 21, 2023, 5:17 a.m. UTC | #3
On Sat, Aug 19, 2023 at 03:35:33AM +0800, Zorro Lang wrote:
> On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> > Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> > victim. It sorts the files with "ls", sort it randomly and pick the first
> > line, which wastes the "ls" sort.
> > 
> > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > 
> >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > 
> >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > 
> > So, just use "ls -U" and "shuf -n 1" to choose a victim.
> > 
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  tests/btrfs/179 | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/btrfs/179 b/tests/btrfs/179
> > index 2f17c9f9fb4a..0fbd875cf01b 100755
> > --- a/tests/btrfs/179
> > +++ b/tests/btrfs/179
> > @@ -45,7 +45,7 @@ fill_workload()
> >  
> >  		# Randomly remove some files for every 5 loop
> >  		if [ $(( $i % 5 )) -eq 0 ]; then
> > -			victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> > +			victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)
> 
> Thanks for this improvement. This case has two lines have this similar logic,
> Why not change them both?

Indeed. I forgot to pick it.

> And btrfs/192 has a similar line too:
> 
> $ grep -rsn -- "sort -R" tests
> tests/btrfs/179:48:                     victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> tests/btrfs/179:72:             victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1)
> tests/btrfs/192:75:     echo "$basedir/$(ls $basedir | sort -R | tail -1)"
> tests/btrfs/004:204:    for file in `find $dir -name f\* -size +0 | sort -R`; do

Yeah, we can use "shuf" there too. I found even the simple "sort -R"
(without head or tail) is really slower than "shuf". I'll address them too.

> 
> Do we need to change that too? And a common helper might help, if more cases
> would like to have this helper?

Yeah, maybe, we can define _random_file() as named in btrfs/192.

> Thanks,
> Zorro
> 
> >  			rm "$SCRATCH_MNT/src/$victim"
> >  		fi
> >  		i=$((i + 1))
> > -- 
> > 2.41.0
> > 
>
diff mbox series

Patch

diff --git a/tests/btrfs/179 b/tests/btrfs/179
index 2f17c9f9fb4a..0fbd875cf01b 100755
--- a/tests/btrfs/179
+++ b/tests/btrfs/179
@@ -45,7 +45,7 @@  fill_workload()
 
 		# Randomly remove some files for every 5 loop
 		if [ $(( $i % 5 )) -eq 0 ]; then
-			victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
+			victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)
 			rm "$SCRATCH_MNT/src/$victim"
 		fi
 		i=$((i + 1))