diff mbox series

[v3,1/3] common/rc: introduce _random_file() helper

Message ID 63147107b1aee89c21ef848857e0dc3964134392.1692600259.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series use shuf to choose a random file | expand

Commit Message

Naohiro Aota Aug. 21, 2023, 7:12 a.m. UTC
Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
random file in a directory.It sorts the files with "ls", sort it randomly
and pick the first line, which wastes the "ls" sort.

Also, using "sort -R | head -n1" is inefficient. For example, in a
directory with 1000000 files, it takes more than 15 seconds to pick a file.

  $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
  bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total

  $ time bash -c "ls -U | shuf -n 1 >/dev/null"
  bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total

So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
Introduce _random_file() helper to do it properly.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/rc | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Anand Jain Aug. 21, 2023, 9:24 a.m. UTC | #1
LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Zorro Lang Aug. 25, 2023, 1:39 p.m. UTC | #2
On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> random file in a directory.It sorts the files with "ls", sort it randomly
> and pick the first line, which wastes the "ls" sort.
> 
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
> 
>   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
>   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> 
>   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
>   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> 
> So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> Introduce _random_file() helper to do it properly.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  common/rc | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 5c4429ed0425..4d414955f6d9 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -5224,6 +5224,13 @@ _soak_loop_running() {
>  	return 0
>  }
>  
> +# Return a random file in a directory. A directory is *not* followed
> +# recursively.
> +_random_file() {
> +	local basedir=$1
> +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"

I think the "1" can be the second argument, for we might want to get a random
file list sometimes. For example:

  local basedir=$1
  local num=$2
  local opt

  if [ -n "$num" ];then
	  opt="-n $num"
  fi
  echo "$basedir/$(ls -U $basedir | shuf $opt)"

What do you think?

Thanks,
Zorro

> +}
> +
>  init_rc
>  
>  ################################################################################
> -- 
> 2.41.0
>
Zorro Lang Aug. 25, 2023, 2 p.m. UTC | #3
On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote:
> On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> > random file in a directory.It sorts the files with "ls", sort it randomly
> > and pick the first line, which wastes the "ls" sort.
> > 
> > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > 
> >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > 
> >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > 
> > So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> > Introduce _random_file() helper to do it properly.
> > 
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  common/rc | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index 5c4429ed0425..4d414955f6d9 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -5224,6 +5224,13 @@ _soak_loop_running() {
> >  	return 0
> >  }
> >  
> > +# Return a random file in a directory. A directory is *not* followed
> > +# recursively.
> > +_random_file() {
> > +	local basedir=$1
> > +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
> 
> I think the "1" can be the second argument, for we might want to get a random
> file list sometimes. For example:
> 
>   local basedir=$1
>   local num=$2
>   local opt
> 
>   if [ -n "$num" ];then
> 	  opt="-n $num"
>   fi
>   echo "$basedir/$(ls -U $basedir | shuf $opt)"
> 
> What do you think?

Hmm... nack my review point. Looks like this makes a simple change to be
complicated, especially multiple output lines. I'll merge this patchset
at first, then we can support that second argument If we need that
feature in one day. Or if you're interested in it.

Thanks,
Zorro

> 
> Thanks,
> Zorro
> 
> > +}
> > +
> >  init_rc
> >  
> >  ################################################################################
> > -- 
> > 2.41.0
> >
Naohiro Aota Aug. 28, 2023, 1:37 a.m. UTC | #4
On Fri, Aug 25, 2023 at 10:00:09PM +0800, Zorro Lang wrote:
> On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote:
> > On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> > > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> > > random file in a directory.It sorts the files with "ls", sort it randomly
> > > and pick the first line, which wastes the "ls" sort.
> > > 
> > > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > > 
> > >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> > >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > > 
> > >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> > >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > > 
> > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> > > Introduce _random_file() helper to do it properly.
> > > 
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > >  common/rc | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 5c4429ed0425..4d414955f6d9 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -5224,6 +5224,13 @@ _soak_loop_running() {
> > >  	return 0
> > >  }
> > >  
> > > +# Return a random file in a directory. A directory is *not* followed
> > > +# recursively.
> > > +_random_file() {
> > > +	local basedir=$1
> > > +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
> > 
> > I think the "1" can be the second argument, for we might want to get a random
> > file list sometimes. For example:
> > 
> >   local basedir=$1
> >   local num=$2
> >   local opt
> > 
> >   if [ -n "$num" ];then
> > 	  opt="-n $num"
> >   fi
> >   echo "$basedir/$(ls -U $basedir | shuf $opt)"
> > 
> > What do you think?
> 
> Hmm... nack my review point. Looks like this makes a simple change to be
> complicated, especially multiple output lines. I'll merge this patchset
> at first, then we can support that second argument If we need that
> feature in one day. Or if you're interested in it.

Yeah, I think so too. Currently, we don't have "shuf -n N" usage where N >
1. Also, while there is a plain "shuf" usage, it is used with "find".

So, I agree we can extent the function when we need it.

> 
> Thanks,
> Zorro
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +}
> > > +
> > >  init_rc
> > >  
> > >  ################################################################################
> > > -- 
> > > 2.41.0
> > > 
>
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index 5c4429ed0425..4d414955f6d9 100644
--- a/common/rc
+++ b/common/rc
@@ -5224,6 +5224,13 @@  _soak_loop_running() {
 	return 0
 }
 
+# Return a random file in a directory. A directory is *not* followed
+# recursively.
+_random_file() {
+	local basedir=$1
+	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
+}
+
 init_rc
 
 ################################################################################