diff mbox series

[v3,03/10] generate-cmdlist.sh: spawn fewer processes

Message ID patch-v3-03.10-737cca59d99-20211105T135058Z-avarab@gmail.com (mailing list archive)
State Accepted
Commit 191eb491ed44b86c7ab4fe279c8ce47f030624b0
Headers show
Series generate-cmdlist.sh: make it (and "make") run faster | expand

Commit Message

Ævar Arnfjörð Bjarmason Nov. 5, 2021, 2:08 p.m. UTC
From: Johannes Sixt <j6t@kdbg.org>

The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.

Furthermore, use sort -u instead of sort | uniq to remove yet another
process.

[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:

$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):     371.3 ms ±  64.2 ms    [User: 430.4 ms, System: 72.5 ms]
  Range (min … max):   320.5 ms … 517.7 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):     489.9 ms ± 185.4 ms    [User: 724.7 ms, System: 141.3 ms]
  Range (min … max):   346.0 ms … 885.3 ms    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Nov. 5, 2021, 10:47 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index 5114f46680a..27367915611 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -11,15 +11,14 @@ command_list () {
>  
>  get_categories () {
>  	tr ' ' '\012' |
> -	grep -v '^$' |
> -	sort |
> -	uniq
> +	LC_ALL=C sort -u
>  }
>  
>  category_list () {
>  	command_list "$1" |
>  	cut -c 40- |
> -	get_categories
> +	get_categories |
> +	grep -v '^$'
>  }

It is funny that this changes "grep then sort" into "sort then
grep", which will be "corrected" in two steps down.  The series
seems a bit over-engineered and broken down too much, at least to
me, but let's not waste any more time on it by an extra reroll.
Ævar Arnfjörð Bjarmason Nov. 6, 2021, 4:23 a.m. UTC | #2
On Fri, Nov 05 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
>> index 5114f46680a..27367915611 100755
>> --- a/generate-cmdlist.sh
>> +++ b/generate-cmdlist.sh
>> @@ -11,15 +11,14 @@ command_list () {
>>  
>>  get_categories () {
>>  	tr ' ' '\012' |
>> -	grep -v '^$' |
>> -	sort |
>> -	uniq
>> +	LC_ALL=C sort -u
>>  }
>>  
>>  category_list () {
>>  	command_list "$1" |
>>  	cut -c 40- |
>> -	get_categories
>> +	get_categories |
>> +	grep -v '^$'
>>  }
>
> It is funny that this changes "grep then sort" into "sort then
> grep", which will be "corrected" in two steps down.  The series
> seems a bit over-engineered and broken down too much, at least to
> me, but let's not waste any more time on it by an extra reroll.

Yes, it's a bit of back and forth, but I didn't want to outright drop
Johannes's patches which I'd integrated here, and thought it would be
helpful to others to distill the history of various optimization steps
(starting with Johannes's work here) into the permanent commit history.
diff mbox series

Patch

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 5114f46680a..27367915611 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -11,15 +11,14 @@  command_list () {
 
 get_categories () {
 	tr ' ' '\012' |
-	grep -v '^$' |
-	sort |
-	uniq
+	LC_ALL=C sort -u
 }
 
 category_list () {
 	command_list "$1" |
 	cut -c 40- |
-	get_categories
+	get_categories |
+	grep -v '^$'
 }
 
 get_synopsis () {