diff mbox series

[6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Message ID patch-6.8-e0b11514b8d-20211020T183533Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series Makefile: make command-list.h 2-5x as fast with -jN | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 20, 2021, 6:39 p.m. UTC
From: Johannes Sixt <j6t@kdbg.org>

This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 generate-cmdlist.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Jeff King Oct. 21, 2021, 2:42 p.m. UTC | #1
On Wed, Oct 20, 2021 at 08:39:57PM +0200, Ævar Arnfjörð Bjarmason wrote:

> From: Johannes Sixt <j6t@kdbg.org>
> 
> This is just a small code reduction. There is a small probability that
> the new code breaks when the category list is empty. But that would be
> noticed during the compile step.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  generate-cmdlist.sh | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index e517c33710a..a1ab2b1f077 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -67,10 +67,7 @@ print_command_list () {
>  	while read cmd rest
>  	do
>  		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		for cat in $(echo "$rest" | get_category_line)
> -		do
> -			printf " | CAT_$cat"
> -		done
> +		printf " | CAT_%s" $(echo "$rest" | get_category_line)
>  		echo " },"

I think this is fine, but regardless of what happens in patch 7, it's
probably worth dropping this get_category_line call. All it does is sort
and de-dup the tokens in $rest, but we don't care because we're just
OR-ing them together. And of the 3 processes spawned by each loop, it is
responsible for 2 of them.

Even if this loop is broken out into individual bits of Makefile
snippet, avoiding the extra processes is worth doing.

The patch below gives me:

  $ git show HEAD:generate-cmdlist.sh >generate-cmdlist.sh.old
  $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
  Benchmark #1: sh generate-cmdlist.sh command-list.txt
    Time (mean ± σ):     591.3 ms ±  31.5 ms    [User: 392.9 ms, System: 243.7 ms]
    Range (min … max):   543.7 ms … 630.6 ms    10 runs
   
  Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
    Time (mean ± σ):      1.236 s ±  0.060 s    [User: 1.100 s, System: 0.556 s]
    Range (min … max):    1.089 s …  1.275 s    10 runs
   
  Summary
    'sh generate-cmdlist.sh command-list.txt' ran
      2.09 ± 0.15 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

---
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..fab9e6a671 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,7 +67,7 @@ print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"

I think you could also delete get_category_line, as it was inlined in
the other caller.

-Peff
Jeff King Oct. 21, 2021, 4:25 p.m. UTC | #2
On Thu, Oct 21, 2021 at 10:42:52AM -0400, Jeff King wrote:

> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..fab9e6a671 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -67,7 +67,7 @@ print_command_list () {
>  	while read cmd rest
>  	do
>  		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> +		printf " | CAT_%s" $rest
>  		echo " },"
>  	done
>  	echo "};"
> 
> I think you could also delete get_category_line, as it was inlined in
> the other caller.

Just for fun, I did a pure-shell loop to drop get_synopsis, which means
we don't exec any processes inside the loop. That patch is below, which
yields the timings (orig is up to your patch 6, no-sort is the patch
above, and pure-shell is the patch below on top):

  $ hyperfine --warmup 1 -L v orig,no-sort,pure-shell -p 'make clean' 'sh generate-cmdlist.sh.{v} command-list.txt'
  Benchmark #1: sh generate-cmdlist.sh.orig command-list.txt
    Time (mean ± σ):      1.286 s ±  0.148 s    [User: 1.503 s, System: 0.781 s]
    Range (min … max):    0.938 s …  1.451 s    10 runs
   
  Benchmark #2: sh generate-cmdlist.sh.no-sort command-list.txt
    Time (mean ± σ):     553.6 ms ± 143.3 ms    [User: 396.7 ms, System: 198.3 ms]
    Range (min … max):   192.6 ms … 683.5 ms    10 runs
   
  Benchmark #3: sh generate-cmdlist.sh.pure-shell command-list.txt
    Time (mean ± σ):      29.7 ms ±  15.6 ms    [User: 22.6 ms, System: 19.4 ms]
    Range (min … max):    12.0 ms …  49.1 ms    10 runs
   
  Summary
    'sh generate-cmdlist.sh.pure-shell command-list.txt' ran
     18.65 ± 10.93 times faster than 'sh generate-cmdlist.sh.no-sort command-list.txt'
     43.33 ± 23.32 times faster than 'sh generate-cmdlist.sh.orig command-list.txt'

So that's building all of the commands faster than I could get even
"touch Documentation/git-add.txt && make command-list.h" to run with
your patch (not entirely fair; I'm not invoking make here, which
probably does add 100ms of overhead, but I think it's still a net win).

The patch below doesn't enforce the /NAME/ section as the sed does. IMHO
that's not of much value because it uses the line with the command-name
as the lower bound. But it could be done pretty easily with an extra
$seen_name variable.

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index fab9e6a671..eae4bbb4c7 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -22,16 +22,6 @@ category_list () {
 	LC_ALL=C sort -u
 }
 
-get_synopsis () {
-	sed -n '
-		/^NAME/,/'"$1"'/H
-		${
-			x
-			s/.*'"$1"' - \(.*\)/N_("\1")/
-			p
-		}' "Documentation/$1.txt"
-}
-
 define_categories () {
 	echo
 	echo "/* Command categories */"
@@ -66,7 +56,18 @@ print_command_list () {
 	command_list "$1" |
 	while read cmd rest
 	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
+		synopsis=
+		while read line
+		do
+			case "$line" in
+			"$cmd - "*)
+				synopsis=${line#$cmd - }
+				break
+				;;
+			esac
+		done <"Documentation/$cmd.txt"
+
+		printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis"
 		printf " | CAT_%s" $rest
 		echo " },"
 	done
diff mbox series

Patch

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index e517c33710a..a1ab2b1f077 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -67,10 +67,7 @@  print_command_list () {
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		for cat in $(echo "$rest" | get_category_line)
-		do
-			printf " | CAT_$cat"
-		done
+		printf " | CAT_%s" $(echo "$rest" | get_category_line)
 		echo " },"
 	done
 	echo "};"