diff mbox series

[v3,09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version

Message ID patch-v3-09.10-e2702bcc1d0-20211105T135058Z-avarab@gmail.com (mailing list archive)
State Accepted
Commit e88842ee1c81d45157e2a4e87f6065b6837e279a
Headers show
Series generate-cmdlist.sh: make it (and "make") run faster | expand

Commit Message

Ævar Arnfjörð Bjarmason Nov. 5, 2021, 2:08 p.m. UTC
Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.

On my *nix system this makes things quite a bit slower compared to
HEAD~:
o
  'sh generate-cmdlist.sh.old command-list.txt' ran
    1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
   18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'

But when I tried running generate-cmdlist.sh 100 times in CI I found
that it helped across the board even on OSX & Linux. I tried testing
it in CI with this ad-hoc few-liner:

    for i in $(seq -w 0 11 | sort -nr)
    do
    	git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
    	git add generate-cmdlist* &&
    	cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
    	perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
    	git add t/t00*.sh
    done && git commit -m"generated it"

Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
and t0003* file to the preceding commit, the relevant results were:

    linux-gcc:

    [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)
    [12:05:30] t0003-generate-cmdlist.sh .. ok       32 ms ( 0.00 usr  0.00 sys +  2.66 cusr  1.81 csys =  4.47 CPU)

    osx-gcc:

    [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
    [11:58:16] t0003-generate-cmdlist.sh .. ok    92127 ms ( 0.02 usr  0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)

    vs-test:

    [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
    [12:03:20] t0003-generate-cmdlist.sh .. ok       32 s ( 0.00 usr  0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)

I.e. even on *nix running 100 of these in a loop was up to ~2x faster
in absolute runtime, I suspect it's due factors that are exacerbated
in the CI, e.g. much slower process startup due to some platform
limits, or a slower FS.

The "cut -d" change here is because we're not emitting the
40-character aligned output anymore, i.e. we'll get the output from
command_list() now, not an as-is line from command-list.txt.

This also makes the parsing more reliable, as we could tweak the
whitespace alignment without breaking this parser. Let's reword a
now-inaccurate comment in "command-list.txt" describing that previous
alignment limitation. We'll still need the "### command-list [...]"
line due to the "Documentation/cmd-list.perl" logic added in
11c6659d85d (command-list: prepare machinery for upcoming "common
groups" section, 2015-05-21).

There was a proposed change subsequent to this one[3] which continued
moving more logic into the "command_list() function, i.e. replaced the
"cut | tr | grep" chain in "category_list()" with an argument to
"command_list()".

That change might have had a bit of an effect, but not as much as the
preceding commit, so I decided to drop it. The relevant performance
numbers from it were:

    linux-gcc:

    [12:05:33] t0001-generate-cmdlist.sh .. ok       13 ms ( 0.00 usr  0.00 sys +  3.33 cusr  2.78 csys =  6.11 CPU)
    [12:05:33] t0002-generate-cmdlist.sh .. ok       14 ms ( 0.00 usr  0.00 sys +  3.64 cusr  3.09 csys =  6.73 CPU)

    osx-gcc:

    [11:58:03] t0001-generate-cmdlist.sh .. ok    78416 ms ( 0.02 usr  0.01 sys + 11.78 cusr  6.22 csys = 18.03 CPU)
    [11:58:04] t0002-generate-cmdlist.sh .. ok    80081 ms ( 0.02 usr  0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)

    vs-test:

    [12:03:20] t0001-generate-cmdlist.sh .. ok       34 s ( 0.00 usr  0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
    [12:03:14] t0002-generate-cmdlist.sh .. ok       30 s ( 0.02 usr  0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)

As above HEAD~2 and t0002* are testing the code in this commit (and
the line is the same), but HEAD~1 and t0001* are testing that dropped
change in [3].

1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/
3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/
---
 command-list.txt    |  2 +-
 generate-cmdlist.sh | 24 ++++++++++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/command-list.txt b/command-list.txt
index 04cde20c3da..675c28f0bd0 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -43,7 +43,7 @@ 
 # specified here, which can only have "guide" attribute and nothing
 # else.
 #
-### command list (do not change this line, also do not change alignment)
+### command list (do not change this line)
 # command name                          category [category] [category]
 git-add                                 mainporcelain           worktree
 git-am                                  mainporcelain
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9b7d6aea629..cfe0454d1de 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -6,12 +6,28 @@  die () {
 }
 
 command_list () {
-	eval "grep -ve '^#' $exclude_programs" <"$1"
+	while read cmd rest
+	do
+		case "$cmd" in
+		"#"* | '')
+			# Ignore comments and allow empty lines
+			continue
+			;;
+		*)
+			case "$exclude_programs" in
+			*":$cmd:"*)
+				;;
+			*)
+				echo "$cmd $rest"
+				;;
+			esac
+		esac
+	done <"$1"
 }
 
 category_list () {
 	command_list "$1" |
-	cut -c 40- |
+	cut -d' ' -f2- |
 	tr ' ' '\012' |
 	grep -v '^$' |
 	LC_ALL=C sort -u
@@ -69,11 +85,11 @@  print_command_list () {
 	echo "};"
 }
 
-exclude_programs=
+exclude_programs=:
 while test "--exclude-program" = "$1"
 do
 	shift
-	exclude_programs="$exclude_programs -e \"^$1 \""
+	exclude_programs="$exclude_programs$1:"
 	shift
 done