[0/8] Makefile: make command-list.h 2-5x as fast with -jN

Message ID	cover-0.8-00000000000-20211020T183533Z-avarab@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Johannes Sixt <j6t@kdbg.org>, =?utf-8?q?=C3=98ystein_Walle?= <oystwa@gmail.com>, =?utf-8?b?w4Z2YXIgQXJu?= =?utf-8?b?ZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> Subject: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Date: Wed, 20 Oct 2021 20:39:51 +0200 Message-Id: <cover-0.8-00000000000-20211020T183533Z-avarab@gmail.com> In-Reply-To: <YNqBtrXzUlJiuc7y@coredump.intra.peff.net> References: <YNqBtrXzUlJiuc7y@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Makefile: make command-list.h 2-5x as fast with -jN \| expand [0/8] Makefile: make command-list.h 2-5x as fast with -jN [1/8] command-list.txt: sort with "LC_ALL=C sort" [2/8] generate-cmdlist.sh: trivial whitespace change [3/8] generate-cmdlist.sh: spawn fewer processes [4/8] generate-cmdlist.sh: don't call get_categories() from category_list() [5/8] generate-cmdlist.sh: run "grep \| sort", not "sort \| grep" [6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature [7/8] Makefile: stop having command-list.h depend on a wildcard [8/8] Makefile: assert correct generate-cmdlist.sh output

Ævar Arnfjörð Bjarmason Oct. 20, 2021, 6:39 p.m. UTC

This series is based off an off-hand comment I made about making the
cmdlist.sh faster, in the meantime much of the same methods are
already cooking in "next" for the "lint-docs" target.

See 7/8 for the main performance numbers, along the way I stole some
patches from Johannes Sixt who'd worked on optimizing the script
before, which compliment this new method of generating this file by
piggy-backing more on GNU make for managing a dependency graph for us.

1. https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/

Johannes Sixt (2):
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Ævar Arnfjörð Bjarmason (6):
  command-list.txt: sort with "LC_ALL=C sort"
  generate-cmdlist.sh: trivial whitespace change
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  Makefile: stop having command-list.h depend on a wildcard
  Makefile: assert correct generate-cmdlist.sh output

 .gitignore          |  1 +
 Makefile            | 57 ++++++++++++++++++++++++++++++++++++++++-----
 command-list.txt    | 20 ++++++++--------
 generate-cmdlist.sh | 53 ++++++++++++++++++++++++++++-------------
 help.c              |  3 ---
 5 files changed, 99 insertions(+), 35 deletions(-)

Jeff King Oct. 20, 2021, 8:35 p.m. UTC | #1

On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This series is based off an off-hand comment I made about making the
> cmdlist.sh faster, in the meantime much of the same methods are
> already cooking in "next" for the "lint-docs" target.
> 
> See 7/8 for the main performance numbers, along the way I stole some
> patches from Johannes Sixt who'd worked on optimizing the script
> before, which compliment this new method of generating this file by
> piggy-backing more on GNU make for managing a dependency graph for us.

I still think this is a much more complicated and error-prone approach
than just making the script faster. I know we can't rely on perl, but
could we use it optimistically?

The proof-of-concept below on top of your patch 6 does two things:

  - observes that there is no need for get_category_line in the loop; it
    is just sorting and de-duping the bitfields, but since we just OR
    them together, neither of those things matters

  - uses perl to open each individual doc file to get the synopsis. It
    _feels_ like this should be something that sed or awk could do, but
    it is beyond me. However, speculatively trying perl is an easy win,
    and we can fall back to the shell loop.

Here are my timings:

Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):      40.4 ms ±  18.1 ms    [User: 44.9 ms, System: 7.1 ms]
  Range (min … max):    20.3 ms …  65.5 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):      1.414 s ±  0.038 s    [User: 1.641 s, System: 0.863 s]
  Range (min … max):    1.344 s …  1.451 s    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
   34.96 ± 15.66 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

I hate having fallbacks, because the seldom-used version may bitrot. I'm
tempted to just write that loop in C, but there's a circular dependency
with using any of libgit.a (even though it's really only the git
porcelain that cares about command-list.h, it goes into help.o which
goes into libgit.a. We could break that dependency if we wanted,
though). If we can do it in awk, that may be worthwhile.

But either way, I think this is superior to trying to parallelize the
Makefile:

  - it actually uses less CPU, rather than just trying to improve
    wall-clock time by using more cores

  - there's little chance of having some subtle dependency problem

Parallelizing makes a lot of sense to me when the operation is truly
expensive. But in this case it's literally just opening a file, and the
only reason it's slow is because we spawn a ton of processes to do it.

---
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..f922eebe23 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -63,11 +63,23 @@ define_category_names () {
 print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"
 
+	# try perl first, as we can do it all in one process
+	command_list "$1" |
+	perl -ne '
+		my ($cmd, @rest) = split;
+		open(my $fh, "<", "Documentation/$cmd.txt");
+		while (<$fh>) {
+			next unless /^$cmd - (.*)/;
+			print "{ \"$cmd\", N_(\"$1\"), 0";
+			print " | CAT_$_" for (@rest);
+			print " },\n";
+		}
+	' ||
 	command_list "$1" |
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"

Taylor Blau Oct. 20, 2021, 9:31 p.m. UTC | #2

On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> > This series is based off an off-hand comment I made about making the
> > cmdlist.sh faster, in the meantime much of the same methods are
> > already cooking in "next" for the "lint-docs" target.
> >
> > See 7/8 for the main performance numbers, along the way I stole some
> > patches from Johannes Sixt who'd worked on optimizing the script
> > before, which compliment this new method of generating this file by
> > piggy-backing more on GNU make for managing a dependency graph for us.
>
> I still think this is a much more complicated and error-prone approach
> than just making the script faster. I know we can't rely on perl, but
> could we use it optimistically?

I'll take credit for this terrible idea of using Perl when available.

But I don't think we even need to, since we could just rely on Awk. That
has all the benefits you described while still avoiding the circular
dependency on libgit.a. But the killer feature is that we don't have to
rely on two implementations, the lesser-used of which is likely to
bitrot over time.

The resulting awk is a little ugly, because of the nested-ness. I'm also
no awk-spert, but I think that something like the below gets the job
done.

It also has the benefit of being slightly faster than the equivalent
Perl implementation, for whatever those extra ~9 ms are worth ;).

Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
  Range (min … max):    15.5 ms …  31.7 ms    95 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
  Range (min … max):    22.4 ms …  54.8 ms    64 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

---

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..39338ef1cc 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -64,12 +64,19 @@ print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"

 	command_list "$1" |
-	while read cmd rest
-	do
-		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
-		echo " },"
-	done
+	awk '{
+		f="Documentation/" $1 ".txt"
+		while((getline line<f) > 0) {
+			if (match(line, "^" $1 " - ")) {
+				syn=substr(line, RLENGTH+1)
+				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
+				for (i=2; i<=NF; i++) {
+					printf " | CAT_%s", $i
+				}
+				print " },"
+			}
+		}
+	}'
 	echo "};"
 }

Ævar Arnfjörð Bjarmason Oct. 20, 2021, 11:14 p.m. UTC | #3

On Wed, Oct 20 2021, Taylor Blau wrote:

> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?

Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.

We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.

If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.

> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
>   Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
>   Range (min … max):    15.5 ms …  31.7 ms    95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
>   Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
>   Range (min … max):    22.4 ms …  54.8 ms    64 runs
>
> Summary
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>
>  	command_list "$1" |
> -	while read cmd rest
> -	do
> -		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> -		echo " },"
> -	done
> +	awk '{
> +		f="Documentation/" $1 ".txt"
> +		while((getline line<f) > 0) {
> +			if (match(line, "^" $1 " - ")) {
> +				syn=substr(line, RLENGTH+1)
> +				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> +				for (i=2; i<=NF; i++) {
> +					printf " | CAT_%s", $i
> +				}
> +				print " },"
> +			}
> +		}
> +	}'
>  	echo "};"
>  }

Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].

I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.

I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.

The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).

I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".

There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.

In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).

1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

Jeff King Oct. 20, 2021, 11:46 p.m. UTC | #4

On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Jeff: Just in terms of error prone both of these implementations will
> accept bad input that's being caught in 8/8 of this series.
> 
> We accept a lot of bad input now, ending up with some combinations of
> bad output or compile errors if you screw with the input *.txt files. I
> think I've addressed all of those in this series.

I don't mind more error-checking, though TBH I don't find a huge value
in it. But what I did mean was:

> If you mean the general concept of making a "foo.gen" from a "foo.txt"
> as an intermediate with make as a way to get to "many-foo.h" I don't
> really see how it's error prone conceptually. You get error checking
> each step of the way, and it encourages logic that's simpler each step
> of the way.

Yes. It just seems like the Makefile gets more complicated, and
sometimes that can lead to subtle dependency issues (e.g., the ".build"
dependency in the earlier iteration of the series).

And in general I'd much rather debug an awk script than a Makefile.

> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> were both considered before[1].

Ah sorry, I thought it was just a perl one that had been the
show-stopper. I hadn't noticed the awk one. However, the point of my
patch was to use perl if available, and fall back otherwise. Maybe
that's too ugly, but it does address the concern with Eric's
implementation.

> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
> with/without this awk version the difference in runtime is within the
> error bars. I.e. making the loop faster isn't necessary. It's better to
> get to a point where make can save you from doing all/most of the work
> by checking modification times, rather than making an O(n) loop faster.

FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
partial results is IMHO inferior to just making things faster.

> I'm also interested in (and have WIP patches for) simplifying things
> more generally in the Makefile. Once we have a file exploded out has
> just the synopsis line that can be used to replace what's now in
> Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
> "man git".
> 
> There's subtle dependency issues there as well, and just having a
> one-off solution for the the command-list.h doesn't get us closer to
> addressing that sibling implementation.

So I don't know what "subtle dependency issues" you found here, but this
is exactly the kind of complexity it was my goal to avoid.

-Peff

Ævar Arnfjörð Bjarmason Oct. 21, 2021, 12:48 a.m. UTC | #5

On Wed, Oct 20 2021, Jeff King wrote:

> On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Jeff: Just in terms of error prone both of these implementations will
>> accept bad input that's being caught in 8/8 of this series.
>> 
>> We accept a lot of bad input now, ending up with some combinations of
>> bad output or compile errors if you screw with the input *.txt files. I
>> think I've addressed all of those in this series.
>
> I don't mind more error-checking, though TBH I don't find a huge value
> in it. But what I did mean was:
>
>> If you mean the general concept of making a "foo.gen" from a "foo.txt"
>> as an intermediate with make as a way to get to "many-foo.h" I don't
>> really see how it's error prone conceptually. You get error checking
>> each step of the way, and it encourages logic that's simpler each step
>> of the way.
>
> Yes. It just seems like the Makefile gets more complicated, and
> sometimes that can lead to subtle dependency issues (e.g., the ".build"
> dependency in the earlier iteration of the series).

FWIW there wasn't an earlier version of the series, just a POC patch I
had as a comment in
https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/

> And in general I'd much rather debug an awk script than a Makefile.
>
>> Per Eric's Sunshine's upthread comments an awk and Perl implementation
>> were both considered before[1].
>
> Ah sorry, I thought it was just a perl one that had been the
> show-stopper. I hadn't noticed the awk one. However, the point of my
> patch was to use perl if available, and fall back otherwise. Maybe
> that's too ugly, but it does address the concern with Eric's
> implementation.

I think carrying two implementations is worse than just having the one
slightly slower one.

>> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
>> with/without this awk version the difference in runtime is within the
>> error bars. I.e. making the loop faster isn't necessary. It's better to
>> get to a point where make can save you from doing all/most of the work
>> by checking modification times, rather than making an O(n) loop faster.
>
> FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
> partial results is IMHO inferior to just making things faster.

I agree with you in the general case, but for something that's consumed
by a make dependency graph I find it easier to debug things if
e.g. changing git-add.txt results in a change to git-add.gen, which is
then cat'd together.

IOW if we had a sufficiently fast C compiler I think I'd still prefer
make's existing rules over some equivalent of:

    cat *.c | super-fast-cc

Since similar to how the *.sp files depend on the the *.o files now,
declaring the dependency graph allows you to easily add more built
things.

>> I'm also interested in (and have WIP patches for) simplifying things
>> more generally in the Makefile. Once we have a file exploded out has
>> just the synopsis line that can be used to replace what's now in
>> Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
>> "man git".
>> 
>> There's subtle dependency issues there as well, and just having a
>> one-off solution for the the command-list.h doesn't get us closer to
>> addressing that sibling implementation.
>
> So I don't know what "subtle dependency issues" you found here, but this
> is exactly the kind of complexity it was my goal to avoid.

But how? I don't see how narrowly making the loop in generate-cmdlist.sh
gets us closer to generating the "cmds_txt" in the
Documentation/Makefile.

Whereas after this series we're pretty much there in terms of generating
those files. i.e. try:

    cat Documentation/cmds-mainporcelain.txt

All of those synopsis blurbs are extracted, and reverse-attributable to
the corresponding files.

The dependencies there are (arguably) subtly broken because those files
aren't re-made if a "cmd-list.made" is more recent, so if you remove one
of the generated text files the Makefile logic will get stuck because
the graph is incomplete (which can happen e.g. if "make clean" is
interrupted, or you run a "git clean -dxf '*.txt'".

I did the latter and ran into that recently, because I was trying to
ad-hoc fix another more general dependency issue we tend to have, which
is using wildcards on potentially generated files, so if you checkout a
new verison, build, and then checkout an old version (or are adding one
of the files involved) a script like build-docdep.perl will "helpfully"
pick up bad dependencies.

I guess you could argue that those are all problems with the Makefile,
but I think they're ultimately best solved by driving the dependencies
from the Makefile.

I.e. all we need is the one list of built-ins in command-list.txt, pair
that up with the "category" and we can always generated everything down
to the manpages correctly without relying on FS wildcards.

Taylor Blau Oct. 21, 2021, 2:20 a.m. UTC | #6

On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> >> were both considered before[1].
> >
> > Ah sorry, I thought it was just a perl one that had been the
> > show-stopper. I hadn't noticed the awk one. However, the point of my
> > patch was to use perl if available, and fall back otherwise. Maybe
> > that's too ugly, but it does address the concern with Eric's
> > implementation.
>
> I think carrying two implementations is worse than just having the one
> slightly slower one.

I have no opinion on whether or not assuming that awk or Perl exists and
can be relied upon during the build is reasonable or not. It seems like
the former might be a slightly safer assumption than the latter, but in
all honesty it seems like both are always likely to be around.

In any case, I think the point was that we could improve upon Peff's
patch by just having a single implementation done in awk. And when I
wrote that I definitely was in the mindset of being able to rely on awk
during compilation.

> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
> >> with/without this awk version the difference in runtime is within the
> >> error bars. I.e. making the loop faster isn't necessary. It's better to
> >> get to a point where make can save you from doing all/most of the work
> >> by checking modification times, rather than making an O(n) loop faster.
> >
> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
> > partial results is IMHO inferior to just making things faster.
>
> I agree with you in the general case, but for something that's consumed
> by a make dependency graph I find it easier to debug things if
> e.g. changing git-add.txt results in a change to git-add.gen, which is
> then cat'd together.
>
> IOW if we had a sufficiently fast C compiler I think I'd still prefer
> make's existing rules over some equivalent of:
>
>     cat *.c | super-fast-cc
>
> Since similar to how the *.sp files depend on the the *.o files now,
> declaring the dependency graph allows you to easily add more built
> things.

This seems like an unfair comparison to me. I might be more sympathetic
if we were generating a more complicated artifact by running
generate-cmdlist.sh, but its inputs and outputs seem very well defined
(and non-complicated) to me.

In any case, I agree with Peff that this isn't the approach that I would
have taken. But I also think that *just* parallelizing isn't necessarily
a win here. There are two reasons I think that:

  - The cognitive load required to parallelize this process is
    complicated; the .build directory seems like another thing to keep
    track of, and it's not clear to me what updates it, or what the
    result of touching some file in that directory is.

  - But even if the parallelization was achievable by more
    straightforward means, you still have to do the slow thing when
    you're rebuilding from scratch. So this is strictly worse the first
    time you are compiling, at least on machines with fewer cores.

In any case, this is all overkill in my mind for what we are talking
about. I agree that 'cat *.c | super-fast-cc' is worse than a competent
Makefile that knows what to build and when. But the problem here is a
slow loop in shell that is easily made much faster by implementing it
in a language that can execute the whole loop in a single process.

Thanks,
Taylor

Eric Sunshine Oct. 21, 2021, 5:39 a.m. UTC | #7

On Wed, Oct 20, 2021 at 7:34 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, Oct 20 2021, Taylor Blau wrote:
> > But I don't think we even need to, since we could just rely on Awk. That
> > has all the benefits you described while still avoiding the circular
> > dependency on libgit.a. But the killer feature is that we don't have to
> > rely on two implementations, the lesser-used of which is likely to
> > bitrot over time.
>
> Per Eric's Sunshine's upthread comments an awk and Perl implementation
> were both considered before[1].
> 1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

Thanks for saving me the trouble of digging up that email reference.

Jeff King Oct. 21, 2021, 2:34 p.m. UTC | #8

On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> There's subtle dependency issues there as well, and just having a
> >> one-off solution for the the command-list.h doesn't get us closer to
> >> addressing that sibling implementation.
> >
> > So I don't know what "subtle dependency issues" you found here, but this
> > is exactly the kind of complexity it was my goal to avoid.
> 
> But how? I don't see how narrowly making the loop in generate-cmdlist.sh
> gets us closer to generating the "cmds_txt" in the
> Documentation/Makefile.

What I meant is that the work to get everything right in the Makefile to
correctly handle dependencies and a partial rebuild can be tricky. For
instance, you're still stuck with a big wildcard dependency on
Documentation/git*.txt (and a manual list of exclusions in the Makefile)
because it's hard in make to do make new dynamic rules based on an
existing one (i.e., the list _should_ come from what's in
command-list.txt). Or the fact that we apparently need to keep the old
script around or cmake anyway.

It's also much slower. Here are from-scratch builds before and after
your patch 7:

  $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
  Benchmark #1: make command-list.h
    Time (mean ± σ):      1.527 s ±  0.060 s    [User: 1.320 s, System: 0.649 s]
    Range (min … max):    1.433 s …  1.625 s    10 runs

  $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
  Benchmark #1: make command-list.h
    Time (mean ± σ):      2.661 s ±  0.080 s    [User: 2.359 s, System: 1.082 s]
    Range (min … max):    2.481 s …  2.756 s    10 runs

I know that partial builds will offset that in some cases, but it still
feels like a step in the wrong direction. Even with a partial build,
swapping out "make clean" for "touch Documentation/git-add.txt" takes
about 200ms for me. Whereas with the faster version of
generate-cmdlist.sh I showed, it takes 150ms.

Now performance isn't everything, and it's possible these partial
snippets will be useful in other places. But I'm not sure I see any real
advantage in this series, and it seems like we're taking a hit in both
performance and complexity in the meantime.

-Peff

Junio C Hamano Oct. 21, 2021, 10:34 p.m. UTC | #9

Jeff King <peff@peff.net> writes:

> Now performance isn't everything, and it's possible these partial
> snippets will be useful in other places. But I'm not sure I see any real
> advantage in this series, and it seems like we're taking a hit in both
> performance and complexity in the meantime.

Let's not forget about a hit we are taking in reviewer bandwidth.
And added complexity will cost more over time.  I am not sure if
these 8-patches deserved this much attention, compared to other
neglected topics.

Ævar Arnfjörð Bjarmason Oct. 22, 2021, 10:51 a.m. UTC | #10

On Thu, Oct 21 2021, Jeff King wrote:

> On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> There's subtle dependency issues there as well, and just having a
>> >> one-off solution for the the command-list.h doesn't get us closer to
>> >> addressing that sibling implementation.
>> >
>> > So I don't know what "subtle dependency issues" you found here, but this
>> > is exactly the kind of complexity it was my goal to avoid.
>> 
>> But how? I don't see how narrowly making the loop in generate-cmdlist.sh
>> gets us closer to generating the "cmds_txt" in the
>> Documentation/Makefile.
>
> What I meant is that the work to get everything right in the Makefile to
> correctly handle dependencies and a partial rebuild can be tricky. For
> instance, you're still stuck with a big wildcard dependency on
> Documentation/git*.txt (and a manual list of exclusions in the Makefile)
> because it's hard in make to do make new dynamic rules based on an
> existing one (i.e., the list _should_ come from what's in
> command-list.txt). Or the fact that we apparently need to keep the old
> script around or cmake anyway.
>
> It's also much slower. Here are from-scratch builds before and after
> your patch 7:
>
>   $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
>   Benchmark #1: make command-list.h
>     Time (mean ± σ):      1.527 s ±  0.060 s    [User: 1.320 s, System: 0.649 s]
>     Range (min … max):    1.433 s …  1.625 s    10 runs
>    
>   
>   $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h'
>   Benchmark #1: make command-list.h
>     Time (mean ± σ):      2.661 s ±  0.080 s    [User: 2.359 s, System: 1.082 s]
>     Range (min … max):    2.481 s …  2.756 s    10 runs
>
> I know that partial builds will offset that in some cases, but it still
> feels like a step in the wrong direction. Even with a partial build,
> swapping out "make clean" for "touch Documentation/git-add.txt" takes
> about 200ms for me. Whereas with the faster version of
> generate-cmdlist.sh I showed, it takes 150ms.
>
> Now performance isn't everything, and it's possible these partial
> snippets will be useful in other places. But I'm not sure I see any real
> advantage in this series, and it seems like we're taking a hit in both
> performance and complexity in the meantime.

Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
slower on -j1, but faster with higher -j<n> numbers.

Aside from any changes I'm proposing here it seems rather pointless to
me to optimize the runtime of -j1 runs.

I think we use those in e.g. CI, so of course if they become *really*
slow it will matter, but the purpose of this change is to make hacking
on git easier, both in terms of runtime and discovering what the
Makefile is doing wih V=1.

I think anyone hacking on git is going to be on a system with -j2 at
least. So again, separate from these specific changes, if we've got a
change that speeds up -jN runs at the cost of a -j1 run that seems like
good thing.

In terms of the utility of benchmarks this benchmark uses "make" and is
meaningful, but e.g. <YXCKqAEwtwFozWk6@nand.local> (and I think some
other ones?) in this thread invoke the shellscript directly.

Those sorts of benchmarks may or may not matter, and in this case the
script is always called in the context of a Makefile, so that's really
the only meaningful way to test it. If e.g. its performance changes in a
way that won't be noticed in other Makefile noise it probably won't
matter to anyone.

Ævar Arnfjörð Bjarmason Oct. 22, 2021, 12:37 p.m. UTC | #11

On Wed, Oct 20 2021, Taylor Blau wrote:

> On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation
>> >> were both considered before[1].
>> >
>> > Ah sorry, I thought it was just a perl one that had been the
>> > show-stopper. I hadn't noticed the awk one. However, the point of my
>> > patch was to use perl if available, and fall back otherwise. Maybe
>> > that's too ugly, but it does address the concern with Eric's
>> > implementation.
>>
>> I think carrying two implementations is worse than just having the one
>> slightly slower one.
>
> I have no opinion on whether or not assuming that awk or Perl exists and
> can be relied upon during the build is reasonable or not. It seems like
> the former might be a slightly safer assumption than the latter, but in
> all honesty it seems like both are always likely to be around.
>
> In any case, I think the point was that we could improve upon Peff's
> patch by just having a single implementation done in awk. And when I
> wrote that I definitely was in the mindset of being able to rely on awk
> during compilation.
>
>> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
>> >> with/without this awk version the difference in runtime is within the
>> >> error bars. I.e. making the loop faster isn't necessary. It's better to
>> >> get to a point where make can save you from doing all/most of the work
>> >> by checking modification times, rather than making an O(n) loop faster.
>> >
>> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing
>> > partial results is IMHO inferior to just making things faster.
>>
>> I agree with you in the general case, but for something that's consumed
>> by a make dependency graph I find it easier to debug things if
>> e.g. changing git-add.txt results in a change to git-add.gen, which is
>> then cat'd together.
>>
>> IOW if we had a sufficiently fast C compiler I think I'd still prefer
>> make's existing rules over some equivalent of:
>>
>>     cat *.c | super-fast-cc
>>
>> Since similar to how the *.sp files depend on the the *.o files now,
>> declaring the dependency graph allows you to easily add more built
>> things.
>
> This seems like an unfair comparison to me. I might be more sympathetic
> if we were generating a more complicated artifact by running
> generate-cmdlist.sh, but its inputs and outputs seem very well defined
> (and non-complicated) to me.

They are? a foo.o to foo.o input is relatively uncomplicated, and you
can discover the exact dependencies with 3rd party tools, like the GCC
and Clang switches we use generate the .depends dirs[1].

Whereas with the custom shellscripts that have for-loops of their own
like generate-cmdlist.sh what it depends on exactly is relatively opaque
to you until you read the shellscript.

I guess it's a matter of taste, but if you run this with/without this
series:

    touch Documentation/git-a*.txt; time make -j1 command-list.h --debug=b V=1

You'll see that before we'd spot that e.g. git-add.txt changed, but
we'll run one target in response to that at the end.

So it's just like what you'd get when you make %.o from %.c to produce a
"program" that links all those %.o together at the end.

So I do think it's a fair comparison, if anything it's unfair to this
series, because as noted you can discover these dependency independently
with GCC etc. for C code. But for a custom *.txt format with an ad-hoc
*.sh to parse it there's no such aid available.

1. $ cat .depend/help.o.d 
help.o: help.c cache.h git-compat-util.h compat/bswap.h wildmatch.h \
 banned.h strbuf.h hashmap.h hash.h repository.h path.h sha1dc_git.h \
 sha1collisiondetection/lib/sha1.h sha256/block/sha256.h list.h advice.h \
 gettext.h convert.h string-list.h trace.h trace2.h pack-revindex.h \
 oid-array.h mem-pool.h config.h builtin.h commit.h object.h tree.h \
 decorate.h gpg-interface.h pretty.h commit-slab.h commit-slab-decl.h \
 commit-slab-impl.h exec-cmd.h run-command.h thread-utils.h strvec.h \
 levenshtein.h help.h command-list.h column.h version.h refs.h \
 parse-options.h prompt.h
[...]

Jeff King Oct. 22, 2021, 6:31 p.m. UTC | #12

On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
> slower on -j1, but faster with higher -j<n> numbers.
> 
> Aside from any changes I'm proposing here it seems rather pointless to
> me to optimize the runtime of -j1 runs.
> 
> I think we use those in e.g. CI, so of course if they become *really*
> slow it will matter, but the purpose of this change is to make hacking
> on git easier, both in terms of runtime and discovering what the
> Makefile is doing wih V=1.
> 
> I think anyone hacking on git is going to be on a system with -j2 at
> least. So again, separate from these specific changes, if we've got a
> change that speeds up -jN runs at the cost of a -j1 run that seems like
> good thing.

It seems weird to me to assume that all of our processors are available
to build command-list.h. In most cases you are not running "make -j16
command-list.h", but rather "make -j16", and those other processors are
doing useful things, like say, building actual C code.

So counting CPU time is the interesting thing, because every cycle you
save there gets used for other work. And "make -j1" just brings
wall-clock and CPU time together.

-Peff

Ævar Arnfjörð Bjarmason Oct. 22, 2021, 8:50 p.m. UTC | #13

On Fri, Oct 22 2021, Jeff King wrote:

> On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Yes, the same numbers are noted in the 7/8 commit message. I.e. it's
>> slower on -j1, but faster with higher -j<n> numbers.
>> 
>> Aside from any changes I'm proposing here it seems rather pointless to
>> me to optimize the runtime of -j1 runs.
>> 
>> I think we use those in e.g. CI, so of course if they become *really*
>> slow it will matter, but the purpose of this change is to make hacking
>> on git easier, both in terms of runtime and discovering what the
>> Makefile is doing wih V=1.
>> 
>> I think anyone hacking on git is going to be on a system with -j2 at
>> least. So again, separate from these specific changes, if we've got a
>> change that speeds up -jN runs at the cost of a -j1 run that seems like
>> good thing.
>
> It seems weird to me to assume that all of our processors are available
> to build command-list.h. In most cases you are not running "make -j16
> command-list.h", but rather "make -j16", and those other processors are
> doing useful things, like say, building actual C code.
>
> So counting CPU time is the interesting thing, because every cycle you
> save there gets used for other work. And "make -j1" just brings
> wall-clock and CPU time together.

There's been a lot of goalpost moving in the discussion around this
series, but if you look at the numbers for the v1 I proposed at
<patch-7.8-0c6f9b80d3b-20211020T183533Z-avarab@gmail.com> it also had a
significant drop in "user" time.

The only thing that was slower in my tests in either wallclock or user
time was the wallclock time on the building from scratch case (the user
time was lower). Maybe those measurements were bad or whatever, but
that's the context for the above.

Now, since then I've submitted a v2 with just the "make the shellscript
faster" parts of this, including with some of your suggestions, making
the shellscript version faster.

FWIW there's a rather trivial addition to my version that makes the
"make all the things" faster again in wallclock/user time. We don't need
to re-parse the command-list.txt at all again to emit the headers if
command-list.txt doesn't change, which is the common case. So we can
just cache the whole header portion in another *.gen file and "cat" it.

Anyway, I've been running these changes on top of other Makefile changes
I've got locally where we re-build almost nothing as HEAD changes. No
FORCE targets shellscripting (but the same via native GNU make
features), and no *.sh/*.perl script re-generation on HEAD changes (the
former being on-list).

So with that the common case really is that everything hangs on the
command-list.h generation. Since we have maybe 1-5 *.c files modified,
then we need to link everything, and the linking is hanging on help.o,
which needs command-list.h.

[0/8] Makefile: make command-list.h 2-5x as fast with -jN

Message

Comments