mbox series

[v2,00/10] Makefile: make generate-cmdlist.sh much faster

Message ID cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com (mailing list archive)
Headers show
Series Makefile: make generate-cmdlist.sh much faster | expand

Message

Ævar Arnfjörð Bjarmason Oct. 22, 2021, 7:36 p.m. UTC
This version of this series drops the Makefile-powered version of the
cmdlist in favor of making the shellscript much faster, mostly with
suggestions from Jeff King.

I still think that splitting out the generated data into files may be
useful for unifying the Documentation/ and C code build processes,
there's another custom parser for command-list.txt in
Documentation/cmd-list.perl.

But if and when I've got something for that I can dig that out of the
v1, in the meantime the v1 of this should be mostly uncontroversial.

The last tow patches make things a bit slower for me, but since they
replace command invocations with pure-shell logic they presumably make
things a bit less painful on e.g. Windows, and the 8th patch here
already made things quite very fast already.

Jeff King (1):
  generate-cmdlist.sh: do not shell out to "sed"

Johannes Sixt (2):
  generate-cmdlist.sh: spawn fewer processes
  generate-cmdlist.sh: replace for loop by printf's auto-repeat feature

Ævar Arnfjörð Bjarmason (7):
  command-list.txt: sort with "LC_ALL=C sort"
  generate-cmdlist.sh: trivial whitespace change
  generate-cmdlist.sh: don't call get_categories() from category_list()
  generate-cmdlist.sh: run "grep | sort", not "sort | grep"
  generate-cmdlist.sh: stop sorting category lines
  generate-cmdlist.sh: replace "grep' invocation with a shell version
  generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell

 command-list.txt    | 20 +++++++-------
 generate-cmdlist.sh | 66 ++++++++++++++++++++++++++-------------------
 2 files changed, 48 insertions(+), 38 deletions(-)

Range-diff against v1:
 1:  96885282988 =  1:  96885282988 command-list.txt: sort with "LC_ALL=C sort"
 2:  5e8fef90e42 =  2:  5e8fef90e42 generate-cmdlist.sh: trivial whitespace change
 3:  6b4de6a6088 =  3:  6b4de6a6088 generate-cmdlist.sh: spawn fewer processes
 4:  074685cf714 =  4:  074685cf714 generate-cmdlist.sh: don't call get_categories() from category_list()
 5:  f01c1fd8088 =  5:  f01c1fd8088 generate-cmdlist.sh: run "grep | sort", not "sort | grep"
 6:  e0b11514b8d =  6:  e0b11514b8d generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
 7:  0c6f9b80d3b <  -:  ----------- Makefile: stop having command-list.h depend on a wildcard
 8:  23d4cc77b6c <  -:  ----------- Makefile: assert correct generate-cmdlist.sh output
 -:  ----------- >  7:  f2f37c2963b generate-cmdlist.sh: stop sorting category lines
 -:  ----------- >  8:  83318d6c0da generate-cmdlist.sh: do not shell out to "sed"
 -:  ----------- >  9:  7903dd1f8c2 generate-cmdlist.sh: replace "grep' invocation with a shell version
 -:  ----------- > 10:  e10a43756d1 generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell

Comments

Taylor Blau Oct. 22, 2021, 9:20 p.m. UTC | #1
On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.

I'd be happy with the version that you suggest here, since at least the
net-effect is that generate-commandlist.sh is faster than before without
much additional complexity.

That said, I did find the structure of these patches somewhat confusing.
There is a lot of refactoring of get_categories() and related functions,
and those patches were a little tricky to read for me. I wonder how much
could be cleaned up by placing "generate-cmdlist.sh: stop sorting
category lines" earlier in the series, getting rid of the caller.

It's too bad that the penultimate patch slowed things down a bit, but
I think that things are so fast now that much more discussion in this
area is really just splitting hairs. It would be interesting to hear
from somebody on Windows whether or not the speed-up there was worth it
(otherwise dropping that patch might make sense).

Anyway, I think we've all spent a lot of time discussing a rather
straightforward set of patches ;-). So this version looks good to me.

Thanks,
Taylor
Junio C Hamano Oct. 23, 2021, 10:34 p.m. UTC | #2
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Subject: Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster

Stale topic as there is no change to the Makefile.

> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.

OK.

I looked at the whole thing and it looked almost done, modulo just a
little breakages here and there whose corrections should be fairly
obvious.

Thanks.
Jeff King Oct. 25, 2021, 4:57 p.m. UTC | #3
On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This version of this series drops the Makefile-powered version of the
> cmdlist in favor of making the shellscript much faster, mostly with
> suggestions from Jeff King.
> 
> I still think that splitting out the generated data into files may be
> useful for unifying the Documentation/ and C code build processes,
> there's another custom parser for command-list.txt in
> Documentation/cmd-list.perl.
> 
> But if and when I've got something for that I can dig that out of the
> v1, in the meantime the v1 of this should be mostly uncontroversial.

Thanks, up through patch 8 this all looks good to me.

> The last tow patches make things a bit slower for me, but since they
> replace command invocations with pure-shell logic they presumably make
> things a bit less painful on e.g. Windows, and the 8th patch here
> already made things quite very fast already.

These ones I could take or leave. They probably do help a little on
Windows, but I'm much more concerned about O(nr_of_commands) process
invocations than I am in reducing the base number of invocations
(because one gives a 169x speedup over the other).

And in patch 9 in particular, we're trading a grep one-liner for a
much-longer shell loop.  And I don't think this is hypocritical with
respect to patch 8; there we are replacing ugly sed with ugly shell, and
the speed benefit is clear and large.

-Peff