mbox series

[0/9] send-email: various optimizations to speed up by >2x

Message ID cover-0.9-0000000000-20210512T132955Z-avarab@gmail.com (mailing list archive)
Headers show
Series send-email: various optimizations to speed up by >2x | expand

Message

Ævar Arnfjörð Bjarmason May 12, 2021, 1:48 p.m. UTC
This is on top of my just-submitted [1] which in turn is on top of
send-email work of mine sitting in "next".

I was meaning to hold off on these patches for a bit, but given the
concurrent on-list discussion about doing config discovery in
send-email I wanted to send this now.

This combines by not-picked-up[1] recent patches to remove the support
for the "sendemail.smtpssl" variable with the later patches showing
where that effort was really going.

As noted in the subject this speeds up git-send-email invocations by
~2x or more, and brings the very slow t9001 test from running in ~26s
on my box to ~12s. It's no longer consistently the slowest test I run.

This is basically done in two ways: We lazily invoke "git config" to
get config, before it's very eager, and deferring Perl compilation
with s/use/require/g.

1. https://lore.kernel.org/git/patch-1.1-92571a8cf7-20210512T094803Z-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-0.2-00000000000-20210411T144128Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (9):
  send-email: remove non-working support for "sendemail.smtpssl"
  send-email: refactor sendemail.smtpencryption config parsing
  send-email: lazily load config for a big speedup
  send-email: lazily shell out to "git var"
  send-email: use function syntax instead of barewords
  send-email: get rid of indirect object syntax
  send-email: lazily load modules for a big speedup
  perl: lazily load some common Git.pm setup code
  send-email: move trivial config handling to Perl

 Documentation/config/sendemail.txt |   3 -
 git-send-email.perl                | 145 +++++++++++++++++------------
 perl/Git.pm                        |  49 +++++-----
 3 files changed, 111 insertions(+), 86 deletions(-)

Comments

Eric Wong May 12, 2021, 6:04 p.m. UTC | #1
Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> As noted in the subject this speeds up git-send-email invocations by
> ~2x or more, and brings the very slow t9001 test from running in ~26s
> on my box to ~12s. It's no longer consistently the slowest test I run.
> 
> This is basically done in two ways: We lazily invoke "git config" to
> get config, before it's very eager, and deferring Perl compilation
> with s/use/require/g.

Nice.  I've been doing similar things elsewhere and hoping to
find time to get around to git-svn at some point.

> Ævar Arnfjörð Bjarmason (9):
>   send-email: remove non-working support for "sendemail.smtpssl"
>   send-email: refactor sendemail.smtpencryption config parsing
>   send-email: lazily load config for a big speedup
>   send-email: lazily shell out to "git var"
>   send-email: use function syntax instead of barewords
>   send-email: get rid of indirect object syntax
>   send-email: lazily load modules for a big speedup
>   perl: lazily load some common Git.pm setup code
>   send-email: move trivial config handling to Perl

I spotted some further optimizations for 7 and 8,
but otherwise consider this series:

Reviewed-by: Eric Wong <e@80x24.org>
Jeff King May 12, 2021, 11:34 p.m. UTC | #2
On Wed, May 12, 2021 at 03:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This combines by not-picked-up[1] recent patches to remove the support
> for the "sendemail.smtpssl" variable with the later patches showing
> where that effort was really going.
> 
> As noted in the subject this speeds up git-send-email invocations by
> ~2x or more, and brings the very slow t9001 test from running in ~26s
> on my box to ~12s. It's no longer consistently the slowest test I run.

Nice. I have observed that with a decent number of cores, the running
time of the entire test suite correlates strongly with the running time
of t9001. :)

Here are timings for individual tests run with "prove --state=slow,save".
(This is on an 8-core machine using -j32, skipping cvs/svn/p4 tests,
and using a tmpfs via --root). The timings were computed with:

  perl -MYAML -e '
    $_ = do { local $/; <> };
    # prove puts this non-YAML cruft at the end
    s/\.\.\.$//s;

    my $t = YAML::Load($_)->{tests};
    print "$_->[1] $_->[0]\n" for
      sort { $b->[1] <=> $a->[1] }
      map { [$_, $t->{$_}->{elapsed}] }
      keys(%$t);
  ' t/.prove | head

Before your patches, the whole sweet takes ~60-63s, and the top timings
(from a 63s run) are:

    63.2607979774475 t9001-send-email.sh
    51.742644071579 t0027-auto-crlf.sh
    37.7909920215607 t3070-wildmatch.sh
    27.09605717659 t7610-mergetool.sh
    24.7028169631958 t7112-reset-submodule.sh
    24.5535898208618 t5572-pull-submodule.sh
    23.8404550552368 t9500-gitweb-standalone-no-errors.sh
    22.3544380664825 t7400-submodule-basic.sh
    21.7017750740051 t5510-fetch.sh
    21.4575610160828 t3305-notes-fanout.sh

Now after, which takes ~54-59s (this is from a 54s run):

  46.796669960022 t0027-auto-crlf.sh
  32.5747599601746 t3070-wildmatch.sh
  21.5069420337677 t7610-mergetool.sh
  20.8392388820648 t1701-racy-split-index.sh
  19.7403028011322 t5572-pull-submodule.sh
  19.7386808395386 t9001-send-email.sh
  19.4622302055359 t7112-reset-submodule.sh
  18.9555768966675 t9500-gitweb-standalone-no-errors.sh
  18.0672709941864 t7400-submodule-basic.sh
  17.641391992569 t5510-fetch.sh

I have some messy patches to split t9001 into two segments. They were
waiting to get polished, but perhaps I can just discard them now. :)

Some side notes for those interested in timing the test suite:

  - If I run t9001 standalone, it goes much faster, of course; the CPU
    throttles down when we're running all the tests in parallel.

  - Those are with "-x --verbose-log", which is nice for catching flaky
    results. Dropping those seems to shave a few seconds off.

  - A big chunk of time for t0027 and t3070 is spent running the sed-based
    chain-linting for their huge tables of auto-generated tests (1400+
    and 1800+ respectively). Dropping the sed linting for just those
    tests knocks off about 30 CPU-seconds.

-Peff
Jeff King May 12, 2021, 11:36 p.m. UTC | #3
On Wed, May 12, 2021 at 03:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> As noted in the subject this speeds up git-send-email invocations by
> ~2x or more, and brings the very slow t9001 test from running in ~26s
> on my box to ~12s. It's no longer consistently the slowest test I run.
> 
> This is basically done in two ways: We lazily invoke "git config" to
> get config, before it's very eager, and deferring Perl compilation
> with s/use/require/g.

Splitting my reply, since the other one got deep into test-suite timing
details.

The techniques here look overall pretty reasonable. I think the module
lazy-loading makes the overall code a _little_ uglier, but IMHO the
speedup you're getting is worth it (I was surprised how much of the
improvement comes from that versus avoiding git-config subprocesses).

My only concern is changing the interface of Git::config_regexp() in the
final patch. Do we need to have a config_regexp_with_values() to avoid
breaking third-party users of the module?

-Peff
Ævar Arnfjörð Bjarmason May 13, 2021, 7:37 a.m. UTC | #4
On Wed, May 12 2021, Jeff King wrote:

> On Wed, May 12, 2021 at 03:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> As noted in the subject this speeds up git-send-email invocations by
>> ~2x or more, and brings the very slow t9001 test from running in ~26s
>> on my box to ~12s. It's no longer consistently the slowest test I run.
>> 
>> This is basically done in two ways: We lazily invoke "git config" to
>> get config, before it's very eager, and deferring Perl compilation
>> with s/use/require/g.
>
> Splitting my reply, since the other one got deep into test-suite timing
> details.
>
> The techniques here look overall pretty reasonable. I think the module
> lazy-loading makes the overall code a _little_ uglier, but IMHO the
> speedup you're getting is worth it (I was surprised how much of the
> improvement comes from that versus avoiding git-config subprocesses).

Yeah, it's mostly uglier, but I think it's worth it. Some parts are
better afterwards though, i.e. the SOME_RARE_BARE_WORD is now
Module::It::Is::In::SOME_RARE_BARE_WORD, which makes it easier to
understand where it's from.

> My only concern is changing the interface of Git::config_regexp() in the
> final patch. Do we need to have a config_regexp_with_values() to avoid
> breaking third-party users of the module?

As noted in 3/9 I don't think we need to worry about it, it's recently
introduced (a few months) API in Git.pm for send-email itself. I think
we can just change it.

In general I think it's unfortunate that we have (at least in principle)
a "public by default" module like Git.pm that's mostly for our own use.

This series doesn't try to deal with that in general at all, I'm
somewhat of the opinion that we should just fork it at this
point. I.e. have a Git.pm we freeze in time, and a Git/Ours.pm that's
going to be the private API.

I stopped with these optimizations at the point of refactoring away
Error.pm, which is a large contributor to compilation time, but as long
as it's a public API that can't be done without changing the public
API. If all we needed to worry about was send-email, git-svn etc. just
changing it to Perl-native exceptions would be trivial.
Jeff King May 13, 2021, 7:49 a.m. UTC | #5
On Thu, May 13, 2021 at 09:37:36AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > My only concern is changing the interface of Git::config_regexp() in the
> > final patch. Do we need to have a config_regexp_with_values() to avoid
> > breaking third-party users of the module?
> 
> As noted in 3/9 I don't think we need to worry about it, it's recently
> introduced (a few months) API in Git.pm for send-email itself. I think
> we can just change it.

Ah, thanks for pointing that out. I _thought_ I had seen you mention it
earlier, but when I went back to look I couldn't find it.

I'm not entirely convinced, though. I agree it's probably not heavily
used, but the existing interface was shipped in three releases already
(v2.29 and up).

> In general I think it's unfortunate that we have (at least in principle)
> a "public by default" module like Git.pm that's mostly for our own use.

I'd certainly agree with that sentiment. :)

> This series doesn't try to deal with that in general at all, I'm
> somewhat of the opinion that we should just fork it at this
> point. I.e. have a Git.pm we freeze in time, and a Git/Ours.pm that's
> going to be the private API.
> 
> I stopped with these optimizations at the point of refactoring away
> Error.pm, which is a large contributor to compilation time, but as long
> as it's a public API that can't be done without changing the public
> API. If all we needed to worry about was send-email, git-svn etc. just
> changing it to Perl-native exceptions would be trivial.

Yeah, I don't have any real problem with that, as long as we don't break
third-party scripts that we've promised not to. I'd even be OK with
deprecating Git.pm and eventually phasing it out, if we think it's a
maintenance burden.

-Peff
Elijah Newren May 27, 2021, 7:21 a.m. UTC | #6
Hi,

On Wed, May 12, 2021 at 6:50 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> As noted in the subject this speeds up git-send-email invocations by
> ~2x or more, and brings the very slow t9001 test from running in ~26s
> on my box to ~12s. It's no longer consistently the slowest test I run.
>
> This is basically done in two ways: We lazily invoke "git config" to
> get config, before it's very eager, and deferring Perl compilation
> with s/use/require/g.

I know I'm very late to the party, but I just wanted to comment that
this is super cool.  Thanks for speeding this up; some really good
finds here.