diff mbox series

[9/9] send-email: move trivial config handling to Perl

Message ID patch-9.9-0d87c9a5a3-20210512T132955Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series send-email: various optimizations to speed up by >2x | expand

Commit Message

Ævar Arnfjörð Bjarmason May 12, 2021, 1:48 p.m. UTC
Optimize the startup time of git-send-email by using an amended
config_regexp() function to retrieve the list of config keys and
values we're interested in.

For boolean keys we can handle the [true|false] case ourselves, and
the "--get" case didn't need any parsing. Let's leave "--path" and
other "--bool" cases to "git config". As noted in a preceding commit
we're free to change the config_regexp() function, it's only used by
"git send-email".

This brings the runtime of "git send-email" from ~60-~70ms to a very
steady ~40ms on my test box. We no run just one "git config"
invocation on startup instead of 8, the exact number will differ based
on the local sendemail.* config. I happen to have 8 of those set.

This brings the runtime of t9001-send-email.sh from ~13s down to ~12s
for me. The change there is less impressive as many of those tests set
various config values, and we're also getting to the point of
diminishing returns for optimizing "git send-email" itself.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 git-send-email.perl | 17 ++++++++++-------
 perl/Git.pm         | 10 +++++-----
 2 files changed, 15 insertions(+), 12 deletions(-)

Comments

Eric Sunshine May 12, 2021, 7:52 p.m. UTC | #1
On Wed, May 12, 2021 at 9:48 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Optimize the startup time of git-send-email by using an amended
> config_regexp() function to retrieve the list of config keys and
> values we're interested in.
>
> For boolean keys we can handle the [true|false] case ourselves, and
> the "--get" case didn't need any parsing. Let's leave "--path" and
> other "--bool" cases to "git config". As noted in a preceding commit
> we're free to change the config_regexp() function, it's only used by
> "git send-email".
>
> This brings the runtime of "git send-email" from ~60-~70ms to a very
> steady ~40ms on my test box. We no run just one "git config"

s/no/now/

> invocation on startup instead of 8, the exact number will differ based
> on the local sendemail.* config. I happen to have 8 of those set.
>
> This brings the runtime of t9001-send-email.sh from ~13s down to ~12s
> for me. The change there is less impressive as many of those tests set
> various config values, and we're also getting to the point of
> diminishing returns for optimizing "git send-email" itself.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Jeff King May 12, 2021, 11:08 p.m. UTC | #2
On Wed, May 12, 2021 at 03:48:25PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Optimize the startup time of git-send-email by using an amended
> config_regexp() function to retrieve the list of config keys and
> values we're interested in.
> 
> For boolean keys we can handle the [true|false] case ourselves, and
> the "--get" case didn't need any parsing. Let's leave "--path" and
> other "--bool" cases to "git config". As noted in a preceding commit
> we're free to change the config_regexp() function, it's only used by
> "git send-email".

I think both of these cases should be safe.

It is a bit unfortunate to have to go through these contortions, but
this is definitely the best we can do for now. I think in the long run
it would be nice to have a "--stdin" mode for git-config, where we could
do something like:

  git config --stdin <<\EOF
  key=foo.bar
  type=bool
  default=false

  key=another.key
  type=color
  default=red
  EOF

But that doesn't exist yet, and using it would probably involve
rearranging send-email a bit (we would need an up-front list of all of
the keys we care about and their types). So I'm perfectly content with
this strategy in the meantime.

> diff --git a/perl/Git.pm b/perl/Git.pm
> index f18852fb09..a9020d0d01 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -739,18 +739,18 @@ sub config_int {
>  =item config_regexp ( RE )
>  
>  Retrieve the list of configuration key names matching the regular
> -expression C<RE>. The return value is a list of strings matching
> -this regex.
> +expression C<RE>. The return value is an ARRAY of key-value pairs.

The Git.pm interface is public-facing, isn't it? Are we breaking
third-party callers of config_regexp here?

-Peff
Felipe Contreras May 13, 2021, 7:04 a.m. UTC | #3
Jeff King wrote:
> It is a bit unfortunate to have to go through these contortions, but
> this is definitely the best we can do for now. I think in the long run
> it would be nice to have a "--stdin" mode for git-config, where we could
> do something like:
> 
>   git config --stdin <<\EOF
>   key=foo.bar
>   type=bool
>   default=false
> 
>   key=another.key
>   type=color
>   default=red
>   EOF

Why do we even have to specify the type? Shouldn't there be a registry
of configurations (a schema), so that all users don't have to do this?
Ævar Arnfjörð Bjarmason May 13, 2021, 7:07 a.m. UTC | #4
On Thu, May 13 2021, Felipe Contreras wrote:

> Jeff King wrote:
>> It is a bit unfortunate to have to go through these contortions, but
>> this is definitely the best we can do for now. I think in the long run
>> it would be nice to have a "--stdin" mode for git-config, where we could
>> do something like:
>> 
>>   git config --stdin <<\EOF
>>   key=foo.bar
>>   type=bool
>>   default=false
>> 
>>   key=another.key
>>   type=color
>>   default=red
>>   EOF
>
> Why do we even have to specify the type? Shouldn't there be a registry
> of configurations (a schema), so that all users don't have to do this?

Yes, we should be moving towards that. Currently we're not there, the
closest we've got is the generated config-list.h.

If we closed that loop properly you should be able to do:

    git config --get --type=dwim clean.requireForce # or auto, or no --type

Or whatever, instead of:

    git config --get --type=bool clean.requireForce

But we're not there yet, there's also edge cases here and there that
make a plain exhaustive registry hard, and send-email is one of them.

In its case the value of sendemail.identity=something (if any)
determines if/how e.g. sendemail.something.smtpEncryption is
interpreted.

I think we can do it with just a list of config variables where the
variable is either a string or a regex (in this case,
"^sendemail(?:[^.]+\.)?\.smtpEncryption$"), but I haven't tried. There
may be other tricky special-cases.
Jeff King May 13, 2021, 7:26 a.m. UTC | #5
On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote:

> Jeff King wrote:
> > It is a bit unfortunate to have to go through these contortions, but
> > this is definitely the best we can do for now. I think in the long run
> > it would be nice to have a "--stdin" mode for git-config, where we could
> > do something like:
> > 
> >   git config --stdin <<\EOF
> >   key=foo.bar
> >   type=bool
> >   default=false
> > 
> >   key=another.key
> >   type=color
> >   default=red
> >   EOF
> 
> Why do we even have to specify the type? Shouldn't there be a registry
> of configurations (a schema), so that all users don't have to do this?

One of the purposes of git-config is to serve third-party scripts that
store their own config keys that Git does not know about. So we can't
know the set of all possible types that will be asked about.

Obviously we could have git-config know internally about all of the keys
other parts of Git would ask about. But generally we have pushed that
knowledge out to the users of the keys, rather than any kind of central
registry.

-Peff
Felipe Contreras May 13, 2021, 8:15 a.m. UTC | #6
Jeff King wrote:
> On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote:
> 
> > Jeff King wrote:
> > > It is a bit unfortunate to have to go through these contortions, but
> > > this is definitely the best we can do for now. I think in the long run
> > > it would be nice to have a "--stdin" mode for git-config, where we could
> > > do something like:
> > > 
> > >   git config --stdin <<\EOF
> > >   key=foo.bar
> > >   type=bool
> > >   default=false
> > > 
> > >   key=another.key
> > >   type=color
> > >   default=red
> > >   EOF
> > 
> > Why do we even have to specify the type? Shouldn't there be a registry
> > of configurations (a schema), so that all users don't have to do this?
> 
> One of the purposes of git-config is to serve third-party scripts that
> store their own config keys that Git does not know about. So we can't
> know the set of all possible types that will be asked about.

Yes, I know, I maintain several tools that have such configurations. For
those you would need to specify the type (or find some way to install
the schema so that git parses it).

But I'm talking about git.git configurations. If you don't specify the
type in --stdin it should fetch it from some database. That would be
much more user-friendly.
Ævar Arnfjörð Bjarmason May 13, 2021, 11:45 a.m. UTC | #7
On Thu, May 13 2021, Felipe Contreras wrote:

> Jeff King wrote:
>> On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote:
>> 
>> > Jeff King wrote:
>> > > It is a bit unfortunate to have to go through these contortions, but
>> > > this is definitely the best we can do for now. I think in the long run
>> > > it would be nice to have a "--stdin" mode for git-config, where we could
>> > > do something like:
>> > > 
>> > >   git config --stdin <<\EOF
>> > >   key=foo.bar
>> > >   type=bool
>> > >   default=false
>> > > 
>> > >   key=another.key
>> > >   type=color
>> > >   default=red
>> > >   EOF
>> > 
>> > Why do we even have to specify the type? Shouldn't there be a registry
>> > of configurations (a schema), so that all users don't have to do this?
>> 
>> One of the purposes of git-config is to serve third-party scripts that
>> store their own config keys that Git does not know about. So we can't
>> know the set of all possible types that will be asked about.
>
> Yes, I know, I maintain several tools that have such configurations. For
> those you would need to specify the type (or find some way to install
> the schema so that git parses it).
>
> But I'm talking about git.git configurations. If you don't specify the
> type in --stdin it should fetch it from some database. That would be
> much more user-friendly.

For what it's worth my idea of hacking a plumbing thingy for
git-send-email here before ultimately deciding that my simpler caching
approach was easier and gave me 95% of the win, was to just teach it a
mode where it spews out all config variables \0-delimited with all
possible interpretations of it. I.e.:

    some.variable	123	bool	true	path	123	string	123	[...]

It's rather cheap to do the "interpret this as --type=X for me" on the
C-level, so we might as well spew out all possible interpretations.

That means that any external tool would be guaranteed to only need one
"git config" invocation to parse any of its config, i.e. in a case where
variable X decides if variable Y is a bool or path or whatever. They'd
already have all possible values.

Something like:

	git config -l -z --type=bool,path
	git config -l -z --type=ALL
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index 907dc1425f..35ba19470d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -325,7 +325,10 @@  sub read_config {
 		my $target = $config_bool_settings{$setting};
 		my $key = "$prefix.$setting";
 		next unless exists $known_keys->{$key};
-		my $v = Git::config_bool(@repo, $key);
+		my $v = (@{$known_keys->{$key}} == 1 &&
+			 $known_keys->{$key}->[0] =~ /^(?:true|false)$/s)
+			? $known_keys->{$key}->[0] eq 'true'
+			: Git::config_bool(@repo, $key);
 		next unless defined $v;
 		next if $configured->{$setting}++;
 		$$target = $v;
@@ -354,14 +357,12 @@  sub read_config {
 		my $key = "$prefix.$setting";
 		next unless exists $known_keys->{$key};
 		if (ref($target) eq "ARRAY") {
-			my @values = Git::config(@repo, $key);
-			next unless @values;
+			my @values = @{$known_keys->{$key}};
 			next if $configured->{$setting}++;
 			@$target = @values;
 		}
 		else {
-			my $v = Git::config(@repo, $key);
-			next unless defined $v;
+			my $v = $known_keys->{$key}->[0];
 			next if $configured->{$setting}++;
 			$$target = $v;
 		}
@@ -372,8 +373,10 @@  sub read_config {
 # parses 'bool' etc.) by only doing so for config keys that exist.
 my %known_config_keys;
 {
-	my @known_config_keys = Git::config_regexp("^sende?mail[.]");
-	@known_config_keys{@known_config_keys} = ();
+	my @kv = Git::config_regexp("^sende?mail[.]");
+	while (my ($k, $v) = splice @kv, 0, 2) {
+		push @{$known_config_keys{$k}} => $v;
+	}
 }
 
 # sendemail.identity yields to --identity. We must parse this
diff --git a/perl/Git.pm b/perl/Git.pm
index f18852fb09..a9020d0d01 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -739,18 +739,18 @@  sub config_int {
 =item config_regexp ( RE )
 
 Retrieve the list of configuration key names matching the regular
-expression C<RE>. The return value is a list of strings matching
-this regex.
+expression C<RE>. The return value is an ARRAY of key-value pairs.
 
 =cut
 
 sub config_regexp {
 	my ($self, $regex) = _maybe_self(@_);
 	try {
-		my @cmd = ('config', '--name-only', '--get-regexp', $regex);
+		my @cmd = ('config', '--null', '--get-regexp', $regex);
 		unshift @cmd, $self if $self;
-		my @matches = command(@cmd);
-		return @matches;
+		my $data = command(@cmd);
+		my (@kv) = map { split /\n/, $_, 2 } split /\0/, $data;
+		return @kv;
 	} catch Git::Error::Command with {
 		my $E = shift;
 		if ($E->value() == 1) {