Message ID | patch-9.9-0d87c9a5a3-20210512T132955Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | send-email: various optimizations to speed up by >2x | expand |
On Wed, May 12, 2021 at 9:48 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > Optimize the startup time of git-send-email by using an amended > config_regexp() function to retrieve the list of config keys and > values we're interested in. > > For boolean keys we can handle the [true|false] case ourselves, and > the "--get" case didn't need any parsing. Let's leave "--path" and > other "--bool" cases to "git config". As noted in a preceding commit > we're free to change the config_regexp() function, it's only used by > "git send-email". > > This brings the runtime of "git send-email" from ~60-~70ms to a very > steady ~40ms on my test box. We no run just one "git config" s/no/now/ > invocation on startup instead of 8, the exact number will differ based > on the local sendemail.* config. I happen to have 8 of those set. > > This brings the runtime of t9001-send-email.sh from ~13s down to ~12s > for me. The change there is less impressive as many of those tests set > various config values, and we're also getting to the point of > diminishing returns for optimizing "git send-email" itself. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
On Wed, May 12, 2021 at 03:48:25PM +0200, Ævar Arnfjörð Bjarmason wrote: > Optimize the startup time of git-send-email by using an amended > config_regexp() function to retrieve the list of config keys and > values we're interested in. > > For boolean keys we can handle the [true|false] case ourselves, and > the "--get" case didn't need any parsing. Let's leave "--path" and > other "--bool" cases to "git config". As noted in a preceding commit > we're free to change the config_regexp() function, it's only used by > "git send-email". I think both of these cases should be safe. It is a bit unfortunate to have to go through these contortions, but this is definitely the best we can do for now. I think in the long run it would be nice to have a "--stdin" mode for git-config, where we could do something like: git config --stdin <<\EOF key=foo.bar type=bool default=false key=another.key type=color default=red EOF But that doesn't exist yet, and using it would probably involve rearranging send-email a bit (we would need an up-front list of all of the keys we care about and their types). So I'm perfectly content with this strategy in the meantime. > diff --git a/perl/Git.pm b/perl/Git.pm > index f18852fb09..a9020d0d01 100644 > --- a/perl/Git.pm > +++ b/perl/Git.pm > @@ -739,18 +739,18 @@ sub config_int { > =item config_regexp ( RE ) > > Retrieve the list of configuration key names matching the regular > -expression C<RE>. The return value is a list of strings matching > -this regex. > +expression C<RE>. The return value is an ARRAY of key-value pairs. The Git.pm interface is public-facing, isn't it? Are we breaking third-party callers of config_regexp here? -Peff
Jeff King wrote: > It is a bit unfortunate to have to go through these contortions, but > this is definitely the best we can do for now. I think in the long run > it would be nice to have a "--stdin" mode for git-config, where we could > do something like: > > git config --stdin <<\EOF > key=foo.bar > type=bool > default=false > > key=another.key > type=color > default=red > EOF Why do we even have to specify the type? Shouldn't there be a registry of configurations (a schema), so that all users don't have to do this?
On Thu, May 13 2021, Felipe Contreras wrote: > Jeff King wrote: >> It is a bit unfortunate to have to go through these contortions, but >> this is definitely the best we can do for now. I think in the long run >> it would be nice to have a "--stdin" mode for git-config, where we could >> do something like: >> >> git config --stdin <<\EOF >> key=foo.bar >> type=bool >> default=false >> >> key=another.key >> type=color >> default=red >> EOF > > Why do we even have to specify the type? Shouldn't there be a registry > of configurations (a schema), so that all users don't have to do this? Yes, we should be moving towards that. Currently we're not there, the closest we've got is the generated config-list.h. If we closed that loop properly you should be able to do: git config --get --type=dwim clean.requireForce # or auto, or no --type Or whatever, instead of: git config --get --type=bool clean.requireForce But we're not there yet, there's also edge cases here and there that make a plain exhaustive registry hard, and send-email is one of them. In its case the value of sendemail.identity=something (if any) determines if/how e.g. sendemail.something.smtpEncryption is interpreted. I think we can do it with just a list of config variables where the variable is either a string or a regex (in this case, "^sendemail(?:[^.]+\.)?\.smtpEncryption$"), but I haven't tried. There may be other tricky special-cases.
On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote: > Jeff King wrote: > > It is a bit unfortunate to have to go through these contortions, but > > this is definitely the best we can do for now. I think in the long run > > it would be nice to have a "--stdin" mode for git-config, where we could > > do something like: > > > > git config --stdin <<\EOF > > key=foo.bar > > type=bool > > default=false > > > > key=another.key > > type=color > > default=red > > EOF > > Why do we even have to specify the type? Shouldn't there be a registry > of configurations (a schema), so that all users don't have to do this? One of the purposes of git-config is to serve third-party scripts that store their own config keys that Git does not know about. So we can't know the set of all possible types that will be asked about. Obviously we could have git-config know internally about all of the keys other parts of Git would ask about. But generally we have pushed that knowledge out to the users of the keys, rather than any kind of central registry. -Peff
Jeff King wrote: > On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote: > > > Jeff King wrote: > > > It is a bit unfortunate to have to go through these contortions, but > > > this is definitely the best we can do for now. I think in the long run > > > it would be nice to have a "--stdin" mode for git-config, where we could > > > do something like: > > > > > > git config --stdin <<\EOF > > > key=foo.bar > > > type=bool > > > default=false > > > > > > key=another.key > > > type=color > > > default=red > > > EOF > > > > Why do we even have to specify the type? Shouldn't there be a registry > > of configurations (a schema), so that all users don't have to do this? > > One of the purposes of git-config is to serve third-party scripts that > store their own config keys that Git does not know about. So we can't > know the set of all possible types that will be asked about. Yes, I know, I maintain several tools that have such configurations. For those you would need to specify the type (or find some way to install the schema so that git parses it). But I'm talking about git.git configurations. If you don't specify the type in --stdin it should fetch it from some database. That would be much more user-friendly.
On Thu, May 13 2021, Felipe Contreras wrote: > Jeff King wrote: >> On Thu, May 13, 2021 at 02:04:08AM -0500, Felipe Contreras wrote: >> >> > Jeff King wrote: >> > > It is a bit unfortunate to have to go through these contortions, but >> > > this is definitely the best we can do for now. I think in the long run >> > > it would be nice to have a "--stdin" mode for git-config, where we could >> > > do something like: >> > > >> > > git config --stdin <<\EOF >> > > key=foo.bar >> > > type=bool >> > > default=false >> > > >> > > key=another.key >> > > type=color >> > > default=red >> > > EOF >> > >> > Why do we even have to specify the type? Shouldn't there be a registry >> > of configurations (a schema), so that all users don't have to do this? >> >> One of the purposes of git-config is to serve third-party scripts that >> store their own config keys that Git does not know about. So we can't >> know the set of all possible types that will be asked about. > > Yes, I know, I maintain several tools that have such configurations. For > those you would need to specify the type (or find some way to install > the schema so that git parses it). > > But I'm talking about git.git configurations. If you don't specify the > type in --stdin it should fetch it from some database. That would be > much more user-friendly. For what it's worth my idea of hacking a plumbing thingy for git-send-email here before ultimately deciding that my simpler caching approach was easier and gave me 95% of the win, was to just teach it a mode where it spews out all config variables \0-delimited with all possible interpretations of it. I.e.: some.variable 123 bool true path 123 string 123 [...] It's rather cheap to do the "interpret this as --type=X for me" on the C-level, so we might as well spew out all possible interpretations. That means that any external tool would be guaranteed to only need one "git config" invocation to parse any of its config, i.e. in a case where variable X decides if variable Y is a bool or path or whatever. They'd already have all possible values. Something like: git config -l -z --type=bool,path git config -l -z --type=ALL
diff --git a/git-send-email.perl b/git-send-email.perl index 907dc1425f..35ba19470d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -325,7 +325,10 @@ sub read_config { my $target = $config_bool_settings{$setting}; my $key = "$prefix.$setting"; next unless exists $known_keys->{$key}; - my $v = Git::config_bool(@repo, $key); + my $v = (@{$known_keys->{$key}} == 1 && + $known_keys->{$key}->[0] =~ /^(?:true|false)$/s) + ? $known_keys->{$key}->[0] eq 'true' + : Git::config_bool(@repo, $key); next unless defined $v; next if $configured->{$setting}++; $$target = $v; @@ -354,14 +357,12 @@ sub read_config { my $key = "$prefix.$setting"; next unless exists $known_keys->{$key}; if (ref($target) eq "ARRAY") { - my @values = Git::config(@repo, $key); - next unless @values; + my @values = @{$known_keys->{$key}}; next if $configured->{$setting}++; @$target = @values; } else { - my $v = Git::config(@repo, $key); - next unless defined $v; + my $v = $known_keys->{$key}->[0]; next if $configured->{$setting}++; $$target = $v; } @@ -372,8 +373,10 @@ sub read_config { # parses 'bool' etc.) by only doing so for config keys that exist. my %known_config_keys; { - my @known_config_keys = Git::config_regexp("^sende?mail[.]"); - @known_config_keys{@known_config_keys} = (); + my @kv = Git::config_regexp("^sende?mail[.]"); + while (my ($k, $v) = splice @kv, 0, 2) { + push @{$known_config_keys{$k}} => $v; + } } # sendemail.identity yields to --identity. We must parse this diff --git a/perl/Git.pm b/perl/Git.pm index f18852fb09..a9020d0d01 100644 --- a/perl/Git.pm +++ b/perl/Git.pm @@ -739,18 +739,18 @@ sub config_int { =item config_regexp ( RE ) Retrieve the list of configuration key names matching the regular -expression C<RE>. The return value is a list of strings matching -this regex. +expression C<RE>. The return value is an ARRAY of key-value pairs. =cut sub config_regexp { my ($self, $regex) = _maybe_self(@_); try { - my @cmd = ('config', '--name-only', '--get-regexp', $regex); + my @cmd = ('config', '--null', '--get-regexp', $regex); unshift @cmd, $self if $self; - my @matches = command(@cmd); - return @matches; + my $data = command(@cmd); + my (@kv) = map { split /\n/, $_, 2 } split /\0/, $data; + return @kv; } catch Git::Error::Command with { my $E = shift; if ($E->value() == 1) {
Optimize the startup time of git-send-email by using an amended config_regexp() function to retrieve the list of config keys and values we're interested in. For boolean keys we can handle the [true|false] case ourselves, and the "--get" case didn't need any parsing. Let's leave "--path" and other "--bool" cases to "git config". As noted in a preceding commit we're free to change the config_regexp() function, it's only used by "git send-email". This brings the runtime of "git send-email" from ~60-~70ms to a very steady ~40ms on my test box. We no run just one "git config" invocation on startup instead of 8, the exact number will differ based on the local sendemail.* config. I happen to have 8 of those set. This brings the runtime of t9001-send-email.sh from ~13s down to ~12s for me. The change there is less impressive as many of those tests set various config values, and we're also getting to the point of diminishing returns for optimizing "git send-email" itself. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- git-send-email.perl | 17 ++++++++++------- perl/Git.pm | 10 +++++----- 2 files changed, 15 insertions(+), 12 deletions(-)