refs: remove lookup cache for reference-transaction hook

Message ID	0db8ad8cdb69afb9d6453bf60a808e8b82382a4e.1597998473.git.ps@pks.im (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=jbtA=B7=vger.kernel.org=git-owner@kernel.org> Date: Fri, 21 Aug 2020 10:29:18 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com> Subject: [PATCH] refs: remove lookup cache for reference-transaction hook Message-ID: <0db8ad8cdb69afb9d6453bf60a808e8b82382a4e.1597998473.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="45Z9DzgjV8m4Oswq" Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk
Series	refs: remove lookup cache for reference-transaction hook \| expand refs: remove lookup cache for reference-transaction hook

Patrick Steinhardt Aug. 21, 2020, 8:29 a.m. UTC

When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.

To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.79(2.16+0.74)   2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin   0.22(0.08+0.14)   0.21(0.08+0.12) -4.5%

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.70(2.09+0.72)   2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin   0.21(0.10+0.10)   0.21(0.08+0.13) +0.0%

One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push       6.63(6.50+0.13)   6.81(6.67+0.14) +2.7%
1400.4: nonatomic push       6.35(6.21+0.14)   6.39(6.23+0.16) +0.6%
1400.4: nonatomic push       6.43(6.31+0.13)   6.42(6.28+0.15) -0.2%

So let's just remove the cache altogether to simplify the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                     | 11 ++---------
 t/perf/p1400-update-ref.sh |  9 ++++++++-
 2 files changed, 10 insertions(+), 10 deletions(-)

Jeff King Aug. 21, 2020, 2:37 p.m. UTC | #1

On Fri, Aug 21, 2020 at 10:29:18AM +0200, Patrick Steinhardt wrote:

> One case notably absent from those benchmarks is a single executable
> searching for the hook hundreds of times, which is exactly the case for
> which the negative cache was added. p1400.2 will spawn a new update-ref
> for each transaction and p1400.3 only has a single reference-transaction
> for all reference updates. So this commit adds a third benchmark, which
> performs an non-atomic push of a thousand references. This will create a
> new reference transaction per reference. But even for this case, the
> negative cache doesn't consistently improve performance:

Ah, right, I forgot that update-ref would use one single transaction. So
what we were testing in our earlier discussion was not even useful. :)

>  test_expect_success "setup" '
> +	git init --bare target-repo.git &&
>  	test_commit PRE &&
>  	test_commit POST &&
>  	printf "create refs/heads/%d PRE\n" $(test_seq 1000) >create &&
>  	printf "update refs/heads/%d POST PRE\n" $(test_seq 1000) >update &&
> -	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete
> +	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete &&
> +	printf "create refs/heads/branch-%d PRE\n" $(test_seq 1000) | git update-ref --stdin
>  '

OK, we need these new branches to have something to push into and delete
from the remote. They might impact the timings of the other tests,
though (since we now have 1000 entries in .git/refs/heads/, which might
affect filesystem performance). But it should do so uniformly, so I
don't think it invalidates their results.

However, I wondered...

> +test_perf "nonatomic push" '
> +	git push ./target-repo.git branch-{1..1000} &&
> +	git push --delete ./target-repo.git branch-{1..1000}
> +'

...if it might make the test more consistent (not to mention isolated
from the cost of other parts of the push) if we used update-ref here, as
well. You added the code necessary to control individual transactions,
so I thought that:

  printf 'start\ncreate refs/heads/%d PRE\ncommit\n' \
    $(test_seq 1000) >create-transaction

might work. But it doesn't, because after the first transaction is
closed, we refuse to accept any other commands. That makes sense for
"prepare", etc, but there's no reason we couldn't start a new one.

Is that worth supporting? It would allow a caller to use a single
update-ref to make a series of non-atomic updates, which is something
that can't currently be done. And we're so close.

Even if it is, though, that's definitely outside the scope of this
patch, and I think we should take it as-is with "push".

-Peff

Junio C Hamano Aug. 21, 2020, 4:42 p.m. UTC | #2

Jeff King <peff@peff.net> writes:

> However, I wondered...
>
>> +test_perf "nonatomic push" '
>> +	git push ./target-repo.git branch-{1..1000} &&
>> +	git push --delete ./target-repo.git branch-{1..1000}
>> +'

Is this a bash-and-ksh-only test?  At least, the above would not try
to push 1000 branches with the version of dash I have.

Jeff King Aug. 21, 2020, 5:21 p.m. UTC | #3

On Fri, Aug 21, 2020 at 09:42:45AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > However, I wondered...
> >
> >> +test_perf "nonatomic push" '
> >> +	git push ./target-repo.git branch-{1..1000} &&
> >> +	git push --delete ./target-repo.git branch-{1..1000}
> >> +'
> 
> Is this a bash-and-ksh-only test?  At least, the above would not try
> to push 1000 branches with the version of dash I have.

Heh, I was so focused on the "push" part of it that I didn't even look
carefully at the second half of the command-line. ;)

I think pushing "refs/heads/branch-*" would work for pushing. For
deletion, though, I don't think we allow wildcards in the refspecs.
You could abuse pruning:

  git push --prune ../dst.git refs/heads/does-not-exist/*:refs/heads/*

It also may be OK to just omit that half of the test. I think the
initial push exercises the case we care about. Though I guess we do run
the test repeatedly, so we might have to do:

  rm -rf dst.git &&
  git init dst.git &&
  git push dst.git refs/heads/branch-*

-Peff

Patrick Steinhardt Aug. 22, 2020, 8:59 a.m. UTC | #4

On Fri, Aug 21, 2020 at 10:37:27AM -0400, Jeff King wrote:
> On Fri, Aug 21, 2020 at 10:29:18AM +0200, Patrick Steinhardt wrote:
> 
> > One case notably absent from those benchmarks is a single executable
> > searching for the hook hundreds of times, which is exactly the case for
> > which the negative cache was added. p1400.2 will spawn a new update-ref
> > for each transaction and p1400.3 only has a single reference-transaction
> > for all reference updates. So this commit adds a third benchmark, which
> > performs an non-atomic push of a thousand references. This will create a
> > new reference transaction per reference. But even for this case, the
> > negative cache doesn't consistently improve performance:
> 
> Ah, right, I forgot that update-ref would use one single transaction. So
> what we were testing in our earlier discussion was not even useful. :)
> 
> >  test_expect_success "setup" '
> > +	git init --bare target-repo.git &&
> >  	test_commit PRE &&
> >  	test_commit POST &&
> >  	printf "create refs/heads/%d PRE\n" $(test_seq 1000) >create &&
> >  	printf "update refs/heads/%d POST PRE\n" $(test_seq 1000) >update &&
> > -	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete
> > +	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete &&
> > +	printf "create refs/heads/branch-%d PRE\n" $(test_seq 1000) | git update-ref --stdin
> >  '
> 
> OK, we need these new branches to have something to push into and delete
> from the remote. They might impact the timings of the other tests,
> though (since we now have 1000 entries in .git/refs/heads/, which might
> affect filesystem performance). But it should do so uniformly, so I
> don't think it invalidates their results.
> 
> However, I wondered...
> 
> > +test_perf "nonatomic push" '
> > +	git push ./target-repo.git branch-{1..1000} &&
> > +	git push --delete ./target-repo.git branch-{1..1000}
> > +'
> 
> ...if it might make the test more consistent (not to mention isolated
> from the cost of other parts of the push) if we used update-ref here, as
> well. You added the code necessary to control individual transactions,
> so I thought that:
> 
>   printf 'start\ncreate refs/heads/%d PRE\ncommit\n' \
>     $(test_seq 1000) >create-transaction
> 
> might work. But it doesn't, because after the first transaction is
> closed, we refuse to accept any other commands. That makes sense for
> "prepare", etc, but there's no reason we couldn't start a new one.
> 
> Is that worth supporting? It would allow a caller to use a single
> update-ref to make a series of non-atomic updates, which is something
> that can't currently be done. And we're so close.

Yeah, I had the exact same thought and I do think it's useful to be able
to create multiple reference transactions per git-update-ref(1) session.
I might whip something up as soon as I find the time to do so, it really
shouldn't be a lot of work.

Patrick

> Even if it is, though, that's definitely outside the scope of this
> patch, and I think we should take it as-is with "push".
> 
> -Peff

Patrick Steinhardt Aug. 22, 2020, 9:02 a.m. UTC | #5

On Fri, Aug 21, 2020 at 01:21:37PM -0400, Jeff King wrote:
> On Fri, Aug 21, 2020 at 09:42:45AM -0700, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > However, I wondered...
> > >
> > >> +test_perf "nonatomic push" '
> > >> +	git push ./target-repo.git branch-{1..1000} &&
> > >> +	git push --delete ./target-repo.git branch-{1..1000}
> > >> +'
> > 
> > Is this a bash-and-ksh-only test?  At least, the above would not try
> > to push 1000 branches with the version of dash I have.

I didn't realize it's shell-specific behaviour, thanks for highlighting.

> Heh, I was so focused on the "push" part of it that I didn't even look
> carefully at the second half of the command-line. ;)
> 
> I think pushing "refs/heads/branch-*" would work for pushing. For
> deletion, though, I don't think we allow wildcards in the refspecs.
> You could abuse pruning:
> 
>   git push --prune ../dst.git refs/heads/does-not-exist/*:refs/heads/*
> 
> It also may be OK to just omit that half of the test. I think the
> initial push exercises the case we care about. Though I guess we do run
> the test repeatedly, so we might have to do:
> 
>   rm -rf dst.git &&
>   git init dst.git &&
>   git push dst.git refs/heads/branch-*

I'm not too keen to use `rm -rf && git init` as it muddies the subject
under test a bit. I'll try to come up with a non-shell-specific version
of this on Monday.

Patrick

refs: remove lookup cache for reference-transaction hook

Commit Message

Comments

Patch