diff mbox series

[v2] refs: remove lookup cache for reference-transaction hook

Message ID c1cae6dd19ffe00e4456e4f96ad92277ceeced27.1598349284.git.ps@pks.im (mailing list archive)
State Accepted
Commit 6ddd76fd6c356c037b5d5272732900f1f952721e
Headers show
Series [v2] refs: remove lookup cache for reference-transaction hook | expand

Commit Message

Patrick Steinhardt Aug. 25, 2020, 10:35 a.m. UTC
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.

To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.79(2.16+0.74)   2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin   0.22(0.08+0.14)   0.21(0.08+0.12) -4.5%

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref           2.70(2.09+0.72)   2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin   0.21(0.10+0.10)   0.21(0.08+0.13) +0.0%

One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:

Test                         master            pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push       6.63(6.50+0.13)   6.81(6.67+0.14) +2.7%
1400.4: nonatomic push       6.35(6.21+0.14)   6.39(6.23+0.16) +0.6%
1400.4: nonatomic push       6.43(6.31+0.13)   6.42(6.28+0.15) -0.2%

So let's just remove the cache altogether to simplify the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

The only change compared to v1 is that I've addressed the unportable
`branch-{1..1000}` syntax in favor of `test_seq`. I had to setup refs as
part of the setup and change the ordering for "update-ref --stdin" from
create/update/delete to update/delete/create, but I don't think that's
too bad. At least timings didn't seem to really change because of that.

 refs.c                     | 11 ++---------
 t/perf/p1400-update-ref.sh | 13 ++++++++++---
 2 files changed, 12 insertions(+), 12 deletions(-)

Comments

Jeff King Aug. 25, 2020, 3:10 p.m. UTC | #1
On Tue, Aug 25, 2020 at 12:35:24PM +0200, Patrick Steinhardt wrote:

> The only change compared to v1 is that I've addressed the unportable
> `branch-{1..1000}` syntax in favor of `test_seq`. I had to setup refs as
> part of the setup and change the ordering for "update-ref --stdin" from
> create/update/delete to update/delete/create, but I don't think that's
> too bad. At least timings didn't seem to really change because of that.

Another option instead of changing the order in the other tests is to do
another untimed setup step before the push test. I'm OK either way,
though.

> +test_perf "nonatomic push" '
> +	git push ./target-repo.git $(test_seq 1000) &&
> +	git push --delete ./target-repo.git $(test_seq 1000)
>  '

This works as far as Git is concerned, but "seq 1000" output with NULs
is 3893 bytes. I wonder if some platforms might run into command-line
limits there. I guess we will see when Windows CI runs. :)

-Peff
Junio C Hamano Aug. 25, 2020, 6:09 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Tue, Aug 25, 2020 at 12:35:24PM +0200, Patrick Steinhardt wrote:
>
>> The only change compared to v1 is that I've addressed the unportable
>> `branch-{1..1000}` syntax in favor of `test_seq`. I had to setup refs as
>> part of the setup and change the ordering for "update-ref --stdin" from
>> create/update/delete to update/delete/create, but I don't think that's
>> too bad. At least timings didn't seem to really change because of that.
>
> Another option instead of changing the order in the other tests is to do
> another untimed setup step before the push test. I'm OK either way,
> though.
>
>> +test_perf "nonatomic push" '
>> +	git push ./target-repo.git $(test_seq 1000) &&
>> +	git push --delete ./target-repo.git $(test_seq 1000)
>>  '
>
> This works as far as Git is concerned, but "seq 1000" output with NULs
> is 3893 bytes. I wonder if some platforms might run into command-line
> limits there.

That was my thought when I saw the above as well.  In addition, I do
not think it is a good idea to encourage digit-only refnames.
Jeff King Aug. 25, 2020, 6:29 p.m. UTC | #3
On Tue, Aug 25, 2020 at 11:09:54AM -0700, Junio C Hamano wrote:

> >> +test_perf "nonatomic push" '
> >> +	git push ./target-repo.git $(test_seq 1000) &&
> >> +	git push --delete ./target-repo.git $(test_seq 1000)
> >>  '
> >
> > This works as far as Git is concerned, but "seq 1000" output with NULs
> > is 3893 bytes. I wonder if some platforms might run into command-line
> > limits there.
> 
> That was my thought when I saw the above as well.  In addition, I do
> not think it is a good idea to encourage digit-only refnames.

Good point. It gets hairy at four digits:

  $ git show 1000
  error: short SHA1 1000 is ambiguous
  hint: The candidates are:
  hint:   10000434d2 tree
  hint:   10007bcb9e tree
  hint:   10008a0e22 tree
  hint:   1000bdf512 tree
  hint:   1000dc2368 blob
  fatal: ambiguous argument '1000': unknown revision or path not in the working tree.

So I think if the test works it may be relying on the exact object ids
that happen to be generated (which fortunately are at least
deterministic these days, but it may be a trap waiting to spring for
somebody later).

-Peff
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index cf91711968..cb9bfc5c5c 100644
--- a/refs.c
+++ b/refs.c
@@ -1924,24 +1924,17 @@  int ref_update_reject_duplicates(struct string_list *refnames,
 	return 0;
 }
 
-static const char hook_not_found;
-static const char *hook;
-
 static int run_transaction_hook(struct ref_transaction *transaction,
 				const char *state)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
 	struct strbuf buf = STRBUF_INIT;
+	const char *hook;
 	int ret = 0, i;
 
-	if (hook == &hook_not_found)
-		return ret;
+	hook = find_hook("reference-transaction");
 	if (!hook)
-		hook = xstrdup_or_null(find_hook("reference-transaction"));
-	if (!hook) {
-		hook = &hook_not_found;
 		return ret;
-	}
 
 	strvec_pushl(&proc.args, hook, state, NULL);
 	proc.in = -1;
diff --git a/t/perf/p1400-update-ref.sh b/t/perf/p1400-update-ref.sh
index d275a81248..ce5ac3ed85 100755
--- a/t/perf/p1400-update-ref.sh
+++ b/t/perf/p1400-update-ref.sh
@@ -7,11 +7,13 @@  test_description="Tests performance of update-ref"
 test_perf_fresh_repo
 
 test_expect_success "setup" '
+	git init --bare target-repo.git &&
 	test_commit PRE &&
 	test_commit POST &&
 	printf "create refs/heads/%d PRE\n" $(test_seq 1000) >create &&
 	printf "update refs/heads/%d POST PRE\n" $(test_seq 1000) >update &&
-	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete
+	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete &&
+	git update-ref --stdin <create
 '
 
 test_perf "update-ref" '
@@ -24,9 +26,14 @@  test_perf "update-ref" '
 '
 
 test_perf "update-ref --stdin" '
-	git update-ref --stdin <create &&
 	git update-ref --stdin <update &&
-	git update-ref --stdin <delete
+	git update-ref --stdin <delete &&
+	git update-ref --stdin <create
+'
+
+test_perf "nonatomic push" '
+	git push ./target-repo.git $(test_seq 1000) &&
+	git push --delete ./target-repo.git $(test_seq 1000)
 '
 
 test_done