diff mbox series

[v4] refs: implement reference transaction hook

Message ID 55c58e9b09691dc11dea911f26424594fb3922c9.1592549570.git.ps@pks.im (mailing list archive)
State New, archived
Headers show
Series [v4] refs: implement reference transaction hook | expand

Commit Message

Patrick Steinhardt June 19, 2020, 6:56 a.m. UTC
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.

Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.

In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:

  Test                         origin/master     HEAD
  --------------------------------------------------------------------
  1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
  1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%

The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

The only change was simplified error handling as proposed by Junio.
Thanks for your feedback, the result definitely looks easier to read to
me.

Range-diff against v3:
1:  1de96b96e3 ! 1:  55c58e9b09 refs: implement reference transaction hook
    @@ refs.c: int ref_update_reject_duplicates(struct string_list *refnames,
     +{
     +	struct child_process proc = CHILD_PROCESS_INIT;
     +	struct strbuf buf = STRBUF_INIT;
    -+	int saved_errno = 0, ret, i;
    ++	int ret = 0, i;
     +
     +	if (hook == &hook_not_found)
    -+		return 0;
    ++		return ret;
     +	if (!hook)
     +		hook = find_hook("reference-transaction");
     +	if (!hook) {
     +		hook = &hook_not_found;
    -+		return 0;
    ++		return ret;
     +	}
     +
     +	argv_array_pushl(&proc.args, hook, state, NULL);
    @@ refs.c: int ref_update_reject_duplicates(struct string_list *refnames,
     +
     +		if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
     +			if (errno != EPIPE)
    -+				saved_errno = errno;
    ++				ret = -1;
     +			break;
     +		}
     +	}
    @@ refs.c: int ref_update_reject_duplicates(struct string_list *refnames,
     +	sigchain_pop(SIGPIPE);
     +	strbuf_release(&buf);
     +
    -+	ret = finish_command(&proc);
    -+	if (ret)
    -+		return ret;
    -+
    -+	return saved_errno;
    ++	ret |= finish_command(&proc);
    ++	return ret;
     +}
     +
      int ref_transaction_prepare(struct ref_transaction *transaction,

 Documentation/githooks.txt       |  29 ++++++++
 refs.c                           |  76 ++++++++++++++++++++-
 t/perf/p1400-update-ref.sh       |  32 +++++++++
 t/t1416-ref-transaction-hooks.sh | 109 +++++++++++++++++++++++++++++++
 4 files changed, 244 insertions(+), 2 deletions(-)
 create mode 100755 t/perf/p1400-update-ref.sh
 create mode 100755 t/t1416-ref-transaction-hooks.sh

Comments

Jeff King Oct. 23, 2020, 1:03 a.m. UTC | #1
On Fri, Jun 19, 2020 at 08:56:14AM +0200, Patrick Steinhardt wrote:

> The above scenario is the motivation for the new "reference-transaction"
> hook that reaches directly into Git's reference transaction mechanism.
> The hook receives as parameter the current state the transaction was
> moved to ("prepared", "committed" or "aborted") and gets via its
> standard input all queued reference updates. While the exit code gets
> ignored in the "committed" and "aborted" states, a non-zero exit code in
> the "prepared" state will cause the transaction to be aborted
> prematurely.

I know this has been merged for a while, but Taylor and I were looking
at it today and came across a question. The docs say:

> +For each reference update that was added to the transaction, the hook
> +receives on standard input a line of the format:
> +
> +  <old-value> SP <new-value> SP <ref-name> LF

but that doesn't define <old-value>. I take it to mean the current value
of the ref before the proposed update. But in the tests:

> diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> new file mode 100755
> index 0000000000..da58d867a5
> --- /dev/null
> +++ b/t/t1416-ref-transaction-hooks.sh
> [...]
> +test_expect_success 'hook gets all queued updates in prepared state' '
> +	test_when_finished "rm .git/hooks/reference-transaction actual" &&
> +	git reset --hard PRE &&
> +	write_script .git/hooks/reference-transaction <<-\EOF &&
> +		if test "$1" = prepared
> +		then
> +			while read -r line
> +			do
> +				printf "%s\n" "$line"
> +			done >actual
> +		fi
> +	EOF
> +	cat >expect <<-EOF &&
> +		$ZERO_OID $POST_OID HEAD
> +		$ZERO_OID $POST_OID refs/heads/master
> +	EOF

We are expecting to see $ZERO_OID in that slot, even though the current
value of the ref is "PRE" due to our reset above.

Should this be $PRE_OID (we don't have that variable, but it would be
the result of "git rev-parse PRE")?

I could alternatively see an argument that <old-value> is the old-value
that the caller asked for. So seeing $ZERO_OID is saying that the caller
wants to move from _anything_ to $POST_OID, and we're not willing to
tell the hook what the current value actually is.

We could actually fill in the current value for zero cost. The reason we
found this is that we have a custom patch at GitHub that fills in these
values when we open the ref after locking.

In real usage, I'm not sure how much the distinction would matter,
because any careful caller would provide a non-zero "old" value. And if
that doesn't match the current value, we'd reject the transaction before
we even hit the hook, I think. It's only the fact that the update-ref
calls are sloppy and do not provide an expected old value that it even
matters.

So I wonder if:

diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index f6e741c6c0..8155522a1a 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -9,6 +9,7 @@ test_expect_success setup '
 	test_commit PRE &&
 	PRE_OID=$(git rev-parse PRE) &&
 	test_commit POST &&
+	PRE_OID=$(git rev-parse PRE) &&
 	POST_OID=$(git rev-parse POST)
 '
 
@@ -52,10 +53,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
 		fi
 	EOF
 	cat >expect <<-EOF &&
-		$ZERO_OID $POST_OID HEAD
-		$ZERO_OID $POST_OID refs/heads/master
+		$PRE_OID $POST_OID HEAD
+		$PRE_OID $POST_OID refs/heads/master
 	EOF
-	git update-ref HEAD POST <<-EOF &&
+	git update-ref HEAD POST POST <<-EOF &&
 		update HEAD $ZERO_OID $POST_OID
 		update refs/heads/master $ZERO_OID $POST_OID
 	EOF
@@ -75,10 +76,10 @@ test_expect_success 'hook gets all queued updates in committed state' '
 		fi
 	EOF
 	cat >expect <<-EOF &&
-		$ZERO_OID $POST_OID HEAD
-		$ZERO_OID $POST_OID refs/heads/master
+		$PRE_OID $POST_OID HEAD
+		$PRE_OID $POST_OID refs/heads/master
 	EOF
-	git update-ref HEAD POST &&
+	git update-ref HEAD POST PRE &&
 	test_cmp expect actual
 '
 

would be a step forward. This isn't changing the actual behavior,
obviously. It's just tweaking the test so that it tests the more likely
real-world case.  But we'd possibly want to actually change the behavior
to always send the actual ref value to the hook.

-Peff
Junio C Hamano Oct. 23, 2020, 3:59 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> So I wonder if:
>
> diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> index f6e741c6c0..8155522a1a 100755
> --- a/t/t1416-ref-transaction-hooks.sh
> +++ b/t/t1416-ref-transaction-hooks.sh
> @@ -9,6 +9,7 @@ test_expect_success setup '
>  	test_commit PRE &&
>  	PRE_OID=$(git rev-parse PRE) &&
>  	test_commit POST &&
> +	PRE_OID=$(git rev-parse PRE) &&
>  	POST_OID=$(git rev-parse POST)
>  '
>  
> @@ -52,10 +53,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
>  		fi
>  	EOF
>  	cat >expect <<-EOF &&
> -		$ZERO_OID $POST_OID HEAD
> -		$ZERO_OID $POST_OID refs/heads/master
> +		$PRE_OID $POST_OID HEAD
> +		$PRE_OID $POST_OID refs/heads/master
>  	EOF
> -	git update-ref HEAD POST <<-EOF &&
> +	git update-ref HEAD POST POST <<-EOF &&
>  		update HEAD $ZERO_OID $POST_OID
>  		update refs/heads/master $ZERO_OID $POST_OID
>  	EOF
> @@ -75,10 +76,10 @@ test_expect_success 'hook gets all queued updates in committed state' '
>  		fi
>  	EOF
>  	cat >expect <<-EOF &&
> -		$ZERO_OID $POST_OID HEAD
> -		$ZERO_OID $POST_OID refs/heads/master
> +		$PRE_OID $POST_OID HEAD
> +		$PRE_OID $POST_OID refs/heads/master
>  	EOF
> -	git update-ref HEAD POST &&
> +	git update-ref HEAD POST PRE &&
>  	test_cmp expect actual
>  '

I think that makes a lot of sense.  In addition, 

> ...  But we'd possibly want to actually change the behavior
> to always send the actual ref value to the hook.

I offhand do not see a reason why we shouldn't do that.

Thanks for carefully thinking things through.
Taylor Blau Oct. 23, 2020, 7:57 p.m. UTC | #3
On Thu, Oct 22, 2020 at 08:59:23PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> [...]
> I think that makes a lot of sense.  In addition,
>
> > ...  But we'd possibly want to actually change the behavior
> > to always send the actual ref value to the hook.
>
> I offhand do not see a reason why we shouldn't do that.

I can't see a reason either, but teaching the new ref transaction hook
about these updates gets a little tricky. In particular, we update the
old_oid for ref updates that were split off with something like:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 04e85e7002..9c105a376b 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2484,9 +2484,20 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 		     parent_update = parent_update->parent_update) {
 			struct ref_lock *parent_lock = parent_update->backend_data;
 			oidcpy(&parent_lock->old_oid, &lock->old_oid);
+			/*
+			 * Now the parent knows its old OID too;
+			 * record that fact for logging.
+			 */
+			parent_update->flags |= REF_HAVE_OLD;
 		}
 	}

+	/* Now we know the old value. Record that fact for logging. */
+	if (!(update->flags & REF_HAVE_OLD)) {
+		oidcpy(&update->old_oid, &lock->old_oid);
+		update->flags |= REF_HAVE_OLD;
+	}
+
 	if ((update->flags & REF_HAVE_NEW) &&
 	    !(update->flags & REF_DELETING) &&
 	    !(update->flags & REF_LOG_ONLY)) {

...but by that time we have already recorded via an oidcpy the old and
new state of HEAD. So, Peff's patch passes in isolation, but applying
this on top produces failures in t1416 like:

--- expect	2020-10-23 19:56:15.649101024 +0000
+++ actual	2020-10-23 19:56:15.657100941 +0000
@@ -1,2 +1,2 @@
-63ac8e7bcdb882293465435909f54a96de17d4f7 99d53161c3a0a903b6561b9f6c0c665b3a476401 HEAD
+0000000000000000000000000000000000000000 99d53161c3a0a903b6561b9f6c0c665b3a476401 HEAD
 63ac8e7bcdb882293465435909f54a96de17d4f7 99d53161c3a0a903b6561b9f6c0c665b3a476401 refs/heads/master

It should be possible to keep track of the old and new OIDs via a
non-copied pointer, but I have to untangle this code a little bit more
before I can be sure.

> Thanks for carefully thinking things through.

Thanks,
Taylor
Taylor Blau Oct. 23, 2020, 8 p.m. UTC | #4
On Thu, Oct 22, 2020 at 09:03:15PM -0400, Jeff King wrote:
> We are expecting to see $ZERO_OID in that slot, even though the current
> value of the ref is "PRE" due to our reset above.
>
> Should this be $PRE_OID (we don't have that variable, but it would be
> the result of "git rev-parse PRE")?

I also skimmed past it, but we do already have $PRE_OID, which is
convenient.

> I could alternatively see an argument that <old-value> is the old-value
> that the caller asked for. So seeing $ZERO_OID is saying that the caller
> wants to move from _anything_ to $POST_OID, and we're not willing to
> tell the hook what the current value actually is.
>
> We could actually fill in the current value for zero cost. The reason we
> found this is that we have a custom patch at GitHub that fills in these
> values when we open the ref after locking.

Yup, modulo being easy for symrefs (which I talk about in [1]), but it
shouldn't be impossible.

[1]: https://lore.kernel.org/git/X5M1oe4lfkUy9lAh@nand.local

> In real usage, I'm not sure how much the distinction would matter,
> because any careful caller would provide a non-zero "old" value. And if
> that doesn't match the current value, we'd reject the transaction before
> we even hit the hook, I think. It's only the fact that the update-ref
> calls are sloppy and do not provide an expected old value that it even
> matters.
>
> So I wonder if:
>
> diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> index f6e741c6c0..8155522a1a 100755
> --- a/t/t1416-ref-transaction-hooks.sh
> +++ b/t/t1416-ref-transaction-hooks.sh
> @@ -9,6 +9,7 @@ test_expect_success setup '
>  	test_commit PRE &&
>  	PRE_OID=$(git rev-parse PRE) &&
>  	test_commit POST &&
> +	PRE_OID=$(git rev-parse PRE) &&
>  	POST_OID=$(git rev-parse POST)
>  '
>
> @@ -52,10 +53,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
>  		fi
>  	EOF
>  	cat >expect <<-EOF &&
> -		$ZERO_OID $POST_OID HEAD
> -		$ZERO_OID $POST_OID refs/heads/master
> +		$PRE_OID $POST_OID HEAD
> +		$PRE_OID $POST_OID refs/heads/master
>  	EOF
> -	git update-ref HEAD POST <<-EOF &&
> +	git update-ref HEAD POST POST <<-EOF &&

This should be "git update-ref HEAD POST PRE", since PRE is the before
state.

> would be a step forward. This isn't changing the actual behavior,
> obviously. It's just tweaking the test so that it tests the more likely
> real-world case.  But we'd possibly want to actually change the behavior
> to always send the actual ref value to the hook.

I have to look at the issue in [1] a little bit more to determine
whether or not it requires major surgery. If it does, then I'd be fine
going forward with just your patch to the tests. If it doesn't, then
updating the refs machinery to invoke the hook with the before OIDs
correctly even for symrefs seems sensible to me.

> -Peff

Thanks,
Taylor
Taylor Blau Oct. 23, 2020, 10:07 p.m. UTC | #5
On Fri, Oct 23, 2020 at 03:57:21PM -0400, Taylor Blau wrote:
> It should be possible to keep track of the old and new OIDs via a
> non-copied pointer, but I have to untangle this code a little bit more
> before I can be sure.

This may be both easier and harder than I was imagining ;-). In the
easier direction, the following patch is sufficient to get the tests
passing again:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 04e85e7002..744c93b7ff 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2311,12 +2311,11 @@ static int split_symref_update(struct ref_update *update,
 	new_update->parent_update = update;

 	/*
-	 * Change the symbolic ref update to log only. Also, it
-	 * doesn't need to check its old OID value, as that will be
-	 * done when new_update is processed.
+	 * Change the symbolic ref update to log only. Though we don't
+	 * need to check its old OID value, leave REF_HAVE_OLD alone so
+	 * we can propagate it to the ref transaction hook.
 	 */
 	update->flags |= REF_LOG_ONLY | REF_NO_DEREF;
-	update->flags &= ~REF_HAVE_OLD;

 	/*
 	 * Add the referent. This insertion is O(N) in the transaction

But, it also means that we're now needlessly re-verifying the before
state of symrefs (or so I think, I haven't yet proved it one way or the
other).

So, I need to look into that before deciding if this is a good direction
to go.

Thanks,
Taylor
Patrick Steinhardt Oct. 26, 2020, 7:43 a.m. UTC | #6
On Thu, Oct 22, 2020 at 08:59:23PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > So I wonder if:
> >
> > diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> > index f6e741c6c0..8155522a1a 100755
> > --- a/t/t1416-ref-transaction-hooks.sh
> > +++ b/t/t1416-ref-transaction-hooks.sh
> > @@ -9,6 +9,7 @@ test_expect_success setup '
> >  	test_commit PRE &&
> >  	PRE_OID=$(git rev-parse PRE) &&
> >  	test_commit POST &&
> > +	PRE_OID=$(git rev-parse PRE) &&
> >  	POST_OID=$(git rev-parse POST)
> >  '
> >  
> > @@ -52,10 +53,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
> >  		fi
> >  	EOF
> >  	cat >expect <<-EOF &&
> > -		$ZERO_OID $POST_OID HEAD
> > -		$ZERO_OID $POST_OID refs/heads/master
> > +		$PRE_OID $POST_OID HEAD
> > +		$PRE_OID $POST_OID refs/heads/master
> >  	EOF
> > -	git update-ref HEAD POST <<-EOF &&
> > +	git update-ref HEAD POST POST <<-EOF &&
> >  		update HEAD $ZERO_OID $POST_OID
> >  		update refs/heads/master $ZERO_OID $POST_OID
> >  	EOF
> > @@ -75,10 +76,10 @@ test_expect_success 'hook gets all queued updates in committed state' '
> >  		fi
> >  	EOF
> >  	cat >expect <<-EOF &&
> > -		$ZERO_OID $POST_OID HEAD
> > -		$ZERO_OID $POST_OID refs/heads/master
> > +		$PRE_OID $POST_OID HEAD
> > +		$PRE_OID $POST_OID refs/heads/master
> >  	EOF
> > -	git update-ref HEAD POST &&
> > +	git update-ref HEAD POST PRE &&
> >  	test_cmp expect actual
> >  '
> 
> I think that makes a lot of sense.  In addition, 
> 
> > ...  But we'd possibly want to actually change the behavior
> > to always send the actual ref value to the hook.
> 
> I offhand do not see a reason why we shouldn't do that.
> 
> Thanks for carefully thinking things through.

Thanks for having a look! I agree, changing this seems sensible to me.
In the end, the intention of the hook is to have a single script which
is able to verify all reference updates, no matter where they come from.
And behaving differently based on whether the user passed the zero OID
or not doesn't seem useful to me in that context.

We should also a better job than I did and describe what the old OID
actually is in the documentation.

@Taylor, given that you've already dug into the code: do you already
have plans to post a patch for this?

Patrick
Taylor Blau Oct. 26, 2020, 11:52 p.m. UTC | #7
On Mon, Oct 26, 2020 at 08:43:03AM +0100, Patrick Steinhardt wrote:
> @Taylor, given that you've already dug into the code: do you already
> have plans to post a patch for this?

You are likely in a better position to do that than I am. I am
unfamiliar enough with the refs.c code to feel confident that my change
is correct, let alone working. The combination of REF_HAVE_OLD, the lock
OID, the update OID, and so on is very puzzling.

Ordinarily, I'd be happy to post a patch after familiarizing myself, but
right now I don't have the time. So, I may come back to this in six
months, but I certainly won't feel bad if you beat me to it ;-).

In the meantime, I'd be fine to apply Peff's patch with some fix-ups,
maybe something like what's below the scissors line.

Taylor

--- >8 ---

Subject: [PATCH] t1416: specify pre-state for ref-updates

The ref-transaction hook documentation says that the expected format for
each line is:

  <old-value> SP <new-value> SP <ref-name> LF

without defining what <old-value> is. It could either be the current
state of the refs (after locking, but before committing the
transaction), or the optional caller-provided pre-state.

If <old-value> is to mean the caller-provided pre-state, than $ZERO_OID
could be taken to mean that the update is allowed to take place without
requiring the ref to be at some state. On the other hand, if <old-value>
is taken to mean "the current value of the reference", then that
requires a behavior change.

But that may only be semi-realistic, since any careful callers are
likely to pass a pre-state around anyway, and failing to meet that
pre-state means the hook will not even be invoked.

So, tweak the tests to more closely match how callers will actually
invoke this hook by providing a pre-state explicitly and then asserting
that it made its way down to the ref-transaction hook.

If we do decide to go further and implement a behavior change, it would
make sense to modify the tests to instead look something like:

  for before in "$PRE" ""
  do
    cat >expect <<-EOF &&
      $ZERO_OID $POST_OID HEAD
      $ZERO_OID $POST_OID refs/heads/master
      $PRE_OID $POST_OID HEAD
      $PRE_OID $POST_OID refs/heads/master
    EOF
    git reset --hard $PRE &&
    git update-ref HEAD POST $before &&
    test_cmp expect actual
  done

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t1416-ref-transaction-hooks.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index f6e741c6c0..74f94e293c 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -52,10 +52,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
 		fi
 	EOF
 	cat >expect <<-EOF &&
-		$ZERO_OID $POST_OID HEAD
-		$ZERO_OID $POST_OID refs/heads/master
+		$PRE_OID $POST_OID HEAD
+		$PRE_OID $POST_OID refs/heads/master
 	EOF
-	git update-ref HEAD POST <<-EOF &&
+	git update-ref HEAD POST PRE <<-EOF &&
 		update HEAD $ZERO_OID $POST_OID
 		update refs/heads/master $ZERO_OID $POST_OID
 	EOF
@@ -75,10 +75,10 @@ test_expect_success 'hook gets all queued updates in committed state' '
 		fi
 	EOF
 	cat >expect <<-EOF &&
-		$ZERO_OID $POST_OID HEAD
-		$ZERO_OID $POST_OID refs/heads/master
+		$PRE_OID $POST_OID HEAD
+		$PRE_OID $POST_OID refs/heads/master
 	EOF
-	git update-ref HEAD POST &&
+	git update-ref HEAD POST PRE &&
 	test_cmp expect actual
 '

--
2.29.1.9.g605042ee00
Jeff King Oct. 27, 2020, 5:37 a.m. UTC | #8
On Mon, Oct 26, 2020 at 07:52:23PM -0400, Taylor Blau wrote:

> On Mon, Oct 26, 2020 at 08:43:03AM +0100, Patrick Steinhardt wrote:
> > @Taylor, given that you've already dug into the code: do you already
> > have plans to post a patch for this?
> 
> You are likely in a better position to do that than I am. I am
> unfamiliar enough with the refs.c code to feel confident that my change
> is correct, let alone working. The combination of REF_HAVE_OLD, the lock
> OID, the update OID, and so on is very puzzling.
> 
> Ordinarily, I'd be happy to post a patch after familiarizing myself, but
> right now I don't have the time. So, I may come back to this in six
> months, but I certainly won't feel bad if you beat me to it ;-).
> 
> In the meantime, I'd be fine to apply Peff's patch with some fix-ups,
> maybe something like what's below the scissors line.

Thanks for moving this forward. I'm definitely OK with leaving the code
for now and just tightening the test. But there is one curiosity, still.
Your patch tweaks two tests:

> @@ -52,10 +52,10 @@ test_expect_success 'hook gets all queued updates in prepared state' '
> [...]
> @@ -75,10 +75,10 @@ test_expect_success 'hook gets all queued updates in committed state' '

but there's a third one, which checks the hook's view of the "aborted"
state. And there we _do_ pass in an expected state which is not
$PRE_OID. But it's $ZERO_OID, so we can't tell if if the hook is getting
what we passed in, or some sensible value.

If we instead do this:

diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index 74f94e293c..a7ed983d3c 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -94,14 +94,15 @@ test_expect_success 'hook gets all queued updates in aborted state' '
 			done >actual
 		fi
 	EOF
+	nonsense_oid=$(echo $ZERO_OID | tr 0 a) &&
 	cat >expect <<-EOF &&
-		$ZERO_OID $POST_OID HEAD
-		$ZERO_OID $POST_OID refs/heads/master
+		$nonsense_oid $POST_OID HEAD
+		$nonsense_oid $POST_OID refs/heads/master
 	EOF
 	git update-ref --stdin <<-EOF &&
 		start
-		update HEAD POST $ZERO_OID
-		update refs/heads/master POST $ZERO_OID
+		update HEAD POST $nonsense_oid
+		update refs/heads/master POST $nonsense_oid
 		abort
 	EOF
 	test_cmp expect actual

Then test still passes, because it's passing along the value we
provided. And I think that's the only sensible thing we can do: show the
hook the proposed update. There's little point in telling it what the
actual ref values were, I would think.

So I think it's worth adding in to the test, but:

  - probably that meaning of old-value needs to be discussed in more
    detail in the hook documentation, because it feels like it differs a
    bit in the "aborted" case versus the "committed" case

  - we'd want to make sure this keeps passing if further changes to the
    code are made to support the case without an expected state
    specified (and not, say, accidentally passing $PRE_OID to the hook)

  - we'd likewise want eventually a test for the case without an
    expected state (which I guess would actually pass $ZERO_OID to the
    hook). Of course we're using a mismatch of the expected value as the
    reason to abort, so we'd have to find another reliable way to make
    the transaction abort. :) Perhaps a refname with illegal characters,
    or trying to create "foo/bar" when "foo" exists (or vice versa).

Most of that can be bumped to later, but I think it's worth squashing
something like the hunk above into the patch you posted.

-Peff
Patrick Steinhardt Oct. 28, 2020, 6:22 p.m. UTC | #9
On Mon, Oct 26, 2020 at 07:52:23PM -0400, Taylor Blau wrote:
> On Mon, Oct 26, 2020 at 08:43:03AM +0100, Patrick Steinhardt wrote:
> > @Taylor, given that you've already dug into the code: do you already
> > have plans to post a patch for this?
> 
> You are likely in a better position to do that than I am. I am
> unfamiliar enough with the refs.c code to feel confident that my change
> is correct, let alone working. The combination of REF_HAVE_OLD, the lock
> OID, the update OID, and so on is very puzzling.
> 
> Ordinarily, I'd be happy to post a patch after familiarizing myself, but
> right now I don't have the time. So, I may come back to this in six
> months, but I certainly won't feel bad if you beat me to it ;-).
> 
> In the meantime, I'd be fine to apply Peff's patch with some fix-ups,
> maybe something like what's below the scissors line.

Fair enough, let's do it like this and submit the test change first.
I'll try to squeeze in doing the hook change soonish, but I'm currently
lacking time myself. So no promise I'll get to it soonish,
unfortunately.

Patrick
diff mbox series

Patch

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 81f2a87e88..642471109f 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -404,6 +404,35 @@  Both standard output and standard error output are forwarded to
 `git send-pack` on the other end, so you can simply `echo` messages
 for the user.
 
+ref-transaction
+~~~~~~~~~~~~~~~
+
+This hook is invoked by any Git command that performs reference
+updates. It executes whenever a reference transaction is prepared,
+committed or aborted and may thus get called multiple times.
+
+The hook takes exactly one argument, which is the current state the
+given reference transaction is in:
+
+    - "prepared": All reference updates have been queued to the
+      transaction and references were locked on disk.
+
+    - "committed": The reference transaction was committed and all
+      references now have their respective new value.
+
+    - "aborted": The reference transaction was aborted, no changes
+      were performed and the locks have been released.
+
+For each reference update that was added to the transaction, the hook
+receives on standard input a line of the format:
+
+  <old-value> SP <new-value> SP <ref-name> LF
+
+The exit status of the hook is ignored for any state except for the
+"prepared" state. In the "prepared" state, a non-zero exit status will
+cause the transaction to be aborted. The hook will not be called with
+"aborted" state in that case.
+
 push-to-checkout
 ~~~~~~~~~~~~~~~~
 
diff --git a/refs.c b/refs.c
index 224ff66c7b..f05295f503 100644
--- a/refs.c
+++ b/refs.c
@@ -9,6 +9,7 @@ 
 #include "iterator.h"
 #include "refs.h"
 #include "refs/refs-internal.h"
+#include "run-command.h"
 #include "object-store.h"
 #include "object.h"
 #include "tag.h"
@@ -16,6 +17,7 @@ 
 #include "worktree.h"
 #include "argv-array.h"
 #include "repository.h"
+#include "sigchain.h"
 
 /*
  * List of all available backends
@@ -1986,10 +1988,65 @@  int ref_update_reject_duplicates(struct string_list *refnames,
 	return 0;
 }
 
+static const char hook_not_found;
+static const char *hook;
+
+static int run_transaction_hook(struct ref_transaction *transaction,
+				const char *state)
+{
+	struct child_process proc = CHILD_PROCESS_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, i;
+
+	if (hook == &hook_not_found)
+		return ret;
+	if (!hook)
+		hook = find_hook("reference-transaction");
+	if (!hook) {
+		hook = &hook_not_found;
+		return ret;
+	}
+
+	argv_array_pushl(&proc.args, hook, state, NULL);
+	proc.in = -1;
+	proc.stdout_to_stderr = 1;
+	proc.trace2_hook_name = "reference-transaction";
+
+	ret = start_command(&proc);
+	if (ret)
+		return ret;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "%s %s %s\n",
+			    oid_to_hex(&update->old_oid),
+			    oid_to_hex(&update->new_oid),
+			    update->refname);
+
+		if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
+			if (errno != EPIPE)
+				ret = -1;
+			break;
+		}
+	}
+
+	close(proc.in);
+	sigchain_pop(SIGPIPE);
+	strbuf_release(&buf);
+
+	ret |= finish_command(&proc);
+	return ret;
+}
+
 int ref_transaction_prepare(struct ref_transaction *transaction,
 			    struct strbuf *err)
 {
 	struct ref_store *refs = transaction->ref_store;
+	int ret;
 
 	switch (transaction->state) {
 	case REF_TRANSACTION_OPEN:
@@ -2012,7 +2069,17 @@  int ref_transaction_prepare(struct ref_transaction *transaction,
 		return -1;
 	}
 
-	return refs->be->transaction_prepare(refs, transaction, err);
+	ret = refs->be->transaction_prepare(refs, transaction, err);
+	if (ret)
+		return ret;
+
+	ret = run_transaction_hook(transaction, "prepared");
+	if (ret) {
+		ref_transaction_abort(transaction, err);
+		die(_("ref updates aborted by hook"));
+	}
+
+	return 0;
 }
 
 int ref_transaction_abort(struct ref_transaction *transaction,
@@ -2036,6 +2103,8 @@  int ref_transaction_abort(struct ref_transaction *transaction,
 		break;
 	}
 
+	run_transaction_hook(transaction, "aborted");
+
 	ref_transaction_free(transaction);
 	return ret;
 }
@@ -2064,7 +2133,10 @@  int ref_transaction_commit(struct ref_transaction *transaction,
 		break;
 	}
 
-	return refs->be->transaction_finish(refs, transaction, err);
+	ret = refs->be->transaction_finish(refs, transaction, err);
+	if (!ret)
+		run_transaction_hook(transaction, "committed");
+	return ret;
 }
 
 int refs_verify_refname_available(struct ref_store *refs,
diff --git a/t/perf/p1400-update-ref.sh b/t/perf/p1400-update-ref.sh
new file mode 100755
index 0000000000..d275a81248
--- /dev/null
+++ b/t/perf/p1400-update-ref.sh
@@ -0,0 +1,32 @@ 
+#!/bin/sh
+
+test_description="Tests performance of update-ref"
+
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+test_expect_success "setup" '
+	test_commit PRE &&
+	test_commit POST &&
+	printf "create refs/heads/%d PRE\n" $(test_seq 1000) >create &&
+	printf "update refs/heads/%d POST PRE\n" $(test_seq 1000) >update &&
+	printf "delete refs/heads/%d POST\n" $(test_seq 1000) >delete
+'
+
+test_perf "update-ref" '
+	for i in $(test_seq 1000)
+	do
+		git update-ref refs/heads/branch PRE &&
+		git update-ref refs/heads/branch POST PRE &&
+		git update-ref -d refs/heads/branch
+	done
+'
+
+test_perf "update-ref --stdin" '
+	git update-ref --stdin <create &&
+	git update-ref --stdin <update &&
+	git update-ref --stdin <delete
+'
+
+test_done
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
new file mode 100755
index 0000000000..da58d867a5
--- /dev/null
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -0,0 +1,109 @@ 
+#!/bin/sh
+
+test_description='reference transaction hooks'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	mkdir -p .git/hooks &&
+	test_commit PRE &&
+	test_commit POST &&
+	POST_OID=$(git rev-parse POST)
+'
+
+test_expect_success 'hook allows updating ref if successful' '
+	test_when_finished "rm .git/hooks/reference-transaction" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+	EOF
+	cat >expect <<-EOF &&
+		prepared
+		committed
+	EOF
+	git update-ref HEAD POST &&
+	test_cmp expect actual
+'
+
+test_expect_success 'hook aborts updating ref in prepared state' '
+	test_when_finished "rm .git/hooks/reference-transaction" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = prepared
+		then
+			exit 1
+		fi
+	EOF
+	test_must_fail git update-ref HEAD POST 2>err &&
+	test_i18ngrep "ref updates aborted by hook" err
+'
+
+test_expect_success 'hook gets all queued updates in prepared state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = prepared
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref HEAD POST <<-EOF &&
+		update HEAD $ZERO_OID $POST_OID
+		update refs/heads/master $ZERO_OID $POST_OID
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'hook gets all queued updates in committed state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = committed
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref HEAD POST &&
+	test_cmp expect actual
+'
+
+test_expect_success 'hook gets all queued updates in aborted state' '
+	test_when_finished "rm .git/hooks/reference-transaction actual" &&
+	git reset --hard PRE &&
+	write_script .git/hooks/reference-transaction <<-\EOF &&
+		if test "$1" = aborted
+		then
+			while read -r line
+			do
+				printf "%s\n" "$line"
+			done >actual
+		fi
+	EOF
+	cat >expect <<-EOF &&
+		$ZERO_OID $POST_OID HEAD
+		$ZERO_OID $POST_OID refs/heads/master
+	EOF
+	git update-ref --stdin <<-EOF &&
+		start
+		update HEAD POST $ZERO_OID
+		update refs/heads/master POST $ZERO_OID
+		abort
+	EOF
+	test_cmp expect actual
+'
+
+test_done