diff mbox series

refs: honor reference-transaction semantics when deleting refs

Message ID pull.1228.git.1651676435634.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series refs: honor reference-transaction semantics when deleting refs | expand

Commit Message

Michael Heemskerk May 4, 2022, 3 p.m. UTC
From: Michael Heemskerk <mheemskerk@atlassian.com>

When deleting refs from the loose-files refs backend, files_delete_refs
first rewrites packed-refs if any of the to-be-deleted refs were packed
and then removes loose refs. While making those changes, it invokes the
reference-transaction hook incorrectly; a single transaction is used to
prepare and commit the changes to packed-refs, followed by another
transaction per deleted ref, each with another prepared and committed
reference-transaction hook invocation.

This behaviour is problematic for a number of reasons. First of all,
deleting a ref through `git branch -d` or `git tag -d` results in a
different sequence of reference-transaction callbacks than deleting the
same ref through `git update-ref`:

- update-ref of a loose ref: aborted, prepared, committed
- update-ref of a packed ref: prepared, prepared, committed, commited
- branch -d: prepared, committed, aborted, prepared, committed

The bigger problem is that the packed-refs update is committed before
the prepared reference-transaction invocation is done for the loose
ref. Returning a non-zero exit code from the second
reference-transaction callback leads to an invalid sequence of
reference-transaction callbacks:

1. prepared - hook returns 0, so the change is allowed to go through.
2. committed - git informs the hook that the changes are now final,
   but are they really? Any loose refs may still survive if the
   subsequent prepared callback is canceled.
3. aborted - 'fake' invocation from the packed-transaction because the
   ref does not exist in packed-refs.
4. prepared - hook returns 1, so the change should be blocked.
5. aborted - git informs the hook that it has rolled back the change,
   but it really hasn't; packed-refs has already been rewritten and
   if the ref only existed as a packed ref, it is now gone.

The changes to the reference-transaction invocations that were planned
for git 2.36 but have been reverted make the problem more pronounced.
Those changes suppress the 'fake' invocations for the packed-transaction
(invocations 1-3 in the above list). In that case, the prepared callback
in step 4 cannot prevent a ref from being deleted if it only existed in
packed-refs.

To address the issue, the use a separate transactions to update the
packed and loose refs has been removed from files_delete_refs. Instead,
it now uses a single transaction, queues up the refs-to-be-deleted and
relies on the standard transaction mechanism to invoke the
reference-transaction hooks as expected.

Signed-off-by: Michael Heemskerk <mheemskerk@atlassian.com>
---
    refs: honor reference-transaction semantics when deleting refs
    
    When deleting refs from the loose-files refs backend, files_delete_refs
    first rewrites packed-refs if any of the to-be-deleted refs were packed
    and then removes loose refs. While making those changes, it invokes the
    reference-transaction hook incorrectly; a single transaction is used to
    prepare and commit the changes to packed-refs, followed by another
    transaction per deleted ref, each with another prepared and committed
    reference-transaction hook invocation.
    
    This behaviour is problematic for a number of reasons. First of all,
    deleting a ref through git branch -d or git tag -d results in a
    different sequence of reference-transaction callbacks than deleting the
    same ref through git update-ref:
    
     * update-ref of a loose ref: aborted, prepared, committed
     * update-ref of a packed ref: prepared, prepared, committed, commited
     * branch -d: prepared, committed, aborted, prepared, committed
    
    The bigger problem is that the packed-refs update is committed before
    the prepared reference-transaction invocation is done for the loose ref.
    Returning a non-zero exit code from the second reference-transaction
    callback leads to an invalid sequence of reference-transaction
    callbacks:
    
     1. prepared - hook returns 0, so the change is allowed to go through.
     2. committed - git informs the hook that the changes are now final, but
        are they really? Any loose refs may still survive if the subsequent
        prepared callback is canceled.
     3. aborted - 'fake' invocation from the packed-transaction because the
        ref does not exist in packed-refs.
     4. prepared - hook returns 1, so the change should be blocked.
     5. aborted - git informs the hook that it has rolled back the change,
        but it really hasn't; packed-refs has already been rewritten and if
        the ref only existed as a packed ref, it is now gone.
    
    The changes to the reference-transaction invocations that were planned
    for git 2.36 but have been reverted make the problem more pronounced.
    Those changes suppress the 'fake' invocations for the packed-transaction
    (invocations 1-3 in the above list). In that case, the prepared callback
    in step 4 cannot prevent a ref from being deleted if it only existed in
    packed-refs.
    
    To address the issue, the use a separate transactions to update the
    packed and loose refs has been removed from files_delete_refs. Instead,
    it now uses a single transaction, queues up the refs-to-be-deleted and
    relies on the standard transaction mechanism to invoke the
    reference-transaction hooks as expected.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1228%2Fmheemskerk%2Ffiles-delete-ref-reference-transaction-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1228/mheemskerk/files-delete-ref-reference-transaction-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1228

 refs/files-backend.c             | 33 +++++++--------
 t/t1416-ref-transaction-hooks.sh | 70 ++++++++++++++++++++++++++++++++
 t/t5510-fetch.sh                 |  6 +--
 3 files changed, 87 insertions(+), 22 deletions(-)


base-commit: 0f828332d5ac36fc63b7d8202652efa152809856

Comments

Patrick Steinhardt May 9, 2022, 6:23 a.m. UTC | #1
On Wed, May 04, 2022 at 03:00:35PM +0000, Michael Heemskerk via GitGitGadget wrote:
> From: Michael Heemskerk <mheemskerk@atlassian.com>
> 
> When deleting refs from the loose-files refs backend, files_delete_refs
> first rewrites packed-refs if any of the to-be-deleted refs were packed
> and then removes loose refs. While making those changes, it invokes the
> reference-transaction hook incorrectly; a single transaction is used to
> prepare and commit the changes to packed-refs, followed by another
> transaction per deleted ref, each with another prepared and committed
> reference-transaction hook invocation.
> 
> This behaviour is problematic for a number of reasons. First of all,
> deleting a ref through `git branch -d` or `git tag -d` results in a
> different sequence of reference-transaction callbacks than deleting the
> same ref through `git update-ref`:
> 
> - update-ref of a loose ref: aborted, prepared, committed
> - update-ref of a packed ref: prepared, prepared, committed, commited
> - branch -d: prepared, committed, aborted, prepared, committed
> 
> The bigger problem is that the packed-refs update is committed before
> the prepared reference-transaction invocation is done for the loose
> ref. Returning a non-zero exit code from the second
> reference-transaction callback leads to an invalid sequence of
> reference-transaction callbacks:
> 
> 1. prepared - hook returns 0, so the change is allowed to go through.
> 2. committed - git informs the hook that the changes are now final,
>    but are they really? Any loose refs may still survive if the
>    subsequent prepared callback is canceled.
> 3. aborted - 'fake' invocation from the packed-transaction because the
>    ref does not exist in packed-refs.
> 4. prepared - hook returns 1, so the change should be blocked.
> 5. aborted - git informs the hook that it has rolled back the change,
>    but it really hasn't; packed-refs has already been rewritten and
>    if the ref only existed as a packed ref, it is now gone.
> 
> The changes to the reference-transaction invocations that were planned
> for git 2.36 but have been reverted make the problem more pronounced.
> Those changes suppress the 'fake' invocations for the packed-transaction
> (invocations 1-3 in the above list). In that case, the prepared callback
> in step 4 cannot prevent a ref from being deleted if it only existed in
> packed-refs.
> 
> To address the issue, the use a separate transactions to update the
> packed and loose refs has been removed from files_delete_refs. Instead,
> it now uses a single transaction, queues up the refs-to-be-deleted and
> relies on the standard transaction mechanism to invoke the
> reference-transaction hooks as expected.
> 
> Signed-off-by: Michael Heemskerk <mheemskerk@atlassian.com>
> ---
>     refs: honor reference-transaction semantics when deleting refs
>     
>     When deleting refs from the loose-files refs backend, files_delete_refs
>     first rewrites packed-refs if any of the to-be-deleted refs were packed
>     and then removes loose refs. While making those changes, it invokes the
>     reference-transaction hook incorrectly; a single transaction is used to
>     prepare and commit the changes to packed-refs, followed by another
>     transaction per deleted ref, each with another prepared and committed
>     reference-transaction hook invocation.
>     
>     This behaviour is problematic for a number of reasons. First of all,
>     deleting a ref through git branch -d or git tag -d results in a
>     different sequence of reference-transaction callbacks than deleting the
>     same ref through git update-ref:
>     
>      * update-ref of a loose ref: aborted, prepared, committed
>      * update-ref of a packed ref: prepared, prepared, committed, commited
>      * branch -d: prepared, committed, aborted, prepared, committed
>     
>     The bigger problem is that the packed-refs update is committed before
>     the prepared reference-transaction invocation is done for the loose ref.
>     Returning a non-zero exit code from the second reference-transaction
>     callback leads to an invalid sequence of reference-transaction
>     callbacks:
>     
>      1. prepared - hook returns 0, so the change is allowed to go through.
>      2. committed - git informs the hook that the changes are now final, but
>         are they really? Any loose refs may still survive if the subsequent
>         prepared callback is canceled.
>      3. aborted - 'fake' invocation from the packed-transaction because the
>         ref does not exist in packed-refs.
>      4. prepared - hook returns 1, so the change should be blocked.
>      5. aborted - git informs the hook that it has rolled back the change,
>         but it really hasn't; packed-refs has already been rewritten and if
>         the ref only existed as a packed ref, it is now gone.
>     
>     The changes to the reference-transaction invocations that were planned
>     for git 2.36 but have been reverted make the problem more pronounced.
>     Those changes suppress the 'fake' invocations for the packed-transaction
>     (invocations 1-3 in the above list). In that case, the prepared callback
>     in step 4 cannot prevent a ref from being deleted if it only existed in
>     packed-refs.
>     
>     To address the issue, the use a separate transactions to update the
>     packed and loose refs has been removed from files_delete_refs. Instead,
>     it now uses a single transaction, queues up the refs-to-be-deleted and
>     relies on the standard transaction mechanism to invoke the
>     reference-transaction hooks as expected.
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1228%2Fmheemskerk%2Ffiles-delete-ref-reference-transaction-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1228/mheemskerk/files-delete-ref-reference-transaction-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1228
> 
>  refs/files-backend.c             | 33 +++++++--------
>  t/t1416-ref-transaction-hooks.sh | 70 ++++++++++++++++++++++++++++++++
>  t/t5510-fetch.sh                 |  6 +--
>  3 files changed, 87 insertions(+), 22 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8db7882aacb..5c23277eda7 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1265,44 +1265,39 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg,
>  	struct files_ref_store *refs =
>  		files_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
>  	struct strbuf err = STRBUF_INIT;
> -	int i, result = 0;
> +	int i;
> +	struct ref_transaction *transaction;
>  
>  	if (!refnames->nr)
>  		return 0;
>  
> -	if (packed_refs_lock(refs->packed_ref_store, 0, &err))
> -		goto error;
> -
> -	if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) {
> -		packed_refs_unlock(refs->packed_ref_store);
> +	transaction = ref_store_transaction_begin(&refs->base, &err);
> +	if (!transaction)
>  		goto error;
> -	}
> -
> -	packed_refs_unlock(refs->packed_ref_store);
>  
>  	for (i = 0; i < refnames->nr; i++) {
>  		const char *refname = refnames->items[i].string;
> -
> -		if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
> -			result |= error(_("could not remove reference %s"), refname);
> +		if (ref_transaction_delete(transaction, refname, NULL,
> +					   flags, msg, &err))
> +			goto error;
>  	}
>  
> +	if (ref_transaction_commit(transaction, &err))
> +		goto error;
> +
> +	ref_transaction_free(transaction);
>  	strbuf_release(&err);
> -	return result;
> +	return 0;
>  
>  error:
> -	/*
> -	 * If we failed to rewrite the packed-refs file, then it is
> -	 * unsafe to try to remove loose refs, because doing so might
> -	 * expose an obsolete packed value for a reference that might
> -	 * even point at an object that has been garbage collected.
> -	 */
>  	if (refnames->nr == 1)
>  		error(_("could not delete reference %s: %s"),
>  		      refnames->items[0].string, err.buf);
>  	else
>  		error(_("could not delete references: %s"), err.buf);
>  
> +	if (transaction)
> +		ref_transaction_free(transaction);
>  	strbuf_release(&err);
>  	return -1;
>  }

I really like these changes given that they simplify things, but I
wonder whether we can do them. In the preimage we're eagerly removing
loose refs: any error encountered when deleting a reference is recorded,
but we keep on trying to remove the other refs, as well. With the new
behaviour we now create a single transaction for all refs and try to
commit it. This also means that we'll abort the transaction when locking
any of the refs fails, which is a change in behaviour.

The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`:

    /*
     * Delete the specified references. If there are any problems, emit
     * errors but attempt to keep going (i.e., the deletes are not done in
     * an all-or-nothing transaction). msg and flags are passed through to
     * ref_transaction_delete().
     */
    int refs_delete_refs(struct ref_store *refs, const char *msg,
                         struct string_list *refnames, unsigned int flags);

There are multiple callsites of this function via `delete_refs()`. Now
honestly, most of these callsites look somewhat broken:

    - `bisect.c` simply does its best to clean up bisect state. This
      usecase looks fine to me.

    - `builtin/branch.c` reports the branches as deleted even if
      `delete_refs()` failed.

    - `builtin/remote.c` also misreports the deleted branches for the
      `prune` verb. The `rm` verb looks alright: if deletion of any
      branch failed then it doesn't prune the remote's config in the end
      and reports an error.

    - `builtin/fetch.c` also misreports deleted branches with `--prune`.

So most of these commands incorrectly handle the case where only a
subset of branches has been deleted. This raises the question whether
the interface provided by `refs_delete_refs()` is actually sensible if
it's so easy to get wrong. It doesn't even report which branches could
be removed and which couldn't. Furthermore, the question is whether new
backends like the reftable backend which write all refs into a single
slice would actually even be in a position to efficiently retain
semantics of this function.

I'm torn. There are valid usecases for eagerly deleting refs even if a
subset of deletions failed, making this change a tough sell, but most of
the callsites don't actually handle this correctly in the first place.

> diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> index 27731722a5b..e3d4fe624f7 100755
> --- a/t/t1416-ref-transaction-hooks.sh
> +++ b/t/t1416-ref-transaction-hooks.sh
> @@ -133,4 +133,74 @@ test_expect_success 'interleaving hook calls succeed' '
>  	test_cmp expect target-repo.git/actual
>  '
>  
> +test_expect_success 'hook allows deleting loose ref if successful' '
> +	test_when_finished "rm actual" &&
> +	git branch to-be-deleted $PRE_OID &&
> +	test_hook reference-transaction <<-\EOF &&
> +		echo "$*" >>actual
> +	EOF
> +	cat >expect <<-EOF &&
> +		aborted
> +		prepared
> +		committed
> +	EOF
> +	git branch -d to-be-deleted &&
> +	test_cmp expect actual &&
> +	test_must_fail git rev-parse refs/heads/to-be-deleted
> +'
> +
> +test_expect_success 'hook allows deleting packed ref if successful' '
> +	test_when_finished "rm actual" &&
> +	git branch to-be-deleted $PRE_OID &&
> +	git pack-refs --all --prune &&
> +	test_hook reference-transaction <<-\EOF &&
> +		echo "$*" >>actual
> +	EOF
> +	cat >expect <<-EOF &&
> +		prepared
> +		prepared
> +		committed
> +		committed
> +	EOF
> +	git branch -d to-be-deleted &&
> +	test_cmp expect actual &&
> +	test_must_fail git rev-parse refs/heads/to-be-deleted
> +'
> +
> +test_expect_success 'hook aborts deleting loose ref in prepared state' '
> +	test_when_finished "rm actual" &&
> +	test_when_finished "git branch -d to-be-deleted" &&
> +	git branch to-be-deleted $PRE_OID &&
> +	test_hook reference-transaction <<-\EOF &&
> +		echo "$*" >>actual
> +		exit 1
> +	EOF
> +	cat >expect <<-EOF &&
> +		aborted
> +		prepared
> +		aborted
> +	EOF

It's not really clear why we get the first "aborted" result here. I know
that it is because we didn't queue up any refs to the packed backend,
and thus we don't even try to prepare the transaction. But it's likely
confusing to a reader and might thus warrant a comment. On the other
hand this is going away anyway if and when my patch series lands again
that stops calling the hook from the nested packed backend.

> +	test_must_fail git branch -d to-be-deleted &&
> +	test_cmp expect actual &&
> +	git rev-parse refs/heads/to-be-deleted
> +'
> +
> +test_expect_success 'hook aborts deleting packed ref in prepared state' '
> +	test_when_finished "rm actual" &&
> +	test_when_finished "git branch -d to-be-deleted" &&
> +	git branch to-be-deleted $PRE_OID &&
> +	git pack-refs --all --prune &&
> +	test_hook reference-transaction <<-\EOF &&
> +		echo "$*" >>actual
> +		exit 1
> +	EOF
> +	cat >expect <<-EOF &&
> +		prepared
> +		aborted
> +	EOF
> +	test_must_fail git branch -d to-be-deleted &&
> +	test_cmp expect actual &&
> +	git rev-parse refs/heads/to-be-deleted
> +'
> +

From your description one more interesting case is when the packed-refs
transaction is committed, but the loose-refs backend is aborted. It's a
bit harder to test given that we have no way to indicate what backend
the reftx hook is being invoked from though.

One thing that worries me is that these patches kind of set the current
behaviour of driving the reftx hook via both packed and loose backend
into stone. My patch series that got reverted is going to change that
behaviour though so that we don't execute the hook from the packed
backend, and consequentially we'd have to change all these tests again
to match the new behaviour. This makes it a lot harder to argue though
that we can safely switch to the new behaviour without breaking any
assumptions when we even codified our current assumptions into tests.

Taking a step back I wonder whether my previous approach to just hide
the hook for the packed backend was the right thing to do though. An
alternative would be to instead expose additional information to the
hook so that it can decide by itself whether it wants to execute the
hook or not. This could e.g. trivially be done by exposing a new
"GIT_REFS_BACKEND" environment variable to the reftx hook that gets set
to either "packed-refs", "loose-refs" or "reftables" depending on which
backend is currently in use. Equipped with this information a hook
script can then easily ignore any updates to the packed-refs file by
itself without having to change the way it is invoked right now and thus
we wouldn't regress any currently existing hooks.

>  test_done
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index 4620f0ca7fa..8b09a99c2e8 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' '
>  	git clone . prune-fail &&
>  	cd prune-fail &&
>  	git update-ref refs/remotes/origin/extrabranch main &&
> -	: this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds  &&
> -	>.git/packed-refs.new &&
> +	: this will prevent --prune from locking refs/remotes/origin/extra for deletion &&
> +	>.git/refs/remotes/origin/extrabranch.lock &&
>  
> -	test_must_fail git fetch --prune origin
> +	test_must_fail git fetch --prune origin > outputs 2> errors

It would be nice to have an explanation why exactly this change is
needed, and why it is fine that the visible behaviour changes here.

Patrick

>  
>  test_expect_success 'fetch --atomic works with a single branch' '
> 
> base-commit: 0f828332d5ac36fc63b7d8202652efa152809856
> -- 
> gitgitgadget
Michael Heemskerk May 9, 2022, 10:18 a.m. UTC | #2
On Mon, May 9, 2022 at 8:23 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> I really like these changes given that they simplify things, but I
> wonder whether we can do them. In the preimage we're eagerly removing
> loose refs: any error encountered when deleting a reference is recorded,
> but we keep on trying to remove the other refs, as well. With the new
> behaviour we now create a single transaction for all refs and try to
> commit it. This also means that we'll abort the transaction when locking
> any of the refs fails, which is a change in behaviour.
>
> The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`:
>
>     /*
>      * Delete the specified references. If there are any problems, emit
>      * errors but attempt to keep going (i.e., the deletes are not done in
>      * an all-or-nothing transaction). msg and flags are passed through to
>      * ref_transaction_delete().
>      */
>     int refs_delete_refs(struct ref_store *refs, const char *msg,
>                          struct string_list *refnames, unsigned int flags);
>
> There are multiple callsites of this function via `delete_refs()`. Now
> honestly, most of these callsites look somewhat broken:
>
>     - `bisect.c` simply does its best to clean up bisect state. This
>       usecase looks fine to me.
>
>     - `builtin/branch.c` reports the branches as deleted even if
>       `delete_refs()` failed.
>
>     - `builtin/remote.c` also misreports the deleted branches for the
>       `prune` verb. The `rm` verb looks alright: if deletion of any
>       branch failed then it doesn't prune the remote's config in the end
>       and reports an error.
>
>     - `builtin/fetch.c` also misreports deleted branches with `--prune`.
>
> So most of these commands incorrectly handle the case where only a
> subset of branches has been deleted. This raises the question whether
> the interface provided by `refs_delete_refs()` is actually sensible if
> it's so easy to get wrong. It doesn't even report which branches could
> be removed and which couldn't. Furthermore, the question is whether new
> backends like the reftable backend which write all refs into a single
> slice would actually even be in a position to efficiently retain
> semantics of this function.
>
> I'm torn. There are valid usecases for eagerly deleting refs even if a
> subset of deletions failed, making this change a tough sell, but most of
> the callsites don't actually handle this correctly in the first place.

Exactly; this is the reason that I initially suggested fixing the issue by
just removing the upfront rewrite of packed-refs. With that rewrite removed,
the refs-to-be-deleted are deleted in individual transactions, which may or
may not rewrite packed-refs. The downside, as you correctly pointed out,
is that we could end up rewriting packed-refs multiple times, which could
come at a significant performance penalty for repositories with large
packed-refs files.

Unfortunately, the current approach of updating packed-refs in one
transaction and updating the loose refs in individual transactions doesn't
work either.

So what are our options?

- delete each of the refs in a separate transaction and pay a (potentially
  significant) performance penalty in repositories with large packed-refs
  files when deleting many refs. I'll note that this scenario is similar to
  deleting a set of refs through a non-atomic push.
- switch to a single transaction and update refs.h:refs_delete_refs to use
  an all-or-nothing approach (the approach I've taken in my patch).
- improve the reference-transaction mechanism to support the
  'batch-of-transactions' mode more efficiently. If I remember correctly,
  something like that has been suggested before, but I'm not sure if it's
  actually been built or spiked. In this batch-of-transactions mode, git
  could try to prepare all refs, and only invoke the hook for the refs that
  could be successfully prepared. The hook should then be able to
  reject individual ref updates, and git would then apply only the
  non-rejected ref updates. While such a change would make many
  scenarios where multiple refs are being updated more efficient, it's also
  a much bigger change that's hard to make without breaking the current
  reference-transaction protocol.

Sticking to a transaction per ref and rewriting packed-refs multiple times
is the safer option. Deleting the refs in a single transaction is the more
performant option, but changes the behavior. A stay/stale lock file could
then make it impossible to remove a remote, or to prune /refs/remotes/
refs.

My suggestion would be to stick to a transaction per ref and pay the
same performance penalty as you'd get when deleting many refs through
a non-atomic push.

> > +test_expect_success 'hook aborts deleting loose ref in prepared state' '
> > +     test_when_finished "rm actual" &&
> > +     test_when_finished "git branch -d to-be-deleted" &&
> > +     git branch to-be-deleted $PRE_OID &&
> > +     test_hook reference-transaction <<-\EOF &&
> > +             echo "$*" >>actual
> > +             exit 1
> > +     EOF
> > +     cat >expect <<-EOF &&
> > +             aborted
> > +             prepared
> > +             aborted
> > +     EOF
>
> It's not really clear why we get the first "aborted" result here. I know
> that it is because we didn't queue up any refs to the packed backend,
> and thus we don't even try to prepare the transaction. But it's likely
> confusing to a reader and might thus warrant a comment. On the other
> hand this is going away anyway if and when my patch series lands again
> that stops calling the hook from the nested packed backend.

I can add a comment explaining why we get the 'aborted' callback
and also add a comment that that 'aborted' callback is expected to be
removed in an upcoming version.

> > +     test_must_fail git branch -d to-be-deleted &&
> > +     test_cmp expect actual &&
> > +     git rev-parse refs/heads/to-be-deleted
> > +'
> > +
> > +test_expect_success 'hook aborts deleting packed ref in prepared state' '
> > +     test_when_finished "rm actual" &&
> > +     test_when_finished "git branch -d to-be-deleted" &&
> > +     git branch to-be-deleted $PRE_OID &&
> > +     git pack-refs --all --prune &&
> > +     test_hook reference-transaction <<-\EOF &&
> > +             echo "$*" >>actual
> > +             exit 1
> > +     EOF
> > +     cat >expect <<-EOF &&
> > +             prepared
> > +             aborted
> > +     EOF
> > +     test_must_fail git branch -d to-be-deleted &&
> > +     test_cmp expect actual &&
> > +     git rev-parse refs/heads/to-be-deleted
> > +'
> > +
>
> From your description one more interesting case is when the packed-refs
> transaction is committed, but the loose-refs backend is aborted. It's a
> bit harder to test given that we have no way to indicate what backend
> the reftx hook is being invoked from though.

That is indeed the more interesting case, and exactly the case we saw
fail when the packed-refs reference-transactions callbacks had been
suppressed in 2.36.0-rc1 and -rc2. But as you said, they're pretty hard to
test. In Bitbucket's tests we now verify from the reference-transaction
callback that the ref-updates have not been committed (yet) when the
prepared or aborted callback is received, and that they _have_ been
committed when the committed callback is received. Perhaps a similar
approach could be used in git's reference-transaction tests for more
thorough testing.

> One thing that worries me is that these patches kind of set the current
> behaviour of driving the reftx hook via both packed and loose backend
> into stone. My patch series that got reverted is going to change that
> behaviour though so that we don't execute the hook from the packed
> backend, and consequentially we'd have to change all these tests again
> to match the new behaviour. This makes it a lot harder to argue though
> that we can safely switch to the new behaviour without breaking any
> assumptions when we even codified our current assumptions into tests.

The counter argument to that is that it's kind of scary if you could remove
 half of the reference-transaction callbacks without needing to update a
test. I'd rather have tests that verify current behavior that you need to
update when you intentionally change the behavior, then not have those
tests?

> Taking a step back I wonder whether my previous approach to just hide
> the hook for the packed backend was the right thing to do though. An
> alternative would be to instead expose additional information to the
> hook so that it can decide by itself whether it wants to execute the
> hook or not. This could e.g. trivially be done by exposing a new
> "GIT_REFS_BACKEND" environment variable to the reftx hook that gets set
> to either "packed-refs", "loose-refs" or "reftables" depending on which
> backend is currently in use. Equipped with this information a hook
> script can then easily ignore any updates to the packed-refs file by
> itself without having to change the way it is invoked right now and thus
> we wouldn't regress any currently existing hooks.

From the reference-transaction hook writer's perspective, the backend
involved is an implementation detail that the hook should not have to
care about. Getting separate callbacks for the loose and the packed
backends makes it a lot harder to write a good reference-transaction
hook, especially when the callbacks differ if a ref is packed or not.

IMO, there should really be a "files" backend transaction, that internally
takes care of locking individual refs and possibly "packed-refs" in case
of a deletion. In addition, not getting "artificial" packed callbacks also
saves us a few extra reference-transaction callbacks when deleting refs.
Even if those take only a few ms per invocation, when deleting hundreds
of refs, it's still something that we'd like to avoid if we can.

> >  test_done
> > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> > index 4620f0ca7fa..8b09a99c2e8 100755
> > --- a/t/t5510-fetch.sh
> > +++ b/t/t5510-fetch.sh
> > @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' '
> >       git clone . prune-fail &&
> >       cd prune-fail &&
> >       git update-ref refs/remotes/origin/extrabranch main &&
> > -     : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds  &&
> > -     >.git/packed-refs.new &&
> > +     : this will prevent --prune from locking refs/remotes/origin/extra for deletion &&
> > +     >.git/refs/remotes/origin/extrabranch.lock &&
> >
> > -     test_must_fail git fetch --prune origin
> > +     test_must_fail git fetch --prune origin > outputs 2> errors
>
> It would be nice to have an explanation why exactly this change is
> needed, and why it is fine that the visible behaviour changes here.

The  test was relying on the fact that the packed-refs file was locked when
git fetch --prune is called. My patch replaces that unconditional lock with
a transaction. The transaction only takes out the lock if packed-refs
actually needs to be updated, and since the ref being pruned only exists
as a loose ref, git fetch --prune failed to fail.

I've replaced the lock on packed-refs with a lock on the loose ref that
should be pruned.

Michael
Patrick Steinhardt May 9, 2022, 1:48 p.m. UTC | #3
On Mon, May 09, 2022 at 12:18:51PM +0200, Michael Heemskerk wrote:
> On Mon, May 9, 2022 at 8:23 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > I really like these changes given that they simplify things, but I
> > wonder whether we can do them. In the preimage we're eagerly removing
> > loose refs: any error encountered when deleting a reference is recorded,
> > but we keep on trying to remove the other refs, as well. With the new
> > behaviour we now create a single transaction for all refs and try to
> > commit it. This also means that we'll abort the transaction when locking
> > any of the refs fails, which is a change in behaviour.
> >
> > The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`:
> >
> >     /*
> >      * Delete the specified references. If there are any problems, emit
> >      * errors but attempt to keep going (i.e., the deletes are not done in
> >      * an all-or-nothing transaction). msg and flags are passed through to
> >      * ref_transaction_delete().
> >      */
> >     int refs_delete_refs(struct ref_store *refs, const char *msg,
> >                          struct string_list *refnames, unsigned int flags);
> >
> > There are multiple callsites of this function via `delete_refs()`. Now
> > honestly, most of these callsites look somewhat broken:
> >
> >     - `bisect.c` simply does its best to clean up bisect state. This
> >       usecase looks fine to me.
> >
> >     - `builtin/branch.c` reports the branches as deleted even if
> >       `delete_refs()` failed.
> >
> >     - `builtin/remote.c` also misreports the deleted branches for the
> >       `prune` verb. The `rm` verb looks alright: if deletion of any
> >       branch failed then it doesn't prune the remote's config in the end
> >       and reports an error.
> >
> >     - `builtin/fetch.c` also misreports deleted branches with `--prune`.
> >
> > So most of these commands incorrectly handle the case where only a
> > subset of branches has been deleted. This raises the question whether
> > the interface provided by `refs_delete_refs()` is actually sensible if
> > it's so easy to get wrong. It doesn't even report which branches could
> > be removed and which couldn't. Furthermore, the question is whether new
> > backends like the reftable backend which write all refs into a single
> > slice would actually even be in a position to efficiently retain
> > semantics of this function.
> >
> > I'm torn. There are valid usecases for eagerly deleting refs even if a
> > subset of deletions failed, making this change a tough sell, but most of
> > the callsites don't actually handle this correctly in the first place.
> 
> Exactly; this is the reason that I initially suggested fixing the issue by
> just removing the upfront rewrite of packed-refs. With that rewrite removed,
> the refs-to-be-deleted are deleted in individual transactions, which may or
> may not rewrite packed-refs. The downside, as you correctly pointed out,
> is that we could end up rewriting packed-refs multiple times, which could
> come at a significant performance penalty for repositories with large
> packed-refs files.
> 
> Unfortunately, the current approach of updating packed-refs in one
> transaction and updating the loose refs in individual transactions doesn't
> work either.
> 
> So what are our options?
> 
> - delete each of the refs in a separate transaction and pay a (potentially
>   significant) performance penalty in repositories with large packed-refs
>   files when deleting many refs. I'll note that this scenario is similar to
>   deleting a set of refs through a non-atomic push.

By chance I know that on gitlab.com we've had huge performance issues
with access patterns like this. In some cases with repos that have
millions of refs I have seen that certain actions were completely
dominated by rewriting the packed-refs file. So I'd rather avoid going
there given that it would be a serious performance regression.

> - switch to a single transaction and update refs.h:refs_delete_refs to use
>   an all-or-nothing approach (the approach I've taken in my patch).

This is the easiest approach, but also backwards incompatible as I've
layed out above. I personally wouldn't mind, and as said I think that
most of the usecases are broken anyway because of implementations which
misreport the case where we only partially deleted refs. But it's not a
decision we can make without more discussion, I guess.

> - improve the reference-transaction mechanism to support the
>   'batch-of-transactions' mode more efficiently. If I remember correctly,
>   something like that has been suggested before, but I'm not sure if it's
>   actually been built or spiked. In this batch-of-transactions mode, git
>   could try to prepare all refs, and only invoke the hook for the refs that
>   could be successfully prepared. The hook should then be able to
>   reject individual ref updates, and git would then apply only the
>   non-rejected ref updates. While such a change would make many
>   scenarios where multiple refs are being updated more efficient, it's also
>   a much bigger change that's hard to make without breaking the current
>   reference-transaction protocol.

Peff and I had such a discussion in the past, and having transactions
with eager semantics would also fix some edge cases we have in other
parts of the codebase. You already mentioned git-fetch(1) without
`--atomic`, and there are likely others that could benefit.

I did have a go at this several times already, but none of the
approaches I took resulted in a clean-to-use API. I consider this to be
the best solution, but also the hardest one to implement, unfortunately.

> Sticking to a transaction per ref and rewriting packed-refs multiple times
> is the safer option. Deleting the refs in a single transaction is the more
> performant option, but changes the behavior. A stay/stale lock file could
> then make it impossible to remove a remote, or to prune /refs/remotes/
> refs.
> 
> My suggestion would be to stick to a transaction per ref and pay the
> same performance penalty as you'd get when deleting many refs through
> a non-atomic push.

I don't think we should do this. The one-transaction-per-ref issue is
why I added the `--atomic` flag to git-fetch(1) in the first place,
because it has been biting us in repos with millions of refs. Quoting
583bc41923 (fetch: make `--atomic` flag cover pruning of refs,
2022-02-17), which introduces this flag:

    Benchmark 1: git fetch --prune --atomic +refs/*:refs/* (HEAD~)
      Time (mean ± σ):      2.366 s ±  0.021 s    [User: 0.858 s, System: 1.508 s]
      Range (min … max):    2.328 s …  2.407 s    10 runs

    Benchmark 2: git fetch --prune --atomic +refs/*:refs/* (HEAD)
      Time (mean ± σ):      1.369 s ±  0.017 s    [User: 0.715 s, System: 0.641 s]
      Range (min … max):    1.346 s …  1.400 s    10 runs

    Summary
      'git fetch --prune --atomic +refs/*:refs/* (HEAD)' ran
        1.73 ± 0.03 times faster than 'git fetch --prune --atomic +refs/*:refs/* (HEAD~)'

And this is only with a 100k references. The issue gets a lot worse when
you're in the millions of refs, both because you have 10x more refs to
prune, but also because the packed-refs file is 10x larger. So
performance doesn't scale linearly with the number of refs, but is a
product of file size and number of refs.

[snip]
> > One thing that worries me is that these patches kind of set the current
> > behaviour of driving the reftx hook via both packed and loose backend
> > into stone. My patch series that got reverted is going to change that
> > behaviour though so that we don't execute the hook from the packed
> > backend, and consequentially we'd have to change all these tests again
> > to match the new behaviour. This makes it a lot harder to argue though
> > that we can safely switch to the new behaviour without breaking any
> > assumptions when we even codified our current assumptions into tests.
> 
> The counter argument to that is that it's kind of scary if you could remove
>  half of the reference-transaction callbacks without needing to update a
> test. I'd rather have tests that verify current behavior that you need to
> update when you intentionally change the behavior, then not have those
> tests?

Oh, I certainly agree there. And we have stated in the past that the
reftx hook is _not_ providing a stable interface with regards to exactly
when it is called. We only want to guarantee that it is called on a
transaction.

> > Taking a step back I wonder whether my previous approach to just hide
> > the hook for the packed backend was the right thing to do though. An
> > alternative would be to instead expose additional information to the
> > hook so that it can decide by itself whether it wants to execute the
> > hook or not. This could e.g. trivially be done by exposing a new
> > "GIT_REFS_BACKEND" environment variable to the reftx hook that gets set
> > to either "packed-refs", "loose-refs" or "reftables" depending on which
> > backend is currently in use. Equipped with this information a hook
> > script can then easily ignore any updates to the packed-refs file by
> > itself without having to change the way it is invoked right now and thus
> > we wouldn't regress any currently existing hooks.
> 
> From the reference-transaction hook writer's perspective, the backend
> involved is an implementation detail that the hook should not have to
> care about. Getting separate callbacks for the loose and the packed
> backends makes it a lot harder to write a good reference-transaction
> hook, especially when the callbacks differ if a ref is packed or not.
>
> IMO, there should really be a "files" backend transaction, that internally
> takes care of locking individual refs and possibly "packed-refs" in case
> of a deletion. In addition, not getting "artificial" packed callbacks also
> saves us a few extra reference-transaction callbacks when deleting refs.
> Even if those take only a few ms per invocation, when deleting hundreds
> of refs, it's still something that we'd like to avoid if we can.

True. In this case it's just an alternative way to tackle this specific
issue given that one wouldn't typically care about the packed-refs
changes. And it's very much an artifact of the files-backend really
being two backends in one. In any case I can see arguments for both
approaches, and ultimately I agree that it would be best if the
files-backend behaved as if it was a single one.

> > >  test_done
> > > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> > > index 4620f0ca7fa..8b09a99c2e8 100755
> > > --- a/t/t5510-fetch.sh
> > > +++ b/t/t5510-fetch.sh
> > > @@ -169,10 +169,10 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' '
> > >       git clone . prune-fail &&
> > >       cd prune-fail &&
> > >       git update-ref refs/remotes/origin/extrabranch main &&
> > > -     : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds  &&
> > > -     >.git/packed-refs.new &&
> > > +     : this will prevent --prune from locking refs/remotes/origin/extra for deletion &&
> > > +     >.git/refs/remotes/origin/extrabranch.lock &&
> > >
> > > -     test_must_fail git fetch --prune origin
> > > +     test_must_fail git fetch --prune origin > outputs 2> errors
> >
> > It would be nice to have an explanation why exactly this change is
> > needed, and why it is fine that the visible behaviour changes here.
> 
> The  test was relying on the fact that the packed-refs file was locked when
> git fetch --prune is called. My patch replaces that unconditional lock with
> a transaction. The transaction only takes out the lock if packed-refs
> actually needs to be updated, and since the ref being pruned only exists
> as a loose ref, git fetch --prune failed to fail.
> 
> I've replaced the lock on packed-refs with a lock on the loose ref that
> should be pruned.

Ah, thanks. I was missing the fact that the lock for the packed-refs
file was taken unconditionally before.

Patrick
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8db7882aacb..5c23277eda7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1265,44 +1265,39 @@  static int files_delete_refs(struct ref_store *ref_store, const char *msg,
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
 	struct strbuf err = STRBUF_INIT;
-	int i, result = 0;
+	int i;
+	struct ref_transaction *transaction;
 
 	if (!refnames->nr)
 		return 0;
 
-	if (packed_refs_lock(refs->packed_ref_store, 0, &err))
-		goto error;
-
-	if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) {
-		packed_refs_unlock(refs->packed_ref_store);
+	transaction = ref_store_transaction_begin(&refs->base, &err);
+	if (!transaction)
 		goto error;
-	}
-
-	packed_refs_unlock(refs->packed_ref_store);
 
 	for (i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
-
-		if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
-			result |= error(_("could not remove reference %s"), refname);
+		if (ref_transaction_delete(transaction, refname, NULL,
+					   flags, msg, &err))
+			goto error;
 	}
 
+	if (ref_transaction_commit(transaction, &err))
+		goto error;
+
+	ref_transaction_free(transaction);
 	strbuf_release(&err);
-	return result;
+	return 0;
 
 error:
-	/*
-	 * If we failed to rewrite the packed-refs file, then it is
-	 * unsafe to try to remove loose refs, because doing so might
-	 * expose an obsolete packed value for a reference that might
-	 * even point at an object that has been garbage collected.
-	 */
 	if (refnames->nr == 1)
 		error(_("could not delete reference %s: %s"),
 		      refnames->items[0].string, err.buf);
 	else
 		error(_("could not delete references: %s"), err.buf);
 
+	if (transaction)
+		ref_transaction_free(transaction);
 	strbuf_release(&err);
 	return -1;
 }
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index 27731722a5b..e3d4fe624f7 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -133,4 +133,74 @@  test_expect_success 'interleaving hook calls succeed' '
 	test_cmp expect target-repo.git/actual
 '
 
+test_expect_success 'hook allows deleting loose ref if successful' '
+	test_when_finished "rm actual" &&
+	git branch to-be-deleted $PRE_OID &&
+	test_hook reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+	EOF
+	cat >expect <<-EOF &&
+		aborted
+		prepared
+		committed
+	EOF
+	git branch -d to-be-deleted &&
+	test_cmp expect actual &&
+	test_must_fail git rev-parse refs/heads/to-be-deleted
+'
+
+test_expect_success 'hook allows deleting packed ref if successful' '
+	test_when_finished "rm actual" &&
+	git branch to-be-deleted $PRE_OID &&
+	git pack-refs --all --prune &&
+	test_hook reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+	EOF
+	cat >expect <<-EOF &&
+		prepared
+		prepared
+		committed
+		committed
+	EOF
+	git branch -d to-be-deleted &&
+	test_cmp expect actual &&
+	test_must_fail git rev-parse refs/heads/to-be-deleted
+'
+
+test_expect_success 'hook aborts deleting loose ref in prepared state' '
+	test_when_finished "rm actual" &&
+	test_when_finished "git branch -d to-be-deleted" &&
+	git branch to-be-deleted $PRE_OID &&
+	test_hook reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+		exit 1
+	EOF
+	cat >expect <<-EOF &&
+		aborted
+		prepared
+		aborted
+	EOF
+	test_must_fail git branch -d to-be-deleted &&
+	test_cmp expect actual &&
+	git rev-parse refs/heads/to-be-deleted
+'
+
+test_expect_success 'hook aborts deleting packed ref in prepared state' '
+	test_when_finished "rm actual" &&
+	test_when_finished "git branch -d to-be-deleted" &&
+	git branch to-be-deleted $PRE_OID &&
+	git pack-refs --all --prune &&
+	test_hook reference-transaction <<-\EOF &&
+		echo "$*" >>actual
+		exit 1
+	EOF
+	cat >expect <<-EOF &&
+		prepared
+		aborted
+	EOF
+	test_must_fail git branch -d to-be-deleted &&
+	test_cmp expect actual &&
+	git rev-parse refs/heads/to-be-deleted
+'
+
 test_done
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 4620f0ca7fa..8b09a99c2e8 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -169,10 +169,10 @@  test_expect_success REFFILES 'fetch --prune fails to delete branches' '
 	git clone . prune-fail &&
 	cd prune-fail &&
 	git update-ref refs/remotes/origin/extrabranch main &&
-	: this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds  &&
-	>.git/packed-refs.new &&
+	: this will prevent --prune from locking refs/remotes/origin/extra for deletion &&
+	>.git/refs/remotes/origin/extrabranch.lock &&
 
-	test_must_fail git fetch --prune origin
+	test_must_fail git fetch --prune origin > outputs 2> errors
 '
 
 test_expect_success 'fetch --atomic works with a single branch' '