diff mbox series

[v4,27/28] reftable: fixup for new base topic 2/3

Message ID patch-v4-27.28-c4f9fb42d9e-20210823T120208Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series Support reftable ref backend for Git | expand

Commit Message

Ævar Arnfjörð Bjarmason Aug. 23, 2021, 12:12 p.m. UTC
Since my "refs API: remove OID argument to reflog_expire()" we don't
have the "oid" as part of the reflog_expire() signature. Instead the
reflog_expire() should pass the OID of the tip of the "locked" ref to
the prepare_fn().

In files_reflog_expire() we do that by getting the OID from
lock_ref_oid_basic(). I'm assuming (but am not familiar enough with
reftable...) that by the time we get here we've got a locked ref
already in some way, so let's just use
refs_resolve_ref_unsafe_with_errno() to lookup the current OID of that
presumably-locked ref.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 refs/reftable-backend.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Han-Wen Nienhuys Aug. 30, 2021, 12:32 p.m. UTC | #1
On Mon, Aug 23, 2021 at 2:13 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Since my "refs API: remove OID argument to reflog_expire()" we don't
> have the "oid" as part of the reflog_expire() signature. Instead the
> reflog_expire() should pass the OID of the tip of the "locked" ref to
> the prepare_fn().
>
> In files_reflog_expire() we do that by getting the OID from
> lock_ref_oid_basic(). I'm assuming (but am not familiar enough with
> reftable...) that by the time we get here we've got a locked ref
> already in some way, so let's just use
> refs_resolve_ref_unsafe_with_errno() to lookup the current OID of that
> presumably-locked ref.

I quickly looked at the files code, but I don't understand why the OID
needs to be passed-in (before your refactoring): in builtin/reflog.c
(before), the current OID is read, with any protection. This means
that its value can't be trusted.

After your refactoring, you lock the ref. I guess in the files backend
this protects against non-atomic update of (ref, reflog) racing with a
concurrent reflog expiry? In reftable, the (ref,reflog) update is
atomic, so there is no need for locking to properly sequence
operations.
Ævar Arnfjörð Bjarmason Aug. 30, 2021, 1:01 p.m. UTC | #2
On Mon, Aug 30 2021, Han-Wen Nienhuys wrote:

> On Mon, Aug 23, 2021 at 2:13 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> Since my "refs API: remove OID argument to reflog_expire()" we don't
>> have the "oid" as part of the reflog_expire() signature. Instead the
>> reflog_expire() should pass the OID of the tip of the "locked" ref to
>> the prepare_fn().
>>
>> In files_reflog_expire() we do that by getting the OID from
>> lock_ref_oid_basic(). I'm assuming (but am not familiar enough with
>> reftable...) that by the time we get here we've got a locked ref
>> already in some way, so let's just use
>> refs_resolve_ref_unsafe_with_errno() to lookup the current OID of that
>> presumably-locked ref.
>
> I quickly looked at the files code, but I don't understand why the OID
> needs to be passed-in (before your refactoring): in builtin/reflog.c
> (before), the current OID is read, with any protection. This means
> that its value can't be trusted.
>
> After your refactoring, you lock the ref. I guess in the files backend
> this protects against non-atomic update of (ref, reflog) racing with a
> concurrent reflog expiry? In reftable, the (ref,reflog) update is
> atomic, so there is no need for locking to properly sequence
> operations.

Before my [1] we'd do:

 1. Read the OID for the branch in builtin/reflog.c
 2. Pass it to refs/files-backend.c, it would lock at that OID (or fail if it changed)
 3. Pass the OID with the now-locked OID to the builtin/reflog.c code

After that [1] we do:

 1. Lock the branch in refs/files-backend.c
 2. Pass the OID with the now-locked OID to the builtin/reflog.c code
    (ditto #3 above)

Whatever reftable itself does with updates doesn't change that we need
to do that 2nd step of passing the OID to builtin/reflog.c, as it makes
use of it.

That code is a bit confusing, if you want to understand it I recommend
reading it at the tip of my yet-unsubmitted
avar/cleanup-refs-api-and-reflog-expire-post-no-eisdir, it makes the
control flow a lot cleaner.

So as far as what we do here is concerned, we're stuck with the refs
files backend inherently wanting to pass "I locked this for you, here's
the OID".

I guess it could also pass "OK, now go ahead and expire" and pass no
OID. We'd then in builtin/reflog.c lookup the current OID for the logic
there, but just having the reftable backend appease the common API by
looking up the OID and passing it seemed like the most straightforward
thing to do.

I haven't tested this or thought it through, but I don't understand how
reftable isn't going to race in reflog expiry then. Sure, the ref/reflog
update itself is atomic, so it won't suffer from the needing-a-lock
problem of two concurrent file backend writers doing the equivalent of:

    echo $NEW_SHA1 >.git/refs/heads/some-branch

But we will need at least the optimistic locking of code like
builtin/reflog.c wanting to do an expiry, and deciding whether to do
that expiry based on a given state of the ref/reflog. I.e. we don't
want:

    1. Start reflog expiry
    2. Code in builtin/reflog.c looks up the OID
    3. Code in builtin/reflog.c decides whether expire the reflog
    4. Concurrent with #4, another writer updates the ref/reflog pair
    5. Code in builtin/reflog.c says "OK, expire it!"
    6. Reftable queues a delete/prune of the reflog per #5.

This would be a sequente of updates to the ref/reflog, none of whom were
racy as far as the reftable semantics itself are concerneb, but where
we'd do the wrong thing because the writer thought we had A when we
really had B. So we need the equivalent of an "git update-ref" with the
"<oldvalue>".

Is there a better way to do that in this case that I'm missing?

1. https://lore.kernel.org/git/patch-v5-09.13-aba12606cea-20210823T113115Z-avarab@gmail.com/
2. https://github.com/avar/git/compare/avar/files-backend-remove-dead-errno-eisdir-6...avar:avar/cleanup-refs-api-and-reflog-expire-post-no-eisdir
Han-Wen Nienhuys Aug. 30, 2021, 1:48 p.m. UTC | #3
On Mon, Aug 30, 2021 at 3:22 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> But we will need at least the optimistic locking of code like
> builtin/reflog.c wanting to do an expiry, and deciding whether to do
> that expiry based on a given state of the ref/reflog. I.e. we don't
> want:
>
>     1. Start reflog expiry
>     2. Code in builtin/reflog.c looks up the OID
>     3. Code in builtin/reflog.c decides whether expire the reflog
>     4. Concurrent with #4, another writer updates the ref/reflog pair
>     5. Code in builtin/reflog.c says "OK, expire it!"
>     6. Reftable queues a delete/prune of the reflog per #5.
>
> This would be a sequente of updates to the ref/reflog, none of whom were
> racy as far as the reftable semantics itself are concerneb, but where
> we'd do the wrong thing because the writer thought we had A when we
> really had B. So we need the equivalent of an "git update-ref" with the
> "<oldvalue>".
>
> Is there a better way to do that in this case that I'm missing?

I spent some more time looking at builtin/reflog.c, but I am still not
100% sure what the locking is used for.

From a quick glance, the OID goes into tip_commit, and tip_commit goes
into a reachable list (?). The reachable list is then for something,
but I can't really tell what.

In your example with 1.-6., it's still not clear to me what the
undesired behavior is precisely. If the reflog is pruned in #6, is the
worry that the update in #4 is pruned immediately after being
effected?
Ævar Arnfjörð Bjarmason Aug. 30, 2021, 2:03 p.m. UTC | #4
On Mon, Aug 30 2021, Han-Wen Nienhuys wrote:

> On Mon, Aug 30, 2021 at 3:22 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> But we will need at least the optimistic locking of code like
>> builtin/reflog.c wanting to do an expiry, and deciding whether to do
>> that expiry based on a given state of the ref/reflog. I.e. we don't
>> want:
>>
>>     1. Start reflog expiry
>>     2. Code in builtin/reflog.c looks up the OID
>>     3. Code in builtin/reflog.c decides whether expire the reflog
>>     4. Concurrent with #4, another writer updates the ref/reflog pair
>>     5. Code in builtin/reflog.c says "OK, expire it!"
>>     6. Reftable queues a delete/prune of the reflog per #5.
>>
>> This would be a sequente of updates to the ref/reflog, none of whom were
>> racy as far as the reftable semantics itself are concerneb, but where
>> we'd do the wrong thing because the writer thought we had A when we
>> really had B. So we need the equivalent of an "git update-ref" with the
>> "<oldvalue>".
>>
>> Is there a better way to do that in this case that I'm missing?
>
> I spent some more time looking at builtin/reflog.c, but I am still not
> 100% sure what the locking is used for.
>
> From a quick glance, the OID goes into tip_commit, and tip_commit goes
> into a reachable list (?). The reachable list is then for something,
> but I can't really tell what.
>
> In your example with 1.-6., it's still not clear to me what the
> undesired behavior is precisely. If the reflog is pruned in #6, is the
> worry that the update in #4 is pruned immediately after being
> effected?

Yes, I think so. But I'm not sure. I skimmed the code quickly today, and
when I wrote the referenced series didn't focus much on the nitty-gritty
of the builtin/reflog.c behavior other than assuring myself that I was
doing the exact same thing as before as far as its logic was concerned.

I.e. it always locked at a given OID. Before my in-flight "reflog
expire: don't lock reflogs using previously seen OI" it might not lock
but get this error:

    error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>

But at least it wouldn't do anything, but the current code does require
the passed-in OID. See the code that needs "unreachable_expire_kind" and
"tip_commit".

Perhaps that whole thing can also be refactored somehow. If I change the
"commit = lookup_commit(the_repository, oid);" in
"reflog_expiry_prepare()" to just "commit = NULL" all tests pass, but
that might just be missing test coverage in the face of same race...
diff mbox series

Patch

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index dcc792e5e87..94917c85cf7 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1463,7 +1463,7 @@  static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
 
 static int
 git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
-			   const struct object_id *oid, unsigned int flags,
+			   unsigned int flags,
 			   reflog_expiry_prepare_fn prepare_fn,
 			   reflog_expiry_should_prune_fn should_prune_fn,
 			   reflog_expiry_cleanup_fn cleanup_fn,
@@ -1497,6 +1497,9 @@  git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
 	struct reftable_iterator it = { NULL };
 	struct reftable_addition *add = NULL;
 	int err = 0;
+	int ignore_errno;
+	struct object_id oid;
+
 	if (refs->err < 0) {
 		return refs->err;
 	}
@@ -1515,7 +1518,14 @@  git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
 	if (err) {
 		goto done;
 	}
-	prepare_fn(refname, oid, policy_cb_data);
+	if (!refs_resolve_ref_unsafe_with_errno(ref_store, refname,
+					       RESOLVE_REF_READING, &oid,
+					       NULL, &ignore_errno)) {
+		err = -1;
+		goto done;
+	}
+	prepare_fn(refname, &oid, policy_cb_data);
+
 	while (1) {
 		struct reftable_log_record log = { NULL };
 		int err = reftable_iterator_next_log(&it, &log);