diff mbox series

[v2,7/7] reflog expire: don't assert the OID when locking refs

Message ID 20190314123439.4347-8-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series gc: minor code cleanup + contention fixes | expand

Commit Message

Ævar Arnfjörð Bjarmason March 14, 2019, 12:34 p.m. UTC
During reflog expiry, the cmd_reflog_expire() function first iterates
over all reflogs in logs/*, and then one-by-one acquires the lock for
each one to expire its reflog by getting a *.lock file on the
corresponding loose ref[1] (even if the actual ref is packed).

This lock is needed, but what isn't needed is locking the loose ref as
a function of the OID we found from that first iteration. By the time
we get around to re-visiting the reference some of the OIDs may have
changed.

Thus the verify_lock() function called by the lock_ref_oid_basic()
function being changed here would fail with e.g. "ref '%s' is at %s
but expected %s" if the repository was being updated concurrent to the
"reflog expire".

By not passing the OID to it we'll try to lock the reference
regardless of it last known OID. Locking as a function of the OID
would make "reflog expire" exit with a non-zero exit status under such
contention, which in turn meant that a "gc" command (which expires
reflogs before forking to the background) would encounter a hard
error.

This behavior of considering the OID when locking has been here ever
since "reflog expire" was initially implemented in 4264dc15e1 ("git
reflog expire", 2006-12-19). As seen in that simpler initial version
of the code we subsequently use the OID to inform the expiry (and
still do), but never needed to use it to lock the reference associated
with the reflog.

By locking the reference without considering what OID we last saw it
at, we won't encounter user-visible contention to the extent that
core.filesRefLockTimeout mitigates it. See 4ff0f01cb7 ("refs: retry
acquiring reference locks for 100ms", 2017-08-21).

Unfortunately this sort of probabilistic contention is hard to turn
into a test. I've tested this by running the following three subshells
in concurrent terminals:

    (
        cd /tmp &&
        rm -rf git &&
        git init git &&
        cd git &&
        while true
        do
            head -c 10 /dev/urandom | hexdump >out &&
            git add out &&
            git commit -m"out"
        done
    )

    (
        cd /tmp &&
        rm -rf git-clone &&
        git clone file:///tmp/git git-clone &&
        cd git-clone &&
        while git pull
        do
            date
        done
    )

    (
        cd /tmp/git-clone &&
        while git reflog expire --all
        do
            date
        done
    )

Before this change the "reflog expire" would fail really quickly with
a "but expected" error. After this change both the "pull" and "reflog
expire" will run for a while, but eventually fail because I get
unlucky with core.filesRefLockTimeout (the "reflog expire" is in a
really tight loop). That can be resolved by being more generous with
higher values of core.filesRefLockTimeout than the 100ms default.

1. https://public-inbox.org/git/54857871.5090805@alum.mit.edu/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Duy Nguyen March 15, 2019, 11:10 a.m. UTC | #1
On Thu, Mar 14, 2019 at 7:35 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> During reflog expiry, the cmd_reflog_expire() function first iterates
> over all reflogs in logs/*, and then one-by-one acquires the lock for
> each one to expire its reflog by getting a *.lock file on the
> corresponding loose ref[1] (even if the actual ref is packed).
>
> This lock is needed, but what isn't needed is locking the loose ref as
> a function of the OID we found from that first iteration. By the time
> we get around to re-visiting the reference some of the OIDs may have
> changed.
>
> Thus the verify_lock() function called by the lock_ref_oid_basic()
> function being changed here would fail with e.g. "ref '%s' is at %s
> but expected %s" if the repository was being updated concurrent to the
> "reflog expire".
>
> By not passing the OID to it we'll try to lock the reference
> regardless of it last known OID. Locking as a function of the OID
> would make "reflog expire" exit with a non-zero exit status under such
> contention, which in turn meant that a "gc" command (which expires
> reflogs before forking to the background) would encounter a hard
> error.

Passing NULL oid has another side(?) effect, which I don't know if it
matters at all. Before, the mustexist flag in lock_ref_oid_basic() is
true. Now it's false. This affects refs_resolve_ref_unsafe() calls in
there. But that's where I'm stuck.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index ef053f716c..4d4d226601 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3037,7 +3037,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
>          * reference itself, plus we might need to update the
>          * reference if --updateref was specified:
>          */
> -       lock = lock_ref_oid_basic(refs, refname, oid,
> +       lock = lock_ref_oid_basic(refs, refname, NULL /* NOT oid! */,

Maybe mention this not oid in the comment block above. Or just drop
it. Reading this comment without the commit message does not really
help answer "why not oid?". Or perhaps /* don't verify oid */

>                                   NULL, NULL, REF_NO_DEREF,
>                                   &type, &err);
>         if (!lock) {
> --
Ævar Arnfjörð Bjarmason March 15, 2019, 3:52 p.m. UTC | #2
On Fri, Mar 15 2019, Duy Nguyen wrote:

> On Thu, Mar 14, 2019 at 7:35 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> During reflog expiry, the cmd_reflog_expire() function first iterates
>> over all reflogs in logs/*, and then one-by-one acquires the lock for
>> each one to expire its reflog by getting a *.lock file on the
>> corresponding loose ref[1] (even if the actual ref is packed).
>>
>> This lock is needed, but what isn't needed is locking the loose ref as
>> a function of the OID we found from that first iteration. By the time
>> we get around to re-visiting the reference some of the OIDs may have
>> changed.
>>
>> Thus the verify_lock() function called by the lock_ref_oid_basic()
>> function being changed here would fail with e.g. "ref '%s' is at %s
>> but expected %s" if the repository was being updated concurrent to the
>> "reflog expire".
>>
>> By not passing the OID to it we'll try to lock the reference
>> regardless of it last known OID. Locking as a function of the OID
>> would make "reflog expire" exit with a non-zero exit status under such
>> contention, which in turn meant that a "gc" command (which expires
>> reflogs before forking to the background) would encounter a hard
>> error.
>
> Passing NULL oid has another side(?) effect, which I don't know if it
> matters at all. Before, the mustexist flag in lock_ref_oid_basic() is
> true. Now it's false. This affects refs_resolve_ref_unsafe() calls in
> there. But that's where I'm stuck.

This case was tricky, I ended up doing the same thing in v3, but now
very carefully explain the rationale since none of it is obvious.

>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index ef053f716c..4d4d226601 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -3037,7 +3037,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
>>          * reference itself, plus we might need to update the
>>          * reference if --updateref was specified:
>>          */
>> -       lock = lock_ref_oid_basic(refs, refname, oid,
>> +       lock = lock_ref_oid_basic(refs, refname, NULL /* NOT oid! */,
>
> Maybe mention this not oid in the comment block above. Or just drop
> it. Reading this comment without the commit message does not really
> help answer "why not oid?". Or perhaps /* don't verify oid */

Also fixed.

>>                                   NULL, NULL, REF_NO_DEREF,
>>                                   &type, &err);
>>         if (!lock) {
>> --
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..4d4d226601 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3037,7 +3037,7 @@  static int files_reflog_expire(struct ref_store *ref_store,
 	 * reference itself, plus we might need to update the
 	 * reference if --updateref was specified:
 	 */
-	lock = lock_ref_oid_basic(refs, refname, oid,
+	lock = lock_ref_oid_basic(refs, refname, NULL /* NOT oid! */,
 				  NULL, NULL, REF_NO_DEREF,
 				  &type, &err);
 	if (!lock) {